Story   Photographer   Editor   Student/Intern   Assistant   Job/Item

SportsShooter.com: The Online Resource for Sports Photography

Contents:
 Front Page
 Member Index
 Latest Headlines
 Special Features
 'Fun Pix'
 Message Board
 Educate Yourself
 Equipment Profiles
 Bookshelf
 my.SportsShooter
 Classified Ads
 Workshop
Contests:
 Monthly Clip Contest
 Annual Contest
 Rules/Info
Newsletter:
 Current Issue
 Back Issues
Members:
 Members Area
 "The Guide"
 Join
About Us:
 About SportsShooter
 Contact Us
 Terms & Conditions


Sign in:
Members log in here with your user name and password to access the your admin page and other special features.

Name:



Password:







||
SportsShooter.com: Member Message Board

OT: OCR database experience?
Ethan Magoc, Student/Intern
Erie | PA | United States | Posted: 11:22 PM on 03.28.11
->> Hello SS,

As my senior project, I'm currently digitizing every issue of my college's student newspaper that has been published, dating to 1929. I'm using a Canon 7D to scan each page, so there's plenty of resolution even after scaling the images down to make PDF file sizes more reasonable (25 to 30 mb each). I then use ABBYY FineReader Express for Mac, which I'd estimate is about 80 to 90 percent accurate in its PDF output.

I had initially planned to use Issuu.com to house all the issues. Its screen reader is among the best I've found and it will also pick up all that embedded, OCR'd text in the uploaded document. Only problem is I'm guessing I'll eventually run into some sort of storage limit (though the free version is rumored to have unlimited storage). I have estimated somewhere in the 30 gig range for all 80-some years.

Approx 1,200 issues x 30mb = 32GB

There are a few selected issues (non-OCR versions from the 1920s, 60s and 70s) up as tests on our 2010-11 issue archive:
http://issuu.com/themerciad

My question, then, for all of you: has anyone ever tried to set up a digital archive with similar PDF searching and viewing capabilities? I had been looking into hiring one of our more talented web students to look into this, but he's going to be far too busy during the duration of spring term.

This post is mostly a shot in the dark here (hence the OT), but I'd be most grateful to anyone who has any advice to offer.
 This post is:  Informative (0) | Funny (0) | Huh? (0) | Off Topic (0) | Inappropriate (0) |   Definitions

Add your comments...
If you'd like to add your comments to this thread, use this form. You need to be an active (paying) member of SportsShooter.com in order to post messages to the system.

NOTE: If you would like to report a problem you've found within the SportsShooter.com website, please let us know via the 'Contact Us' form, which alerts us immediately. It is not guaranteed that a member of the staff will see your message board post.
Thread Title: OT: OCR database experience?
Thread Started By: Ethan Magoc
Message:
Member Login:
Password:




Return to -->
Message Board Main Index
Copyright 2023, SportsShooter.com