All scanner software offers a chance to preview. You need to do this
to set the area that you want scanned and to adjust the brightness. Most
scanners offer some filters you can apply to monochrome work. Try them on the
previewed image.
The image needs to be very flat. For old books or very
faded scripts on delicate paper, a trip to the photocopier will produce a sheet
that you can feed into the machine with much greater confidence of
success.
There is a degree of luck as well as judgement involved in
setting things up. It is worth interrupting yourself after a few pages, just to
make sure everything is working. It might take an hour to set everything up and
check that all is working well, before you set about feeding your old opus into the
scanner. If this seems like a bit of an investment of time, just think how long it will
take you to peck away at the keyboard to retype the work.
Of course it helps enormously if you read the
scanner manual. I
know they are not written to confuse, it just seems that way. Often manuals make little sense until you have had a go. The more you read the manual, the
more you will achieve with your scanner.
Part two is the OCR software.
Every scanner seems to come
with a package and they are outstanding. The ones I have tried work out the
typeface used, as well as the font size, and convert the image to letters in
seconds. Some packages allow you to help them learn if they find a jumble of
dots they cannot resolve, but for the last decade I have left it to the software
to guess and it has managed very well without my help.
I should mention a third part of the OCR team, which is the
computer. Converting text is something that requires raw computing power. The
basic Pentium with 16 Mb of memory will trudge through the script, page by page.
You will have plenty of time to tidy the office or do some other penance while
the computer does your work. A computer made in this
millennium will convert pages just as fast as you can load them. The change in
performance over the last 5 years is remarkable. It used to take 10 minutes a
page. Now it takes 10 seconds.
My first set-up cost £70 including the software (a
Black
Widow scanner and TextBridge Classic software) and it gives me a 99% accurate
recognition factor with a good copy. Now I use an HP 'all-in-one' which does a
fine job. After the output has been passed through a spell checker and the
various attempts to translate coffee stains to text have been removed, the
result is 99.9% accurate.
The scanner is over 3 years old, which is definitely
middle-aged in computer years. But it works so well I am reluctant to change it.
The latest software is frighteningly clever and offers to translate the output
into Japanese, which seems to rather defeat the whole object of the exercise.
Sadly, OCR, is not the complete answer. I tried it on my
grandfather's trench diary from the Ypres Salient written during 1915, but it didn’t
work. It also failed when fed the wonderful copperplate handwritten history of
the Stewarts of Comrie. But it has coped with many faded typescripts on yellowing
paper and faded dot matrix printed scripts, so give it a
go.
Scansoft Public-domain
FineReader products