Optical Character Recognition – OCR
Software capture of printed text from scanned documents making them searchable
Optical Character Recognition or OCR is a software technology that effectively ‘reads’ images that contain printed text. The process of scanning a document only ever captures an image of the page as opposed to editable text. OCR looks at the scanned image and recognises the forms of letters to a high level of accuracy. This recognition allows the text information to be extracted for use in other applications such as a document or content management system.
Without OCR, scanned documents would only be searchable by the filenames or metadata associated with them. As an example if you were to search for a letter to a person called Sally Smith then the only way you would be able to find it is if those values were either in the scanned image file name or in the metadata associated with the image in a database.
After OCR then a database would be able to interrogate the contents of all of your documents as well as the metadata which broadens your ability to find what you are looking for considerably.
There are many OCR software engines available together with tools for analysing the resultant information. At Scan Data Experts we can help define your requirements and then help you select the best OCR tool for your needs.