News & Insights

Company Updates

Optical Character Recognition

Optical Character Recognition

How Can Optical Character Recognition technology Improve Your Business?


Before Optical Character Recognition

Before the digital age, data was entirely collected on physical formats such as an application form or an invoice. Data from these documents was then extracted and captured manually. The documents would also be manually sorted and stored.

While digitization has transformed this process for some applications, (e.g. online application for a mortgage), much of the data being captured is still being produced from physical formats. Many companies such as those in finance and healthcare still rely on clients submitting data via paper documents.

OCR technology

Early applications of OCR technology tackled issues for the blind and visually impaired, either by capturing data from documents and converting them to telegraph code or an audio output. The postal office also used OCR technology to help deal with the millions of pieces of mail that go through processing centers every day.

OCR technology follows three key steps:

1.      Pre-processing

2.      Text recognition

3.     Post-processing

Pre-processing involves scanning the document and using various techniques to make the image easier to read.

Text recognition is then performed. There are many software programs available that will recognize the written characters and translate them into a digital format such as XML.

Post-processing applies linguistic logic to improve the accuracy of the digital translation, improving accuracy and filling gaps based on contextual clues.

There are a wide variety of use-case scenarios where specific optimization techniques could be employed to further enhance the quality of the data captured.

OCR use-cases

It’s not only the visually impaired or the post office that benefit from OCR technology. Many industries rely on data that has been traditionally captured manually by data entry roles. OCR has the potential to improve operational efficiency by increasing the volume of data captured at any given time. It could enhance the accuracy of the data captured by essentially eliminating the human error factor in the process.

Here are three examples of how some industries would benefit from utilizing OCR technology:

  • Mortgage and Finance – OCR technology provides efficiency gains in loan processing, sorting, storage, and access. Loan origination expenses have been trending up – $7,452 in 3Q2020 vs an average of $6,566 in 3Q2008. (Mortgage Bankers Association, Dec 2020).  The manual processing of these documents (loan applications and support docs such as pay stubs, bank statements etc.) is a part of this expense that can be reduced by implementing OCR to extract data. OCR technology would be a weapon against the rising cost of loan origination.

  • Healthcare and Medical – Using OCR technology allows nurses to focus on their core functions and increase data accuracy (busy nurses are not ideal for the data entry role). “Eight people using the OCR technology dealt with an average of 10,000 PDFs per day, and the average document was 10 pages long. Each document only took seven seconds to process, and the content within was mostly medical-related.” ( With hospitals in an ongoing digitization effort, nurses are often burdened with the data input tasks before the end of each shift, taking their attention away from patient care. OCR technology would improve both morale and data accuracy.

  • Government – The government has been leveraging OCR technology to streamline data collection. “For example, a number of state revenue agencies now use recognition software to read tax returns. As a result, they have dramatically reduced processing costs.” ( The report goes on to note that the technology can reduce data-entry costs by as much as 70 percent and shrink the data-entry labor-force by as much as 60 percent.  Applied across hundreds of millions of tax returns, police reports, various license applications and more annually, OCR is helping the government stay slim.

Challenges with OCR

While OCR technology promises to bring efficiencies to so many organizations, it is not without challenges. The most apparent issue is accuracy and completeness.

OCR-captured results are usually fraught with errors whether due to unrecognizable handwriting or non-standard forms where input fields are not always in the same place. Optimization techniques such as running multiple OCR engines, and enhancing pre-processing techniques can improve results, but there will always be data quality issues. Data analysts will still have to clean and remediate the output before using the captured data to generate reports, develop strategies, calculate risk etc.

While OCR with manual data cleanup is still an improvement on the traditional data entry approach, it can be significantly stronger if it was also paired with an automated data quality management tool.

Data Quality Manager as the Foundation

BaseCap Analytics’ Data Quality Manager is a perfect complement to Optical Character Recognition technology.

By pairing Optical Character Recognition with our Data Quality Manager, a data analyst can optimize the benefits from data capture by significantly reducing the cost and effort required to clean and validate the data extracted from OCR. This can provide a boost to the near-term bottom-line and more importantly a foundation for long-term scalability of a business, which will invariably encounter more data as it grows.

Schedule a demo with BaseCap Analytics, and our team of data experts will help design a customized approach to turn your data into a competitive advantage for your company.