Pull to refresh

How we tackled document recognition issues for autonomus and automatic payments using OCR and NER

Python *Natural Language Processing *

In this article, I would like to describe how we’ve tackled the named entity recognition (aka NER) issue at one of the hugest banks in the world with the help of advanced AI techniques. It is one of many natural language processing (NLP) tasks that allows you to automatically extract data from unstructured text. This includes monetary values, dates, or names, surnames and positions.

Just imagine countless textual documents even a medium-sized organisation deals with on a daily basis, let alone huge corporations. Take us, for example: it is the largest financial institution in Russia, Central and Eastern Europe that has about 16,500 offices with over 250,000 employees, 137 million retail and 1.1 million corporate clients in 22 countries. As you can imagine, with such an enormous scale, the company collaborates with hundreds of suppliers, contractors and other counterparties, which implies thousands of contracts. For instance, the estimated number of legal documents to be processed in 2022 has been over 65,000, each of them consisting of 30 pages on average. During the lifecycle of a contract, a contract usually updated with 3 to 5 additional agreements. On top of this, a contract is accompanied by various source documents describing transactions. And in the PDF format, too.

Previously, the processing duty befell our service centre’s employees who checked whether payment details in a bill match those in the contract and then sent the document to the Accounting Department where an accountant double-checked everything. This is quite a long journey to a payment, right?

The human factor added to the problem: the task of entering information into an ERP system is not your dream job and is quite mundane, which significantly affects the quality of the resulting data, not to mention that the entire process is relatively slow: according to our calculations, it takes about 3.5 minutes for an employee to process one contract. In addition, employees extract entities from documents only partially, for specific purposes, while all entities contain valuable information that we could use for other projects.

Here comes our document recognition task.

The contemporary tech space offers multiple intelligent document recognition solutions but none of them quite befitted our purposes (aside from, probably, ABBYY FlexiCapture) because we needed a universal solution.

Typically, for optical character recognition (OCR) tasks, flexible templates put on top of the document structure are used. If the structure is the same, the information is retrieved with high quality. The same process is applicable to tables. It may seem like recognising tabular data is a simple problem because structure is virtually the definition of a table. But there are some buts here. For example, different types and formatting of the cells, or wrong association between a cell and the text it contains, in the OCR results.

With this problem in mind and having carefully considered all pros and cons of creating our own product, we’ve come up with a solution that works for all kinds of tables. And here is how: a flexible template first recognises the borders of the table, with a file with the coordinates of the borders generated, then adds the table lines and cells. This way we can have the coordinates of all elements of the table, and it is quite easy to proceed from there with the usual NLP tasks. In some respects, this solution is unique and is the heart of our proprietary AI platform.

As for the architecture specifics, the system includes the source and target systems, which is our ERP system, PDF documents to be processed, and the AI platform itself, with an integration layer between SAP and the platform. First, the system recognises the document structure, then classifies the documents and pastes together contracts with additional agreements, after which relevant entities are finally extracted. 

One of the problems that we faced at the first step was some loss of information because some of our entities are handwritten, as well as data corruption due to low quality of scanned documents.

Then comes the second stage where we recognise a document structure using fundamental models to segment the document into sections, classify the segments and pages, as well as to recognise individual clauses and subclauses in legal texts. This is where we also extract tables using the method I’ve describe above. All entity extraction models heavily depend on this step because they look up only certain sections of the contract. The same goes the model joining documents together. It’s in this part of the process where we saw the majority of technical issues like long request processing time.

Our next AI model classifies incoming documents into groups, such as contracts vs. additional agreements, and further, into more specific groups like signed vs. unsigned contracts, and so on. This step eliminates the need for manual selection of the document to be recognised and provides for downloading documents en masse.

The contract joining model is followed by the entity extraction models. Speaking of the types of entities, we saw that dates and numbers of contracts and signatures, as well as amounts showed good recognition quality of more than 80%. Addresses, names and positions, however, required further refinement. The most problematic ones, with the quality of less than 70%, were signing dates, contract start and end dates, and the subject of the agreement, and these are the top-priority entities for labelling.

In our workflow, the models extract up to 44 entities from every contract and additional agreement and up to 20 entities from source documents. All the entities from the updated contract are then reconciled with those from the relevant source documents. And if there is a match, an auto payment can be made.

Seeing as 100 people out of 149 employees of our service centre are engaged in reconciliation activities, the solution we have offered will greatly optimise the headcount in the centre as well as speed up the reconciliation process (1 minute vs. 3.5 minutes).

As for the quality of the document processing as measured with standard metrics such as precision, recall and F-measure, our models have shown the accuracy of 90% for the majority of the entities vs. expected 60%.

Another important implication of this project is that we have received valuable historical data that can be further used in multiple other projects.

Rating 0
Views 121
Comments Leave a comment