The Client, operating in the real estate trading sector, was struggling with the manual document verification process, which took too much time and effort. The company aimed to harness AI’s potential to automate and expedite the verification process, thereby enhancing accuracy, reducing turnaround times, and ultimately improving customer satisfaction, but – given it has to handle a diverse array of client-submitted documents and photos – the challenge was quite demanding.
The project, despite appearing straightforward at first glance, required the implementation of a diverse array of techniques and technologies due to the heterogeneous nature of the data involved.
The initial challenge was data classification – the system needed to accurately identify the type of document from which the data originated. Subsequent to this, the processing phase presented itself as another hurdle. It became evident that the type of document dictated the selection of the technology capable of processing the information it contained. There was no universal method capable of encompassing all possible scenarios, necessitating a tailored approach to handle the unique characteristics of each document type.
Data classification itself presented a challenge, as clients sent their documents in a wide array of formats – jpg, png, PDF. At times, a passport would be submitted on a white background in PDF format, and at other times as a jpg image. Sometimes, documents were oriented vertically, sometimes horizontally. The system had to automatically handle each case to properly read the necessary information from the documents in subsequent steps.
For document classification, we employed the YOLO model, while for information extraction, we initially used Tesseract but subsequently transitioned to DocTR. This shift was motivated by DocTR's superior ability to accurately extract information from images of highly variable quality.
The very first step was preprocessing, during which every file was transformed into a graphic format (jpg or png). Only after this transformation could the documents be properly classified. After classification, it was time for the preprocessing phase, which needed to be broken down into several distinct steps. The system functioned impeccably when dealing with the horizontal front of a document in a single file but encountered difficulties with background elements and reversed orientation.
These individual steps included:
Only after the proper classification and preprocessing of the data was it possible to proceed to data extraction based on OCR. And, just as in the previous step, depending on the document, a different approach had to be applied to each type of file due the fact that each document has a unique layout. Passports, in particular, required a non-standard approach. The names of certain fields (“Name,” “Country”) turned out to be impossible for machine reading.
In processing data from passports, we had to read data from the so-called machine readable zone (MRZ), and here it was found that Tesseract performed better than DocTr, as it can utilize models adapted to read data from the MRZ.
– Michał Pocztowski, Senior Data Scientist at Addepto.
All data recognized in the images are automatically uploaded to any system – it could even be Excel – and processed in any desired way.
The client, a real estate firm, had been manually verifying documents—IDs, passports, and title deeds—a process that was both time-consuming and labor-intensive. Their customers would submit documents (or photos of documents) in various formats, sizes, and orientations, requiring company employees to meticulously examine each one individually.
Recognizing the inefficiency of this approach, the company made the strategic decision to develop a bespoke AI platform capable of automating the verification process. With the paramount importance of maintaining high accuracy to avoid any potential legal ramifications, ensuring the AI system’s precision became their foremost priority.
Addepto provides specialized AI consulting services to unlock the potential of integrating AI solutions into your business. Our expertise encompasses cutting-edge technologies including Computer Vision, Natural Language Processing, Predictive Analytics, Image Recognition, Recommendation Engines, Smart Search Engines, and more.
Here you can learn more about the technologies used in this project: