The Client, operating in the real estate trading sector, was struggling with the manual document verification process, which took too much time and effort. The company aimed to harness AI’s potential to automate and expedite the verification process, thereby enhancing accuracy, reducing turnaround times, and ultimately improving customer satisfaction, but – given it has to handle a diverse array of client-submitted documents and photos – the challenge was quite demanding.
Clients submitted documents in multiple formats (JPG, PNG, PDF), with inconsistent orientations (vertical/horizontal) and backgrounds. This heterogeneity required robust preprocessing and dynamic input handling to ensure consistent data extraction.
Before any processing could begin, the system had to correctly classify the document type (e.g., passport, ID, invoice). Misclassification would lead to incorrect data processing, making this a critical and non-trivial task due to format and layout inconsistencies.
Each document type required a custom processing pipeline depending on its structure and content. There was no one-size-fits-all solution, technologies and techniques had to be tailored for each document category, increasing complexity and implementation time.
The client, a real estate firm, had been manually verifying documents—IDs, passports, and title deeds—a process that was both time-consuming and labor-intensive. Their customers would submit documents (or photos of documents) in various formats, sizes, and orientations, requiring company employees to meticulously examine each one individually.
Recognizing the inefficiency of this approach, the company made the strategic decision to develop a bespoke AI platform capable of automating the verification process. With the paramount importance of maintaining high accuracy to avoid any potential legal ramifications, ensuring the AI system’s precision became their foremost priority.
For document classification, we employed the YOLO model, while for information extraction, we initially used Tesseract but subsequently transitioned to DocTR. This shift was motivated by DocTR's superior ability to accurately extract information from images of highly variable quality.
Addepto, a fast-paced, growing company focused on innovations in AI-related and data-oriented areas, supports digital transformation at companies working on electronics manufacturing services.
Here you can learn more about the technologies used in this project:
We help them find ways to use their data effectively with data lakes, data platforms, data engineering and so on.