Meet ContextCheck: Our Open-Source Framework for LLM & RAG Testing! Check it out on Github!


Client: NDA

AI for real estate: Automated Document Verification

Case study details


The Client, operating in the real estate trading sector, was struggling with the manual document verification process, which took too much time and effort. The company aimed to harness AI’s potential to automate and expedite the verification process, thereby enhancing accuracy, reducing turnaround times, and ultimately improving customer satisfaction, but – given it has to handle a diverse array of client-submitted documents and photos – the challenge was quite demanding.



Challenge

Overcoming Data Complexity: Challenges in Document Classification and Processing


The project, despite appearing straightforward at first glance, required the implementation of a diverse array of techniques and technologies due to the heterogeneous nature of the data involved.

The initial challenge was data classification – the system needed to accurately identify the type of document from which the data originated. Subsequent to this, the processing phase presented itself as another hurdle. It became evident that the type of document dictated the selection of the technology capable of processing the information it contained. There was no universal method capable of encompassing all possible scenarios, necessitating a tailored approach to handle the unique characteristics of each document type.

Data classification itself presented a challenge, as clients sent their documents in a wide array of formats – jpg, png, PDF. At times, a passport would be submitted on a white background in PDF format, and at other times as a jpg image. Sometimes, documents were oriented vertically, sometimes horizontally. The system had to automatically handle each case to properly read the necessary information from the documents in subsequent steps.



Our team expert opinion







Approach

Enhancing Document Processing in Real Estate: Streamlining Preprocessing and OCR


The very first step was preprocessing, during which every file was transformed into a graphic format (jpg or png). Only after this transformation could the documents be properly classified. After classification, it was time for the preprocessing phase, which needed to be broken down into several distinct steps. The system functioned impeccably when dealing with the horizontal front of a document in a single file but encountered difficulties with background elements and reversed orientation.

These individual steps included:

  • Image Detection, during which the front of the document is identified.
  • Cropping, where the document is separated from its background.
  • Face Detection in the photo, determines the document’s horizontal orientation.

Only after the proper classification and preprocessing of the data was it possible to proceed to data extraction based on OCR. And, just as in the previous step, depending on the document, a different approach had to be applied to each type of file due the fact that each document has a unique layout. Passports, in particular, required a non-standard approach. The names of certain fields (“Name,” “Country”) turned out to be impossible for machine reading.

In processing data from passports, we had to read data from the so-called machine readable zone (MRZ), and here it was found that Tesseract performed better than DocTr, as it can utilize models adapted to read data from the MRZ.

– Michał Pocztowski, Senior Data Scientist at Addepto.

All data recognized in the images are automatically uploaded to any system – it could even be Excel – and processed in any desired way.



Goal

Streamlining Real Estate Operations: Custom AI Platform for Document Verification


The client, a real estate firm, had been manually verifying documents—IDs, passports, and title deeds—a process that was both time-consuming and labor-intensive. Their customers would submit documents (or photos of documents) in various formats, sizes, and orientations, requiring company employees to meticulously examine each one individually.

Recognizing the inefficiency of this approach, the company made the strategic decision to develop a bespoke AI platform capable of automating the verification process. With the paramount importance of maintaining high accuracy to avoid any potential legal ramifications, ensuring the AI system’s precision became their foremost priority.



Outcome

Outcome



Before


  • Diverse array of techniques and technologies required due to heterogeneous data nature.
  • Data classification challenge: accurately identifying document types.
  • No universal method for processing all scenarios, demanding tailored approaches.
  • System struggles with diverse document formats and orientations.


After


  • Proof of Concept (PoC) developed by Addepto to automate document verification.
  • Combination of tailored technologies and approaches tackled data disorganization.
  • Achieved satisfying accuracy with acceptable performance.
  • Saved time previously spent on manual verification processes.

About Addepto


Addepto provides specialized AI consulting services to unlock the potential of integrating AI solutions into your business. Our expertise encompasses cutting-edge technologies including Computer Vision, Natural Language Processing, Predictive Analytics, Image Recognition, Recommendation Engines, Smart Search Engines, and more.


Here you can learn more about the technologies used in this project:



About us


We are recognized as one of the best AI, BI, and Big Data consultants


We helped multiple companies achieve their goals, but - instead of making hollow marketing claims here - we encourage you to check our Clutch scoring.

Let's discuss
a solution
for you



Edwin Lisowski

will help you estimate
your project.










Required fields

For more information about how we process your personal data see our Privacy Policy





Message sent successfully!