AI for Real Estate: Automated Document Verification

The Client, operating in the real estate trading sector, was struggling with the manual document verification process, which took too much time and effort. The company aimed to harness AI’s potential to automate and expedite the verification process, thereby enhancing accuracy, reducing turnaround times, and ultimately improving customer satisfaction, but – given it has to handle a diverse array of client-submitted documents and photos – the challenge was quite demanding.




Case Study Shortcut


Challenge


icon

High Variability in Document Formats and Layouts


Clients submitted documents in multiple formats (JPG, PNG, PDF), with inconsistent orientations (vertical/horizontal) and backgrounds. This heterogeneity required robust preprocessing and dynamic input handling to ensure consistent data extraction.

icon

Accurate Document Type Classification


Before any processing could begin, the system had to correctly classify the document type (e.g., passport, ID, invoice). Misclassification would lead to incorrect data processing, making this a critical and non-trivial task due to format and layout inconsistencies.

icon

Lack of a Universal Processing Approach


Each document type required a custom processing pipeline depending on its structure and content. There was no one-size-fits-all solution, technologies and techniques had to be tailored for each document category, increasing complexity and implementation time.

Goal


The client, a real estate firm, had been manually verifying documents—IDs, passports, and title deeds—a process that was both time-consuming and labor-intensive. Their customers would submit documents (or photos of documents) in various formats, sizes, and orientations, requiring company employees to meticulously examine each one individually.

Recognizing the inefficiency of this approach, the company made the strategic decision to develop a bespoke AI platform capable of automating the verification process. With the paramount importance of maintaining high accuracy to avoid any potential legal ramifications, ensuring the AI system’s precision became their foremost priority.


  • Automate Manual Document Verification

  • Ensure High Accuracy and Legal Compliance

  • Handle Diverse Document Formats and Layouts

Outcome



Before


  • Diverse array of techniques and technologies required due to heterogeneous data nature.
  • Data classification challenge: accurately identifying document types.
  • No universal method for processing all scenarios, demanding tailored approaches.
  • System struggles with diverse document formats and orientations.


After


  • Proof of Concept (PoC) developed by Addepto to automate document verification.
  • Combination of tailored technologies and approaches tackled data disorganization.
  • Achieved satisfying accuracy with acceptable performance.
  • Saved time previously spent on manual verification processes.

Integrate those solutions in your company


Contact below and let us design and integrate solutions tailored to your business needs


Let's talk

Case Study Details


Approach


Standardized Image Preprocessing Pipeline


  • All incoming documents (regardless of format) were converted into graphic formats (JPG/PNG) as a standardized starting point. This allowed consistent handling in the downstream processing stages.

Multi-Step Preprocessing for Document Cleanup


  • Image Detection to locate the document within the image.
  • Cropping to remove background elements and isolate the document.
  • Face Detection and Orientation Check to correct reversed or misaligned documents.

Document-Type-Specific OCR Strategy


  • Used custom pipelines based on layout complexity and field structures.
  • Passports required MRZ (Machine Readable Zone) extraction, where Tesseract outperformed DocTr due to its specialized MRZ-trained models.

Automated Data Extraction and System Integration


  • Once processed, all extracted data was automatically uploaded to the target system (e.g., CRM or Excel), enabling seamless integration into existing workflows without manual input.


Our Team Expert Opinion




For document classification, we employed the YOLO model, while for information extraction, we initially used Tesseract but subsequently transitioned to DocTR. This shift was motivated by DocTR's superior ability to accurately extract information from images of highly variable quality.


Michał Pocztowski Senior Data Scientist at Addepto

Take the next step


Schedule an intro call to get know each other better and understand the way we work


Let's talk

About Addepto



Addepto, a fast-paced, growing company focused on innovations in AI-related and data-oriented areas, supports digital transformation at companies working on electronics manufacturing services.


Here you can learn more about the technologies used in this project:



We help them find ways to use their data effectively with data lakes, data platforms, data engineering and so on.


About us


We are recognized as one of the best AI, BI, and Big Data consultants


We helped multiple companies achieve their goals, but - instead of making hollow marketing claims here - we encourage you to check our Clutch scoring.

Our customers love to work with us

Let's discuss
a solution
for you



Edwin Lisowski

will help you estimate
your project.













Required fields

For more information about how we process your personal data see our Privacy Policy





Message sent successfully!