AI for Real Estate: Automated Document Verification

Home » AI for Real Estate: Automated Document Verification

The Client, operating in the real estate trading sector, was struggling with the manual document verification process, which took too much time and effort. The company aimed to harness AI’s potential to automate and expedite the verification process, thereby enhancing accuracy, reducing turnaround times, and ultimately improving customer satisfaction, but – given it has to handle a diverse array of client-submitted documents and photos – the challenge was quite demanding.

Case Study Shortcut

Challenge

High Variability in Document Formats and Layouts

Clients submitted documents in multiple formats (JPG, PNG, PDF), with inconsistent orientations (vertical/horizontal) and backgrounds. This heterogeneity required robust preprocessing and dynamic input handling to ensure consistent data extraction.

Accurate Document Type Classification

Before any processing could begin, the system had to correctly classify the document type (e.g., passport, ID, invoice). Misclassification would lead to incorrect data processing, making this a critical and non-trivial task due to format and layout inconsistencies.

Lack of a Universal Processing Approach

Each document type required a custom processing pipeline depending on its structure and content. There was no one-size-fits-all solution, technologies and techniques had to be tailored for each document category, increasing complexity and implementation time.

Goal

The client, a real estate firm, had been manually verifying documents—IDs, passports, and title deeds—a process that was both time-consuming and labor-intensive. Their customers would submit documents (or photos of documents) in various formats, sizes, and orientations, requiring company employees to meticulously examine each one individually.

Recognizing the inefficiency of this approach, the company made the strategic decision to develop a bespoke AI platform capable of automating the verification process. With the paramount importance of maintaining high accuracy to avoid any potential legal ramifications, ensuring the AI system’s precision became their foremost priority.

Automate Manual Document Verification

Ensure High Accuracy and Legal Compliance

Handle Diverse Document Formats and Layouts

Outcome

Before

Diverse array of techniques and technologies required due to heterogeneous data nature.
Data classification challenge: accurately identifying document types.
No universal method for processing all scenarios, demanding tailored approaches.
System struggles with diverse document formats and orientations.

After

Proof of Concept (PoC) developed by Addepto to automate document verification.
Combination of tailored technologies and approaches tackled data disorganization.
Achieved satisfying accuracy with acceptable performance.
Saved time previously spent on manual verification processes.

Integrate those solutions in your company

Contact below and let us design and integrate solutions tailored to your business needs

Let's talk

Case Study Details

Approach

Standardized Image Preprocessing Pipeline

All incoming documents (regardless of format) were converted into graphic formats (JPG/PNG) as a standardized starting point. This allowed consistent handling in the downstream processing stages.

Multi-Step Preprocessing for Document Cleanup

Image Detection to locate the document within the image.
Cropping to remove background elements and isolate the document.
Face Detection and Orientation Check to correct reversed or misaligned documents.

Document-Type-Specific OCR Strategy

Used custom pipelines based on layout complexity and field structures.
Passports required MRZ (Machine Readable Zone) extraction, where Tesseract outperformed DocTr due to its specialized MRZ-trained models.

Automated Data Extraction and System Integration

Once processed, all extracted data was automatically uploaded to the target system (e.g., CRM or Excel), enabling seamless integration into existing workflows without manual input.

Our Team Expert Opinion

For document classification, we employed the YOLO model, while for information extraction, we initially used Tesseract but subsequently transitioned to DocTR. This shift was motivated by DocTR's superior ability to accurately extract information from images of highly variable quality.

Michał Pocztowski Senior Data Scientist at Addepto

Take the next step

Schedule an intro call to get know each other better and understand the way we work

Let's talk

About Addepto

Addepto, a fast-paced, growing company focused on innovations in AI-related and data-oriented areas, supports digital transformation at companies working on electronics manufacturing services.

Here you can learn more about the technologies used in this project:

We help them find ways to use their data effectively with data lakes, data platforms, data engineering and so on.

About us