Intelligent Agentic RAG: Modular AI for Enterprise Knowledge Bases

Home » Intelligent Agentic RAG: Modular AI for Enterprise Knowledge Bases

This case study explores our collaboration with a major player in the industrial manufacturing and automotive sector. Organizations of this scale rarely operate on a unified technology stack—decades of growth often result in a fragmented landscape of heterogeneous tools, siloed data sources, and legacy systems not designed to interoperate with modern AI platforms.

The central challenge was extracting actionable insights from vast volumes of complex, unstructured technical documentation—such as engine damage reports, flowcharts, and engineering diagrams—while maintaining strict data security and compliance.

To address this, we developed an intelligent AI agenting platform built on a modular, future-proof architecture that functions as the enterprise’s internal “brain.” Each capability operates as an independently deployable service, allowing the platform to evolve alongside emerging employee use cases. As a result, employees can interact with company documents conversationally and securely automate daily tasks—without risking data leaks.

Meet Our Client

The client is a leading heavy-duty engineering manufacturer focused on building power systems and engines for submarines, trains, and large industrial machines. Their highly specialized manufacturing processes generate massive volumes of complex technical documentation—including damage reports, engineering diagrams, and HR policies—that are often scattered and difficult to navigate.

Case Study Shortcut

Challenge

Modular Parsing Service

Developed a sophisticated document parsing module capable of identifying and extracting contextual information from several different categories of visual data—including Gantt charts, flowcharts, and technical engine diagrams—with layout-aware processing that handles even the most unusual document structures.

Power Search Engine

A dedicated search service that goes far beyond keyword matching, surfacing exact text fragments matched to a user’s query and enabling document-level conversations with pinpoint precision.

Agentic RAG Service

A standalone multi-agent reasoning module capable of breaking down complex queries into sub-steps, expanding acronyms, performing calculations, and synthesizing answers across multiple sources—with full awareness of context and a hard boundary against hallucination.

Excel Integration Module

A purpose-built integration service for structured data, enabling complex cross-referencing, comparison, and reasoning across multiple Excel spreadsheets alongside unstructured PDF content.

Goal

The primary goal was to drastically reduce the time employees spend searching for critical information by building a centralized, intelligent interface that deeply understands the company’s internal knowledge base.

From the outset, the client required a solution that went far beyond standard semantic search. The platform needed deep customization to handle a long tail of edge cases inherent in complex technical documentation—and it needed to do so reliably, at enterprise scale, with zero tolerance for fabricated or misleading answers.

This last point was perhaps the most demanding engineering challenge of the entire project. Large language models are non-deterministic by design and inherently tend toward “average” outputs—producing responses that sound plausible but may blend, omit, or subtly distort information. In a domain where a single misread damage report or incorrectly attributed engine fault could have serious operational consequences, that tendency had to be systematically identified, constrained, and eliminated at every layer of the architecture.

The platform was therefore built to be fully context-aware at all times: rather than relying on a model’s pre-trained assumptions, every response is grounded in explicitly retrieved, traceable source material.

Ensuring Data Security: Keeping all enterprise data and LLM operations strictly within a secure European Microsoft Azure data center.

Handling Complex Document Layouts: Accurately processing multi-column PDFs containing varied fonts, quotes, and embedded images into a unified, readable format—including the countless layout edge cases that standard parsers fail silently on.

Bridging Text and Visuals: Making images and diagrams fully searchable by extracting their contents and linking them intelligently with the surrounding text context.

Enabling Multi-Step Reasoning Without Hallucination: Allowing the system to decompose complex user queries, invoke the appropriate service modules, and synthesize accurate answers from verified sources—never from the model's internal, potentially stale or fabricated knowledge.

Integrating Structured Data: Building a dedicated Excel integration module capable of analyzing, comparing, and connecting data across multiple complex spreadsheets in conjunction with unstructured document sources.

Designing for the Future: Architecting every service—parsing, power search, agentic RAG, and Excel integration—as a modular, loosely coupled component, so the platform can be extended with new capabilities as the organization's use cases mature and expand.

Outcome

Employees can now use natural language to instantly query thousands of historical damage reports, technical diagrams as well as HR policies, and financial information, turning hours of manual document scanning into a process that takes mere seconds.

Critically, the platform delivers those answers with full contextual grounding—users can see exactly which source documents and text fragments informed each response, eliminating the risk of acting on a hallucinated or out-of-context result. The modular architecture means that as new departments begin using the platform and new use cases surface, individual service modules can be upgraded or extended without disrupting the rest of the system.

The solution also ensures that all workflows remain fully compliant with enterprise security standards, providing a secure, controllable alternative to public tools like ChatGPT.

Before

Manual search through thousands of scattered, 30-page PDFs to find historical engine issues.
Images, flowcharts, and technical diagrams were ignored by text-based search systems.
Fear of data leaks prevented the use of public LLMs for daily tasks like summarizing emails.
Standard searches failed when queries required math, acronym expansion, or combining multiple documents.
Any new use case required a bespoke solution built from scratch.

After

Instant, structured summaries of root causes and damaged parts based on natural language queries.
Visual data is categorized into ~20 types and fully searchable, with text and image context intelligently linked.
A secure, internal Azure environment ensures enterprise data never leaves the isolated infrastructure.
Agentic RAG dynamically invokes tools to calculate, expand, and synthesize—always from verified sources, never from model assumptions.
Modular architecture allows new capabilities to be plugged in as use cases emerge without rearchitecting the platform.

Integrate those solutions in your company

Contact below and let us design and integrate solutions tailored to your business needs

Let's talk

Case Study Details

Approach

Parsing Module — Taming Document Complexity

The parsing service is the foundation of the entire platform. It classifies images into approximately 20 distinct categories to apply the most appropriate extraction method for each—for example, reading a flowchart in its correct directional sequence rather than treating it as a flat image. These categories were intentionally designed around the types of documents most frequently encountered in the client’s environment, with particular emphasis on technical documentation formats, ensuring the system could accurately interpret the structures and visual conventions typical of such materials. Handling the full spectrum of edge cases was non-trivial: real-world technical documents contain irregular multi-column layouts, mixed fonts, embedded diagrams at unusual orientations, and structural inconsistencies that cause standard parsers to silently drop or misrepresent content. Every such edge case was catalogued and explicitly addressed, because any gap at the parsing layer propagates as misinformation through every layer above it.

Single-Chunk Strategy for Critical Documents

Standard RAG systems split documents into small chunks, which scatters context and creates ambiguity when queries span multiple passages. For high-stakes documents like engine damage reports, the parsing module instead extracts all key information into a single, richly structured chunk. This preserves the full context of each report—root cause, timeline, affected components—so the model is never reasoning from a fragment when it needs the whole picture.

Power Search Engine — Transparent Retrieval

The power search module provides an AI search experience designed as the first step in a two-stage workflow. Using AI-powered retrieval, it identifies the documents most relevant to a user’s query while showing exactly which text fragments matched the search, allowing employees to quickly validate why specific results were returned. Once the relevant documents are identified, users can move to the second step—conducting deeper analysis on selected materials, such as extracting data, comparing information across sources, or exploring them further through conversational AI.

Agentic RAG Service — Reasoning Without Hallucination

This is the most architecturally complex module, and the one where the risk of hallucination was highest. Rather than allowing the model to answer immediately from its parametric memory, the agentic reasoning engine first decomposes the user's query into a structured plan of sub-queries. Each sub-query is routed to the appropriate tool—document retrieval, acronym expansion, mathematical calculation, or cross-document synthesis—and only verified, retrieved content is passed back to the model for final answer composition. The model's own pre-trained knowledge is treated as inadmissible. This design directly counteracts the LLM's natural tendency to produce fluent but "averaged" responses that blend information across contexts in ways that may be subtly wrong.

Excel Integration Module — Working with Structured Data

Structured data stored in spreadsheets comes with different challenges than unstructured content like PDFs, so we built a separate module to handle it. This service allows the platform to compare, aggregate, and cross-reference data across multiple Excel sheets, and combine those insights with information pulled from documents—something that general-purpose tools often struggle to do reliably.

Technology

Microsoft Azure OpenAI

OpenAI GPT Models

Our team

Bartłomiej Grasza

Principal AI Engineer

Bartosz Nguyen Van

Data Engineer

Daniel Mątwicki

AI Engineer

Jakub Okrzesa

Senior Data Scientist

Krzysztof Mariański

Data Scientist

Marcin Dekiert

Senior Software Engineer

Marcin Krupa

Senior Software Engineer

Volodymyr Kepsha

Senior AI Engineer

Our Team Expert Opinion

The hardest part wasn't building the AI—it was making it right. Anyone can wire up a language model and demo it on clean data. The real work is in the edge cases: the malformed PDFs, the legacy system that speaks a protocol nobody remembers, the diagram that breaks every assumption your parser was built on. We don't come in to execute a spec, we come in to understand how a business actually operates, where its knowledge lives, where it gets lost, and then build something that survives contact with that reality. The modular architecture wasn't a technical preference, it was a strategic one—we knew that what the client needed on day one would look different from what they'd need in year two.

Bartłomiej Grasza Principal AI Engineer – Addepto

Take the next step

Schedule an intro call to get know each other better and understand the way we work

Let's talk

Intelligent Agentic RAG: Modular AI for Enterprise Knowledge Bases

Meet Our Client

Case Study Shortcut

Challenge

Modular Parsing Service

Power Search Engine

Agentic RAG Service

Excel Integration Module

Goal

Outcome

Before

After

Integrate those solutions in your company

Case Study Details

Approach

Parsing Module — Taming Document Complexity

Single-Chunk Strategy for Critical Documents

Power Search Engine — Transparent Retrieval

Agentic RAG Service — Reasoning Without Hallucination

Excel Integration Module — Working with Structured Data

Technology

Our team

Our Team Expert Opinion

Take the next step

About Addepto

We are recognized as one of the best AI, BI, and Big Data consultants

We helped multiple companies achieve their goals, but - instead of making hollow marketing claims here - we encourage you to check our Clutch scoring.

Let's discuss
a solution
for you

Meet Our Client

Case Study Shortcut

Challenge

Modular Parsing Service

Power Search Engine

Agentic RAG Service

Excel Integration Module

Goal

Outcome

Before

After

Integrate those solutions in your company

Case Study Details

Approach

Parsing Module — Taming Document Complexity

Single-Chunk Strategy for Critical Documents

Power Search Engine — Transparent Retrieval

Agentic RAG Service — Reasoning Without Hallucination

Excel Integration Module — Working with Structured Data

Technology

Our team

Our Team Expert Opinion

Take the next step

About Addepto

We are recognized as one of the best AI, BI, and Big Data consultants

We helped multiple companies achieve their goals, but - instead of making hollow marketing claims here - we encourage you to check our Clutch scoring.

Let's discuss a solution for you

Other case studies

Cutting Data Costs by 70%: Optimising Databricks for a European Fashion Retailer

Intermodal Transportation Data Platform: Unifying Data for Travel Operations

Transform Engineering Chaos into Strategic Clarity

Let's discuss
a solution
for you