Addepto in now part of KMS Technology – read full press release!



Intelligent Agentic RAG: Modular AI for Enterprise Knowledge Bases

This case study explores our collaboration with a major player in the industrial manufacturing and automotive sector. Organizations of this scale rarely operate on a unified technology stack—decades of growth often result in a fragmented landscape of heterogeneous tools, siloed data sources, and legacy systems not designed to interoperate with modern AI platforms.

The central challenge was extracting actionable insights from vast volumes of complex, unstructured technical documentation—such as engine damage reports, flowcharts, and engineering diagrams—while maintaining strict data security and compliance.

To address this, we developed an intelligent AI agenting platform built on a modular, future-proof architecture that functions as the enterprise’s internal “brain.” Each capability operates as an independently deployable service, allowing the platform to evolve alongside emerging employee use cases. As a result, employees can interact with company documents conversationally and securely automate daily tasks—without risking data leaks.



Meet Our Client


The client is a leading heavy-duty engineering manufacturer focused on building power systems and engines for submarines, trains, and large industrial machines. Their highly specialized manufacturing processes generate massive volumes of complex technical documentation—including damage reports, engineering diagrams, and HR policies—that are often scattered and difficult to navigate.


Case Study Shortcut


Challenge


icon

Modular Parsing Service


Developed a sophisticated document parsing module capable of identifying and extracting contextual information from several different categories of visual data—including Gantt charts, flowcharts, and technical engine diagrams—with layout-aware processing that handles even the most unusual document structures.

icon

Power Search Engine


A dedicated search service that goes far beyond keyword matching, surfacing exact text fragments matched to a user’s query and enabling document-level conversations with pinpoint precision.

icon

Agentic RAG Service


A standalone multi-agent reasoning module capable of breaking down complex queries into sub-steps, expanding acronyms, performing calculations, and synthesizing answers across multiple sources—with full awareness of context and a hard boundary against hallucination.

icon

Excel Integration Module


A purpose-built integration service for structured data, enabling complex cross-referencing, comparison, and reasoning across multiple Excel spreadsheets alongside unstructured PDF content.

Goal


The primary goal was to drastically reduce the time employees spend searching for critical information by building a centralized, intelligent interface that deeply understands the company’s internal knowledge base.

From the outset, the client required a solution that went far beyond standard semantic search. The platform needed deep customization to handle a long tail of edge cases inherent in complex technical documentation—and it needed to do so reliably, at enterprise scale, with zero tolerance for fabricated or misleading answers.

This last point was perhaps the most demanding engineering challenge of the entire project. Large language models are non-deterministic by design and inherently tend toward “average” outputs—producing responses that sound plausible but may blend, omit, or subtly distort information. In a domain where a single misread damage report or incorrectly attributed engine fault could have serious operational consequences, that tendency had to be systematically identified, constrained, and eliminated at every layer of the architecture.

The platform was therefore built to be fully context-aware at all times: rather than relying on a model’s pre-trained assumptions, every response is grounded in explicitly retrieved, traceable source material.


  • Ensuring Data Security: Keeping all enterprise data and LLM operations strictly within a secure European Microsoft Azure data center.

  • Handling Complex Document Layouts: Accurately processing multi-column PDFs containing varied fonts, quotes, and embedded images into a unified, readable format—including the countless layout edge cases that standard parsers fail silently on.

  • Bridging Text and Visuals: Making images and diagrams fully searchable by extracting their contents and linking them intelligently with the surrounding text context.

  • Enabling Multi-Step Reasoning Without Hallucination: Allowing the system to decompose complex user queries, invoke the appropriate service modules, and synthesize accurate answers from verified sources—never from the model's internal, potentially stale or fabricated knowledge.

  • Integrating Structured Data: Building a dedicated Excel integration module capable of analyzing, comparing, and connecting data across multiple complex spreadsheets in conjunction with unstructured document sources.

  • Designing for the Future: Architecting every service—parsing, power search, agentic RAG, and Excel integration—as a modular, loosely coupled component, so the platform can be extended with new capabilities as the organization's use cases mature and expand.

Outcome


Employees can now use natural language to instantly query thousands of historical damage reports, technical diagrams as well as HR policies, and financial information, turning hours of manual document scanning into a process that takes mere seconds.

Critically, the platform delivers those answers with full contextual grounding—users can see exactly which source documents and text fragments informed each response, eliminating the risk of acting on a hallucinated or out-of-context result. The modular architecture means that as new departments begin using the platform and new use cases surface, individual service modules can be upgraded or extended without disrupting the rest of the system.

The solution also ensures that all workflows remain fully compliant with enterprise security standards, providing a secure, controllable alternative to public tools like ChatGPT.



Before


  • Manual search through thousands of scattered, 30-page PDFs to find historical engine issues.
  • Images, flowcharts, and technical diagrams were ignored by text-based search systems.
  • Fear of data leaks prevented the use of public LLMs for daily tasks like summarizing emails.
  • Standard searches failed when queries required math, acronym expansion, or combining multiple documents.
  • Any new use case required a bespoke solution built from scratch.


After


  • Instant, structured summaries of root causes and damaged parts based on natural language queries.
  • Visual data is categorized into ~20 types and fully searchable, with text and image context intelligently linked.
  • A secure, internal Azure environment ensures enterprise data never leaves the isolated infrastructure.
  • Agentic RAG dynamically invokes tools to calculate, expand, and synthesize—always from verified sources, never from model assumptions.
  • Modular architecture allows new capabilities to be plugged in as use cases emerge without rearchitecting the platform.

Integrate those solutions in your company


Contact below and let us design and integrate solutions tailored to your business needs


Let's talk

Case Study Details


Approach


Parsing Module — Taming Document Complexity


  • The parsing service is the foundation of the entire platform. It classifies images into approximately 20 distinct categories to apply the most appropriate extraction method for each—for example, reading a flowchart in its correct directional sequence rather than treating it as a flat image. These categories were intentionally designed around the types of documents most frequently encountered in the client’s environment, with particular emphasis on technical documentation formats, ensuring the system could accurately interpret the structures and visual conventions typical of such materials. Handling the full spectrum of edge cases was non-trivial: real-world technical documents contain irregular multi-column layouts, mixed fonts, embedded diagrams at unusual orientations, and structural inconsistencies that cause standard parsers to silently drop or misrepresent content. Every such edge case was catalogued and explicitly addressed, because any gap at the parsing layer propagates as misinformation through every layer above it.

Single-Chunk Strategy for Critical Documents


  • Standard RAG systems split documents into small chunks, which scatters context and creates ambiguity when queries span multiple passages. For high-stakes documents like engine damage reports, the parsing module instead extracts all key information into a single, richly structured chunk. This preserves the full context of each report—root cause, timeline, affected components—so the model is never reasoning from a fragment when it needs the whole picture.

Power Search Engine — Transparent Retrieval


  • The power search module provides an AI search experience designed as the first step in a two-stage workflow. Using AI-powered retrieval, it identifies the documents most relevant to a user’s query while showing exactly which text fragments matched the search, allowing employees to quickly validate why specific results were returned. Once the relevant documents are identified, users can move to the second step—conducting deeper analysis on selected materials, such as extracting data, comparing information across sources, or exploring them further through conversational AI.

Agentic RAG Service — Reasoning Without Hallucination


  • This is the most architecturally complex module, and the one where the risk of hallucination was highest. Rather than allowing the model to answer immediately from its parametric memory, the agentic reasoning engine first decomposes the user's query into a structured plan of sub-queries. Each sub-query is routed to the appropriate tool—document retrieval, acronym expansion, mathematical calculation, or cross-document synthesis—and only verified, retrieved content is passed back to the model for final answer composition. The model's own pre-trained knowledge is treated as inadmissible. This design directly counteracts the LLM's natural tendency to produce fluent but "averaged" responses that blend information across contexts in ways that may be subtly wrong.

Excel Integration Module — Working with Structured Data


  • Structured data stored in spreadsheets comes with different challenges than unstructured content like PDFs, so we built a separate module to handle it. This service allows the platform to compare, aggregate, and cross-reference data across multiple Excel sheets, and combine those insights with information pulled from documents—something that general-purpose tools often struggle to do reliably.

Technology


Microsoft Azure OpenAI

Microsoft Azure OpenAI

OpenAI GPT Models

OpenAI GPT Models

Our team





Our Team Expert Opinion




The hardest part wasn't building the AI—it was making it right. Anyone can wire up a language model and demo it on clean data. The real work is in the edge cases: the malformed PDFs, the legacy system that speaks a protocol nobody remembers, the diagram that breaks every assumption your parser was built on. We don't come in to execute a spec, we come in to understand how a business actually operates, where its knowledge lives, where it gets lost, and then build something that survives contact with that reality. The modular architecture wasn't a technical preference, it was a strategic one—we knew that what the client needed on day one would look different from what they'd need in year two.


Bartłomiej Grasza Principal AI Engineer – Addepto

Take the next step


Schedule an intro call to get know each other better and understand the way we work


Let's talk

About Addepto


About us


We are recognized as one of the best AI, BI, and Big Data consultants


We helped multiple companies achieve their goals, but - instead of making hollow marketing claims here - we encourage you to check our Clutch scoring.

Our customers love to work with us

Let's discuss
a solution
for you



Edwin Lisowski

will help you estimate
your project.













Required fields

For more information about how we process your personal data see our Privacy Policy





Message sent successfully!