Meet ContextCheck: Our Open-Source Framework for LLM & RAG Testing! Check it out on Github!

in Blog

March 27, 2025

Leveraging Knowledge Graphs with LLMs: A Business Guide to Enhanced Decision-Making

Author:




Artur Haponik

CEO & Co-Founder


Reading time:




21 minutes


Today, businesses rely heavily on data to guide their decisions and create new ideas. But with so much unorganized data out there, it can be hard to find useful insights. Knowledge graphs and large language models (LLMs) have emerged as valuable tools that enable businesses to transform complex data into easy-to-use actionable insights.

Knowledge graphs organize data into connected networks, linking related concepts and ideas. Large language models (LLMs), on the other hand, process data and generate text resembling natural language. When businesses merge knowledge graphs with large language models (LLMs), they are able to uncover hidden patterns in their data that will help them make better business decisions.

This article explores how combining knowledge graphs with large language models (LLMs) can improve data processing and decision-making.

AI-Consulting-CTA

What are knowledge graphs and why are they important?

A knowledge graph is like an intelligent map that displays real-world entities and how they relate to each other. Knowledge graphs are usually stored in graph databases, which are the ideal storage solution because they are perfect for keeping track of the relationships between different data entities. Entities can be anything from objects and events to concepts and situations. The connections between entities in knowledge graphs demonstrate how they relate to each other within specific contexts.

Why knowledge graphs matter for your business

  1. Combine siloed data sources: Knowledge graphs consolidate separate data silos to deliver an all-encompassing overview of your data. This means you can see not just what’s happening within individual departments but also how everything connects across different teams and global organizations.
  2. Combine structured and unstructured data: Knowledge graph technology delivers superior data management services which outperform conventional knowledge management systems. Using this technology lets you establish relationships among various data formats which helps you identify intricate patterns concealed in the information.
  3. Help business leaders to make more informed decisions: A knowledge graph eliminates the need to search through stacks of paperwork when you want to find particular information. These graphs give you relevant and contextual answers that match your questions instead of overwhelming you with irrelevant search results. They also provide a structured view of how everything is interconnected, making it easy to understand how everything relates to one another on a larger scale.
  4. Visualization of knowledge flow: A knowledge graph creates an information network that illustrates the relationships between different data entities. This ability to visualize information makes them great tools for business workflow monitoring because they help identify problem areas and detect patterns over time.
  5. Enabling real-time data analysis: Knowledge graphs are great tools for real-time data analysis. They keep track of relationships and interactions, allowing them to reflect changes in data patterns and trends almost instantly. This means organizations can respond quickly to shifting circumstances, making informed decisions based on the latest information.

Building knowledge graphs with LLMs: Methods and techniques

Traditional natural language processing methods were used to construct knowledge graphs before the emergence of modern large language models (LLMs). This involved three main steps:

  • Named entity recognition (NER)
  • Entity linking
  • Relation extraction (RE)

These techniques mostly used part-of-speech tagging, thorough text preprocessing procedures, and heuristic rules to capture meanings and relationships in datasets. While they got the job done, they were very labor-intensive. Fast forward to today, and the process has undergone complete transformation through the use of instruction fine-tuned large language models (LLMs). Businesses can now automate knowledge graph creation by dividing text into smaller segments and using LLMs to extract entities and relationships based on user prompts.

That said, creating strong and accurate LLM-based knowledge graphs still demands careful consideration of some key factors:

  • Schema or ontology definition: The relationship structure between data elements needs to be customized according to the specific application area or industry requirements. A schema or ontology serves as the framework which establishes formal guidelines to structure the graph.
  • Entity consistency: It’s important to maintain consistent entity representation to avoid duplications and inconsistencies. For instance, names like America, USA, US and United States must all point to the exact reference location.
  • Enforced structured output: It’s crucial for LLM outputs to follow a set structure to ensure usability. There are two primary methods to achieve this:
    • Post-processing; When the LLM output fails to meet format requirements, then fine-tuning is necessary to adjust the responses.
    • Using JSON mode or function calling: Certain large language models (LLMs) include features which enable users to restrict the output format to specific formats like JSON. When this native support isn’t available, you can fine-tune the model to generate JSON outputs through ongoing instruction-based training.

After evaluating these elements and fine-tuning models appropriately companies can deploy LLM-generated knowledge graphs to construct data representations that are accurate and scalable.

A close-up shot shows a person's hands pointing at a computer screen. Digital icons, including a checklist, a checkmark, and a team of people, overlay the screen, suggesting a focus on collaboration and task management.

Methods and techniques for building knowledge graphs

The first phase of knowledge graph creation with LLMs involves collecting unstructured data from multiple sources, including articles and reports. The unstructured data functions as the primary source from which you will extract meaningful insights.

Next, you want to use Large Language Models (LLMs) to identify key entities, including people, organizations, locations, and their relationships. This gives you a structured representation of information. In the graph construction phase, you will organize the extracted entities and relationships into a structured knowledge graph format with tools like Neo4j. Here’s a detailed overview of the whole process:

Step 1: Setting Up the Neo4j Environment

We need a storage solution to store and visualize connections between various elements. Neo4j is excellent for this purpose since it specializes in graph databases.

Neo4j offers two setup options:

  1. You can use Neo4j Aura, a free database service that runs on cloud servers.
  2. You can also install Neo4j Desktop (Local Instance) directly onto your computer.

We connect to Neo4j using the Neo4jGraph module in LangChain.

How to connect to Neo4j

from langchain_community.graphs import Neo4jGraph

graph = Neo4jGraph(
    url="bolt://54.87.130.140:7687",
    username="neo4j",
    password="cables-anchors-directories",
    refresh_schema=False
)

Step 2: Set Up the LLM Graph Transformer

An LLM graph transformer is a tool that helps extract meaningful data (like entities and relationships) from plain text using a Large Language Model (LLM). The LLM Graph Transformer converts text into a structured knowledge graph using two modes:

  1. Tool-Based Mode (Default): Uses structured knowledge to extract entities and relationships.
  2. Prompt-Based Mode (Fallback): Uses text-based extraction if external tools are not available.

That said, we are going to use a tool-based approach for extraction since it minimizes the need for extensive prompt engineering and custom parsing functions.

We start by defining a Node class. Nodes are things in the graph, like people, places, organizations, awards, etc. We must define them to standardize how they are stored in our program.

class Node(BaseNode):
    id: str = Field(..., description="Name or human-readable unique identifier")
    label: str = Field(..., description=f"Available options are {enum_values}")
    properties: Optional[List[Property]]
  • id represents the unique identifier for the node (e.g., “Marie Curie”).
  • Label is the category/type of node (e.g., “Person” or “Award”).
  • Properties are additional details like birthdate, occupation, etc.

Step 3: Defining the relationship class

Relationships connect two nodes and define how they are related. We need to define relationship class to ensure all relationships follow a standard structure.

class Relationship(BaseRelationship):
    source_node_id: str
    source_node_label: str = Field(..., description=f"Available options are {enum_values}")
    target_node_id: str
    target_node_label: str = Field(..., description=f"Available options are {enum_values}")
    type: str = Field(..., description=f"Available options are {enum_values}")
    properties: Optional[List[Property]]

Each relationship has the following:

  • Source_node_id: Represents the starting point of the relationship (e.g., “Marie Curie”).
  • Source_node_label: The type of the source node (e.g., “Person”).
  • Target_node_id: The endpoint of the relationship (e.g., “Nobel Prize”).
  • Target_node_label: The type of the target node (e.g., “Award”).
  • Type: The relationship name (e.g., “WON”).

Step 4: Defining properties for nodes and relationships

Properties are extra details about nodes and relationships.

Example:
Node : Marie Curie → property: { “birth year”: 1867 }
Relationship: Marie Curie WON Nobel Prize → property: {“year”: 1903}
class Property(BaseModel):
“””A single property consisting of key and value”””
key: str = Field(…, description=f”Available options are {enum_values}”)
value: str
Key represents the name of the property e.g., birthyear.
Value is the property value (e.g., “1867”).

Step 5: Defining the graph schema

The graph schema serves as a guide for generative AI to identify which nodes and relationships should be extracted. Node type examples include person, organization, award. On the other hand, relationship types could be things like ‘won’, ‘for’, or ‘works.’ The schema serves as a blueprint, ensuring consistent information retrieval by the LLM.

Example Input:
text = “””
Marie Curie, 7 November 1867 – 4 July 1934, was a Polish and naturalized-French physicist and chemist who conducted pioneering research on radioactivity.
She was the first woman to win a Nobel Prize, the first person to win a Nobel Prize twice, and the only person to win a Nobel Prize in two scientific fields.
“””
documents = [Document(page_content=text)]

Step 6: Using GPT-4o for extraction

GPT-4o is a powerful generative AI model that helps in information retrieval from text. We need to set up GPT-4o and connect it to our LangChain pipeline to be able to extract the information we want.

How do we connect it?

from langchain_openai import ChatOpenAI
import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI api key")
llm = ChatOpenAI(model='gpt-4o')
  • We set up an OpenAI API key to use GPT-4o.
  • We specify the model name (gpt-4o).

Step 7: Extracting the graph without a schema

Next, we test the generative AI’s ability to extract relationships without defining strict rules. We are going to use LLMGraphTransformer to process the text.

from langchain_experimental.graph_transformers import LLMGraphTransformer
no_schema = LLMGraphTransformer(llm=llm)

Now we can process the documents using the aconvert_to_graph_documents function.

data = await no_schema.aconvert_to_graph_documents(documents)

The extracted graph should consist of:

Nodes (Entities)

[
Node(id=”Marie Curie”, type=”Person”, properties={}),
Node(id=”Pierre Curie”, type=”Person”, properties={}),
Node(id=”Nobel Prize”, type=”Award”, properties={}),
Node(id=”University Of Paris”, type=”Organization”, properties={}),
Node(id=”Robin Williams”, type=”Person”, properties={}),
]

Relationships

[
Relationship(source=”Marie Curie”, target=”Nobel Prize”, type=”WON”),
Relationship(source=”Marie Curie”, target=”University Of Paris”, type=”PROFESSOR”),
]

We can then use the Neo4j Browser to visualize the outputs, providing a clearer and more intuitive understanding of the data.

Integrating LLMs with knowledge graphs: Use cases and best practices

While LLMs do a great job at producing language-based content, they fall short in delivering precise contextual output. When faced with a complex inquiry such as “What were the key contributions of Adam Smith to economics?”, a large language model will often deliver a generalized response based on its training. The response might lack important details, such as the exact dates of his contributions and the impact of his work.

On the other hand, knowledge graphs store information in structured formats but often do not have natural language processing abilities that produce natural language.

By bringing large language models (LLMs) and knowledge graphs together, we can leverage the best of both worlds—natural language and well-structured knowledge organization. This powerful combination allows systems to understand complex questions and deliver precise, context-aware answers.

The process of joining large language models (LLMs) with knowledge graphs requires multiple distinct actions. Here’s a simplified overview of the process:

  • Query Understanding: The large language model analyzes the user’s query to understand its meaning and context. The process involves breaking down the question into small parts and then pinpointing the primary entities and the relationships that connect them.
  • Knowledge Graph Access: After gaining an understanding of the user query the large language model interacts with the knowledge graph to find relevant information. The retrieval process might involve finding nodes and edges within the knowledge graph that correspond to the entities cited in the user’s query.
  • Contextualization: The LLM generates a response based on the retrieved information from the knowledge graph. The model leverages its language generation abilities alongside structured knowledge from the knowledge graph to deliver coherent and contextually precise answers.
  • Response Generation: The LLM produces a response that resembles human language while incorporating factual accuracy and contextual depth from the knowledge graph.

Applications of LLM and knowledge graph integration

Combining LLMs with Knowledge Graphs unlocks a variety of applications across multiple industries:

  • Healthcare: Large language models (LLMs) and knowledge graph integration can help doctors and researchers get accurate answers to complex medical questions. It does this by combining clinical study data from knowledge graphs with general medical information from LLMs.
  • Education: Educational platforms can use this technology to help students understand complex topics better by providing clear explanations, accurate information, and relevant examples.
  • Customer Support: Businesses achieve better customer support by merging knowledge graphs with LLMs to provide context-aware and precise answers to customer questions.
  • Content Creation: Content creators and writers can utilize these integrated systems to produce content that’s both creative and factually correct.

Best practices for integrating knowledge graphs with LLMs

The combination of knowledge graphs and large language models enhances your ability to understand data while improving decision-making processes and automating tasks. Here are some best practices to follow:

  1. Define clear objectives: The first thing to do is decide what you want to achieve with the integration. Do you want to use it to improve search results, achieve smarter recommendations or for accurate data analysis?
  2. Ensure high-quality data: The effectiveness of a knowledge graph depends entirely on the quality of its underlying data. To achieve accurate and reliable outcomes, the data should be clean, structured, and up to date.
  3. Use the knowledge graph for context: LLMs produce text content that sometimes lacks accuracy. Connecting them with a knowledge graph provides verified and structured knowledge, leading to improved results.
  4. Optimize for Scalability: Your system should be able to process larger volumes of data without experiencing performance degradation as data expands. As such, you should choose flexible and scalable technologies.
  5. Fine-Tune LLMs with Domain-Specific Knowledge: Industry-specific knowledge graphs allow you to tailor LLMs which results in more relevant and accurate responses.
  6. Maintain Explainability and Transparency: Users should understand how decisions are made. A knowledge graph with proper organization assists in generating clear explanations for responses created by LLMs.
  7. Monitor Performance and Improve: Continuously assess your system to verify its alignment with your business requirements. Make necessary modifications to achieve improved outcomes.

Automating knowledge graph creation and validation with LLMs

To create a knowledge graph from scratch, you need to put in some manual work and have specialized domain knowledge. However, you can make work easier by using LLMs to automate the following processes:

  • Ontology Creation: The ontology creation process basically entails defining knowledge graph structures such as categories, attributes, and relationships.
  • Data Extraction: Identifying entities and relationships from unstructured data.
  • Validation & Refinement: Ensuring extracted data follows the ontology.

Step 1: Automatically creating ontologies with LLMs

An ontology defines the structure of a knowledge graph by specifying:

  • Classes e.g., Patient, Doctor, Condition.
  • Data Properties e.g., name, age.
  • Object Properties e.g., treated by.

Instead of manually writing the ontology, we use an LLM to generate it from a simple text description of the domain. LLM will then give an output of the ontology in OWL (XML) format

Example input

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

llm = ChatOpenAI(
    model="gpt-4o",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
    # api_key="...", # if you prefer to pass api key in directly instead of using env vars
    # base_url="...",
    # organization="...",
    # other params...
)

# Define system prompt
system_prompt = """

You are an expert in ontology engineering. Generate an OWL ontology based on the following domain description:

Define classes, data properties, and object properties.
Include domain and range for each property.
Provide the output in OWL (XML) format.”””
# Function to generate ontology
def generate_ontology(domain_description):
prompt = f”Domain description: {domain_description}\nGenerate OWL ontology.”
response = llm.invoke([(
“system”, system_prompt
),
(“human”, prompt),])
return respone.content

Example Output (Excerpt in OWL format):

xml
<owl:Class rdf:ID=”Patient”/>
<owl:Class rdf:ID=”Condition”/>
<owl:Class rdf:ID=”Doctor”/>
<owl:DatatypeProperty rdf:ID=”hasName”>
<rdfs:domain rdf:resource=”#Patient”/>
<rdfs:range rdf:resource=”&xsd;string”/>
</owl:DatatypeProperty>
<owl:ObjectProperty rdf:ID=”treatedBy”>
<rdfs:domain rdf:resource=”#Patient”/>
<rdfs:range rdf:resource=”#Doctor”/>
</owl:ObjectProperty>

Step 2: Importing and validating the ontology

After generating the ontology, the next step is to import it into your system and check its structure.

Importing an OWL ontology

from owlready2 import get_ontology

# Load dynamically generated ontology
ontology_path = "healthcare.owl"  # Replace with the OWL file path
ontology = get_ontology(ontology_path).load()

# Print ontology structure
print("Classes:")
for cls in ontology.classes():
    print(cls)

print("\nProperties:")
for prop in ontology.properties():
    print(f"{prop}: Domain={prop.domain}, Range={prop.range}")

Example Output:
Classes:
healthcare.owl.Patient
healthcare.owl.Condition
healthcare.owl.Doctor
Properties:
healthcare.owl.hasName: Domain=[healthcare.owl.Patient], Range=[xsd:string]
healthcare.owl.treatedBy: Domain=[healthcare.owl.Patient], Range=[healthcare.owl.Doctor]

Step 3: Extract RDF triples from text

Next, you want to extract structured RDF triples from unstructured data using LLMs. LLM reads text and extracts entities & relationships. It then outputs RDFs triples in turtle format.

Extracting RDF triples

from rdflib import Graph

# Function to generate RDF triples using LLM
def generate_rdf_triples(text_input, ontology_schema):
    system_prompt = f"""
    Extract RDF triples from the following text in Turtle format, adhering to the ontology:
    - Patient: hasName, hasAge, hasGender, hasCondition, treatedBy.
    - Doctor: hasName.
    - Condition: hasName.
    Ontology: {ontology_schema}
    """
    user_prompt = f"Text: {text_input}\nGenerate RDF triples in Turtle format."
    response = llm.invoke([
        ("system", system_prompt),
        ("human", user_prompt),
    ])
    return response.content

Step 4: Validate and Refine RDF Data

You should validate the extracted RDF triples to ensure they match the ontology.

Example of validation issues include:

  • If a property expects a number but gets a string, it’s an error.
  • If an undefined property appears, it’s an error.

Code to Validate RDF Data

def validate_rdf(rdf_data, ontology):
    g = Graph()
    g.parse(data=rdf_data, format="turtle")
    errors = []
    for s, p, o in g:
        prop_name = p.split("#")[-1]
        ontology_prop = getattr(ontology, prop_name, None)
        if not ontology_prop:
            errors.append(f"Property '{prop_name}' not found in ontology.")
        elif isinstance(o, str) and xsd:string not in ontology_prop.range:
            errors.append(f"Range Error: {p} expects {ontology_prop.range}, but found a string.")
    return errors

# If validation fails, refine triples
def refine_rdf(rdf_data, feedback):
    refinement_prompt = f"""
    The following RDF output has errors:
    {rdf_data}
    Errors: {feedback}
    Refine the RDF triples to fix these issues while adhering to the ontology schema.
    """
    response = llm.invoke([
        ("system", system_prompt),
        ("human", refinement_prompt),
    ])
    return response.content

Step 5: Fine tuning RDF data automatically

If you find any errors, you can ask the LLM to fix them.

Code to fix errors in RDF Data

# Function to refine RDF data
def refine_rdf(rdf_data, feedback):
refinement_prompt = f”””
The following RDF has errors:
{rdf_data}
Errors: {feedback}
Refine the RDF triples to fix these issues while following the ontology.
“””
response = llm.invoke([(“system”, system_prompt), (“human”, refinement_prompt)])
return response.content
# Fix errors in RDF triples
if errors:
refined_rdf = refine_rdf(rdf_triples, errors)
print(“Refined RDF Data:”, refined_rdf)

Real-world applications of knowledge graphs in different industries

Knowledge graphs are transforming data management across industries by structuring complex information into interconnected networks. Here are some key real-world applications of knowledge graphs in various sectors:

Medicine

Textual medical knowledge plays an important role in healthcare information systems. Therefore, there have been efforts in integrating textual medical knowledge into knowledge graphs, for the purpose of enhancing information retrieval and inference-based reasoning. A great real-life example is Mayo Clinic & IBM Watson. They have worked on knowledge graph-powered AI systems to assist in clinical diagnosis and treatment planning. [1]

Cybersecurity

KGs can enhance cybersecurity by providing context information useful to detect and predict dynamic attacks and safeguard people’s cyber assets. Microsoft uses knowledge graphs in Azure Sentinel to correlate security data, detect threats, and automate response actions. [2]

Finance

You can build an enterprise knowledge graph by collecting news about different companies, and mapping business relationships between related stocks. By combining this with news sentiments about connected stocks, you can better predict stock price movements. For instance, Bloomberg’s KG-powered Terminal links news articles, stock prices, and financial data to uncover relationships between companies, industries, and economic events [3]. This helps traders and investors make data-driven decisions.

An eye-level close-up photo shows hands typing on a laptop keyboard. Surrounding the laptop are icons related to online communication. The main icon, a padlock, is centrally placed on the laptop's surface, indicating a focus on internet security.

The future of knowledge graphs and LLMs: Trends and predictions

The rise of hybrid generative AI systems

Knowledge graphs (KGs) combined with Large Language Models (LLMs) are defining future AI-driven solution development. LLMs demonstrate strong capabilities in understanding and producing natural language, but face difficulties maintaining factual accuracy and structured reasoning. On the other hand, knowledge graphs deliver structured knowledge that can enhance the outputs generated by LLMs. Future generative AI systems will likely see deeper integration of these technologies, leading to the creation of more reliable and contextually smart systems.

Automated knowledge graph construction

The main obstacle in developing knowledge graphs lies in the extensive manual work needed to create ontologies, extract entities, and map relationships. In the future, the development of knowledge graphs will be faster and more accurate, thanks to LLM technologies. Generative AI-driven methods will enable automatic extraction and validation of knowledge from massive datasets which will allow for dynamic knowledge graph updates without requiring human inputs. As a result, knowledge graphs will be adaptable, scalable, and efficient for real-world applications.

Improved reasoning and explainability

Current LLMs produce answers based on probabilistic predictions, which frequently cause hallucinations or incorrect results. Integrating KGs into generative AI systems allows responses to be based on structured knowledge which enhances factual accuracy. As generative AI becomes more widely used, people will need to understand how it makes decisions. Knowledge graphs will play a key role in making generative AI more transparent by allowing users to trace answers back to trusted data sources.

Industry-specific knowledge graphs

As more industries adopt AI, we’ll see a rise in knowledge graphs built for specific fields like healthcare, finance, law, and cybersecurity. These graphs help organize complex information, making generative AI smarter and more accurate. In the future, industry-specific knowledge graphs will become even more common, improving things like medical diagnoses and financial risk analysis by providing clear, structured knowledge.

Real-time knowledge updating

Static knowledge graphs struggle to adapt to rapidly evolving data. Future AI systems will incorporate live updates to their knowledge graphs which will allow language large models to use the most current data. Real-time updates to knowledge graphs will be quite beneficial for fields like news aggregation, market analysis, and fraud detection because using outdated data in these fields can result in incorrect findings.

AI agents with long-term memory

Integrating large language models (LLMs) with knowledge graphs enables AI systems to develop long-term memory capabilities. Future AI models will learn continuously by interacting with structured knowledge graphs instead of depending on static training datasets like traditional LLMs. As such, AI agents will maintain contextual awareness throughout their operations which will enhance their performance in roles requiring historical data recognition such as customer service tasks, legal research projects, and scientific exploration.

The role of knowledge graphs in AI regulation

Knowledge graphs will play a big role in making AI more accountable as ethical concerns and governance demands increase. AI regulators and policymakers will use knowledge graphs to monitor AI decision-making, ensure compliance with legal standards, and reduce algorithmic biases.

The future of AI-powered search

Traditional search engines depend on keyword matching which produces results that lack deep contextual understanding. The combination of large language models (LLMs) with knowledge graphs will transform search capabilities by enabling semantic understanding and thus generating responses relevant to the user’s queries rather than just giving links to pages.

Wrapping up

When businesses combine knowledge graphs with LLMs, they can make sense of complex data and make smarter decisions. These tools help organize information and reveal important insights, giving companies a competitive advantage. Using this technology isn’t just about managing data, it’s about turning it into a valuable asset for growth and innovation.

 

 

Sources
[1] Mayoclinic.org, Mayo Clinic, and IBM Task Watson to Improve Clinical Trial Research
https://newsnetwork.mayoclinic.org/discussion/mayo-clinic-and-ibm-task-watson-to-improve-clinical-trial-research/?utm_source=chatgpt.com, Accessed on March 19, 2025
[2] Microsoft.com, Threat detection in Microsoft Sentinel
https://learn.microsoft.com/en-us/azure/sentinel/threat-detection, Accessed on March 19, 2025
[3] Bloomberglp.com, Getting Started Guide for Students,https://data.bloomberglp.com/professional/sites/10/Getting-Started-Guide-for-Students-English.pdf, Accessed on March 19, 2025
ChatGPT (code)



Category:


Generative AI

Artificial Intelligence