Author:
CEO & Co-Founder
Reading time:
Today, businesses rely heavily on data to guide their decisions and create new ideas. But with so much unorganized data out there, it can be hard to find useful insights. Knowledge graphs and large language models (LLMs) have emerged as valuable tools that enable businesses to transform complex data into easy-to-use actionable insights.
Knowledge graphs organize data into connected networks, linking related concepts and ideas. Large language models (LLMs), on the other hand, process data and generate text resembling natural language. When businesses merge knowledge graphs with large language models (LLMs), they are able to uncover hidden patterns in their data that will help them make better business decisions.
This article explores how combining knowledge graphs with large language models (LLMs) can improve data processing and decision-making.
A knowledge graph is like an intelligent map that displays real-world entities and how they relate to each other. Knowledge graphs are usually stored in graph databases, which are the ideal storage solution because they are perfect for keeping track of the relationships between different data entities. Entities can be anything from objects and events to concepts and situations. The connections between entities in knowledge graphs demonstrate how they relate to each other within specific contexts.
Traditional natural language processing methods were used to construct knowledge graphs before the emergence of modern large language models (LLMs). This involved three main steps:
These techniques mostly used part-of-speech tagging, thorough text preprocessing procedures, and heuristic rules to capture meanings and relationships in datasets. While they got the job done, they were very labor-intensive. Fast forward to today, and the process has undergone complete transformation through the use of instruction fine-tuned large language models (LLMs). Businesses can now automate knowledge graph creation by dividing text into smaller segments and using LLMs to extract entities and relationships based on user prompts.
That said, creating strong and accurate LLM-based knowledge graphs still demands careful consideration of some key factors:
After evaluating these elements and fine-tuning models appropriately companies can deploy LLM-generated knowledge graphs to construct data representations that are accurate and scalable.
The first phase of knowledge graph creation with LLMs involves collecting unstructured data from multiple sources, including articles and reports. The unstructured data functions as the primary source from which you will extract meaningful insights.
Next, you want to use Large Language Models (LLMs) to identify key entities, including people, organizations, locations, and their relationships. This gives you a structured representation of information. In the graph construction phase, you will organize the extracted entities and relationships into a structured knowledge graph format with tools like Neo4j. Here’s a detailed overview of the whole process:
We need a storage solution to store and visualize connections between various elements. Neo4j is excellent for this purpose since it specializes in graph databases.
Neo4j offers two setup options:
We connect to Neo4j using the Neo4jGraph module in LangChain.
How to connect to Neo4j
from langchain_community.graphs import Neo4jGraph graph = Neo4jGraph( url="bolt://54.87.130.140:7687", username="neo4j", password="cables-anchors-directories", refresh_schema=False )
An LLM graph transformer is a tool that helps extract meaningful data (like entities and relationships) from plain text using a Large Language Model (LLM). The LLM Graph Transformer converts text into a structured knowledge graph using two modes:
That said, we are going to use a tool-based approach for extraction since it minimizes the need for extensive prompt engineering and custom parsing functions.
We start by defining a Node class. Nodes are things in the graph, like people, places, organizations, awards, etc. We must define them to standardize how they are stored in our program.
class Node(BaseNode): id: str = Field(..., description="Name or human-readable unique identifier") label: str = Field(..., description=f"Available options are {enum_values}") properties: Optional[List[Property]]
Relationships connect two nodes and define how they are related. We need to define relationship class to ensure all relationships follow a standard structure.
class Relationship(BaseRelationship): source_node_id: str source_node_label: str = Field(..., description=f"Available options are {enum_values}") target_node_id: str target_node_label: str = Field(..., description=f"Available options are {enum_values}") type: str = Field(..., description=f"Available options are {enum_values}") properties: Optional[List[Property]]
Each relationship has the following:
Properties are extra details about nodes and relationships.
Example:
Node : Marie Curie → property: { “birth year”: 1867 }
Relationship: Marie Curie WON Nobel Prize → property: {“year”: 1903}
class Property(BaseModel):
“””A single property consisting of key and value”””
key: str = Field(…, description=f”Available options are {enum_values}”)
value: str
Key represents the name of the property e.g., birthyear.
Value is the property value (e.g., “1867”).
The graph schema serves as a guide for generative AI to identify which nodes and relationships should be extracted. Node type examples include person, organization, award. On the other hand, relationship types could be things like ‘won’, ‘for’, or ‘works.’ The schema serves as a blueprint, ensuring consistent information retrieval by the LLM.
Example Input:
text = “””
Marie Curie, 7 November 1867 – 4 July 1934, was a Polish and naturalized-French physicist and chemist who conducted pioneering research on radioactivity.
She was the first woman to win a Nobel Prize, the first person to win a Nobel Prize twice, and the only person to win a Nobel Prize in two scientific fields.
“””
documents = [Document(page_content=text)]
GPT-4o is a powerful generative AI model that helps in information retrieval from text. We need to set up GPT-4o and connect it to our LangChain pipeline to be able to extract the information we want.
How do we connect it?
from langchain_openai import ChatOpenAI import getpass import os os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI api key") llm = ChatOpenAI(model='gpt-4o')
Next, we test the generative AI’s ability to extract relationships without defining strict rules. We are going to use LLMGraphTransformer to process the text.
from langchain_experimental.graph_transformers import LLMGraphTransformer
no_schema = LLMGraphTransformer(llm=llm)
Now we can process the documents using the aconvert_to_graph_documents function.
data = await no_schema.aconvert_to_graph_documents(documents)
The extracted graph should consist of:
Nodes (Entities)
[
Node(id=”Marie Curie”, type=”Person”, properties={}),
Node(id=”Pierre Curie”, type=”Person”, properties={}),
Node(id=”Nobel Prize”, type=”Award”, properties={}),
Node(id=”University Of Paris”, type=”Organization”, properties={}),
Node(id=”Robin Williams”, type=”Person”, properties={}),
]
Relationships
[
Relationship(source=”Marie Curie”, target=”Nobel Prize”, type=”WON”),
Relationship(source=”Marie Curie”, target=”University Of Paris”, type=”PROFESSOR”),
]
We can then use the Neo4j Browser to visualize the outputs, providing a clearer and more intuitive understanding of the data.
While LLMs do a great job at producing language-based content, they fall short in delivering precise contextual output. When faced with a complex inquiry such as “What were the key contributions of Adam Smith to economics?”, a large language model will often deliver a generalized response based on its training. The response might lack important details, such as the exact dates of his contributions and the impact of his work.
On the other hand, knowledge graphs store information in structured formats but often do not have natural language processing abilities that produce natural language.
By bringing large language models (LLMs) and knowledge graphs together, we can leverage the best of both worlds—natural language and well-structured knowledge organization. This powerful combination allows systems to understand complex questions and deliver precise, context-aware answers.
The process of joining large language models (LLMs) with knowledge graphs requires multiple distinct actions. Here’s a simplified overview of the process:
Combining LLMs with Knowledge Graphs unlocks a variety of applications across multiple industries:
The combination of knowledge graphs and large language models enhances your ability to understand data while improving decision-making processes and automating tasks. Here are some best practices to follow:
To create a knowledge graph from scratch, you need to put in some manual work and have specialized domain knowledge. However, you can make work easier by using LLMs to automate the following processes:
An ontology defines the structure of a knowledge graph by specifying:
Instead of manually writing the ontology, we use an LLM to generate it from a simple text description of the domain. LLM will then give an output of the ontology in OWL (XML) format
Example input
from langchain_openai import ChatOpenAI from langchain_core.prompts import ChatPromptTemplate llm = ChatOpenAI( model="gpt-4o", temperature=0, max_tokens=None, timeout=None, max_retries=2, # api_key="...", # if you prefer to pass api key in directly instead of using env vars # base_url="...", # organization="...", # other params... ) # Define system prompt system_prompt = """
You are an expert in ontology engineering. Generate an OWL ontology based on the following domain description:
Define classes, data properties, and object properties.
Include domain and range for each property.
Provide the output in OWL (XML) format.”””
# Function to generate ontology
def generate_ontology(domain_description):
prompt = f”Domain description: {domain_description}\nGenerate OWL ontology.”
response = llm.invoke([(
“system”, system_prompt
),
(“human”, prompt),])
return respone.content
Example Output (Excerpt in OWL format):
xml
<owl:Class rdf:ID=”Patient”/>
<owl:Class rdf:ID=”Condition”/>
<owl:Class rdf:ID=”Doctor”/>
<owl:DatatypeProperty rdf:ID=”hasName”>
<rdfs:domain rdf:resource=”#Patient”/>
<rdfs:range rdf:resource=”&xsd;string”/>
</owl:DatatypeProperty>
<owl:ObjectProperty rdf:ID=”treatedBy”>
<rdfs:domain rdf:resource=”#Patient”/>
<rdfs:range rdf:resource=”#Doctor”/>
</owl:ObjectProperty>
After generating the ontology, the next step is to import it into your system and check its structure.
Importing an OWL ontology
from owlready2 import get_ontology # Load dynamically generated ontology ontology_path = "healthcare.owl" # Replace with the OWL file path ontology = get_ontology(ontology_path).load() # Print ontology structure print("Classes:") for cls in ontology.classes(): print(cls) print("\nProperties:") for prop in ontology.properties(): print(f"{prop}: Domain={prop.domain}, Range={prop.range}")
Example Output:
Classes:
healthcare.owl.Patient
healthcare.owl.Condition
healthcare.owl.Doctor
Properties:
healthcare.owl.hasName: Domain=[healthcare.owl.Patient], Range=[xsd:string]
healthcare.owl.treatedBy: Domain=[healthcare.owl.Patient], Range=[healthcare.owl.Doctor]
Next, you want to extract structured RDF triples from unstructured data using LLMs. LLM reads text and extracts entities & relationships. It then outputs RDFs triples in turtle format.
Extracting RDF triples
from rdflib import Graph # Function to generate RDF triples using LLM def generate_rdf_triples(text_input, ontology_schema): system_prompt = f""" Extract RDF triples from the following text in Turtle format, adhering to the ontology: - Patient: hasName, hasAge, hasGender, hasCondition, treatedBy. - Doctor: hasName. - Condition: hasName. Ontology: {ontology_schema} """ user_prompt = f"Text: {text_input}\nGenerate RDF triples in Turtle format." response = llm.invoke([ ("system", system_prompt), ("human", user_prompt), ]) return response.content
You should validate the extracted RDF triples to ensure they match the ontology.
Example of validation issues include:
Code to Validate RDF Data
def validate_rdf(rdf_data, ontology): g = Graph() g.parse(data=rdf_data, format="turtle") errors = [] for s, p, o in g: prop_name = p.split("#")[-1] ontology_prop = getattr(ontology, prop_name, None) if not ontology_prop: errors.append(f"Property '{prop_name}' not found in ontology.") elif isinstance(o, str) and xsd:string not in ontology_prop.range: errors.append(f"Range Error: {p} expects {ontology_prop.range}, but found a string.") return errors # If validation fails, refine triples def refine_rdf(rdf_data, feedback): refinement_prompt = f""" The following RDF output has errors: {rdf_data} Errors: {feedback} Refine the RDF triples to fix these issues while adhering to the ontology schema. """ response = llm.invoke([ ("system", system_prompt), ("human", refinement_prompt), ]) return response.content
If you find any errors, you can ask the LLM to fix them.
Code to fix errors in RDF Data
# Function to refine RDF data
def refine_rdf(rdf_data, feedback):
refinement_prompt = f”””
The following RDF has errors:
{rdf_data}
Errors: {feedback}
Refine the RDF triples to fix these issues while following the ontology.
“””
response = llm.invoke([(“system”, system_prompt), (“human”, refinement_prompt)])
return response.content
# Fix errors in RDF triples
if errors:
refined_rdf = refine_rdf(rdf_triples, errors)
print(“Refined RDF Data:”, refined_rdf)
Knowledge graphs are transforming data management across industries by structuring complex information into interconnected networks. Here are some key real-world applications of knowledge graphs in various sectors:
Textual medical knowledge plays an important role in healthcare information systems. Therefore, there have been efforts in integrating textual medical knowledge into knowledge graphs, for the purpose of enhancing information retrieval and inference-based reasoning. A great real-life example is Mayo Clinic & IBM Watson. They have worked on knowledge graph-powered AI systems to assist in clinical diagnosis and treatment planning. [1]
KGs can enhance cybersecurity by providing context information useful to detect and predict dynamic attacks and safeguard people’s cyber assets. Microsoft uses knowledge graphs in Azure Sentinel to correlate security data, detect threats, and automate response actions. [2]
You can build an enterprise knowledge graph by collecting news about different companies, and mapping business relationships between related stocks. By combining this with news sentiments about connected stocks, you can better predict stock price movements. For instance, Bloomberg’s KG-powered Terminal links news articles, stock prices, and financial data to uncover relationships between companies, industries, and economic events [3]. This helps traders and investors make data-driven decisions.
Knowledge graphs (KGs) combined with Large Language Models (LLMs) are defining future AI-driven solution development. LLMs demonstrate strong capabilities in understanding and producing natural language, but face difficulties maintaining factual accuracy and structured reasoning. On the other hand, knowledge graphs deliver structured knowledge that can enhance the outputs generated by LLMs. Future generative AI systems will likely see deeper integration of these technologies, leading to the creation of more reliable and contextually smart systems.
The main obstacle in developing knowledge graphs lies in the extensive manual work needed to create ontologies, extract entities, and map relationships. In the future, the development of knowledge graphs will be faster and more accurate, thanks to LLM technologies. Generative AI-driven methods will enable automatic extraction and validation of knowledge from massive datasets which will allow for dynamic knowledge graph updates without requiring human inputs. As a result, knowledge graphs will be adaptable, scalable, and efficient for real-world applications.
Current LLMs produce answers based on probabilistic predictions, which frequently cause hallucinations or incorrect results. Integrating KGs into generative AI systems allows responses to be based on structured knowledge which enhances factual accuracy. As generative AI becomes more widely used, people will need to understand how it makes decisions. Knowledge graphs will play a key role in making generative AI more transparent by allowing users to trace answers back to trusted data sources.
As more industries adopt AI, we’ll see a rise in knowledge graphs built for specific fields like healthcare, finance, law, and cybersecurity. These graphs help organize complex information, making generative AI smarter and more accurate. In the future, industry-specific knowledge graphs will become even more common, improving things like medical diagnoses and financial risk analysis by providing clear, structured knowledge.
Static knowledge graphs struggle to adapt to rapidly evolving data. Future AI systems will incorporate live updates to their knowledge graphs which will allow language large models to use the most current data. Real-time updates to knowledge graphs will be quite beneficial for fields like news aggregation, market analysis, and fraud detection because using outdated data in these fields can result in incorrect findings.
Integrating large language models (LLMs) with knowledge graphs enables AI systems to develop long-term memory capabilities. Future AI models will learn continuously by interacting with structured knowledge graphs instead of depending on static training datasets like traditional LLMs. As such, AI agents will maintain contextual awareness throughout their operations which will enhance their performance in roles requiring historical data recognition such as customer service tasks, legal research projects, and scientific exploration.
Knowledge graphs will play a big role in making AI more accountable as ethical concerns and governance demands increase. AI regulators and policymakers will use knowledge graphs to monitor AI decision-making, ensure compliance with legal standards, and reduce algorithmic biases.
Traditional search engines depend on keyword matching which produces results that lack deep contextual understanding. The combination of large language models (LLMs) with knowledge graphs will transform search capabilities by enabling semantic understanding and thus generating responses relevant to the user’s queries rather than just giving links to pages.
When businesses combine knowledge graphs with LLMs, they can make sense of complex data and make smarter decisions. These tools help organize information and reveal important insights, giving companies a competitive advantage. Using this technology isn’t just about managing data, it’s about turning it into a valuable asset for growth and innovation.
Sources
[1] Mayoclinic.org, Mayo Clinic, and IBM Task Watson to Improve Clinical Trial Research
https://newsnetwork.mayoclinic.org/discussion/mayo-clinic-and-ibm-task-watson-to-improve-clinical-trial-research/?utm_source=chatgpt.com, Accessed on March 19, 2025
[2] Microsoft.com, Threat detection in Microsoft Sentinel
https://learn.microsoft.com/en-us/azure/sentinel/threat-detection, Accessed on March 19, 2025
[3] Bloomberglp.com, Getting Started Guide for Students,https://data.bloomberglp.com/professional/sites/10/Getting-Started-Guide-for-Students-English.pdf, Accessed on March 19, 2025
ChatGPT (code)
Category: