Creating Knowledge Graphs with Neo4j and Graphlit | Prescience Decision Solutions, a Movate company

Knowledge graphs have become an integral part of modern data management and information retrieval systems. Introduced in their current form in 2012 by Google, knowledge graphs represent a structured data model that depicts real-world entities as nodes and their relationships as edges, in a graph format.

Knowledge graphs are increasingly important in Generative Artificial Intelligence (AI) applications as they offer a robust framework for organizing and contextualizing domain-specific or proprietary company information. By capturing relationships and adding semantic layers to data, knowledge graphs enhance the accuracy, explainability, and reliability of AI systems. They play a crucial role in grounding Large Language Models (LLMs) for applications like enterprise search, enabling AI to deliver precise answers by leveraging structured and unstructured data connections. This capability is particularly valuable for industries aiming to break down data and improve decision-making processes. Generative AI systems powered by knowledge graphs are transforming fields such as fraud detection, supply chain management by providing deeper insights into complex data relationships.

In this article, we will take a closer look at how Knowledge Graphs can be created using Neo4j and Graphlit.

Components of Knowledge Graphs

The 3 key components of Knowledge Graphs are:

Entities (Nodes)
This can be a person, organization, label or concept. Each entity has properties and attributes.
Relationships (Edges)
These connect entities meaningfully. They are directional and labelled.
Properties
They describe characteristics of entities or store metadata about relationships.

Common Examples of Knowledge Graphs:

Consider Google’s knowledge graph, which has revolutionized how we search for information. When you search for “Steve Jobs,” Google doesn’t just match your keywords to web pages. Instead, it understands that Steve Jobs was a CEO of Apple and connects him to various Products and Regions. This allows Google to provide rich, contextual information in its search results, often answering your questions directly without requiring you to click through to websites.

Image: A Knowledge Graph of Steve Jobs

LinkedIn offers another compelling example of knowledge graphs. Their knowledge graph goes beyond simple professional connections as it is designed to understand the complex web of professional relationships. It comprehends how skills relate to jobs, how career paths typically progress, and how industry trends affect career opportunities. This deep understanding enables LinkedIn to provide sophisticated job recommendations and identify relevant professional connections that might not be obvious through traditional networking approaches.

Image: A Knowledge Graph of LinkedIn’s Network

Introduction to Neo4j

Neo4j is a leading open-source graph database management system designed to model, store, and analyse complex relationships between data points. Unlike traditional relational databases that organize data in tables with rows and columns, Neo4j uses a graph-based structure consisting of nodes, relationships (edges), and properties. This architecture makes it highly suitable for applications involving connected or semi-structured data.

Creating Knowledge Graphs with Neo4j

The different steps involved in creating Knowledge Graphs with Neo4j are:

Entity Identification
Begin by identifying different entities in your data source. This can be achieved using various libraries and strategies such as natural language processing, regular expressions, or domain-specific heuristics. The goal is to extract meaningful entities like people, organizations, locations, products, etc.
Defining Relationships
Once you have your entities, define how they interact with each other. This involves determining the nature of their relationships—whether hierarchical, associative, or causal. For example, a person might “work for” an organization, or a product might be “manufactured by” a company.
Data Modelling
Develop a graph data model where entities become nodes and relationships become edges. This model should accurately reflect the complexities and nuances of your data. It’s often an iterative process where you refine the schema to best represent the underlying information.
Data Ingestion
Import the structured data into Neo4j using appropriate tools. You can use CSV files with the Neo4j import tool or programmatically load data using libraries like py2neo or the official Neo4j-driver for your preferred programming language. Map your entities and relationships into nodes and edges based on your graph model.
Graph Validation
After loading your data, validate the graph by running Cypher queries. Check that relationships are correctly defined and that there are no orphaned nodes or inconsistent data. This helps ensure that your knowledge graph accurately represents the intended information.

Challenges of Knowledge Graphs With Neo4j

While creating knowledge graphs in Neo4j helps users accurately depicts entities and their corresponding relationships, there are certain challenges that users might encounter.

Handling Diverse Document Formats
Working with a wide array of document types (PDF, JSON, CSV, Excel, etc.) requires developing data pipelines that can handle various formats. Additionally, because the schema can vary and change across these formats, you must convert the data into a standardized, generic format suitable for integration into a knowledge graph.
Entity Extraction and Relevance
While building a knowledge graph, the primary goal is to extract meaningful entities and relationships. However, existing frameworks like spaCy might identify a broad set of entities, many of which are not relevant. This results in significant post-processing to filter out unnecessary data and isolate the entities that truly matter for your use case.
Domain Expertise Requirement
Creating accurate entities and mapping relationships effectively often requires deep domain expertise. Without subject matter experts guiding the extraction and mapping process, there is a risk of misinterpreting data, which can lead to irrelevant or inaccurate nodes and connections. Domain experts ensure that the knowledge graph truly reflects the nuances of the field, leading to a more reliable and actionable graph.
Lack of Automatic Knowledge Graph Creation
While Neo4j provides powerful tools for data import and modelling, it does not offer an out-of-the-box solution for automatic knowledge graph creation. Users must define their own schema, mapping rules, and data pipelines. This can be time-consuming and requires a thorough understanding of both the data and the domain to ensure that the resulting graph is meaningful and accurate.

Introduction to Graphlit

Graphlit is a serverless RAG-as-a-Service platform designed for developers creating AI applications that utilize both structured and unstructured data. It simplifies by managing the underlying infrastructure, which traditionally requires extensive integration of various components like vector databases and data pipelines.

Creating Knowledge Graphs with Graphlit

The different steps involved in creating Knowledge Graphs with Graphlit are:

Ingestion
Identifying the source content from its original location and caching it on cloud storage for later processing.
Indexing
Parsing the source content, and creating technical metadata which describes the content type, file type, file size, and specific properties such as document title, audio duration or image resolution. For some content types, this also includes creating a list of hyperlinks referenced by the source content.
Preparation
Preparing the cached content for processing by ML models or Application Programming Interfaces (APIs), by extracting text from documents or web pages, transcribing audio from media files, or resizing images.
Extraction
From the extracted document text or audio transcript, identifying named entities such as people, places or organizations, and connecting them with their source content in the Graphlit knowledge graph.
Enrichment
For extracted entities, enriching their metadata via 3rd party APIs such as diffbot, CrunchBase Wikipedia to provide more precise entity resolution and deduplication and additional structured data related to entities. For content, optionally ingesting linked content via link crawling.

Image: An overview of Graphlit

Benefits of Knowledge Graphs with Graphlit

Integrated Data Handling
Graphlit automates the ingestion of unstructured data from multiple sources, including websites, audio, video, and documents. It supports various formats and automatically transcribes audio content using advanced speech-to-text technology.
Knowledge Graph Creation
The platform builds a knowledge graph from ingested content, maintaining relationships and enabling semantic search capabilities. This allows for enhanced data retrieval and interaction with AI models.
Multi-Modal Support
Graphlit is LLM agnostic, supporting from providers like OpenAI and Anthropic.
User-Friendly API
It offers high-level APIs that facilitate rapid development, enable quicker prototyping and deployment of AI applications.

Conclusion

Working with Neo4j is ideal when users have complete domain knowledge on the use case and have a detailed understanding of the end output. In comparison, Graphlit stands out as a powerful end-to-end solution that revolutionizes how users can handle knowledge management and graph creation, which is very useful for rapid prototyping. As a comprehensive content management system, Graphlit streamlines the entire process from data ingestion to knowledge graph implementation. Its automated knowledge graph creation capabilities significantly reduce the manual effort typically required, while offering extensive customization options to meet specific business needs.

Shyam Krishna Kirithivasan

As a Machine Learning Engineer, I’m deeply passionate about building intelligent systems, experimenting with new algorithms, and exploring emerging technologies. Outside of work, I enjoy playing badminton and unwinding with a good movie.