CrimeKGQA: A Crime Investigation System Based
on Retrieval-Augmented Generation and
Knowledge Graphs
Ka Lok Kuok Hao Hui Liu Wai Weng Lo
Hou Kong Middle School Hou Kong Middle School Hou Kong Middle School
Macau, S.A.R., China Macau, S.A.R., China Macau, S.A.R., China
20112037@hkms.hktedu.com 20240370@hkms.hktedu.com 2109@hkms.hktedu.com
Abstract—In this paper, we introduce CrimeKGQA, an inves- data sources to gather and analyze information. Still, the huge
tigative system that integrates Large Language Models (LLMs) amount of crime data and the ever-increasing complexity of
with an improved Crime Knowledge Graph (KG) to enhance the forms of crime make traditional techniques inefficient and
accuracy and speed of crime investigation. Traditional LLMs are
often prone to producing inaccurate responses when challenged inaccurate. Thus, LLMs can aid in crime investigations, but
with specialized domain knowledge (e.g., sensitive crime data), as their capability to secure sufficiently accurate and trusted
such data typically belongs to specialized and privacy-sensitive support is constrained by the fact that they have limited
domains beyond what is included in their pre-training datasets. training data and rarely have proprietary crime data. They may
This can have dire consequences in some critical fields, such even produce fabricated data that, if perceived as true, could
as crime investigation. To address this issue, we create the
CrimeKGQA system, which combines the Retrieval Augmented mislead investigations and waste resources. [3].
Generation (RAG) framework with the Neo4j graph database. To address these issues, this work proposes an innova-
With this, CrimeKGQA generates precise Cypher queries on the tive crime investigation question-answering system, which we
crime investigation knowledge graph stored in Neo4j as well name CrimeKGQA. In this system, we combine Retrieval-
as dynamically retrieves and incorporates the corresponding
information to deliver high-quality and contextually grounded
Augmented Generation (RAG) [4] framework with a Neo4j-
answers. To the best of our knowledge, this is the first work based Knowledge Graph (KG) [5] to leverage the LLM appli-
that aims to exploit the potential of large language models cation effectiveness in the crime investigation. By leveraging
within a crime knowledge graph to build efficient and accurate LLM to parse complex natural language queries and integrat-
functionality for crime investigation. The experimental results ing with LLM on a structured knowledge graph, CrimeKGQA
show that the CrimeKGQA system can assist in answering
questions and aid the investigative process, validating the efficacy
can perform deep investigation via crime knowledge graph.
of our proposed method. The main contributions of this paper include:
Index Terms—Knowledge Graphs (KGs), Retrieval-Augmented
Generation Framework (RAG), Large Language Models (LLMs),
1) Combining GPT-4o LLM with Neo4j Knowledge
Crime Investigation Graph for Crime Investigation Q&A: To our knowl-
edge, this represents the first attempt to augment a large
I. I NTRODUCTION model with a crime knowledge graph in order to provide
In the field of Natural Language Processing (NLP), LLMs efficient and accurate crime investigation functionality.
such as ChatGPT and LLaMA have achieved incredible re- 2) Applying the Retrieval-Augmented Generation
sults. Based on deep learning technologies, these models use (RAG) Framework to a Crime Investigation
pretraining on large-scale text data, with excellent natural Q&A System: With the RAG framework, the LLM
language understanding and generation abilities. They excel dynamically fetches and merges data from the
in various tasks such as text generation, semantic under- knowledge graph to generate correct answers instead of
standing, machine translation, question answering, and text hallucination content.
summarization [1], [2]. Despite their exceptional performance 3) Utilizing an Interactive Visualization Framework
in various application contexts, LLMs have fundamental re- to Support Intuitive Crime Investigation Analysis:
strictions when it comes to handling proprietary or sensitive Offering a graphical data visualization tool to aid in-
data, especially in the field of crime investigation. vestigators in understanding and analyzing crime data
Crime investigation is a highly specialized task that requires smoothly.
extreme accuracy and demands analysts to analyze large The structure of this paper is organized as follows: In
amounts of data, perform pattern recognition, and engage in Section II, we introduce the background related to large lan-
complex reasoning. The traditional method of crime investi- guage models, knowledge graphs, and the retrieval-augmented
gation involves law enforcement collaborating with multiple generation framework. Related work is discussed in Section
III. The design and implementation of the CrimeKGQA system B. Knowledge Graphs
are described in Section IV. The experimental design of A graph data representation known as Knowledge Graphs
Section V is described, and the dataset we use is presented. (KGs) [8] is a method of modelling different entities and
Experimental results are presented in Section VI. To conclude their interrelationships between nodes (entities) and edges
the study, Section VII discusses what is left ahead. (relationships). Combining large storage power and the ca-
pabilities of knowledge reasoning and querying, KGs are
II. BACKGROUND
the natural choice for problem analysis and understanding
The rapidly evolving era of information technology and arti- complex relationships between entities like crime analysis,
ficial intelligence, particularly in Natural Language Processing medical diagnosis and recommendation systems. KGs not
(NLP), has made significant progress. Nowadays, Large Lan- only store a large amount of factual information but also
guage Models (LLMs) and Knowledge Graphs (KGs) have provide a capability for knowledge reasoning and querying,
reached a critical mass in their impact on innovation in several making them suitable for analyzing and understanding com-
applications. The potential application of these technologies is plex relationships between entities, such as crime analysis,
particularly prominent in the highly specialized and accuracy- medical diagnosis, and recommendation systems. KGs can
demanding field of crime investigation. In this section, the be used for analysing and investigating crime-related events
development status, the core principles and applications of (such as suspects, victims, crime locations and crime events)
these technologies to crime investigation are explored to and the relationships (involved, known) between them to
provide a theoretical foundation for the subsequent systems ensure precise knowledge support and reasoning for crime
design and implementation. investigation.
The application of KGs in crime investigation offers the
A. Large Language Models following advantages:
Deep learning-based models for natural language process- • Structured Representation: KGs can clearly represent
ing are referred to as Large Language Models (LLMs). By entities and their relationships through nodes and edges,
pretraining on massive text data, they can learn statistical providing clarity that may be lacking in other data orga-
properties and semantic information of language. Since BERT nization and management designs.
[6] and the GPT [7] series appeared, LLMs have performed • Efficient Querying and Reasoning: KGs offer multi-
very well in various NLP tasks including machine translation, level and multi-dimensional data analysis based on graph
text generation, question-answering system, and sentiment database query languages: the ability to find and quickly
analysis. GPT models are particularly popular due to their analyze complex relations between entities.
efficient parameter configurations and excellent performance. • Knowledge Expansion: The knowledge coverage, depth,
Transformer architecture is the core of LLMs, in which and support for the system can be continuously expanded
almost effectively the self-attention mechanism can capture the and enhanced as new data is added by the KGs and can
long-range dependency, making output contextually relevant constantly update (a property of statelessness).
and semantically coherent text. However, LLMs still face chal- • Semantic Connections: In KGs, entities and relation-
lenges when handling specialized domain knowledge, mainly ships have sharp semantic relation, which facilitates much
in the following aspects: more precise knowledge reasoning and semantic under-
• Limited Knowledge Coverage: In most cases, LLMs standing, and further improves the system’s intelligence
mainly rely on publicly accessible datasets that are level.
used by them in pre-training. Also, they normally don’t Among the most used graph databases, Neo4j [5] pro-
have adequate coverage and understanding of specialized vides excellent graph data storage, querying, and analysis
knowledge and proprietary data in specific domains like capabilities. and edges (relationships) [8]. It offers powerful
crime investigation. graph data storage, querying, and analysis capabilities. By
• Difficulties in Knowledge Updating: Because LLMs are using Cypher query language, it efficiently processes graph-
always being retrained to adapt to new knowledge and structured data, as well as supports complex graph query and
information that continuously come online over time, they analysis operations.
lose the timeliness of their knowledge. It is costly and
difficult to realise in real instances. C. Retrieval-Augmented Generation (RAG)
• Risk of Generating Hallucination Content: LLMs may The Retrieval-Augmented Generation (RAG) framework [4]
not produce accurate or even hallucination content in is an architecture that combines a retriever and a generator
those fields wherein their actions can have serious effects model to enhance the performance of generation models
such as in areas of crime investigation. in terms of answer accuracy and knowledge coverage. The
To overcome these limitations, researchers have proposed RAG framework dynamically retrieves relevant information
various methods to enhance the professional knowledge from external knowledge sources during answer generation,
and application capabilities of LLMs, including retrieval- enabling the generation model to utilize the latest, specialized,
augmented generation frameworks and fine-tuning. and structured knowledge, thereby reducing the generation of
fabricated content and improving the accuracy and reliability integrating the RAG framework with a crime knowledge graph
of answers. stored in Neo4j, utilizing GPT-4o.
The core components of the RAG framework include:
• Retriever: Responsible for retrieving relevant informa-
IV. M ETHODOLOGY
tion from external knowledge sources (such as document
libraries, databases, etc.) related to the user’s query. The
retriever typically employs vector retrieval techniques,
converting queries into vector representations and finding
the most similar vectors in the knowledge source to
quickly locate relevant information.
• Generator: Based on the information provided by the
retriever, the generator produces natural language an-
swers. The generator is usually an LLM capable of
understanding and integrating the retrieved information
to generate coherent, accurate, and contextually relevant Fig. 1: CrimeKGQA Architecture
responses.
Currently, the RAG framework predominantly uses vector-
based retrieval techniques, converting user queries into vec- A. Architecture
tor representations to retrieve relevant information. However, CrimeKGQA is a multi-layered architectural framework of
RAG can also utilize graph databases like Neo4j as retrievers, integrating many core components to serve as a rich, intuitive
offering unique advantages in handling structured and highly interface to retrieve crime data stored in the Neo4j graph
relational data. In the RAG framework, a Neo4j-based retriever database. The system flow is illustrated in Figure 1. First, the
converts the user’s natural language query into a structured user asks a question through the interface. The LLM takes
Cypher query, extracts relevant nodes and relationships from the question as input and turns it into a Cypher query that is
the graph database, and provides this structured data to the then sent to the Neo4j database which returns the information
generator model (such as GPT) to produce coherent and we are looking for. Finally, the LLM processes the received
accurate answers. results to generate an answer and provides feedback to the
user. The decoupled design of response generation from data
III. R ELATED W ORK
retrieval enables each of the two systems to be independently
Previous work in the field of crime investigation has aimed optimized based on its strength.
at applying natural language processing, graph databases,
and machine learning tools to improve the functionality of B. User Interface
question-answering systems. For example, Dasgupta et al.
The user interface, a main bridge between a user and the
[9] presented a system for extracting text related to crime
system, is a core component of the CrimeKGQA system. The
using named entity recognition and dependency parsing to
interface is well suited for building intuitive web applications
find crime patterns through topic modeling and sentiment
as it has been developed using Streamlit [15], an open-source
analysis. In a study by Wang et al. [10], crime data was
machine learning application framework that provides the
collected using a graph neural network-based approach. They
design of the interface. The aim is to provide a user-friendly
predicted crime hotspots based on spatiotemporal analysis.
interface with smooth navigation and query submission. Spe-
Elazej et al. [11] proposed a framework based on knowledge
cific functionalities include:
graphs to analyze crime information for controlling criminal
• Text Input Field: It allows users to enter their natural
activities on online social networks. Furthermore, Shi et al.
[12] created a knowledge graph for job-related crimes and language queries.
• Chat Message Display Area: Helps with real-time dis-
mapped it to Neo4J to assist in developing the prosecution
system. Sarzaeim et al. [13] utilized BART, GPT-3, and GPT- play of conversation history between user and the system.
4 LLMs in crime classification and prediction tasks using
C. Knowledge Retrieval Component
zero-shot, few-shot prompting, and refinement, demonstrating
the superior performance of LLMs in smart policing tasks. The knowledge retrieval component retrieves relevant data
In their work, Nikolakopoulos et al. [14] proposed the use from the Neo4j graph database. LangChain [16]’s Graph-
of RAG LLMs in the forensic science domain, introducing CypherQAChain is used by this component through Graph-
an Investigation Enhancement Model (IEM) that combines CypherQAChain to perform efficient querying to the database.
biometric data with LLMs to enhance forensic data analysis The specific implementation steps are as follows:
while addressing challenges related to computational complex- 1) Cypher Query Generation: The LLM is asked to
ity and bias control. However, no prior work has systematically provide a Cypher query statement when a user posts
combined the RAG framework with KGs and LLMs to create a question, generated according to a prompt template to
a crime investigation system. CrimeKGQA achieves this by retrieve data from the Neo4j knowledge graph.
2) Query Execution: We execute this Cypher query gen- 1) Query 1: “Identify people involved in multiple crimes,
erated to fetch nodes and relationships that satisfy the and find their connections.”: The purpose of selecting this
criteria on the Neo4j graph database. query is to test the system’s ability to handle complex requests
3) Result Parsing: Finally, the LLM parses the retrieved that require deep traversal of the KG and reasoning about
results, turns them into natural language responses and interconnected data.
gives them valuable information. a) Process:
1) Natural Language Understanding: After the user in-
D. Implementation of the System puts the query through the Streamlit interface, the GPT-
The CrimeKGQA system is implemented by applying var- 4o LLM parses the question, identifying key entities
ious advanced tools for effectively performing crime investi- and relationships involved, particularly individuals par-
gations. The specific technologies include: ticipating in multiple crimes and their interconnections.
2) Cypher Query Generation: Based on the pars-
• Streamlit [15]: To build a user interface and an inter- ing results, the system uses LangChain’s Graph-
active web application in which users can input their CypherQAChain to generate a Cypher query targeting
questions and retrieve answers. the Neo4j KG. This query aims to match Person nodes
• LangChain [16]: Framework combining LLMs with connected to multiple Crime nodes and identify the
external tools and data sources. LangChain’s Graph- relationships between these Person nodes.
CypherQAChain is utilized to convert natural language 3) Data Retrieval: The generated Cypher query is executed
queries into Cypher queries for interactions with the on the Neo4j KG to retrieve individuals involved in
Neo4j KG. multiple crimes and their detailed connection informa-
• Neo4j [5]: A graph Neo4j database platform, is used for tion, including KNOWS, FAMILY REL, and joint crime
storing and managing the crime investigation knowledge participation relationships.
graph. With powerful querying capabilities, and can effi- 4) Answer Generation: The GPT-4o LLM processes the
ciently manage complex data relationships. retrieved results to generate a natural language answer,
• GPT-4o [17]: We utilise GPT-4o, a large language model detailing the identified individuals and their relation-
based on the Transformer architecture trained on trillions ships, and providing relevant insights.
of parameters over large-scale datasets. It has natural
language understanding and generation capabilities which b) Experimental Results:
are complex and have been widely used in both automatic
text generation and machine translation, intelligent ques-
tion answering, and content recommendation.
V. DATASET
This study utilizes a crime investigation graph dataset
provided by Neo4j [18]. The Manchester, UK POLE (Person,
Object, Location, Event) model based dataset. The specific
characteristics of the data are as follows:
• Number of Entities: This included 61,521 nodes over
several entity types (suspects, victims, locations, crime
events and evidence).
• Number of Relationships: It has 105,840 relationships,
with complex interactions and relationships between en- Fig. 2: Response of Criminal Relationship
tities, including personal relationships and participation
in crimes etc. c) Text Output: As shown in Fig. 2, the GPT-4-generated
answer provides a detailed explanation of the investigation
VI. E XPERIMENTS results, including the identified multiple crime participants and
their mutual relationships. For example:
We selected a series of complex crime investigation ques- “According to the investigation results, Amy is in-
tions that require in-depth traversal of the knowledge graph volved in multiple crime cases. Amy has a KNOWS
and multi-level reasoning to comprehensively assess the sys- relationship with Pamela and Raymond, and a FAM-
tem’s capabilities. ILY REL relationship with Bardon.”
d) Graphical Visualization: Using Neo4j’s native visual-
A. Query Execution and Result Retrieval
ization tools, a graphical representation of the crime network
To comprehensively evaluate the performance of the was generated by using the generated query as shown in Fig.
CrimeKGQA system, we designed two complex queries: 3. The graph displays:
b) Experimental Results: The GPT-4o LLM-generated
answer provides a detailed explanation of the investigation
results, identifying the regions where crimes, especially violent
and sexual crimes, are concentrated, such as M1, M6, BL2,
OL16, OL1, BL1, and M40 areas. Notably, the M1 area
exhibited a high number of incidents on multiple dates in
August 2017. These patterns indicate that certain areas are
more prone to specific types of crimes, highlighting the need
for targeted interventions and further analysis of demographic
Fig. 3: Graphical representation of relationship and temporal factors.
VII. C ONCLUSION
• Nodes: Represent entities such as Person, Crime, Loca- In this paper, we introduce the CrimeKGQA system which
tion, etc. combines GPT-4o, a large language model with a crime knowl-
• Edges: Depict relationships like PARTY TO, KNOWS, edge graph stored in Neo4j and employs Retrieval-Augmented
FAMILY REL, etc. Generation (RAG) to build an efficient and exact crime inves-
tigation question-answering system. Our experimental results
This visualization allows users to intuitively see the complex show that combining LLMs with KGs is indeed a promising
connections between crime participants, facilitating the dis- approach for improving both answer accuracy and query
covery of potential crime patterns and relationships. efficiency for crime investigations. Future work will focus on
2) Query 2: “Please analyze and identify the geographi- improving system performance, expanding the scope of the
cal areas where crime events consistently occur, taking into knowledge graph, and incorporating additional data sources to
account various factors such as time of day, type of crime, enhance the system’s comprehensiveness and usefulness. We
and demographic influences, and subsequently determine the also plan to introduce advanced visualization techniques to
common patterns that emerge from this data.”: support more sophisticated crime network analysis and pattern
a) Process: recognition.
1) Natural Language Understanding: After the user in- R EFERENCES
puts the query through the Streamlit interface, the GPT-
4o LLM parses the question, identifying key entities and [1] W. Zhu, H. Liu, Q. Dong, et al., “Multilingual machine
relationships, particularly those related to geographical translation with large language models: Empirical re-
areas of crime events, time, type of crime, and demo- sults and analysis,” arXiv preprint arXiv:2304.04675,
graphic factors. 2023.
2) Cypher Query Generation: Based on the pars- [2] Y. Chang, X. Wang, J. Wang, et al., “A survey on eval-
ing results, the system uses LangChain’s Graph- uation of large language models,” ACM Transactions
CypherQAChain to generate a Cypher query targeting on Intelligent Systems and Technology, vol. 15, no. 3,
the Neo4j KG. This query aims to extract crime events pp. 1–45, 2024.
related to specific geographical areas while considering [3] R. Azamfirei, S. R. Kudchadkar, and J. Fackler, “Large
time periods, crime types, and relevant demographic language models and the perils of their hallucinations,”
data. Critical Care, vol. 27, no. 1, p. 120, 2023.
3) Data Retrieval: The generated Cypher query is executed [4] Y. Gao, Y. Xiong, X. Gao, et al., “Retrieval-augmented
on the Neo4j KG to retrieve crime event data related to generation for large language models: A survey,” arXiv
the target geographical areas, including specific times, preprint arXiv:2312.10997, 2023.
types of crimes, and associated demographic character- [5] J. J. Miller, “Graph database applications and concepts
istics. with neo4j,” in Proceedings of the southern association
4) Data Analysis and Pattern Recognition: The retrieved for information systems conference, Atlanta, GA, USA,
data is analyzed to identify which time periods and types vol. 2324, 2013, pp. 141–147.
of crimes are most frequent in specific geographical [6] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova,
areas. Demographic influences are also analyzed to “Bert: Pre-training of deep bidirectional transform-
extract common crime patterns and trends. ers for language understanding. arxiv,” arXiv preprint
5) Answer Generation: The GPT-4o LLM generates a arXiv:1810.04805, 2019.
natural language answer based on the analysis results, [7] T. B. Brown, “Language models are few-shot learners,”
detailing the identified high-crime-rate geographical ar- arXiv preprint arXiv:2005.14165, 2020.
eas and their related factors, and providing insights into [8] X. Chen, S. Jia, and Y. Xiang, “A review: Knowledge
common patterns, such as the high occurrence of certain reasoning over knowledge graph,” Expert systems with
types of crimes during specific time periods and their applications, vol. 141, p. 112 948, 2020.
association with population structures.
[9] T. Dasgupta, A. Naskar, R. Saha, and L. Dey, “Crime-
profiler: Crime information extraction and visualization
from news media,” in Proceedings of the international
conference on web intelligence, 2017, pp. 541–549.
[10] Y. Wang, L. Ge, S. Li, and F. Chang, “Deep tempo-
ral multi-graph convolutional network for crime pre-
diction,” in Conceptual Modeling: 39th International
Conference, ER 2020, Vienna, Austria, November 3–6,
2020, Proceedings 39, Springer, 2020, pp. 525–538.
[11] O. Elezaj, S. Y. Yayilgan, E. Kalemi, L. Wendelberg,
M. Abomhara, and J. Ahmed, “Towards designing
a knowledge graph-based framework for investigat-
ing and preventing crime on online social networks,”
in E-Democracy–Safeguarding Democracy and Human
Rights in the Digital Age: 8th International Conference,
e-Democracy 2019, Athens, Greece, December 12-13,
2019, Proceedings 8, Springer, 2020, pp. 181–195.
[12] S. Yong, A. Wenlu, X. Jiayu, and Q. Yi, “A knowledge
graph constructed for job-related crimes [j],” Procedia
Computer Science, vol. 199, pp. 540–547, 2022.
[13] P. Sarzaeim, Q. H. Mahmoud, and A. Azim, “Ex-
perimental Analysis of Large Language Models in
Crime Classification and Prediction,” Proceedings of the
Canadian Conference on Artificial Intelligence, 2024,
https://caiac.pubpub.org/pub/flaj2ttj.
[14] A. Nikolakopoulos, S. Evangelatos, E. Veroni, et al.,
“Large language models in modern forensic investi-
gations: Harnessing the power of generative artificial
intelligence in crime resolution and suspect identifica-
tion,” in 2024 5th International Conference in Elec-
tronic Engineering, Information Technology & Educa-
tion (EEITE), IEEE, 2024, pp. 1–5.
[15] M. Khorasani, M. Abdou, and J. H. Fernández, Web
application development with streamlit: Develop and
deploy secure and scalable web applications to the
cloud using a pure Python framework. Springer, 2022.
[16] A. Kansal, “Langchain: Your swiss army knife,” in
Building Generative AI-Powered Apps: A Hands-on
Guide for Developers, Springer, 2024, pp. 17–40.
[17] R. Islam and O. M. Moushi, “Gpt-4o: The cutting-edge
advancement in multimodal llm,” Authorea Preprints,
2024.
[18] neo4j-graph examples, Github - neo4j-graph-
examples/pole: Crime investigation - explore
connections in crime data using the pole - person,
object, location, event - model in a public dataset from
manchester, u.k. GitHub, 2020. [Online]. Available:
https://github.com/neo4j-graph-examples/pole.