0% found this document useful (0 votes)
30 views9 pages

Usage of K

The document discusses using k-means clustering on crime data to predict crime prone areas and visualize crime trends. It describes clustering crime data to group similar crime types and determine low, medium, and high crime areas. The methodology section explains the k-means clustering algorithm and system architecture. Modules for admin and users are outlined.

Uploaded by

Anjali sinha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views9 pages

Usage of K

The document discusses using k-means clustering on crime data to predict crime prone areas and visualize crime trends. It describes clustering crime data to group similar crime types and determine low, medium, and high crime areas. The methodology section explains the k-means clustering algorithm and system architecture. Modules for admin and users are outlined.

Uploaded by

Anjali sinha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

CRIME RATE PREDICTION USING K-MEANS

Abstract
Crime analysis is a set of techniques that assist the police departments to
become more effective though better information. Our Proposed system
help to predict the probability of crime occurrence in a city by analyzing
the crime dataset and visualize the results on a google map for better
understanding.Here we can predict the most places where the crime might
occur. This is done by using K-means clustering algorithm which group
the similar objects into clusters. This system can be used by police
forcrime prediction cases and proactive steps for catching criminals.

Keywords —Crime, Data mining, Clustering, K-Means, Google Map, Web


application.

Introduction
Nowadays, the crime rate is increasing day by day and all the efforts of the government’s
worldwide aim to reduce the crime rate. With the help of data mining which is a great field of
applying high volume of dataset and it proved that it helps to take advantages of historical data
with its techniques and it is going to benefit the police departments with analyzing the data and
extract the knowledge from it. Data mining consists of many techniques like clustering,
classification, prediction, etc.

Clustering used to cluster (group) objects with similar attributes from dataset for further analysis
and one of the algorithm used in clustering is k-means algorithm. Most of the crimes are
unsolved and contained empty values so that restrict us to choose clustering over classification.
Clustering is known as the simplest and fastest algorithm for clustering compared to other
algorithms and it is the best choice when It comes to cluster high volume of data due to its less
computational speed. In this project we are going to create a web application with five features that
can be helpful for police and this project use crime dataset that belongs to Chicago city in USA and we
implemented k-means algorithm in NetBeans software, to cluster the crime datasetand give the results
which is going to show us the crime prone areas with the rate of crime occurred there, and it is going to
determine the percentage of a particular crime occurring in particular area as well as it is going to
visualize the results of each crime on google map including the type, number and the time so the police
can take advanced strategies for crime reduction.

Page 1
Literature Review

Aditya Raj [1] proposed a system which includes steps for crime analysis starting with extraction
of crime patterns and prediction the crime using k-means algorithm and that lead to detection of
the crime at the end with the use of Rapid miner tool.

Sayyad Wahab [2] study the existing system in use by Indian police and highlighted the basic
features of the system, then he introduced a criminal analysis tool which is based on data mining
techniques that helps police department to carry out the activities efficiently.

Aditya Raj [3] proposed a crime analysis tool for Indian scenario and proved that it is effective for
identifying crime patterns and prediction. To develop the tool, he develops a data cleaning
algorithm and he enhanced clustering algorithm (Kmeans, DBScan).

Sayyad Wahab [4] proposed a framework for crime and criminal data analysis. For data
classification, he used decision tree algorithms and simple K-means for data clustering. The
result is used for prediction of trends and behavior between crimes and criminals.

Aditya Raj [5] he pointed out about the importance of identifying and make tactical analysis of
crime trends and patterns by criminal investigators all over the world.

Sayyad Wahab [6] did a study at the use of clustering technique in data mining to analyze the
crime patterns which helped to recognize which type of crime is more occurring and showed the
distribution of each crime type in every area category.

Page 2
Methodology

Data mining techniques can be applied in crime analysis and it help to take advantage of historical
data and extract knowledge from it therefor help to take better decisions. A data mining approach
such as clustering is used to cluster the data into groups where similar objects are placed together
and in this system, clustering helps to group the same crime types together which means crime like
“murder” are grouped together and same for “rape” cases and so on. In addition to that, the system
determines the low, medium and high areas of crime based on the dataset

K-means process consist of the following steps:

[1] choose k number of clusters as initial step.

[2] choose a set of K instances as centers of the clusters.

[3] each instance assigned by the algorithm to the cluster which is closest.

[4] The cluster centroids are recalculated either after whole cycle of re-assignment or each
instance assignment.

[5] The process is iterated.

The algorithm requires to specify k number of clusters in advance. It is unable to handle noisy data
and outliers and not suitable to discover clusters with non-convex shapes.

Fig. 1. K-means Clustering Algorithm

Page 3
K-means is chosen to be used here because the ease of implementing it using java,
plus its simplicity and speed which is very appealing in practice and it is suitable
for high volume crime dataset and can help to extract useful information.

Fig. 2. System Architecture

The dataset imported from Chicago police department included 6.54M rows of
committed crimes from 2011 until present and 22 columns such as (id, date, type,
location...).

We are going to take this data and imported into NetBeans software and implement
the k-means algorithm in java and here we aimed to implement the algorithm in the
best way with an optimized code to ensure the efficacy and speed as well as the
accuracy in the work.The next part is to choose the number of clusters and it is
somehow subjective in general. As we saw, that we are dealing with a huge dataset,
we decided to choose 10 clusters for better prediction. Then we can visualize the
clusters that are created by k-means on the google map and determine the crime
prone areas and each cluster is going to be denoted with a number indicates the
crimes included in it. Then it will visualize the places where a specific crime
occurs(such as there more theftincidents occurs at tourist destinations and that is
the main work which is going to help the police to focus on some areas and take

4
more precautions to prevent any future crimes and hence reduce the crime rate.we

5
can also visualize all the crime locations of the city on the google map for good
understanding of the situation. Additional feature is implemented to show the
percentage of a specific crime that occur in specific part of the city and an example
of that is “rape” incidents are twice in villages than cities. In addition to that,
implementing a performance features are necessary to help to monitor the rate of
the crime to see if it is increasing or decreasing.

Finally, reports must be generated and can be download that contains the crime rate
graphs and map images.

6
Module Description

Proposed system consists of two modules, Admin Module and User Module.
Admin module where his work is to train the dataset and uploaded to make it
available to the authorized users. The crime dataset should not contain any empty
values and some dataset contain a lot of attributes so here we filter our dataset
according to some requirements to ensure the accuracy in the clustering process.

The user module is about the user’s work of determining the cluster number which
is depends on the method used for measuring similarities and the parameters used
for partitioning. Then applying the k-means algorithm on the dataset and starting
his job for analyzing the crime patterns and monitoring the crime rate. Then
generating the required reports that is going to be demanded by the employees in
the police department. So here, many users can share the same work on analyzing
the clustering results and trying with different number of clusters to ensure the
accuracy and finally they can identify common crime patterns with the help of the
system.

7
8
9

You might also like