Unit - 5
Unit - 5
Data Mining Trends and Research Frontiers: Other methodologies of data mining: Web mining, Temporal
mining-Spatial mining-Statistical data mining- Visual and audio data mining- Data mining applications- Data
mining and society: Ubiquitous and invisible data mining- Privacy, Security, and Social Impacts of data mining
Similarity Search:
Data Reduction
Indexing Methods
Similarity Search Methods
Query Languages
Biological sequences are made up of nucleotide or amino acid sequences. In bioinformatics and modern
biology, biological sequence analysis compares, aligns, indexes, and analyzes biological sequences.
Biological sequences analysis plays a crucial role in bioinformatics and modern biology. Such analysis can
be partitioned into two tasks- pairwise sequence alignment and multiple sequence alignment.
Biological Sequence Methods:
Alignment of Biological Sequences:
Pairwise Alignment
The BLAST Local Alignment Algorithm
Multiple Sequence Alignment Methods
Biological Sequence Analysis Using a Hidden Markov Model:
Markov Chain
Hidden Markov Model
Forward Algorithm
Viterbi Algorithm
Baum-Welch Algorithm
Application of Data Mining:
Financial Information Analysis:
Loan payment prediction/consumer credit policy analysis
Design and construction of information warehouse
Financial information collected in banks and money establishments area units are typically
comparatively complete, reliable, and of top quality.
Retail Industry:
Multidimensional analysis( sales, customers, products, time, etc.)
Sales campaign analysis
Customer retention
Product recommendation
Using visualization tools for data analysis
Science and Engineering:
Data processing and data warehouse
Mining complex data types
Network-based mining
Graph-based mining
Trends of Data Mining:
Exploration of applications: addressing application-specific issues
Data mining approaches that are scalable and interactive
Data mining integration with Web search engines, database systems, data warehouse systems, and cloud
computing systems
Mining social and information networks
Mining spatiotemporal, moving objects, and cyber-physical systems
Mining multimedia, text, and web data
Mining biological and biomedical data
Visual and audio data mining
Distributed data mining and real-time data stream mining.
Other methodologies of data mining: Web mining, Temporal mining-Spatial mining-Statistical data
mining- Visual and audio data mining
Web Mining is the process of Data Mining techniques to automatically discover and extract information
from Web documents and services. The main purpose of web mining is discovering useful information
from the World-Wide Web and its usage patterns.
Applications of Web Mining:
Web mining is the process of discovering patterns, structures, and relationships in web data. It involves
using data mining techniques to analyze web data and extract valuable insights. The applications of web
mining are wide-ranging and include:
Personalized marketing:
Web mining can be used to analyze customer behavior on websites and social media platforms. This
information can be used to create personalized marketing campaigns that target customers based on their
interests and preferences.
E-commerce
Web mining can be used to analyze customer behavior on e-commerce websites. This information can be
used to improve the user experience and increase sales by recommending products based on customer
preferences.
Search engine optimization:
Web mining can be used to analyze search engine queries and search engine results pages (SERPs). This
information can be used to improve the visibility of websites in search engine results and increase traffic to
the website.
Fraud detection:
Web mining can be used to detect fraudulent activity on websites. This information can be used to prevent
financial fraud, identity theft, and other types of online fraud.
Sentiment analysis:
Web mining can be used to analyze social media data and extract sentiment from posts, comments, and
reviews. This information can be used to understand customer sentiment towards products and services and
make informed business decisions.
Web content analysis:
Web mining can be used to analyze web content and extract valuable information such as keywords,
topics, and themes. This information can be used to improve the relevance of web content and optimize
search engine rankings.
Customer service:
Web mining can be used to analyze customer service interactions on websites and social media platforms.
This information can be used to improve the quality of customer service and identify areas for
improvement.
Healthcare:
Web mining can be used to analyze health-related websites and extract valuable information about
diseases, treatments, and medications. This information can be used to improve the quality of healthcare
and inform medical research.
Process of Web Mining:
Web mining can be broadly divided into three different types of techniques of mining: Web Content
Mining, Web Structure Mining, and Web Usage Mining. These are explained as following below.
1. Web Content Mining: Web content mining is the application of extracting useful information from
the content of the web documents. Web content consist of several types of data – text, image, audio,
video etc. Content data is the group of facts that a web page is designed. It can provide effective and
interesting patterns about user needs. Text documents are related to text mining, machine learning and
natural language processing. This mining is also known as text mining. This type of mining performs
scanning and mining of the text, images and groups of web pages according to the content of the input.
2. Web Structure Mining: Web structure mining is the application of discovering structure information
from the web. The structure of the web graph consists of web pages as nodes, and hyperlinks as edges
connecting related pages. Structure mining basically shows the structured summary of a particular
website. It identifies relationship between web pages linked by information or direct link connection.
To determine the connection between two commercial websites, Web structure mining can be very
useful.
3. Web Usage Mining: Web usage mining is the application of identifying or discovering interesting
usage patterns from large data sets. And these patterns enable you to understand the user behaviors or
something like that. In web usage mining, user access data on the web and collect data in form of logs.
So, Web usage mining is also called log mining.
Temporal mining-Spatial mining
1. Spatial Data Mining :
Spatial data mining is the process of discovering interesting and previously unknown, but potentially useful
patterns from spatial databases. In spatial data mining analyst use geographical or spatial information to
produce business intelligence or other results. Challenges involved in spatial data mining include identifying
patterns or finding objects that are relevant to research project.
2. Temporal Data Mining :
Temporal data refers to the extraction of implicit, non-trivial and potentially useful abstract information from
large collection of temporal data. It is concerned with the analysis of temporal data and for finding temporal
patterns and regularities in sets of temporal data tasks of temporal data mining are –
Data Characterization and Comparison
Cluster Analysis
Classification
Association rules
Prediction and Trend Analysis
Pattern Analysis
It deals with spatial (location , Geo- It deals with implicit or explicit Temporal content ,
3.
referenced) data. from large quantities of data.
It includes finding characteristic rules, It aims at mining new and unknown knowledge,
5. discriminant rules, association rules and which takes into account the temporal aspects of
evaluation rules etc. data.
Examples –
Examples –
An association rule which looks like – “Any Person
7. Determining hotspots , Unusual
who buys a car also buys steering lock”. By
locations.
temporal aspect this rule would be – ” Any person
who buys a car also buys a steering lock after that
“.
Data mining is a process that interacts with a massive set of data. In this perspective, it unravels interesting
patterns from unknown data structured. The same may apply to audio and video data mining as well. Today,
users can access a large volume of multimedia data generated from information technology and the easy
availability of multimedia systems. Hence, the amount of audio and video data available today grows
exponentially. Video falls under the multimedia category that contains various data comprising text, image,
visual, audio, and meta-data.
Audio-video mining holds a major place in different applications across security and surveillance, medicine
discovery, education, entertainment, and sports. The key objective of video data mining is to extract data from
video sources and discover and define patterns and dynamics.
Visual data mining essentially combines the power of these components, making it a highly attractive and
effective tool for the comprehension of data distributions, patterns, clusters, and outliers in data.
Visual data mining can be considered as an association of two disciplines such as data visualization and data
mining. It can also associate with computer graphics, multimedia systems, human-computer interaction, pattern
identification, and high-performance computing. In general, data visualization and data mining can be
integrated into the following ways:
1. Data visualization: Data in a database or data warehouse can be viewed at different granularity or
abstraction levels or as different combinations of attributes or dimensions. Data can be presented in
various visual forms, including boxplots, 3-D cubes, data distribution charts, curves, surfaces, link
graphs, etc. The visual display can help give users a clear impression and overview of the data
characteristics in a large data set.
2. Data mining result visualization: Visualization of data mining results is the presentation of the results
or knowledge obtained from data mining in visual forms. Such forms may include scatter plots and
boxplots, decision trees, association rules, clusters, outliers, generalized rules, etc.
3. Data mining process visualization: This type of visualization presents the various processes of data
mining in visual forms so that users can see how the data are extracted and from which database or data
warehouse they are extracted, as well as how the selected data are cleaned, integrated, preprocessed,
and mined. Moreover, it may also show which method is selected for data mining, where the results
are stored, and how they may be viewed.
4. Interactive visual data mining: In interactive visual data mining, visualization tools can be used in
the data mining process to help users make smart data mining decisions. For example, the data
distribution in a set of attributes can be displayed using colored sectors (where a circle represents the
whole space). This display helps users determine which sector should first be selected for classification
and where a good split point for this sector may be.
If patterns can be transformed into sound and music, we can listen to pitches, rhythm, tune, and melody instead
of watching pictures to identify anything interesting or unusual. This may relieve some of the burdens of visual
concentration and be more relaxing than visual mining. Therefore, audio data mining is an interesting
complement to visual mining.
Here, we will discuss different use cases of audio and video data mining in businesses, such as:
Data Analytics is one of the most emerging technologies in the present-day world. With the increase in the
demand for portable and remote devices like mobile phones and personal digital assistants (PDAs), the need
for extracting data from these devices for analysis plays a crucial role in order to perform data analysis.
Therefore, accessing data from a remote device is much needed.
Ubiquitous Data Mining (UDM) is a process of analyzing data performing concrete mining and
examination of distributed and heterogeneous systems like mobile and embedded devices.
UDM is used for mining data from mobile environments like cell phones and sensors, which are constrained
by limited computational resources and varying networks. It supports time-critical and real-time data needs.
It is also used for intelligent analysis. Using UDM we can extract hidden classifiers and clusters.
The architecture of Ubiquitous technologies
Ubiquitous Data Mining involves the collection and storage of data, processing data, and dissemination of
the result that is analyzed. In order to achieve this, we make use of 6 parts, which make up the architecture
of the ubiquitous system.
Sl.
No Parts Function
The component that is used for storing and processing the data. Eg: Personal
1 Devices
Computers, Super Computers
The mode in which devices communicate with each other. Eg: Internet, via
2 Communication
Centralized System
The user who would interact with the system. It can have a Single User or can
3 Users
have Multiple Users.
The component that administers all the above-mentioned parts. It can either be
4 Control
controlled by a single administrator or by multiple administrators.
The type of data that is stored and processed by the system. It also gives the
implication to its dynamics and organizations. The data can be of the following
5 Data types:
Static data
Dynamic Data
The infrastructure that the system employs for data discovery. Eg: Web,
6 Infrastructure
Database
Application of UDM:
1. Traffic Safety: Abnormal traffic can be detected using sensors and the data can be stored and analyzed
in the system. This analyzed data can then be used to detect traffic mishaps on a real -time basis using
sensors. Thus, traffic and road safety is monitored.
2. Health Care: The sensors can be used in creating smart homes for elderly persons and people who are
in need of continuous medical attention. The sensors can be used for notifying the immediate medical
need and would be collected. The data so collected can help to provide them with timely medical aids.
3. Crisis and Calamities Management: The previous crisis and calamities data are collected and stored
and analyzed. During times of crisis, sensors can detect the crisis and the data is sent to the controllers.
Results are predicted before the effect can get disastrous. Thus, helps in Crisis Management.
Invisible Data Mining
Data mining is present in all the major aspects of our life. This requires effective mining of data mining
without disclosing private information via extraction of data to outsiders.
Invisible Data Mining is a process of data mining where the functionalities are performed invisibly.
1. Search Engines
2. Intelligent Database System
3. e-mail managers
Advantages:
Improved decision-making: Ubiquitous data mining can provide valuable insights into customer
behavior, market trends, and other important factors, enabling organizations to make more
informed decisions.
Personalized services: Ubiquitous data mining can enable personalized services, such as
customized product recommendations or personalized healthcare treatments, that can improve the
overall customer experience.
Improved efficiency: Ubiquitous data mining can help streamline business processes, enabling
organizations to operate more efficiently.
Disadvantages:
Privacy concerns: Ubiquitous data mining can raise privacy concerns, as it involves collecting and
analyzing data about individuals without their explicit consent. This can result in the disclosure of
sensitive information, which can have negative consequences for individuals.
Security risks: The data used in ubiquitous data mining can be subject to security risks, such as
unauthorized access or hacking, which can result in the exposure of sensitive information.
Bias: Ubiquitous data mining algorithms can be biased, which can lead to discriminatory outcomes
or reinforce existing biases.
Advantages:
Less intrusive: Invisible data mining can be less intrusive than other forms of data mining, as
individuals may be less aware that their data is being collected and analyzed.
Improved decision-making: Invisible data mining can provide valuable insights into customer
behavior, market trends, and other important factors, enabling organizations to make more informed
decisions.
Personalized services: Invisible data mining can enable personalized services, such as customized
product recommendations or personalized healthcare treatments, that can improve the overall
customer experience.
Disadvantages:
Lack of transparency: Invisible data mining can lack transparency, as individuals may be unaware
that their data is being collected and analyzed.
Privacy concerns: Invisible data mining can raise privacy concerns, as it involves collecting and
analyzing data about individuals without their explicit consent.
Security risks: The data used in invisible data mining can be subject to security risks, such as
unauthorized access or hacking, which can result in the exposure of sensitive information.
Advantages:
Improved security: Data mining can help identify patterns and anomalies that could indicate security
breaches, enabling organizations to take action to prevent future attacks.
Personalized services: Data mining can enable personalized services, such as customized product
recommendations or personalized healthcare treatments, that can improve the overall customer experience.
Improved decision making: Data mining can provide insights into customer behavior, market trends, and
other important factors, enabling organizations to make more informed decisions.
Improved efficiency: Data mining can help streamline business processes, enabling organizations to
operate more efficiently.
Disadvantages:
Privacy concerns: Data mining can raise privacy concerns, as it involves collecting and analyzing data
about individuals without their explicit consent. This can result in the disclosure of sensitive information,
which can have negative consequences for individuals.
Security risks: The data used in data mining can be subject to security risks, such as unauthorized access
or hacking, which can result in the exposure of sensitive information.
Bias: Data mining algorithms can be biased, which can lead to discriminatory outcomes or reinforce
existing biases.
Social impacts: Data mining can have social impacts, such as increasing surveillance and reducing
personal autonomy, which can have negative consequences for society as a whole.
Conclusion:
Data mining has several advantages and disadvantages when it comes to privacy, security, and social
impacts. While it can provide valuable insights and improve efficiency, it is important to carefully consider
the potential risks and impacts before implementing data mining practices. Organizations must take steps
to mitigate risks and protect the privacy and security of individuals, while also ensuring that data mining is
conducted in an ethical and responsible manner.