0% found this document useful (0 votes)
8 views12 pages

Unit - 5

Uploaded by

seilmonbhaii1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views12 pages

Unit - 5

Uploaded by

seilmonbhaii1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

UNIT - 5

Data Mining Trends and Research Frontiers: Other methodologies of data mining: Web mining, Temporal
mining-Spatial mining-Statistical data mining- Visual and audio data mining- Data mining applications- Data
mining and society: Ubiquitous and invisible data mining- Privacy, Security, and Social Impacts of data mining

Data Mining Trends and Research Frontiers


Data mining is the process of analyzing a large size of information to find out the patterns, trends. It can be
used for corporations to find out about customers’ choices, make a good relationship with customers,
increase the revenue, reduce risks. Data mining is based on complex algorithms that allow data segmentation
to discover numerous trends and patterns, detect deviations, and estimate the likelihood of certain
occurrences occurring. Raw data can be in both analog and digital formats, and it is essentially dependent
on the data’s source. Companies must keep up with the latest data mining trends and stay current in order to
succeed in the industry and beat out the competition.

Types of Mining Sequence in Data Mining:

 Mining time series


 Mining symbolic sequence
 Mining biological sequence
1. Mining Time Series
A specified number of data points are recorded at a specific time or events obtained over repeated
measurements of time in a mining time series. The values or data are typically measured in equal time
intervals like- hourly, weekly, daily. In time-series data is also recorded regular intervals or characteristic
time-series components are trend, seasonal, cycle, irregular.
Application of Time Series:
 Financial: Stock market analysis
 Industry: Power consumption
 Scientific: Experiment result
 Meteorological: Precipitation
Time Series Analysis Methods:
 Trend Analysis: Categories of Time Series movements:
 Long-term or Trend Movements: General direction in which a time series is moving over a
long interval of time.
 Cyclic Movements: Long-term oscillation about a trend line or curve.
 Seasonal Movements: A time series appears to follow substantially identical patterns during
the corresponding months of subsequent years.
 Irregular or Random Movements: It changes that occur randomly due to unplanned events.

 Similarity Search:
 Data Reduction
 Indexing Methods
 Similarity Search Methods
 Query Languages

2. Mining Symbolic Sequence


A symbolic sequence is made up of an ordered list of elements that can be recorded with or without a sense
of time. This sequence can be used in a variety of ways, including consumer shopping sequences, web
clickstreams, software execution sequences, biological sequences, and so on.
Mining of sequential patterns entails identifying the subsequences that appear frequently in one or more
sequences. As a result of substantial research in this area, a number of scalable algorithms have been
developed. Alternatively, we can only mine the set of closed sequential patterns, where a sequential pattern
s is closed if it is a correct subsequence of s’ and s’ has the same support as s.
For example:
if
where a, b, c, d and e are items, then S is a subsequence of S’.

3. Mining Biological Sequence

Biological sequences are made up of nucleotide or amino acid sequences. In bioinformatics and modern
biology, biological sequence analysis compares, aligns, indexes, and analyzes biological sequences.
Biological sequences analysis plays a crucial role in bioinformatics and modern biology. Such analysis can
be partitioned into two tasks- pairwise sequence alignment and multiple sequence alignment.
Biological Sequence Methods:
 Alignment of Biological Sequences:
 Pairwise Alignment
 The BLAST Local Alignment Algorithm
 Multiple Sequence Alignment Methods
 Biological Sequence Analysis Using a Hidden Markov Model:
 Markov Chain
 Hidden Markov Model
 Forward Algorithm
 Viterbi Algorithm
 Baum-Welch Algorithm
Application of Data Mining:
 Financial Information Analysis:
 Loan payment prediction/consumer credit policy analysis
 Design and construction of information warehouse
 Financial information collected in banks and money establishments area units are typically
comparatively complete, reliable, and of top quality.
 Retail Industry:
 Multidimensional analysis( sales, customers, products, time, etc.)
 Sales campaign analysis
 Customer retention
 Product recommendation
 Using visualization tools for data analysis
 Science and Engineering:
 Data processing and data warehouse
 Mining complex data types
 Network-based mining
 Graph-based mining
Trends of Data Mining:
 Exploration of applications: addressing application-specific issues
 Data mining approaches that are scalable and interactive
 Data mining integration with Web search engines, database systems, data warehouse systems, and cloud
computing systems
 Mining social and information networks
 Mining spatiotemporal, moving objects, and cyber-physical systems
 Mining multimedia, text, and web data
 Mining biological and biomedical data
 Visual and audio data mining
 Distributed data mining and real-time data stream mining.

Other methodologies of data mining: Web mining, Temporal mining-Spatial mining-Statistical data
mining- Visual and audio data mining
Web Mining is the process of Data Mining techniques to automatically discover and extract information
from Web documents and services. The main purpose of web mining is discovering useful information
from the World-Wide Web and its usage patterns.
Applications of Web Mining:
Web mining is the process of discovering patterns, structures, and relationships in web data. It involves
using data mining techniques to analyze web data and extract valuable insights. The applications of web
mining are wide-ranging and include:
Personalized marketing:
Web mining can be used to analyze customer behavior on websites and social media platforms. This
information can be used to create personalized marketing campaigns that target customers based on their
interests and preferences.
E-commerce
Web mining can be used to analyze customer behavior on e-commerce websites. This information can be
used to improve the user experience and increase sales by recommending products based on customer
preferences.
Search engine optimization:
Web mining can be used to analyze search engine queries and search engine results pages (SERPs). This
information can be used to improve the visibility of websites in search engine results and increase traffic to
the website.
Fraud detection:
Web mining can be used to detect fraudulent activity on websites. This information can be used to prevent
financial fraud, identity theft, and other types of online fraud.
Sentiment analysis:
Web mining can be used to analyze social media data and extract sentiment from posts, comments, and
reviews. This information can be used to understand customer sentiment towards products and services and
make informed business decisions.
Web content analysis:
Web mining can be used to analyze web content and extract valuable information such as keywords,
topics, and themes. This information can be used to improve the relevance of web content and optimize
search engine rankings.
Customer service:
Web mining can be used to analyze customer service interactions on websites and social media platforms.
This information can be used to improve the quality of customer service and identify areas for
improvement.
Healthcare:
Web mining can be used to analyze health-related websites and extract valuable information about
diseases, treatments, and medications. This information can be used to improve the quality of healthcare
and inform medical research.
Process of Web Mining:

Web Mining Process

Web mining can be broadly divided into three different types of techniques of mining: Web Content
Mining, Web Structure Mining, and Web Usage Mining. These are explained as following below.

Categories of Web Mining

1. Web Content Mining: Web content mining is the application of extracting useful information from
the content of the web documents. Web content consist of several types of data – text, image, audio,
video etc. Content data is the group of facts that a web page is designed. It can provide effective and
interesting patterns about user needs. Text documents are related to text mining, machine learning and
natural language processing. This mining is also known as text mining. This type of mining performs
scanning and mining of the text, images and groups of web pages according to the content of the input.
2. Web Structure Mining: Web structure mining is the application of discovering structure information
from the web. The structure of the web graph consists of web pages as nodes, and hyperlinks as edges
connecting related pages. Structure mining basically shows the structured summary of a particular
website. It identifies relationship between web pages linked by information or direct link connection.
To determine the connection between two commercial websites, Web structure mining can be very
useful.
3. Web Usage Mining: Web usage mining is the application of identifying or discovering interesting
usage patterns from large data sets. And these patterns enable you to understand the user behaviors or
something like that. In web usage mining, user access data on the web and collect data in form of logs.
So, Web usage mining is also called log mining.
Temporal mining-Spatial mining
1. Spatial Data Mining :
Spatial data mining is the process of discovering interesting and previously unknown, but potentially useful
patterns from spatial databases. In spatial data mining analyst use geographical or spatial information to
produce business intelligence or other results. Challenges involved in spatial data mining include identifying
patterns or finding objects that are relevant to research project.
2. Temporal Data Mining :
Temporal data refers to the extraction of implicit, non-trivial and potentially useful abstract information from
large collection of temporal data. It is concerned with the analysis of temporal data and for finding temporal
patterns and regularities in sets of temporal data tasks of temporal data mining are –
 Data Characterization and Comparison
 Cluster Analysis
 Classification
 Association rules
 Prediction and Trend Analysis
 Pattern Analysis

Difference between Spatial and Temporal Data Mining :


SNO. Spatial data mining Temporal data mining

1. It requires space. It requires time.

Spatial mining is the extraction of


Temporal mining is the extraction of knowledge
knowledge/spatial relationship and
2. about occurrence of an event whether they follow
interesting measures that are not
Cyclic , Random ,Seasonal variations etc.
explicitly stored in spatial database.

It deals with spatial (location , Geo- It deals with implicit or explicit Temporal content ,
3.
referenced) data. from large quantities of data.

Spatial databases reverses spatial objects


Temporal data mining comprises the subject as well
4. derived by spatial data. types and spatial
as its utilization in modification of fields.
association among such objects.

It includes finding characteristic rules, It aims at mining new and unknown knowledge,
5. discriminant rules, association rules and which takes into account the temporal aspects of
evaluation rules etc. data.

It is the method of identifying unusual


6. and unexplored data but useful models It deals with useful knowledge from temporal data.
from spatial databases.

Examples –
Examples –
An association rule which looks like – “Any Person
7. Determining hotspots , Unusual
who buys a car also buys steering lock”. By
locations.
temporal aspect this rule would be – ” Any person
who buys a car also buys a steering lock after that
“.

Statistical data mining


Data mining refers to extracting or mining knowledge from large amounts of data. In other words, data
mining is the science, art, and technology of discovering large and complex bodies of data in order to
discover useful patterns. Theoreticians and practitioners are continually seeking improved techniques to
make the process more efficient, cost-effective, and accurate. Any situation can be analyzed in two ways in
data mining:
 Statistical Analysis: In statistics, data is collected, analyzed, explored, and presented to identify
patterns and trends. Alternatively, it is referred to as quantitative analysis.
 Non-statistical Analysis: This analysis provides generalized information and includes sound, still
images, and moving images.
In statistics, there are two main categories:
 Descriptive Statistics: The purpose of descriptive statistics is to organize data and identify the main
characteristics of that data. Graphs or numbers summarize the data. Average, Mode, SD(Standard
Deviation), and Correlation are some of the commonly used descriptive statistical methods.
 Inferential Statistics: The process of drawing conclusions based on probability theory and generalizing
the data. By analyzing sample statistics, you can infer parameters about populations and make models of
relationships within data.
There are various statistical terms that one should be aware of while dealing with statistics. Some of these
are:
 Population
 Sample
 Variable
 Quantitative Variable
 Qualitative Variable
 Discrete Variable
 Continuous Variable

Visual and Audio Data Mining

Data mining is a process that interacts with a massive set of data. In this perspective, it unravels interesting
patterns from unknown data structured. The same may apply to audio and video data mining as well. Today,
users can access a large volume of multimedia data generated from information technology and the easy
availability of multimedia systems. Hence, the amount of audio and video data available today grows
exponentially. Video falls under the multimedia category that contains various data comprising text, image,
visual, audio, and meta-data.

Audio-video mining holds a major place in different applications across security and surveillance, medicine
discovery, education, entertainment, and sports. The key objective of video data mining is to extract data from
video sources and discover and define patterns and dynamics.

What is Visual Data Mining?


Visual data mining uses data and knowledge visualization methods to find implicit and beneficial knowledge
from huge data sets. The eyes and brain manage the human visual system, the latter of which can be thought
of as a dynamic, largely parallel processing and reasoning engine, including a huge knowledge base.

Visual data mining essentially combines the power of these components, making it a highly attractive and
effective tool for the comprehension of data distributions, patterns, clusters, and outliers in data.

Visual data mining can be considered as an association of two disciplines such as data visualization and data
mining. It can also associate with computer graphics, multimedia systems, human-computer interaction, pattern
identification, and high-performance computing. In general, data visualization and data mining can be
integrated into the following ways:

1. Data visualization: Data in a database or data warehouse can be viewed at different granularity or
abstraction levels or as different combinations of attributes or dimensions. Data can be presented in
various visual forms, including boxplots, 3-D cubes, data distribution charts, curves, surfaces, link
graphs, etc. The visual display can help give users a clear impression and overview of the data
characteristics in a large data set.
2. Data mining result visualization: Visualization of data mining results is the presentation of the results
or knowledge obtained from data mining in visual forms. Such forms may include scatter plots and
boxplots, decision trees, association rules, clusters, outliers, generalized rules, etc.
3. Data mining process visualization: This type of visualization presents the various processes of data
mining in visual forms so that users can see how the data are extracted and from which database or data
warehouse they are extracted, as well as how the selected data are cleaned, integrated, preprocessed,
and mined. Moreover, it may also show which method is selected for data mining, where the results
are stored, and how they may be viewed.
4. Interactive visual data mining: In interactive visual data mining, visualization tools can be used in
the data mining process to help users make smart data mining decisions. For example, the data
distribution in a set of attributes can be displayed using colored sectors (where a circle represents the
whole space). This display helps users determine which sector should first be selected for classification
and where a good split point for this sector may be.

What is Audio Data Mining?


Audio data mining uses audio signals to indicate data patterns or the features of data mining results. Although
visual data mining may disclose interesting patterns using graphical displays, it requires users to concentrate
on watching patterns and identifying interesting or novel features within them. This can sometimes be quite
tiresome.

If patterns can be transformed into sound and music, we can listen to pitches, rhythm, tune, and melody instead
of watching pictures to identify anything interesting or unusual. This may relieve some of the burdens of visual
concentration and be more relaxing than visual mining. Therefore, audio data mining is an interesting
complement to visual mining.

Applications of Audio and Visual Mining

Here, we will discuss different use cases of audio and video data mining in businesses, such as:

1. Traffic control management

2. Vehicles monitoring procedures


3. Enhanced Security with Live Video Streaming
4. Health Status Monitoring
5. Customer Demographic Data on Hand
6. Automated Transcription of Audio/Video Data
7. Understanding Customer opinions accurately.

Ubiquitous and Invisible Data Mining

Data Analytics is one of the most emerging technologies in the present-day world. With the increase in the
demand for portable and remote devices like mobile phones and personal digital assistants (PDAs), the need
for extracting data from these devices for analysis plays a crucial role in order to perform data analysis.
Therefore, accessing data from a remote device is much needed.
Ubiquitous Data Mining (UDM) is a process of analyzing data performing concrete mining and
examination of distributed and heterogeneous systems like mobile and embedded devices.
UDM is used for mining data from mobile environments like cell phones and sensors, which are constrained
by limited computational resources and varying networks. It supports time-critical and real-time data needs.
It is also used for intelligent analysis. Using UDM we can extract hidden classifiers and clusters.
The architecture of Ubiquitous technologies
Ubiquitous Data Mining involves the collection and storage of data, processing data, and dissemination of
the result that is analyzed. In order to achieve this, we make use of 6 parts, which make up the architecture
of the ubiquitous system.
Sl.
No Parts Function

The component that is used for storing and processing the data. Eg: Personal
1 Devices
Computers, Super Computers

The mode in which devices communicate with each other. Eg: Internet, via
2 Communication
Centralized System

The user who would interact with the system. It can have a Single User or can
3 Users
have Multiple Users.

The component that administers all the above-mentioned parts. It can either be
4 Control
controlled by a single administrator or by multiple administrators.

The type of data that is stored and processed by the system. It also gives the
implication to its dynamics and organizations. The data can be of the following
5 Data types:
 Static data
 Dynamic Data

The infrastructure that the system employs for data discovery. Eg: Web,
6 Infrastructure
Database

Application of UDM:
1. Traffic Safety: Abnormal traffic can be detected using sensors and the data can be stored and analyzed
in the system. This analyzed data can then be used to detect traffic mishaps on a real -time basis using
sensors. Thus, traffic and road safety is monitored.
2. Health Care: The sensors can be used in creating smart homes for elderly persons and people who are
in need of continuous medical attention. The sensors can be used for notifying the immediate medical
need and would be collected. The data so collected can help to provide them with timely medical aids.
3. Crisis and Calamities Management: The previous crisis and calamities data are collected and stored
and analyzed. During times of crisis, sensors can detect the crisis and the data is sent to the controllers.
Results are predicted before the effect can get disastrous. Thus, helps in Crisis Management.
Invisible Data Mining
Data mining is present in all the major aspects of our life. This requires effective mining of data mining
without disclosing private information via extraction of data to outsiders.
Invisible Data Mining is a process of data mining where the functionalities are performed invisibly.

Applications of Invisible Data Mining

1. Search Engines
2. Intelligent Database System
3. e-mail managers

Ubiquitous Data Mining:

Advantages:

 Improved decision-making: Ubiquitous data mining can provide valuable insights into customer
behavior, market trends, and other important factors, enabling organizations to make more
informed decisions.
 Personalized services: Ubiquitous data mining can enable personalized services, such as
customized product recommendations or personalized healthcare treatments, that can improve the
overall customer experience.
 Improved efficiency: Ubiquitous data mining can help streamline business processes, enabling
organizations to operate more efficiently.

Disadvantages:

 Privacy concerns: Ubiquitous data mining can raise privacy concerns, as it involves collecting and
analyzing data about individuals without their explicit consent. This can result in the disclosure of
sensitive information, which can have negative consequences for individuals.
 Security risks: The data used in ubiquitous data mining can be subject to security risks, such as
unauthorized access or hacking, which can result in the exposure of sensitive information.
 Bias: Ubiquitous data mining algorithms can be biased, which can lead to discriminatory outcomes
or reinforce existing biases.

Invisible Data Mining:

Advantages:

 Less intrusive: Invisible data mining can be less intrusive than other forms of data mining, as
individuals may be less aware that their data is being collected and analyzed.
 Improved decision-making: Invisible data mining can provide valuable insights into customer
behavior, market trends, and other important factors, enabling organizations to make more informed
decisions.
 Personalized services: Invisible data mining can enable personalized services, such as customized
product recommendations or personalized healthcare treatments, that can improve the overall
customer experience.

Disadvantages:

 Lack of transparency: Invisible data mining can lack transparency, as individuals may be unaware
that their data is being collected and analyzed.
 Privacy concerns: Invisible data mining can raise privacy concerns, as it involves collecting and
analyzing data about individuals without their explicit consent.
 Security risks: The data used in invisible data mining can be subject to security risks, such as
unauthorized access or hacking, which can result in the exposure of sensitive information.

Privacy, Security, and Social Impacts of data mining:-


Data is a collection of instances, and mining is designed to filter useful information. Data mining, called
knowledge discovery in databases (KDD), is responsible for analyzing data from different perspectives and
classifying them.
Importance of data mining:
 It helps in exploring the large increase in the database and gather only valid information by improving
segmentation.
 It’s an efficient, cost-effective solution by uncovering the risk and fraud that makes profitable production.
 Sometimes customers having difficulty while purchasing helps in decision making and increases the sale.
 Data mining techniques can help organizations in real-time plan and save time.
 Also, saved money through fraud detection.
Application area of data mining:
 Future Healthcare
 Market Basket Analysis
 Manufacturing Engineering
 Fraud Detection
 Intrusion Detection
 Customer Segmentation
 Financial Banking
How data mining influences privacy, security, and socially:
Security and privacy have been an initial concern all the time. It aimed at future predictions using previous
data. Suppose we buy any product so based on past purchases they make predictions and which also target
our personal information. The continuous development of data mining techniques brings serious threats to
data security and privacy which is very important to protect. The real threat is that if information gets exposed
to unauthorized parties, it will be impractical to stop misuse. Therefore, we must need a system that possesses
to protect data and its resources concerning authenticity and integrity.
How we can protect our data:
 Due to minimal protection setup, we lose data so we need to initiate a multilayer security system
 Access Controls are only given to those who have been given the authorization can access the data
 Data must verify an individual user’s identity
So, some privacy preservation methods protect sensitive or private data while allowing useful information
to be extracted from the data set.
 Privacy-Preserving Data Mining (PPDM): The main objective of the PPDM is to protect the privacy
of the data and extract only relevant information. It ensures the protection of individual data to conserve
privacy and provide accuracy by performing all the data mining operations.
Techniques of PPDM is further divided into various categories:
 Data Hiding Technique: In, this technique the data is reform in such a way that the sensitive or private
information will not be visible to other parties. Using various ways we can implement these techniques
such as Cryptographic Technique, Data Perturbation, and Anonymization Technique.
 Knowledge hiding Technique: In this technique, sensitive content is extracted from data using a data
mining algorithm. There are different ways of implementing these techniques such as Association Rule
Hiding, and Query Auditing.
 Hybrid Technique: It is a combination of the two techniques which infuse the limitations of the above
two techniques.
Social Impacts of Data Mining:
Data mining has innovatively influenced our daily lifestyle like how we work, shop, what we buy, search
for any information, importantly saves our precious time and offers personalized product recommendations
based on our previous history like amazon, Flipkart, etc.
Data mining emerging in all fields like Healthcare, Finance, Marketing, and social media. But there is a
higher contribution towards healthcare and well-being by using data mining software to analyze data when
developing drugs and to find associations between patients, drugs, and outcomes. And improving patient
satisfaction, providing more patient-centered care, and decreasing costs, and increase operating efficiency
and Insurance organizations can detect medical insurance fraud and abuse through data minin g and reduce
their losses.
An old payment system has now taken different forms of transactions depending on usage, acceptability,
methods, technology, and availability. It changes the physical financial transactions to virtual payment
transactions. So, data mining focuses on successful transactions and keeps track of fake transactions.
It is also used in Web-wide tracking technology that tracks user’s interests while visiting any site. So,
information about every site is been recorded, which can be used further to provide marketers with
information reflecting your interests.
It is also used for customer relationship management which helps in providing more customized, personal
service to individual customers. By studying browsing and purchasing history on Web stores, companies can
tailor advertisements and promotions to customer profiles, only those who are interested and less likely to
be annoyed with unwanted mailings. This helps in reducing costs, the waste of time, and improving work
productivity.

Advantages:

Improved security: Data mining can help identify patterns and anomalies that could indicate security
breaches, enabling organizations to take action to prevent future attacks.
Personalized services: Data mining can enable personalized services, such as customized product
recommendations or personalized healthcare treatments, that can improve the overall customer experience.
Improved decision making: Data mining can provide insights into customer behavior, market trends, and
other important factors, enabling organizations to make more informed decisions.
Improved efficiency: Data mining can help streamline business processes, enabling organizations to
operate more efficiently.

Disadvantages:

Privacy concerns: Data mining can raise privacy concerns, as it involves collecting and analyzing data
about individuals without their explicit consent. This can result in the disclosure of sensitive information,
which can have negative consequences for individuals.
Security risks: The data used in data mining can be subject to security risks, such as unauthorized access
or hacking, which can result in the exposure of sensitive information.
Bias: Data mining algorithms can be biased, which can lead to discriminatory outcomes or reinforce
existing biases.
Social impacts: Data mining can have social impacts, such as increasing surveillance and reducing
personal autonomy, which can have negative consequences for society as a whole.

Conclusion:

Data mining has several advantages and disadvantages when it comes to privacy, security, and social
impacts. While it can provide valuable insights and improve efficiency, it is important to carefully consider
the potential risks and impacts before implementing data mining practices. Organizations must take steps
to mitigate risks and protect the privacy and security of individuals, while also ensuring that data mining is
conducted in an ethical and responsible manner.

You might also like