0% found this document useful (0 votes)

113 views36 pages

DM Clustering UNIT4

The document discusses different types of clustering algorithms and methods. It covers partitioning, hierarchical, density-based and other approaches. Specific algorithms discussed include k-means, k-medoids, BIRCH, CHAMELEON and DBSCAN. The text provides details on how each works and examples of their applications.

Uploaded by

sriramr2508

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

113 views36 pages

DM Clustering UNIT4

Uploaded by

sriramr2508

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 36

3sem MCA DATA MINING 62 | P a g e

UNIT-IV
CLUSTERING
Clustering
Clustering is the process of partitioning a set of data into meaningful similar subclasses
is called cluster
[Or]
Clustering is the grouping the set of objects such a way that the object of same
group’s are grouped together. i.e., while doing clustering analysis, we first partitioning data
into group based on similarity.
Examples of clustering application:
Marketing
Land use
Insurance
City-planning

A categorization of major clustering methods:

1. Partitioning approach:
Construct various partitions and then evaluate them by some criterion.e.g, Minimizing the
sum of square errors.
Typical methods: k-means, k-medoids, CLARANS

2 .Hierarchical approach:
Create a hierarchical decomposition of the set of data (or objects) using some criterion.
Typical methods: Agglomerative, Divisive clustering
BIRCH
ROCK
CHAMELEON

3. Density-based approach:
Based on connectivity and density functions.
Typical methods:
 Denclue [Density based clustering]
 DBSCAN [Density based clustering method based on connected region with sufficiently
high density]
 OPTICS [Ordering point to identify the clustering structure]

4. Grid based methods:

Based on a multiple-level granularity structure.
Typical methods: Wave cluster, STING [Statistical information Grid]
CLIQUE [Clustering In QUEST]

5. Model-based methods:
A model is hypothesized for each of the clusters and tries to find the best fit of that model to each
other.
Typical method: COWWEB, EM [Expectation & Maximization],

Prepared by RAMIREDDY SUNEEL M.Sc, M.Tech (Computer Science)

3sem MCA DATA MINING 63 | P a g e

SOM[S elf-organizing feature map]

6. Constraint-based methods:
Clustering by considering user-specified or application specific clustering.
Typical methods: COD (Obstacles), constrained clustering.

Partitioning methods:
1. k-mean algorithm:
Step-1 Take mean value (randomly)
Step-2 Find nearest number of mean and put it in cluster.
Step-3 Repeat step-1&step-2 until we get same mean.
Step-4 k-mean method typically uses the Square error criterion function.
EX: O= {2, 3, 4, 10, 11, 12, 20, 25, 30}; K=2
M1=4 M2=12
C1={2,3,4} C2={10,11,12,20,25,30}
M1=9/3 =3 M2=108/6 =18
C1={2,3,4,10} C2={11,12,20,25,3
M1=19/4 =4.75 M2=19.6
M1= ~5 M2= ~20
C1={2,3,4,10,11,12} C2={20,25,30}
M1=7 M2=25
C1={2,3,4,10,11,12} C2={20,25,30}
M1=7 M2=25

2. K-Medoids algorithm:
The k-mean algorithm is sensitivity to outliers because an object with an extremely large values
may substrainly destroy the destroy the distribution of data.
Insetead of taking the mean value of the object in a cluster as a reference point, we can
pick actual objects to represent clusters using one representative object per cluster.
Each remaining object is clustered with the representative object to which it is the most
similar.
Case-1: ‘p’ currenty belongs to representative object Oj.If Oj is replaced by O randam as a
representative object Oi,i≠j , then p is reassigned to Oi.
 →data object
+ →cluster center
- →before swapping
--- →After swapping.
Case-2:’p’ currently belongs to representative object Oj,if Oj is replaced by Orandom as a
representative object and p is close to Orandom then p is assigned to Orandom.
Case-3:’p’ currently belongs to representative object Oi,i≠j, if Oj is replaced by Orandom
as a representative object and p is still close to Oi, then the assigned doesn’t change.
Case-4: ‘p’ currently belongs to representative object Oi, i≠j, if Oj is replaced by
Orandom, then p is reassigned to Orandom.

Prepared by RAMIREDDY SUNEEL M.Sc, M.Tech (Computer Science)

3sem MCA DATA MINING 64 | P a g e

PAM (partitioning around medoids) algorithm:

1. It is one of the k-medoids Algorithm.
2. It uses a k-medoids to identify the clusters
3. CLARANS [clustering Large Application]:
1. CLARA is a simplifying based method used to partition Large data base.
2. The idea behind CLARA is instead of taking the whole set of data in to
consideration, a small portion of the actual data is choosen.
3. Medoids are then choose from this sample using PAM.
4. CLARA applies PAM on each sample and returns its best clustering as the output.
5. CLARA can deal with larger data sets than PAM. The complexities of Each iteration now
becomes O(ks²+k(n-k))
Where ‘s’ size of the sample
‘k’ is the no.of clusters
‘n’ is the total no.of objects

Hierachical clustering Method (HCM):

1. HCM works by grouping data objects in to tree of clusters.
2. HCM can be further classified as either Agglomerative(or) divisive, depending on
whether the hierarchial decomposition is formed in a bottom - up(merging) (or) top-
down(splitting) fastion.
Agglomerative Hierarchial clustering:
1. This is also known as bottom-up method.
2.It starts by placing each object in its own cluster and merge this atomic clusters in to
large &Larger clusters, until all of the objects are in a single cluster.
2. Divisive Hierarchial clustering:
1. This is also known as top-down method.
2. It does the reverse of agglomerative hierarchial clustering by starting with all objects in
one cluster.
3. It subdivide/split the clustera in to smaller &smaller pieces,until each object from a
cluster on its own.

Prepared by RAMIREDDY SUNEEL M.Sc, M.Tech (Computer Science)

3sem MCA DATA MINING 65 | P a g e

Disadvantages:
1.Once merge (or) splits step is done it cannot be redo (or) undo.
2. To overcome this problem and to improve the Quality of hierarchial methods is to
integrate with other clustering techniques.
1. BIRCH- Balanced interactive reducing &clustering using hierarchy.
2. ROCKS- Robust clustering using links.
3. CHAMELEON-

BIRCH: It is designed for clustering Large amount of numerical data by integration of

hierarchical clustering.
BIRCH introduces two concepts
(a) Clustering feature.
(b). clustering feature tree (cf-tree)
Given ‘n’ d- dimensional data objects or points in a cluster, we can define the centroid x ₀,
Radius R, & diameter D.

Clustering feature(CF): CF is the three dimensional vector summarization information

about clusters of object.
CF= {n, Ls,Ss}
Where n is the no.of points in the cluster
Ls is the Linear sum of the n points(i.e.,∑in=1 xi)
Ss is the square sum of the data points(i.e.,∑i=1n xi²)
Ex: suppose that there are three points(2,5),(3,2)ξ (4,3) in a cluster c1.
CF1={n, Ls, Ss}
CF1= {3, (2+3+4,5+2+3), (2²+3²+4² ,5²+2²+3²)}
={3, (9,10), (29,38)}
suppose that c1 is disjoint to a second cluster c2,
where CF2= (3, (35,36),(417,440)}

CF-tree: It is a height-balanced tree that stores the clustering features for a hierarchical
clustering. the size of a clustering feature tree is dependent on two factors:

Prepared by RAMIREDDY SUNEEL M.Sc, M.Tech (Computer Science)

3sem MCA DATA MINING 66 | P a g e

1. Branching factor: It decides the maximum number of child nodes for a non-
leafnode.
2. Threshold : It decides the maximum diameter that a subclusters i.e., a
collection of non-leaf and its child node.

CHAMELEON:
Measures the similarity based on a dynamic model: Two clusters are merged only if the
interconnectivity and closeness between two clusters are high relative to the internal
interconnectivity of the clusters and closeness of items with in the clusters.
→Cure ignores information about interconnectivity of objects.
→ Rock ignores information about closeness of two clusters.
There are two – phases of Algorithm:

1. Use a graph partitioning algorithem: cluster objects in to a large number

of relatively small sub-clusters
2. Use an agg lomerative hierarchial clustering algorithm : find the genuine
clusters by repeatedly combining these sub-clusters

Prepared by RAMIREDDY SUNEEL M.Sc, M.Tech (Computer Science)

3sem MCA DATA MINING 67 | P a g e

Density-based Clustering
The Density-based Clustering tool works by detecting areas where points are concentrated and
where they are separated by areas that are empty or sparse. Points that are not part of a cluster are
labeled as noise.
This tool uses unsupervised machine learning clustering algorithms which automatically detect
patterns based purely on spatial location and the distance to a specified number of neighbors.
These algorithms are considered unsupervised because they do not require any training on what
it means to be a cluster.
Clustering Methods
The Density-based Clustering tool provides three different Clustering Methods with which to
find clusters in your point data:
 Defined distance (DBSCAN)—uses a specified distance to separate dense clusters from
sparser noise. The DBSCAN algorithm is the fastest of the clustering methods, but is only
appropriate if there is a very clear Search Distance to use, and that works well for all
potential clusters. This requires that all meaningful clusters have similar densities.
 Multi-scale (OPTICS)—uses the distance between neighboring features to create a
reachability plot which is then used to separate clusters of varying densities from noise.
The OPTICS algorithm offers the most flexibility in fine-tuning the clusters that are
detected, though it is computationally intensive, particularly with a large Search
Distance.
 For Defined distance (DBSCAN), if the Minimum Features per Cluster cannot be
found within the Search Distance from a particular point, then that point will be marked
as noise. In other words, if the core-distance (the distance required to reach the minimum
number of features) for a feature is greater than the Search Distance, the point is marked
as noise. The Search Distance, when using Defined distance (DBSCAN), is treated as a
search cut-off.

Multi-scale (OPTICS) will search all neighbor distances within the specified Search Distance,
comparing each of them to the core-distance. If any distance is smaller than the core-distance,
then that feature is assigned that core-distance as its reachability distance. If all of the distances
are larger than the core-distance, then the smallest of those distances is assigned as the
reachability distance.

Prepared by RAMIREDDY SUNEEL M.Sc, M.Tech (Computer Science)

3sem MCA DATA MINING 68 | P a g e

While only Multi-scale (OPTICS) uses the reachability plot to detect clusters, the plot can be
used to help explain, conceptually, how these methods differ from each other. For the purposes
of illustration, the reachability plot below will be used to explain the differences in the 3
methods. The plot reveals clusters of varying densities and separation distances.

Grid-Based clustering Methods:

Using multi-resolution grid data structure.
1. STING [A statistical Information Grid approach]
→The spatial area is divided into regular cells.
→There are several levels of cells, corresponding to different levels of resolution.
→Each cell at a high level is partitioned into number of smaller cells in the next lower
level.

→Statistical info of each cell is calculated and stored before hand and is used to
answer Queries.

Prepared by RAMIREDDY SUNEEL M.Sc, M.Tech (Computer Science)

3sem MCA DATA MINING 69 | P a g e

→parameters of higher level cells can be easily calculated from parameters of lower
level cell
→Remove the irrelevant cells from further consideration
→when finish examining the current layer, proceed to the next lower level.
→Repeat this process until the bottom layer is reached.

Advantages:
1. Query-independent, easy to parallelize, incremental update
2. O(k), where k is the number of grid cells at the lowest level
Disadvantages: All the cluster boundaries are either horizontal or vertical, and no
diagonal boundary is detected.
Wave cluster: clustering by wavelet Analysis
A multi-resolution clustering approach which applies wavelet transform to the feature
space. How to apply wavelet transform to find clusters.
Summarizes the data by imposing a multi dimensional grid grid structure on to data space
These multidimensional spatial data objects are represented in a n-dimensional feature
space.
Apply wavelet transform on feature space to find the dense regions in the feature space.

Model-Based Clustering Methods

Model-based clustering methods attempt to optimize the fit between the given data and some
mathematical model. Such methods are often based on the assumption that the data are generated
by a mixture of underlying probability distributions.

Prepared by RAMIREDDY SUNEEL M.Sc, M.Tech (Computer Science)

3sem MCA DATA MINING 70 | P a g e

Expectation-Maximization (EM):
In practice, each cluster can be represented mathematically by a parametric probability
distribution. The entire data is a mixture of these distributions, where each individual distribution
is typically referred to as a component distribution.
The EM (Expectation-Maximization) algorithm is a popular iterative refinement algorithm that
can be used for finding the parameter estimates. It can be viewed as an extension of the k-means
paradigm, which assigns an object to the cluster with which it is most similar, based on the
cluster mean.

Conceptual Clustering
Conceptual clustering is a form of clustering in machine learning that, given a set of unlabeled
objects, produces a classification scheme over the objects. Unlike conventional clustering, which
primarily identifies groups of like objects, conceptual clustering goes one step further by also
finding characteristic descriptions for each group, where each group represents a concept or
class. Hence, conceptual clustering is a two-step process: clustering is performed first, followed
by characterization.

COBWEB is a popular and simple method of incremental conceptual clustering. Its input objects
are described by categorical attribute-value pairs. COBWEB creates a hierarchical clustering in
the form of a classification tree.

Prepared by RAMIREDDY SUNEEL M.Sc, M.Tech (Computer Science)

3sem MCA DATA MINING 71 | P a g e

Outlier Analysis

“What is an outlier?” Very often, there exist data objects that do not comply with the general
behavior or model of the data. Such data objects, which are grossly different from or inconsistent
with the remaining set of data, are called outliers. Outliers can be caused by measurement or
execution error.

Many data mining algorithms try to minimize the influence of outliers or eliminate them all
together. This, however, could result in the loss of important hidden information because one
person’s noise could be another person’s signal. In other words, the outliers may be of particular
interest, such as in the case of fraud detection, where outliers may indicate fraudulent activity.
Thus, outlier detection and analysis is an interesting data mining task, referred to as outlier
mining.

Outlier mining has wide applications. As mentioned previously, it can be used in fraud detection,
for example, by detecting unusual usage of credit cards or telecommunication services. In
addition, it is useful in customized marketing for identifying the spending behavior of customers
with extremely low or extremely high incomes, or in medical analysis for finding unusual
responses to various medical treatments.

Web Data Mining:

Prepared by RAMIREDDY SUNEEL M.Sc, M.Tech (Computer Science)

3sem MCA DATA MINING 72 | P a g e

Data Mining- World Wide Web

Over the last few years, the World Wide Web has become a significant source of information
and simultaneously a popular platform for business. Web mining can define as the method of
utilizing data mining techniques and algorithms to extract useful information directly from the
web, such as Web documents and services, hyperlinks, Web content, and server logs. The World
Wide Web contains a large amount of data that provides a rich source to data mining. The
objective of Web mining is to look for patterns in Web data by collecting and examining data in
order to gain insights.

What is Web Mining?

Web mining can widely be seen as the application of adapted data mining techniques to the web,
whereas data mining is defined as the application of the algorithm to discover patterns on mostly
structured data embedded into a knowledge discovery process. Web mining has a distinctive
property to provide a set of various data types. The web has multiple aspects that yield different
approaches for the mining process, such as web pages consist of text, web pages are linked via
hyperlinks, and user activity can be monitored via web server logs. These three features lead to
the differentiation between the three areas are web content mining, web structure mining, web
usage mining.

Web terminology and the characteristics

In data mining, web terminology pertains to concepts relevant to analyzing web-related data.
This involves web crawling, scraping, URL tokenization, content mining, link analysis, usage
mining, structure mining, and more. Data mining techniques enable understanding user behavior,
preferences, and patterns on websites. Techniques include personalization, anomaly detection,
text mining, and utilizing semantic web concepts. These web-focused approaches allow for the
extraction of valuable insights from the vast amount of data available on the internet.

Some of the characteristics of web user interface are as follow below:

1. Interaction styles
2.The concept of direct Manipulation
3.The characteristics of graphical interfaces
4.The characteristics of web interfaces

Prepared by RAMIREDDY SUNEEL M.Sc, M.Tech (Computer Science)

3sem MCA DATA MINING 73 | P a g e

5.Web page versus Web application design

6.The general principles of user interface design.

There are three types of data mining:

1. Web Content Mining:

2. Web Structured Mining:

3. Web Usage Mining:

Applications of Web Mining

Web mining has numerous applications in various fields, including business, marketing, e-
commerce, education, healthcare, and more. Some common applications of web mining include -

 Marketing and Advertising -

Web mining is used to analyze consumer behavior, identify trends, and personalize
marketing campaigns. This includes targeted advertising, product recommendation, and
customer segmentation.
 Business Intelligence -
Web mining is used to extract valuable insights from web data, including competitor
analysis, market trends, and customer preferences.
 E-commerce -
Web mining is used to analyze user behavior on e-commerce websites,
including purchase history, search queries, and clickstream data. This information can be
used to optimize website design, personalize product recommendations, and improve
customer experience.

Prepared by RAMIREDDY SUNEEL M.Sc, M.Tech (Computer Science)

3sem MCA DATA MINING 74 | P a g e

 Fraud Detection -
Web mining is used to detect fraudulent activities, such as credit card fraud, identity
theft, and online scams. This includes analyzing user behavior patterns, detecting
anomalies, and identifying potential security threats.
 Social Network Analysis -
Web mining is used to analyze social media data and identify social networks,
communities, and influencers. This information can be used to understand social
dynamics, sentiment analysis, and targeted advertising.

Process of Web Mining

The process of web mining typically involves the following steps -

 Data collection -
Web data is collected from various sources, including web pages, databases, and APIs.
 Data pre-processing -
The collected data is pre-processed to remove irrelevant information, such as
advertisements and duplicate content.
 Data integration -
The pre-processed data is integrated and transformed into a structured format for
analysis.
 Pattern discovery -
Web mining techniques are applied to identify patterns, trends, and relationships.
 Evaluation -
The discovered patterns are evaluated to determine their significance and usefulness.
 Visualization -
The analysis results are visualized through graphs, charts, and other visualizations.

Difference Between Data Mining and Web Mining

Parameter Data Mining Web Mining
The process of discovering The process of discovering patterns in web
Definition
patterns in large datasets data
Databases, data warehouses, and Web pages, weblogs, social media, and
Data Source
other data repositories other web-related data sources
Data Structured, semi-structured, and
Mostly unstructured data
Characteristics unstructured data
Clustering, classification, Text mining, natural language processing,
Techniques
association rules, regression, etc. image analysis, link analysis, etc.
Marketing, finance, healthcare, E-commerce, social media, search engines,
Applications
etc. etc.
Data quality, scalability, and Data heterogeneity, ambiguity, and
Challenges
privacy concerns dynamic nature of the web

Prepared by RAMIREDDY SUNEEL M.Sc, M.Tech (Computer Science)

3sem MCA DATA MINING 75 | P a g e

What is Web Content Mining?

Web Content Mining is one of the three different types of techniques in Web Mining. In this
article, we will purely discuss Web Content Mining. Mining, extraction, and integration of
useful data, information, and knowledge from Web page content are known as Web Mining.
It describes the discovery of useful information from web content. In simple words, it is the
application of web mining that extracts relevant or useful information content from the Web.
Web Content mining is somehow related but different from other mining techniques like data
mining and text mining. Due to heterogeneity and the absence of web data, automated
discovery of new knowledge patterns can be challenging to some extent.
Web data are generally semi-structured and/or unstructured, while data mining is primarily
concerned with structured data . It performs scanning and mining of text, image and images,
and groups of web pages according to the content of input by displaying the list in search
engines.
For Example: if the user is searching for a particular song then the search engine will display
or provide suggestions relevant to it.
Web content mining deals with different kinds of data such as text, audio, video, image, etc.
Unstructured Web Data Mining
Unstructured data includes data such as audio, video, etc, We convert these unstructured data
into structured data,i.e., into useful information or structured information (which is known as
Web Content Mining). the process of Conversion is mentioned as follows:

Prepared by RAMIREDDY SUNEEL M.Sc, M.Tech (Computer Science)

3sem MCA DATA MINING 76 | P a g e

Unstructured Documents Feature Extraction:

1. Bag of words to represent unstructured documents
 Takes a single word as a feature.
 It ignores the sequence or order in which words occur.
2. Features could be:
 Boolean: This would either occur or may not occur in the document.
 Frequency-based: A number of times the word is repeated in the particular
document.
3. Variations of the feature selection include:
 Removal of the case, punctuation, less frequent words and also top words, etc.
4. Features can be reduced using different feature selection techniques:
 Gain of Information, measuring of difference between the probability distribution.
 Stemming: it reduces words to their morphological roots.
Mining Techniques Using Agents and Databases:
1. Agent-Based Approaches:
 Intelligent- Search- This type of search basically refers to a particular goal of the
user and will return the results based on the conclusion of that goal.
 Information-Filtering/ Categorization – This type of search basically deals with
the filtering of data, i.e., removal of unwanted information or redundant information
using certain ai based methods. Like, HyPursuit, BO ( Bookmark Organizer).
 Growth of Sophisticated AI systems replacing users in an automated or
unautomated manner. One of these is Deep Learning, wherein the system is trained
by feeding it with certain kinds of data.

2. Database Approaches:
Used for transforming unstructured data into a more structured and high-level collection of
resources, such as in relational databases, and using standard database querying mechanisms
and data mining techniques to access and analyze this information.
 Multilevel Databases:
 Lowest Level – semi-structured information is kept.
 High Level- generalization from lower levels organized into relations
and objects.
 Web Query Systems:
 Web-query systems are developed such as SQL, and Natural Language
Processing for extracting data.

Prepared by RAMIREDDY SUNEEL M.Sc, M.Tech (Computer Science)

3sem MCA DATA MINING 77 | P a g e

Web Content Mining Challenges

Web content mining has the following problems or challenges also with their solutions, such as:

o Data Extraction: Extraction of structured data from Web pages, such as products and
search results. Extracting such data allows one to provide services. Two main types of
techniques, machine learning and automatic extraction, are used to solve this problem.
o Web Information Integration and Schema Matching: Although the Web contains a
huge amount of data, each website (or even page) represents similar information
differently. Identifying or matching semantically similar data is an important problem
with many practical applications.
o Opinion extraction from online sources: There are many online opinion sources, e.g.,
customer reviews of products, forums, blogs, and chat rooms. Mining opinions are of
great importance for marketing intelligence and product benchmarking.
o Knowledge synthesis: Concept hierarchies or ontology are useful in many applications.
However, generating them manually is very time-consuming. The main application is to
synthesize and organize the pieces of information on the web to give the user a coherent
picture of the topic domain. A few existing methods that explore the web's information
redundancy will be presented.
o Segmenting Web pages and detecting noise: In many Web applications, one only wants
the main content of the Web page without advertisements, navigation links, copyright
notices. Automatically segmenting Web pages to extract the pages' main content is an
interesting problem.

Prepared by RAMIREDDY SUNEEL M.Sc, M.Tech (Computer Science)

3sem MCA DATA MINING 78 | P a g e

What is Web Structure Mining?

The challenge for Web structure mining is to deal with the structure of the hyperlinks within the
web itself. Link analysis is an old area of research. However, with the growing interest in Web
mining, the research of structure analysis has increased. These efforts resulted in a newly
emerging research area called Link Mining, which is located at the intersection of the work in
link analysis, hypertext, web mining, relational learning, inductive logic programming, and graph
mining.

Web structure mining uses graph theory to analyze a website's node and connection structure.
According to the type of web structural data, web structure mining can be divided into two kinds:

o Extracting patterns from hyperlinks in the web: a hyperlink is a structural component

that connects the web page to a different location.
o Mining the document structure: analysis of the tree-like structure of page structures to
describe HTML or XML tag usage.

The web contains a variety of objects with almost no unifying structure, with differences in the
authoring style and content much greater than in traditional collections of text documents. The
objects in the WWW are web pages, and links are in, out, and co-citation (two pages linked to by
the same page). Attributes include HTML tags, word appearances, and anchor texts. Web
structure mining includes the following terminology, such as:

o Web graph:directed graph representing web.

o Node: web page in the graph.
o Edge: hyperlinks.
o In degree: the number of links pointing to a particular node.
o Out degree: number of links generated from a particular node.

An example of a technique of web structure mining is the PageRank algorithm used by Google
to rank search results. A page's rank is decided by the number and quality of links pointing to the
target node.

Link mining had produced some agitation on some traditional data mining tasks. Below we
summarize some of these possible tasks of link mining which are applicable in Web structure
mining, such as:

1. Link-based Classification: The most recent upgrade of a classic data mining task to
linked Domains. The task is to predict the category of a web page based on words that

Prepared by RAMIREDDY SUNEEL M.Sc, M.Tech (Computer Science)

3sem MCA DATA MINING 79 | P a g e

occur on the page, links between pages, anchor text, html tags, and other possible
attributes found on the web page.
2. Link-based Cluster Analysis: The data is segmented into groups, where similar objects
are grouped together, and dissimilar objects are grouped into different groups. Unlike the
previous task, link-based cluster analysis is unsupervised and can be used to discover
hidden patterns from data.
3. Link Type: There is a wide range of tasks concerning predicting the existence of links,
such as predicting the type of link between two entities or predicting the purpose of a
link.
4. Link Strength: Links could be associated with weights.
5. Link Cardinality: The main task is to predict the number of links between objects. page
categorization used to
o Finding related pages.
o Finding duplicated websites and finding out the similarity between them.

What is Web Usage Mining?

Web Usage Mining focuses on techniques that could predict the behavior of users while they are
interacting with the WWW. Web usage mining, discovering user navigation patterns from web
data, trying to discover useful information from the secondary data derived from users'
interactions while surfing the web. Web usage mining collects the data from Weblog records to
discover user access patterns of web pages. Several available research projects and commercial
tools analyze those patterns for different purposes. The insight knowledge could be utilized in
personalization, system improvement, site modification, business intelligence, and usage
characterization.

The only information left behind by many users visiting a Web site is the path through the pages
they have accessed. Most of the Web information retrieval tools only use textual information,
while they ignore the link information that could be very valuable. In general, there are mainly
four kinds of data mining techniques applied to the web mining domain to discover the user
navigation pattern, such as:

1. Association Rule Mining

Association rule is the most basic rule of data mining methods which is used more than other
methods in web usage mining. This method enables the website for more efficient content
organization or provides recommendations for an effective cross-selling product.

Prepared by RAMIREDDY SUNEEL M.Sc, M.Tech (Computer Science)

3sem MCA DATA MINING 80 | P a g e

These rules are statements in the form X => Y where (X) and (Y) are the set of available items in
a series of transactions. The rule of X => Y states that transactions that contain items in X may
also include items in Y. Association rules in the web usage mining are used to find relationships
between pages that frequently appear next to one another in user sessions.

2. Sequential Patterns

Sequential patterns are used to discover the subsequence in a large volume of sequential data. In
web usage mining, sequential patterns are used to find user navigation patterns that frequently
appear at meetings. The sequential patterns may seem to be association rules. But the sequential
patterns are included the time, which means that the sequence of events that occurred is defined
in sequential patterns. Algorithms that are used to extract association rules can also be used to
generate sequential patterns. Two types of algorithms are used for sequential mining patterns.

o The first type of algorithm is based on association rules mining. Many common
algorithms of sequential mining patterns have been changed for mining association rules.
For example, GSP and AprioriAll are two developed species of Apriori algorithms that
are used to extract association rules. But some researchers believe that association rules
mining algorithms do not have enough performance in the long sequential patterns
mining.
o The second type of sequential patterns mining algorithms has been introduced in which
the tree structure and Markov chain are used to represent survey patterns. For example, in
one of these algorithms called WAP-mine, the tree structure called WAP-tree is used to
explore access patterns to the web. Evaluation results show that its performance is higher
than an algorithm such as GSP.

3. Clustering

Clustering techniques diagnose groups of similar items among high volumes of data. This is
done based on distance functions which measure the degree of similarity between different items.
Clustering in web usage mining is used for grouping similar meetings. What is important in this
type of search is the contrast between the user and individual groups. Two types of interesting
clustering can be found in this area: user clustering and page clustering.

Clustering of user records is usually used to analyze web mining and web analytics tasks. More
knowledge derived from clustering is used to partition the market in e-commerce. Different
methods and techniques are used for clustering, which includes:

o Using the similarity graph and the amount of time spent viewing a page to estimate the
similarity of meetings.

Prepared by RAMIREDDY SUNEEL M.Sc, M.Tech (Computer Science)

3sem MCA DATA MINING 81 | P a g e

o Using genetic algorithms and user feedback.

o Clustering matrix.
o K -means algorithm, which is the most classic clustering method.

The repetitive patterns are first extracted from the user's sessions using association rules in other
clustering methods. Then, these patterns are used to construct a graph where the nodes are the
visited pages. The edges of the graph connect two or more pages. If these pages exist in a pattern
extracted, the weight will be assigned to the edges that show the relationship between the nodes.
Then, for clustering, this graph is recursively divided to user behavior groups are detected.

4. Classification Mining

Discovering classification rules allows one to develop a profile of items belonging to a particular
group according to their common attributes. This profile can classify new data items added to the
database. In Web Mining, classified techniques allow one to develop a profile for clients who
access particular server files based on demographic information available on those clients or
their navigation patterns.

Advantages

Web usage mining has many advantages, making this technology attractive to corporations,
including government agencies.

o This technology has enabled e-commerce to do personalized marketing, resulting in

higher trade volumes. Government agencies are using this technology to classify threats
and fight against terrorism.
o Companies can establish better customer relationships by understanding the customer's
needs better and reacting to customer needs faster. They can increase profitability by
target pricing based on the profiles created. They can even find customers who might
default to a competitor. The company will try to retain the customer by providing
promotional offers to the specific customer, thus reducing the risk of losing a customer or
customers.
o More benefits of web usage mining, particularly personalization, are outlined in specific
frameworks like the probabilistic latent semantic analysis model, which offers additional
features to user behavior and access patterns. This is because the process provides the
user with more relevant content through collaborative recommendations.

Prepared by RAMIREDDY SUNEEL M.Sc, M.Tech (Computer Science)

3sem MCA DATA MINING 82 | P a g e

o There are also elements unique to web usage mining that show the technology's benefits.
These include the way semantic knowledge is applied when interpreting, analyzing and
reasoning about usage patterns during the mining phase.

Disadvantages

Web usage mining by itself does not create issues, but when used on data of personal nature, this
technology might cause concerns.

o The most criticized ethical issue involving web usage mining is the invasion of privacy.
Privacy is considered lost when information concerning an individual is obtained, used,
or disseminated, especially if this occurs without the individual's knowledge or consent.
The obtained data will be analyzed, made anonymous, and then clustered to form
anonymous profiles.
o These applications de-individualize users by judging them by their mouse clicks rather
than by identifying information. De-individualization, in general, can be defined as a
tendency to judge and treat people based on group characteristics instead of on their
characteristics and merits.
o The companies collecting the data for a specific purpose might use the data for totally
different purposes, violating the user's interests.

Web Usage Mining Applications

The main objective of web usage mining is to collect data about the user's navigation patterns.
This information can improve the Web sites in the user view. There are three main applications
of this mining, such as:

1. Privatization of web content

Web usage mining techniques can be used for the personalization of web users. For example,
user behavior can be immediately predicted by comparing her current survey patterns with those
extracted from the log files. Recommendation systems with a real application in this area suggest
links that direct the user to his favorite pages. Some sites also organize their product catalogs
based on the predicted interests of a specific user and represent them.

2. Pre - recovery

The results of web usage mining can be used to improve the performance of Web servers and
Web-based applications. Web usage mining can be used for retrieving and caching strategies and
thus reduce the response time of Web servers.

Prepared by RAMIREDDY SUNEEL M.Sc, M.Tech (Computer Science)

3sem MCA DATA MINING 83 | P a g e

3. Improvement of Web site design

Usability is one of the most important issues in designing and implementing websites. The
results of web usage mining can help to appropriate the design of websites. Adaptive websites
are an application of this type of mining. Website content and structure are dynamically
reorganized based on data derived from user behavior in these sites.

What is a search engine?

A search engine is a coordinated set of programs that searches for and identifies items in
a database that match specified criteria. Search engines are used to access information on the
World Wide Web.

How do search engines work?

Google is the most commonly used internet search engine. Google search takes place in the
following three stages:

1. Crawling. Crawlers discover what pages exist on the web. A search engine
constantly looks for new and updated pages to add to its list of known pages. This is
referred to as URL discovery. Once a page is discovered, the crawler examines its
content. The search engine uses an algorithm to choose which pages to crawl and
how often.

2. Indexing. After a page is crawled, the textual content is processed, analyzed and
tagged with attributes and metadata that help the search engine understand what the
content is about. This also enables the search engine to weed out duplicate pages and
collect signals about the content, such as the country or region the page is local to and
the usability of the page.

3. Searching and ranking. When a user enters a query, the search engine searches the
index for matching pages and returns the results that appear the most relevant on the
search engine results page (SERP). The engine ranks content on a number of factors,
such as the authoritativeness of a page, back links to the page and keywords a page
contains.

Prepared by RAMIREDDY SUNEEL M.Sc, M.Tech (Computer Science)

3sem MCA DATA MINING 84 | P a g e

Specialized content search engines are more selective about the parts of the web they crawl and
index. For example, Creative Commons Search is a search engine for content shared explicitly
for reuse under Creative Commons license. This search engine only looks for that specific type
of content.

Country-specific search engines may prioritize websites presented in the native language of the
country over English websites. Individual websites, such as large corporate sites, may use a
search engine to index and retrieve only content from that company's site. Some of the major
search engine companies license or sell their search engines for use on individual sites.

Search
engines crawl, index and rank content across the internet, using algorithms to decide placement
on results pages.

How search engines rank results

Not every search engine ranks content the same way, but some have similar ranking algorithms.
Google search and other search engines like it rank relevant results based on the following
criteria:

 Query meaning. The search engine looks at user queries to establish searcher intent,
which is the specific type of information the user is looking for. Search engines use
language models to do this. Language models are algorithms that read user input,
understand what it means and determine the type of information that a user is looking
for.

Prepared by RAMIREDDY SUNEEL M.Sc, M.Tech (Computer Science)

3sem MCA DATA MINING 85 | P a g e

 Relevance. Keywords from search queries are matched to keywords in content.

Keywords that appear in several places in the content signify more relevance than
others.

 Quality. Search engines look for indicators of expertise, authority and

trustworthiness in the content. If other prominent websites link to the content, it is
considered more trustworthy.

 Usability. Search engines evaluate the accessibility and general user experience of
content and reward content with better page experience. One example of page
usability is mobile-friendliness, which is a measure of how easy a webpage is to use
on a mobile device.

 User data. A user's past search history, search settings and location data are a few of
the data types search engines use to determine the content rankings they choose.

Search engines might use other website performance metrics, such as bounce rate and time spent
on page, to determine where websites rank on a results page. Search engines might return
different results for the same term searched as text-based content versus an image or video
search.

Prepared by RAMIREDDY SUNEEL M.Sc, M.Tech (Computer Science)

3sem MCA DATA MINING 86 | P a g e

engines often provide links to videos on their search engine results pages.

Content creators use search engine optimization (SEO) to take advantage of the above processes.
Optimizing the content on a page for search engines increases its visibility to searchers and its
ranking on the SERP. For example, a content creator could insert keywords relevant to a given
search query to improve results for that query. If the content creator wants people searching for
dogs to land on their page, they might add the keywords bone, leash and hound. They might also
include links to pages that Google deems authoritative.

What is the goal of search engines?

The primary goal of a search engine is to help people search for and find information. Search
engines are designed to provide people with the right information based on a set of criteria, such
as quality and relevance.

Webpage and website providers use search engines to make money and to collect data, such
as clickstream data, about searchers. These are secondary goals that require users to trust that the
content they are getting on a SERP is enough to engage with it. Users must see the information
they're getting is the right information.

User trust can be earned in different ways, including the following:

 Organic results. Unpaid organic results are seen as more trustworthy than paid, ad-
based results.

 Authority. Google seeks to establish a webpage's authority to identify it as the source

of true information.

 Privacy. DuckDuckGo is a search engine that uses privacy protection to establish

trust. It protects user privacy and avoids skewed search results that can come from
using personal information to target users or place them in limited search categories,
known as filter bubbles.

Prepared by RAMIREDDY SUNEEL M.Sc, M.Tech (Computer Science)

3sem MCA DATA MINING 87 | P a g e

Search
engines return both organic and paid results; the two differ in several ways.
How do search engines make money?

Search engines make money in several ways, including the following:

 Pay-per-click ads. Advertisers or third-party advertising networks place ads on

SERPs and on the content itself. The more views or clicks a search-related keyword
gets, the more advertisers pay to have their advertisements associated with it.

 User data. Search engines also make money from the user data that they collect.
Examples include search history and location data. This data is used to create a digital
profile for a given searcher, which search engine providers can use to serve targeted
ads to that user.

 Contextual ads. Search engines also capitalize on serving up contextual ads that are
directly related to the user's current search. If a search engine includes a shopping
feature on the platform, it might display contextual ads for products related to the
user's search in the sidebar of a website where advertisements are displayed. For
example, if the online store sells books, an ad may appear in the corner of the page
for reading glasses.

 Donations. Some search engines are designed help nonprofits solicit donations.

Prepared by RAMIREDDY SUNEEL M.Sc, M.Tech (Computer Science)

3sem MCA DATA MINING 88 | P a g e

 Affiliate links. Some engines include affiliate links, where the search engine has a
partnership in which the partner pays the search engine when a user clicks the
partner's link.

How do search engines personalize results?

Search engines personalize results based on digital searcher profiles created from user data. User
data is collected from the application or device a user accesses the search engine with. User data
collected includes the following:

 search history

 search date and time

 location information

 audio data

 user ID

 device identification

 IP address

 device diagnostic data

 contact lists

 purchase history

Cookies are used to track browsing history and other data. They are small text files sent from the
websites a user visits to their web browser. Search engines use cookies to track user preferences
and personalize results and ads. They are able to remember settings, such as passwords, language
preferences, content filters, how many results per page and session information.

Using private browsing settings or incognito browsing protects users from tracking but only at
the device level. Search history and other information accumulated during search is not saved
and is deleted after the search session. However, internet service providers, employers and the

Prepared by RAMIREDDY SUNEEL M.Sc, M.Tech (Computer Science)

3sem MCA DATA MINING 89 | P a g e

domain owners of the websites visited are able to track digital information left behind during a
search.

Characteristics

The main purpose of a Web Search Engine is to provide website listings that are being sought by
the user. To do this, the website usually collects words fromthe usr that it then matches with
websites to bring results. However, this process of collecting words and matching is not a simple
excercise because it has to know the ‘stress’ factor on each word. So different search engine
technologies would use different word resolution methods. In Zapaat Search Engine, for
example, some characteristics are:

1. Context : Words that define the type of websites that the user is interested in.
2. Keywords : Words to look for in particular websites that match Context.
3. Layering: Define a context within a context to narrow the result set.
4. Connected Words: Define adjacent words as keywords.

Prepared by RAMIREDDY SUNEEL M.Sc, M.Tech (Computer Science)

3sem MCA DATA MINING 90 | P a g e

Functions of the Search Engines

the four main functions of the search engines.

Prepared by RAMIREDDY SUNEEL M.Sc, M.Tech (Computer Science)

3sem MCA DATA MINING 91 | P a g e

Crawling
The crawler, or web spider, is a vital software component of the search engine. It essentially
sorts through the Internet to find website addresses and the contents of a website for storage in
the search engine database. Crawling can scan brand new information on the Internet or it can
locate older data. Crawlers have the ability to search a wide range of websites at the same time
and collect large amounts of information simultaneously. This allows the search engine to find
current content on an hourly basis. The web spider crawls until it cannot find any more
information within a site, such as further hyperlinks to internal or external pages.

Indexing
Once the search engine has crawled the contents of the Internet, it indexes that content based on
the occurrence of keyword phrases in each individual website. This allows a particular search
query and subject to be found easily. Keyword phrases are the particular group of words used by
an individual to search a particular topic.
The indexing function of a search engine first excludes any unnecessary and common articles
such as "the," "a" and "an." After eliminating common text, it stores the content in an organized
way for quick and easy access. Search engine designers develop algorithms for searching the
web according to specific keywords and keyword phrases. Those algorithms match user-
generated keywords and keyword phrases to content found within a particular website, using the
index.

Storage
Storing web content within the database of the search engine is essential for fast and easy
searching. The amount of content available to the user is dependent on the amount of storage
space available. Larger search engines like Google and Yahoo are able to store amounts of data
ranging in the terabytes, offering a larger source of information available for the user.

Results
Results are the hyperlinks to websites that show up in the search engine page when a certain
keyword or phrase is queried. When you type in a search term, the crawler runs through the

Prepared by RAMIREDDY SUNEEL M.Sc, M.Tech (Computer Science)

3sem MCA DATA MINING 92 | P a g e

index and matches what you typed with other keywords. Algorithms created by the search engine
designers are used to provide the most relevant data first. Each search engine has its own set of
algorithms and therefore returns different results.
Ranking Algorithms

The ranking algorithm is used by Google to rank web pages according to the Google search
algorithm.

There are the following ranking features that affect the search results -

o Location and frequency

o Link Analysis
o Clickthrough measurement

Page Rank Algorithm in Data Mining

The page rank algorithm is applicable to web pages. The page rank algorithm is used by
Google Search to rank many websites in their search engine results. The page rank algorithm
was named after Larry Page, one of the founders of Google. We can say that the page rank
algorithm is a way of measuring the importance of website pages. A web page basically is a
directed graph which is having two components namely Nodes and Connections. The pages are
nodes and hyperlinks are connections.
Let us see how to solve Page Rank Algorithm. Compute page rank at every node at the end of
the second iteration. use teleportation factor = 0.8

Refer Online further

Enterprise Search

Popular search engines like Google and Bing are so enmeshed in our everyday lives that have
become synonymous with search in most of our minds. However, though web search and
enterprise are broadly comparable, they work in quite different ways and serve distinct purposes.

Prepared by RAMIREDDY SUNEEL M.Sc, M.Tech (Computer Science)

3sem MCA DATA MINING 93 | P a g e

Enterprise search tools are for use by employees. They retrieve information from all types of data
that an organization stores, including both structured data, which is found in databases, and
unstructured data that takes the form of documents like PDFs and media.

The term “enterprise search” describes the software used to search for information inside a
corporate organization. The technology identifies and enables the indexing, searching and
display of specific content to authorized users across the enterprise.

IT industry analysts have shared that enterprise search is growing into something new. In 2017,
for instance, Gartner created a new enterprise search category called “Insight Engines.” These
solutions help business synthesize information interactively, or even proactively, by ingesting,
organizing and analyzing data. Forrester, another prominent analyst firm, defines this new
category as “Cognitive Search.”
How does enterprise search work?
Content is the raw material for enterprise search
More and more data to analyze, structure and classify
Data becomes more pervasive within a business as the organization grows. There can be a huge
proliferation of product information, process information, marketing content and so forth.
Individual teams create content, which then inevitably spreads across the enterprise.

Diversity of data
The information found inside large organizations tends to be highly diverse and fragmented. It’s
invariably hosted on a broad range of repositories and enterprise applications. These include
Content Management Systems (CMS’s), Enterprise Resource Planning solutions (ERP),
Customer Relationship Management (CRM), Relational Database Management Systems
(RDBMS’s), file systems, archives, data lakes, email systems, websites, intranets and social
networks as well as both private and public cloud platforms.

The data comes from a variety of sources. Structured and unstructured data are kept in different
“containers.”

How enterprise search indexes, classes and ranks data

Search engine process
The enterprise search process occurs in three main phases:

 Exploration – Here, the enterprise search engine software crawls all data sources,
gathering information from across the organization and its internal and external data
sources.

Prepared by RAMIREDDY SUNEEL M.Sc, M.Tech (Computer Science)

3sem MCA DATA MINING 94 | P a g e

 Indexing – After the data has been recovered, the enterprise search platform performs
analysis and enrichment of the data by tracking relationships inside the data—and then
storing the results so as to facilitate accurate, quick information retrieval.

 Search – On the front end, employees request for information in their native languages.
The enterprise search platform then offers answers—taking the form of content and
pieces of content—that appear to be the most relevant to the query. The query response
also factors in the employee’s work context. Different people may get different answers
that relate to their work and search histories.
Techniques like Natural Language Processing (NLP) and Machine Learning are often involved
in determining relevant answers to queries.

Natural Language Processing (NLP)

NLP, a branch of Artificial Intelligence (AI), involves interactions between humans and
computers that take the form of natural, human-like language. Its ultimate objective is to make
sense of human languages in a way that is valuable to the process at hand, with the computer
reading, deciphering, understanding the human language.

Machine Learning
Machine learning applies AI to give systems the ability to learn and improve from experience,
automatically, without the need to be programmed explicitly. It focuses on creating computer
software that can access data and then make use of it for learning purposes.

User Experience Design

User experience (UX) design is about creating products that deliver relevant, meaningful
experiences to end users. It comprises the design of the entire process required to acquire and
integrate the product. This includes branding, design, function and usability.
Why is enterprise search strategic in big companies?
Content without access is worthless
Enterprise search helps people in an organization find the information they need to perform their
jobs. It gives them access to data extracted from inside the business, along with external data
sources like document management systems, databases, paper and so forth.

Time is money: how Enterprise Search increases productivity

Studies reveal the cost of employees spending time finding knowledge:

 “The knowledge worker spends about 2.5 hours per day, or roughly 30% of the workday,
searching for information.” – IDC

Prepared by RAMIREDDY SUNEEL M.Sc, M.Tech (Computer Science)

3sem MCA DATA MINING 95 | P a g e

 “The research found that on average, workers in both the U.K. and U.S. spent up to 25
minutes looking for a single document in over a third of searches conducted.”
– SearchYourCloud

 “The average digital worker spends an estimated 28 percent of the workweek searching
e-mail and nearly 20 percent looking for internal information or tracking down colleagues
who can help with specific tasks.” – McKinsey & Company

Enterprise search software reduces the time employees require to find the necessary information.
As a result, it opens up work schedules for more high-value tasks. This improvement is
particularly important given the current emphasis on getting optimal performance out of teams in
lean, digital, agile organizations.

Enterprise Search, Insight Engine or Cognitive Search

Cognitive search, the new generation of technology for information gathering, uses AI
capabilities like NLP and machine learning to ingest, analyze and query digital data content from
multiple sources. Users receive results that are more relevant to their intentions. Cognitive search
solutions are essential to delivering the most valuable customers and employee experiences.

Prepared by RAMIREDDY SUNEEL M.Sc, M.Tech (Computer Science)

3sem MCA DATA MINING 96 | P a g e

Apply Enterprise Search to many Use Cases

Enterprise search engines can put to work across many different use cases in order that are
intended to improve productivity:

 Digital workplace – Enabling teams to be more productive and collaborate effectively

using enterprise search as part of an overall digital workplace experience.

 Customer service – Giving customer service representatives the ability to quickly and
easily find the information they need to deliver excellent customer service.

 Knowledge management – Applying enterprise search to facilitate the corporate

knowledge management process.

 Contact experts – Letting employees search for experts and filter results according to
expertise and knowledge.

 Talent search – Matching candidates with job descriptions from a database of potential
candidates.

 Intranet search – helping intranet users locate information they need from shared drives
and databases.

 Insight engines – Leveraging AI to detect relationships between people, content and data
as well as connections between user interests and current and past search queries.
What are the main criteria to select an enterprise search software?
Connectors
How many data connectors will an enterprise search engine need for the data sources it has to
index? The best practice is to include the sources that are likely to be indexed in the future in
addition to what is planned for current indexing. If a company plans to decommission a data
source in a year or so, however, it may want to exclude it from the connection and indexing
processes. This is particularly true if the data is going to migrated to a new source.

Privacy & Security

Data security and privacy is of paramount importance in the enterprise search process. The
enterprise search platform must be configured to comply with corporate security policies, SOC2
and regulations like GDPR. Efforts must be made to ensure the integrity and confidentiality of
data. Critical business assets must be protected.

The following enterprise search platform characteristics and features help make sure that
information and documents are only accessible to users with the right permissions:

Prepared by RAMIREDDY SUNEEL M.Sc, M.Tech (Computer Science)

3sem MCA DATA MINING 97 | P a g e

 Adhering to international and industry-specific compliance standards

 Protecting content from malicious actors using built-in encryption in the indexing
pipeline

 Customizing the process regarding IP restrictions and encryption

 Synchronizing with single sign-on (SSO) providers

 Controlling access on a per-user basis and using security filters for indexed content

 Using multilayer security across the cloud, on-premises data centers, intranets and
operations
Intelligent search or Predictive AI
Predictive AI is seen as the future of enterprise search engines. With self-learning algorithms
embedded in enterprise search tools, it is possible to innovate by learning from users and
improving results based on their usage patterns. Furthermore, by using custom APIs that are
designed to make search tools work optimally for a given audience, it is possible to deliver fine-
tuned results that improve over time.
A digital workplace solution to improve productivity
The exploding data and hours of time that employees waste looking for what they need has other
ramifications, as well. Without a reliable way to search through and contextualize all the
structured and unstructured data that exists, insights are routinely missed, and the value of the
data is lost.

Employees often need to ask colleagues for help finding the information they need, wasting
additional time and resources and slowing progress. And in the end, the digital workplace that
was meant to facilitate more creative, nonroutine work actually ends up producing the inverse.

An enterprise search solution can solve this information crisis. It can search and retrieve data
regardless of format, type, language, and location. But more than that, it can use AI to
understand the context of each piece and match it to the search intent. And the more data it is
fed, the more it learns, returning better results with each query.

To the end user, it is a simple and familiar experience that delivers powerful results. For
businesses overall, it’s a key building block in their digital transformation.

Prepared by RAMIREDDY SUNEEL M.Sc, M.Tech (Computer Science)

DWDM Unit 4
No ratings yet
DWDM Unit 4
24 pages
Clustering Algorithms Overview
No ratings yet
Clustering Algorithms Overview
6 pages
By Lior Rokach and Oded Maimon: Clustering Methods
No ratings yet
By Lior Rokach and Oded Maimon: Clustering Methods
5 pages
DWMModule 4
No ratings yet
DWMModule 4
31 pages
Unit 2 DMW
No ratings yet
Unit 2 DMW
26 pages
Cluster
No ratings yet
Cluster
20 pages
Unit-5 DM
No ratings yet
Unit-5 DM
11 pages
ML Unit 4 Notes - NJ
No ratings yet
ML Unit 4 Notes - NJ
15 pages
Module 3 Clustering
No ratings yet
Module 3 Clustering
57 pages
Unit IV Cluster Analysis
No ratings yet
Unit IV Cluster Analysis
7 pages
Clustering
No ratings yet
Clustering
34 pages
Cluster Analysis - Approach 1
No ratings yet
Cluster Analysis - Approach 1
28 pages
Clustering
No ratings yet
Clustering
7 pages
Unit 4 - Data Warehousing and Mining
No ratings yet
Unit 4 - Data Warehousing and Mining
51 pages
Pam Clustering Technique: Bachelor of Technology Computer Science and Engineering
No ratings yet
Pam Clustering Technique: Bachelor of Technology Computer Science and Engineering
11 pages
Session 3: Clustering Techniques - Partitioning & Hierarchical Methods
No ratings yet
Session 3: Clustering Techniques - Partitioning & Hierarchical Methods
27 pages
Cluster Analysis
No ratings yet
Cluster Analysis
26 pages
Introduction To Cluster Analysis.
No ratings yet
Introduction To Cluster Analysis.
53 pages
Unit Iv
No ratings yet
Unit Iv
14 pages
Cluster Analysis and Methods Overview
No ratings yet
Cluster Analysis and Methods Overview
47 pages
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
No ratings yet
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
22 pages
2002 Spring CS525 Lecture 2
No ratings yet
2002 Spring CS525 Lecture 2
37 pages
Clustering
No ratings yet
Clustering
45 pages
DM Module 4
No ratings yet
DM Module 4
17 pages
Graph Partitioning & Clustering Techniques
No ratings yet
Graph Partitioning & Clustering Techniques
14 pages
Data Mining Unit 5
No ratings yet
Data Mining Unit 5
30 pages
Unit VII
No ratings yet
Unit VII
30 pages
Partitioning Methods
100% (1)
Partitioning Methods
3 pages
Clustering
No ratings yet
Clustering
28 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
52 pages
Unit 4 Cluster Analysis 3
No ratings yet
Unit 4 Cluster Analysis 3
20 pages
Unit 4
No ratings yet
Unit 4
4 pages
Unit 4 Descriptive Modeling
No ratings yet
Unit 4 Descriptive Modeling
18 pages
A Parallel Study On Clustering Algorithms in Data Mining
No ratings yet
A Parallel Study On Clustering Algorithms in Data Mining
7 pages
Unit 4 Data Warehousing
No ratings yet
Unit 4 Data Warehousing
47 pages
Cluster Analysis
No ratings yet
Cluster Analysis
18 pages
HTCB Unit 5
No ratings yet
HTCB Unit 5
3 pages
A Short Review On Different Clustering Techniques and Their Applications
No ratings yet
A Short Review On Different Clustering Techniques and Their Applications
15 pages
An Improved K-Means Cluster Algorithm Using Map Reduce Techniques To Mining of Inter and Intra Cluster Datain Big Data Analytics
No ratings yet
An Improved K-Means Cluster Algorithm Using Map Reduce Techniques To Mining of Inter and Intra Cluster Datain Big Data Analytics
12 pages
Data Warehouse and Mining UNIT 4
No ratings yet
Data Warehouse and Mining UNIT 4
10 pages
Amity School of Engineering and Technology Amity University, Uttar Pradesh
No ratings yet
Amity School of Engineering and Technology Amity University, Uttar Pradesh
5 pages
Clustering Algorithm: A Fundamental Operation in Data Mining
No ratings yet
Clustering Algorithm: A Fundamental Operation in Data Mining
44 pages
Paper-2 Clustering Algorithms in Data Mining A Review
No ratings yet
Paper-2 Clustering Algorithms in Data Mining A Review
7 pages
Unit 5 - Cluster Analysis
No ratings yet
Unit 5 - Cluster Analysis
14 pages
Data Mining Clustering
No ratings yet
Data Mining Clustering
76 pages
Segment 7 (Ch10)
No ratings yet
Segment 7 (Ch10)
60 pages
Unit-4 Notes
No ratings yet
Unit-4 Notes
16 pages
Clustering in Data Mining
No ratings yet
Clustering in Data Mining
14 pages
Unit 5 - Cluster Analysis
No ratings yet
Unit 5 - Cluster Analysis
14 pages
U-5 Iml
No ratings yet
U-5 Iml
20 pages
DMT Unit-5
No ratings yet
DMT Unit-5
10 pages
DataMining Unit4 Notes
No ratings yet
DataMining Unit4 Notes
27 pages
DM Notes - UNIT 4
No ratings yet
DM Notes - UNIT 4
31 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
42 pages
Session 7 Clustering
No ratings yet
Session 7 Clustering
93 pages
Custer Analysis: Prepared by Navin Ninama
No ratings yet
Custer Analysis: Prepared by Navin Ninama
20 pages
Slide-08-Chapter10-Cluster Analysis Basic Concept I
No ratings yet
Slide-08-Chapter10-Cluster Analysis Basic Concept I
40 pages
Plato Family PDF
No ratings yet
Plato Family PDF
40 pages
SOAL Latihan PTS 1
No ratings yet
SOAL Latihan PTS 1
6 pages
FKUH - Vit Mineral PDF
No ratings yet
FKUH - Vit Mineral PDF
37 pages
Volumetric Analysis2024
No ratings yet
Volumetric Analysis2024
3 pages
Research 2022 2023
No ratings yet
Research 2022 2023
110 pages
BPCC 111
No ratings yet
BPCC 111
293 pages
Devotional
No ratings yet
Devotional
4 pages
Elrid H Itl Eagle Ridge Hospital G G P: Concept Master Plan Concept Master Plan
No ratings yet
Elrid H Itl Eagle Ridge Hospital G G P: Concept Master Plan Concept Master Plan
27 pages
Wireless Network Monitoring Tool
0% (1)
Wireless Network Monitoring Tool
17 pages
Harga Elektrikal
100% (1)
Harga Elektrikal
3 pages
120 Expert Opinions On Coronavirus
No ratings yet
120 Expert Opinions On Coronavirus
28 pages
Zumba
No ratings yet
Zumba
4 pages
Solved Problems Set Magnetically Coupled Circuits
No ratings yet
Solved Problems Set Magnetically Coupled Circuits
6 pages
SSRN Id4346152
No ratings yet
SSRN Id4346152
27 pages
W8 (Value Judgement)
No ratings yet
W8 (Value Judgement)
22 pages
Name Email Phone Number
No ratings yet
Name Email Phone Number
18 pages
WEB打印控件LODOP的说明文档
No ratings yet
WEB打印控件LODOP的说明文档
19 pages
Image and Identity Copy-1
No ratings yet
Image and Identity Copy-1
8 pages
38-Service Manual Rear Final Drive 01R
No ratings yet
38-Service Manual Rear Final Drive 01R
102 pages
Organization and Management Quiz 1
No ratings yet
Organization and Management Quiz 1
13 pages
Sociology PPT 1
No ratings yet
Sociology PPT 1
18 pages
Chapter10 - Object-Oriented Systems Analysis and Design Using UML - E
No ratings yet
Chapter10 - Object-Oriented Systems Analysis and Design Using UML - E
79 pages
Step 50200 50500
No ratings yet
Step 50200 50500
6 pages
Prestressed Concrete Beam Analysis
No ratings yet
Prestressed Concrete Beam Analysis
5 pages
The Mill On The Floss As A Tragedy
No ratings yet
The Mill On The Floss As A Tragedy
3 pages
In Vitro Antioxidant Activity of Ficus Hispida Leaves
No ratings yet
In Vitro Antioxidant Activity of Ficus Hispida Leaves
1 page
Autolift Rodlifter
No ratings yet
Autolift Rodlifter
2 pages
1 - Eva Muchinik - History of Social Psychology
No ratings yet
1 - Eva Muchinik - History of Social Psychology
5 pages
Construction of One Storey Residential Building
No ratings yet
Construction of One Storey Residential Building
9 pages
Characteristics of Tennyson's Period
No ratings yet
Characteristics of Tennyson's Period
11 pages