0% found this document useful (0 votes)

11 views46 pages

DMT Unit1

The document provides an overview of data mining, detailing types of data such as database data, data warehouses, and transactional data, along with their structures, purposes, and mining techniques. It discusses functionalities like characterization, classification, clustering, and outlier analysis, as well as technologies used in data mining including statistics, machine learning, and database systems. Additionally, it highlights applications in business intelligence and web search engines, while addressing major issues in the field such as methodology, user interaction, efficiency, and data diversity.

Uploaded by

is7636665

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views46 pages

DMT Unit1

Uploaded by

is7636665

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

Data Mining

1. Database Data

• Stored in: Relational databases (tables with rows and columns).

• Examples: Customer records, employee data, item listings.

• Structure: Uses a schema (e.g., customer(custID, name, age, income...)).

• Tools: SQL for querying and aggregating data.

• Data Mining Use: Identify trends, patterns, and deviations (e.g., predicting
credit risk, analyzing sales).

2. Data Warehouses

• Stored in: Centralized repositories integrating data from multiple sources.

• Structure: Multidimensional (data cubes).

• Purpose: Historical analysis and business decision support.

• Features:

o Organized by subject (customer, item, time, etc.)

o Uses OLAP operations like roll-up and drill-down.

• Data Mining Use: Discover patterns at various levels of granularity, enable

exploratory analysis.
•

3. Transactional Data

• Stored in: Flat files or tables representing individual transactions.

• Structure: Each record contains a transaction ID and items involved (e.g., T100:
I1, I3, I8).

• Examples: Retail sales, flight bookings, clickstream data.

• Data Mining Use:

o Market basket analysis

o Frequent itemset mining to discover what products are often bought

together.

Other Forms of Data (Mentioned briefly)

• Data Streams
• Sequence Data

• Graph or Network Data

• Spatial, Text, Multimedia Data

• Web Data (WWW)

Feature Database Data Data Warehouse Data Transactional Data

Definition Structured data Integrated, historical Data representing

stored in relational data from multiple real-world
databases (RDBMS) sources stored for transactions or
using tables. analytical purposes. events, often
sequential and time-
stamped.

Purpose Real-time operations Strategic decision Capturing and

and day-to-day support and historical analyzing individual
transactions (OLTP). analysis (OLAP). user or business
actions (e.g.,
purchases).

Structure Tables (relations) Multidimensional Flat files or nested

with rows (tuples) data cubes with tables with
and columns summarized transaction IDs and
(attributes). information. lists of items or
events.

Storage System Relational Database Centralized Data Flat files, NoSQL

Management System Warehouse (e.g., DBs, or special
(RDBMS) like MySQL, Amazon Redshift, transactional DB
Oracle, SQL Server. Snowflake, Google systems.
BigQuery).

Schema Type Normalized schemas Star or Snowflake Often denormalized

(3NF or ER models). schemas for fast or semi-structured
aggregation and (list of items per
querying. transaction).

Data Granularity Fine-grained Aggregated Fine-grained

(detailed individual (summary over time (detailed per
records). or groups). transaction).
Data Sources Single operational Multiple Point-of-sale
system (e.g., POS heterogeneous systems, sensors,
system). sources (e.g., regional web logs, etc.
DBs, logs).

Update Frequency Frequently updated Periodically updated Continuously

(daily, hourly). (daily, weekly, updated or
monthly). appended.

Examples - customer(custID, - Sales by region and - T100: [I1, I3, I8] -

name, age, income) - time - Quarterly Clickstream log:
item(itemID, price, product performance UserID: [Page1,
category) Page3, Page7]

Query Type SQL-based queries: OLAP queries: Drill- Pattern mining:

SELECT, JOIN, down, Roll-up, Slice, association rule
GROUP BY, etc. Dice mining, sequence
pattern mining

Mining Techniques Classification, Multidimensional Association rule

clustering, outlier pattern mining, trend mining, sequential
detection, regression analysis, anomaly pattern mining,
detection market basket
analysis

Use Cases Predicting customer Strategic decisions Recommending

churn, identifying like which regions are products, detecting
fraud, customer underperforming or buying behavior,
segmentation. product trend promotion bundling.
analysis.

Tools/Technologies SQL, DBMS (MySQL, OLAP tools (Tableau, Apache Hadoop,

PostgreSQL), Python Power BI), ETL Spark, NoSQL,
(pandas) pipelines, Cube Association Rule
computation Mining (Apriori, FP-
Growth)

Functionalities

1. Characterization and Discrimination (Class/Concept Description)

➤ Characterization:

Describes the general features of data belonging to a target class.

• Provides a concise summary, usually through descriptive statistics, OLAP
operations, or attribute-oriented induction.

• Example: A retail manager wants to know the profile of customers who spend
over $5000/year. The result might show that they are typically middle-aged,
employed, and have good credit ratings.

• Output can be presented in the form of:

o Charts (bar, pie)

o Generalized relations

o Characteristic rules (e.g., "If income > 50K → likely to spend > $5000")

• It's used for summarizing and understanding data patterns in a group.

➤ Discrimination:

• Compares the features of a target class against one or more contrasting

classes.

• Example: Comparing customers who shop frequently for computer products vs.
those who shop rarely. Differences may include age, education, etc.

• Helps identify features that distinguish between groups (e.g., age, occupation).

• Often results in discriminant rules, e.g., “If age between 20-40 and education =
university → frequent buyer.”

Key Difference: Characterization is about describing one group; discrimination is

about comparing multiple groups.

2. Mining Frequent Patterns, Associations, and Correlations

• Aims to find repetitive patterns, associations, or correlations in large

datasets.

• This includes:

o Frequent Itemsets: Sets of items that often appear together in

transactions (e.g., bread and butter).

o Sequential Patterns: Items purchased in a sequence (e.g., laptop →

camera → memory card).

o Substructures: Patterns in structural forms like graphs or trees.

➤ Association Rule Mining:

• Example: “buys(X, 'computer') → buys(X, 'software') [support: 1%, confidence:
50%]”

o Means 1% of transactions include both, and 50% of computer buyers also

buy software.

• Can be single-dimensional (same predicate) or multi-dimensional (age,

income, buys).

• Used in market basket analysis, cross-selling, product recommendations.

➤ Correlation Analysis:

• Goes beyond co-occurrence to measure statistical significance between items

(e.g., chi-square test).

Importance: Helps in identifying what tends to happen together in data, enabling

targeted marketing, inventory planning, and more.

3. Classification and Regression (Predictive Analysis)

➤ Classification:

• Builds a model (classifier) that assigns data to predefined categories or

classes.

• Requires labeled training data.

• Output can be:

o IF-THEN rules

o Decision trees

o Neural networks

o SVMs, k-NN, Bayesian classifiers

• Example: Classifying items based on sales response (good, mild, none).

• Used in spam detection, credit scoring, disease diagnosis, etc.

➤ Regression:

• Predicts continuous numeric values, not categories.

• Example: Predicting the expected revenue from a product.

• Methods include linear regression, polynomial regression, and advanced ML

techniques.
• Used in forecasting, pricing models, stock prediction.

Key Difference: Classification predicts discrete labels, regression predicts

continuous values.

4. Cluster Analysis

• Groups a set of objects into clusters so that:

o Intra-cluster similarity is high

o Inter-cluster similarity is low

• No prior labels are required (unsupervised learning).

• Each cluster can later be treated as a class for further analysis.

• Example: Segmenting customers based on purchasing behavior or geographic

location.

• Visualized often using 2D/3D plots (e.g., k-means, DBSCAN).

• Applications include customer segmentation, image recognition, bioinformatics.

Purpose: To discover natural groupings within data without predefined categories.

5. Outlier Analysis

• Identifies data objects that deviate significantly from the general pattern.

• Such data points are called outliers or anomalies.

• Useful in applications where rare events are more important than common
ones:

o Fraud detection (e.g., unusual credit card activity)

o Intrusion detection

o Medical anomalies

• Techniques:

o Statistical methods (assuming distribution models)

o Distance-based methods (objects far from others)

o Density-based methods (like LOF – Local Outlier Factor)

• Not all outliers are noise—many are insightful and can drive important
decisions.

Technologies Used in Data Mining

Data mining is an application-driven field that integrates various techniques from
multiple disciplines to extract valuable insights from data. These include:

1. Statistics

o Role in Data Mining: Statistics is used to model data and target classes.
A statistical model describes the behavior of data using mathematical
functions and probability distributions.

o Applications in Data Mining:

▪ Data Characterization and Classification: Statistical models can

be used to classify and characterize data.

▪ Handling Noise and Missing Data: Statistics helps in modeling

and handling noisy or missing data during the data mining process.
▪ Prediction and Forecasting: Statistical models are key for
prediction tasks, providing a framework for making inferences
about the data.

▪ Verifying Data Mining Results: After building classification or

prediction models, statistical hypothesis testing helps verify their
accuracy and significance.

o Challenges: Scaling statistical methods for large datasets is complex

due to computational costs. This issue is exacerbated for online
applications requiring real-time processing.

2. Machine Learning

o Role in Data Mining: Machine learning focuses on enabling computers to

learn patterns and make decisions based on data. Machine learning is
used in data mining for tasks like classification and clustering.

o Types of Learning Methods:

▪ Supervised Learning (Classification): Involves training a model

with labeled data to recognize patterns, such as recognizing postal
codes from handwritten images.

▪ Unsupervised Learning (Clustering): The model learns from data

without labels, finding hidden patterns or groups (e.g., recognizing
different digits in handwritten data without predefined labels).

▪ Semi-supervised Learning: Combines both labeled and

unlabeled data. Labeled data helps build models, while unlabeled
data helps refine the model's boundaries, improving accuracy.

▪ Active Learning: The model actively queries humans (domain

experts) to label uncertain data points, thus improving the model
with minimal human input.

o Challenges: While machine learning focuses on accuracy, data mining

also emphasizes efficiency, scalability, and handling diverse types of
data.

3. Database Systems and Data Warehouses

o Role in Data Mining: Database systems handle the storage,

management, and retrieval of data, and they play a crucial role in
ensuring that data mining can scale to large datasets.

o Data Warehousing: A data warehouse integrates data from various

sources and timeframes into a unified structure. It enables advanced
data analysis by consolidating data into multidimensional space, known
as data cubes.

o Data Mining Integration: Modern database systems often incorporate

data mining capabilities to extend their analytic power. Data mining tools
can operate directly on data stored in databases to identify patterns.

o Challenges: Data mining often involves working with real-time streaming

data, which requires efficient database technologies to process large
volumes of data quickly.

4. Information Retrieval (IR)

o Role in Data Mining: Information retrieval involves searching and

retrieving relevant documents or information from a large database or the
web. Unlike database systems, IR deals with unstructured data (e.g., text
or multimedia).

o Probabilistic Models: IR uses probabilistic models to measure the

similarity between documents. Text documents are often represented as
a bag of words, where the presence and frequency of words are
important, but word order is not.

o Topic Modeling: IR systems use models to identify underlying topics in

collections of documents. These topics are represented as probability
distributions over a vocabulary, and documents may belong to multiple
topics.

o Integration with Data Mining: Combining IR with data mining techniques

enables deeper analysis of text and multimedia data, facilitating better
search and analysis of large, unstructured datasets (e.g., web data, digital
libraries, healthcare records).

Applications of Data Mining

Data mining plays a vital role in various fields where large amounts of data need to be
analyzed. Here are two major applications:

1. Business Intelligence (BI)

o Purpose: To understand business contexts such as customers, market

trends, and competitors.
o Key Techniques:

▪ Classification and Prediction for sales, market analysis, and

customer feedback.

▪ Clustering for Customer Relationship Management (CRM),

grouping customers by similarities.

▪ Characterization mining for understanding customer groups and

developing tailored programs.

o Importance: BI allows businesses to make smart decisions, retain

valuable customers, and gain insights into competitors. Without data
mining, effective market analysis would be difficult.

2. Web Search Engines

o Purpose: To retrieve information from the web in response to user

queries.

o Techniques Used:

▪ Crawling: Deciding which web pages to crawl and how frequently.

▪ Indexing: Choosing which pages to index and how to structure the

index.

▪ Ranking: Determining how to rank pages based on relevance and

quality.

o Challenges:

▪ Data Volume: Search engines deal with massive amounts of data,

requiring cloud computing for processing.

▪ Real-Time Processing: Search engines need to respond to user

queries instantly, often requiring continuous updates and real-
time data mining.

▪ Small Data Issues: Many queries are asked rarely, posing a

challenge for mining methods designed for large datasets.

Major Issues in Data Mining

Data mining, being a rapidly evolving field, faces several challenges and open research
areas. These challenges can be categorized into five main groups:

1. Mining Methodology
o New Knowledge Types: Data mining covers a broad range of tasks (e.g.,
classification, regression, clustering), and as new applications emerge,
new mining techniques are developed.

o Multidimensional Data Mining: Mining knowledge across different

dimensions, such as combining various attributes in data cubes.

o Interdisciplinary Approaches: Integrating methods from natural

language processing, software engineering, and other fields enhances
data mining.

o Handling Uncertainty: Dealing with noisy or incomplete data is a

significant challenge in data mining, requiring techniques like data
cleaning and outlier detection.

2. User Interaction

o Interactive Mining: The mining process should be flexible and dynamic,

allowing users to refine searches and explore data interactively.

o Incorporation of Background Knowledge: Including domain-specific

knowledge, constraints, or rules can guide the mining process towards
more useful results.

o Data Mining Query Languages: High-level languages or interfaces allow

users to define and optimize ad hoc queries, making the process more
user-friendly.

o Visualization of Results: Presenting mining results in an understandable,

visually intuitive way is crucial for the usability of data mining systems.

3. Efficiency and Scalability

o Algorithm Efficiency: Data mining algorithms need to handle large

datasets quickly and efficiently, especially as data volumes grow.

o Parallel and Distributed Mining: Large datasets often require parallel

processing across distributed systems. Cloud and cluster computing are
common methods to scale data mining processes.

o Incremental Mining: Incremental algorithms that can update models as

new data arrives without reprocessing all existing data are a key area of
research.

4. Diversity of Data Types

o Complex Data Types: Data mining must handle a variety of data types,
from structured databases to unstructured data like text and images.
o Dynamic Data: Some data, like online data streams or real-time sensor
data, change constantly, presenting challenges for traditional mining
methods.

o Interconnected Data: Many datasets are linked (e.g., social networks,

web data), requiring mining techniques that can handle and exploit these
connections.

5. Data Mining and Society

o Social Impact: Data mining affects privacy, security, and social

dynamics. How can we use data mining for societal benefit while
preventing misuse?

o Privacy-Preserving Mining: Safeguarding individuals' privacy while

conducting data mining is crucial. Ongoing research focuses on privacy-
preserving data mining methods.

o Invisible Data Mining: Many systems perform data mining behind the
scenes without users' awareness. For instance, e-commerce sites track
user behavior to recommend products.

What is a Data Warehouse?

A data warehouse refers to a central repository where data from different sources is
stored and organized for analysis and decision-making. It allows businesses to store
historical data that supports strategic decisions. Data warehouses are essential in
today’s competitive world as organizations use them to gain insights into various
aspects of their operations and make informed decisions.

Key Features of a Data Warehouse

1. Subject-Oriented:

o Data warehouses are designed around major subjects of interest such as

customers, products, suppliers, and sales.

o Unlike operational databases, which focus on day-to-day transactions,

data warehouses are structured to provide a more analytical view that
helps decision-makers.

o Data is organized to reflect decision support processes, not operational

activities.
2. Integrated:

o A data warehouse integrates data from multiple heterogeneous sources

like relational databases, flat files, and transaction logs.

o It ensures consistency in naming conventions, data formats, and attribute

measures.

o The data is cleaned and standardized before being loaded into the
warehouse.

3. Time-Variant:

o Data in a warehouse is typically historical, covering several years (e.g., 5-

10 years) to help analyze trends over time.

o Each data set within a warehouse includes a time element, either

implicitly or explicitly, to track changes and trends over time.

4. Nonvolatile:

o Once data is stored in a warehouse, it is not changed. New data is only

appended to the system.

o A data warehouse does not require the mechanisms for transaction

processing, recovery, or concurrency control that are needed in
operational databases.

o The primary operations in a data warehouse are data loading and

querying.

Functions of a Data Warehouse

• A data warehouse consolidates large amounts of data for analysis and decision-
making purposes. It is not primarily designed for transactional operations but
rather for answering complex queries and providing insights into various
business activities.

• A data warehouse is often constructed by integrating data from multiple sources,

using processes like data cleaning, data integration, and data consolidation.

• Decision support technologies are used to query the data, generate reports,
and make strategic decisions based on the insights derived from the data.
Difference b/w Database and Datawarehouse
Feature OLTP (Operational Database OLAP (Data Warehouse
System) System)

Purpose Handles day-to-day Supports data analysis and

transactions and query decision making
processing

Users Clerks, clients, IT professionals Managers, executives, analysts

(customer-oriented) (market-oriented)

Data Content Current, detailed, real-time Historical, aggregated,

transactional data summarized data

Data Volume Typically smaller; focused on Very large; includes years of

current data historical data

Database Design Entity-Relationship (ER) model; Star or Snowflake schema;

application-oriented subject-oriented

View of Data Narrow view, specific to Broad view, integrates data

department/enterprise from multiple sources

Query Simple, short, atomic queries Complex queries, often

Characteristics and transactions involving aggregation

Operations Frequent inserts, updates, Primarily read-only operations

deletes (write-heavy) (read-heavy)

Concurrency Requires concurrency control Less need for concurrency, due

Control and recovery mechanisms to read-only nature

Access Patterns High transaction throughput, Low latency not critical, focus
quick response time on complex query performance

Performance Measured by number of Measured by query response

Metrics transactions per second time and analytical capability

Data Granularity Very detailed, fine-grained data Data stored at multiple levels of
granularity (from detailed to
summarized)

Frequency of Constant, very frequent Periodic, depending on

Access analysis/reporting needs
Data Integration Minimal; single source High; integrates data from
systems multiple heterogeneous
sources

Storage Medium Typically stored on a single Distributed storage systems

system or server due to large volume

Data Warehousing: A Multitiered Architecture

Data warehouses are designed using a three-tier architecture, which helps in
separating data storage, data processing, and data presentation. This architecture
ensures scalability, flexibility, and efficient data management. Here's a breakdown of
each tier:

1. Bottom Tier: Data Warehouse Server

• Role: This tier is responsible for storing the actual data.

• Technology Used: Usually a relational database management system

(RDBMS).

• Functions:

o Data Extraction: Pulls data from various operational and external

sources.

o Data Cleaning: Removes errors, inconsistencies, and duplicates.

o Data Transformation: Converts data into a common, unified format.

o Data Loading: Transfers the processed data into the data warehouse.

o Data Refreshing: Periodically updates the data warehouse to reflect

recent changes.

• Data Sources:

o Operational Databases: e.g., banking systems, sales systems.

o External Sources: e.g., market research reports, customer profiles from

third parties.

• Tools Used:

o Gateways (APIs) to connect and query the source systems:

▪ ODBC (Open Database Connectivity)

▪ OLEDB (Object Linking and Embedding Database)

▪ JDBC (Java Database Connectivity)

• Metadata Repository:

o Stores information about data (like source, format, transformations

applied).

o Acts as a directory for warehouse management and query optimization.

2. Middle Tier: OLAP Server

• Role: Acts as the processing layer, converting data into a form suitable for
analysis.

• Two Main OLAP Models:

1. ROLAP (Relational OLAP):

▪ Works on top of relational databases.

▪ Converts multidimensional operations into relational queries.

▪ Suitable for handling large volumes of data.

2. MOLAP (Multidimensional OLAP):

▪ Uses specialized multidimensional data structures (cubes).

▪ Faster for complex analytical queries, but may have storage

limitations.

• Functionality:

o Supports advanced analytical processing, including summarization,

aggregation, and complex computations.

o Optimized for read-heavy operations.

3. Top Tier: Front-End Tools

• Role: This is the user interface layer, where users interact with the system.

• Components:

o Query and Reporting Tools: For generating standard or custom reports.

o Data Analysis Tools: For ad-hoc querying, slicing, dicing, and drill-down
analysis.

o Data Mining Tools: For predictive modeling, clustering, trend analysis,

etc.

• Users:

o Business Analysts

o Executives and Managers

o Decision Makers

• Functionality:

o Provides a visual and interactive environment for exploring and

analyzing data.

o Supports dashboards, charts, graphs, and other visualizations.

Data Warehouse Models: Enterprise Warehouse,
Data Mart, and Virtual Warehouse
1. Enterprise Data Warehouse (EDW)

• Definition: A centralized data warehouse that stores information from across the
entire organization.

• Scope: Corporate-wide, cross-functional.

• Data:

o Includes both detailed and summarized data.

o Integrated from multiple operational systems or external sources.

• Implementation:

o Requires extensive business modeling.

o Typically built on mainframes, superservers, or parallel systems.

o May take months or years to design and deploy.

• Advantages:

o Single source of truth.

o High consistency and integration.

• Disadvantages:

o Time-consuming and expensive to build.

o Inflexible in dynamic environments.

2. Data Mart

• Definition: A smaller, focused version of a data warehouse that stores data for a
specific business line or department (e.g., marketing, sales).

• Scope: Departmental or subject-specific.

• Data:

o Typically summarized and related to specific business needs.

• Types:
o Independent Data Mart: Sourced directly from operational systems or
external providers.

o Dependent Data Mart: Sourced from an existing enterprise data

warehouse.

• Implementation:

o Uses low-cost servers (e.g., Linux, Windows).

o Takes weeks to build (faster ROI).

• Advantages:

o Quick to implement.

o Cost-effective.

o Flexible and adaptable to specific needs.

• Disadvantages:

o Risk of data silos.

o Complex integration later if not aligned with enterprise strategy.

3. Virtual Warehouse

• Definition: A set of virtual views over operational databases.

• Implementation:

o Does not store data physically.

o Queries are processed in real time using views.

• Advantages:

o Easy and fast to build.

o Cost-efficient (no extra storage).

• Disadvantages:

o Performance depends on operational systems.

o Requires high processing capacity for complex queries.

o Limited historical data analysis.

Top-Down vs. Bottom-Up Approaches to Data Warehouse Development

Aspect Top-Down Approach Bottom-Up Approach

Start Point Begin with enterprise data Start with departmental data
warehouse marts

Time & High cost and long duration Low cost, faster implementation
Cost

Flexibility Less flexible More adaptable

Integration Minimizes integration issues later May lead to integration

challenges

Suitability Best for long-term strategic Best for tactical and quick
planning solutions

Risk High initial investment with late Quick wins but may cause silo
returns issues

Recommended Approach: Incremental & Evolutionary

A hybrid approach is often best — combining top-down planning with bottom-up

implementation. The steps are:

1. Define High-Level Corporate Data Model

o Done within 1–2 months.

o Ensures consistent view of data across the organization.

2. Implement Independent Data Marts

o Developed in parallel using the high-level model.

o Quick deployment, department-level use.

3. Construct Distributed Data Marts

o Integrate various marts via hub servers.

o Enables data sharing across business units.

4. Build Multitier Data Warehouse

o Centralized Enterprise Data Warehouse becomes the primary data

store.

o Distributes data to dependent data marts as needed.

Data Warehouse Modeling: Data Cube and OLAP
Overview

• Data Warehouses and OLAP (Online Analytical Processing) tools are built on
the multidimensional data model.

• This model visualizes data as a data cube, which allows for interactive analysis
of multidimensional data.

• The modeling supports advanced operations like roll-up, drill-down, and

slicing/dicing to enable deep business insights.

Data Cube: A Multidimensional Data Model

What is a Data Cube?

• A data cube allows data to be modeled in n-dimensions (not just 3D).

• It is defined by:

o Dimensions: The perspectives for analysis (e.g., time, item, location).

o Facts/Measures: Quantitative data (e.g., dollars sold, units sold).

Key Concepts:

• Dimensions:

o Examples: time, item, branch, location.

o Each has a dimension table (e.g., for item: item name, brand, type).

• Fact Table:

o Contains numeric measures like dollars sold, units sold, etc.

o Links to each dimension via foreign keys.

Representation:

• 2-D Cube: Like a spreadsheet/table (e.g., time × item, for location = Vancouver).

• 3-D Cube: time × item × location.

• 4-D Cube: time × item × location × supplier — hard to visualize but conceptually
a series of 3D cubes.

Cuboids and Lattice

What is a Cuboid?

• A cuboid is a cube at a certain level of summarization (group-by).

• Base Cuboid: The lowest level (e.g., time, item, location, supplier).

• Apex Cuboid (0-D): The highest level — summarized over all dimensions.

Data Cube Lattice

• Given n dimensions, 2^n possible cuboids exist.

• Forms a lattice structure, representing all possible levels of summarization.

Example from the notes:

For dimensions: time, item, location, supplier, the lattice includes:

• 0-D Cuboid: total sales (summarized across all dimensions)

• 1-D Cuboids: {time}, {item}, {location}, {supplier}

• 2-D Cuboids: {time, item}, {item, supplier}, etc.

• 3-D Cuboids: {time, item, location}, etc.

• 4-D Cuboid: {time, item, location, supplier} (base cuboid)

Multidimensional Schema Models
1. Star Schema

• Structure:

o Central fact table (large, non-redundant).

o Connected dimension tables (flat, possibly redundant).

• Pros:

o Simple, fast query performance.

o Easy to understand.

• Cons:

o Some redundancy in dimension tables.

• Use case: Most common in data marts.

2. Snowflake Schema

• Structure:

o Like a star schema but dimension tables are normalized into sub-tables.

• Pros:

o Reduces redundancy.

o Easier maintenance.

• Cons:

o More complex queries due to joins.

o Slight performance trade-off.

• Use case: Less common; used when storage efficiency is more critical.

3. Fact Constellation (Galaxy Schema)

• Structure:

o Multiple fact tables sharing dimension tables.

• Pros:

o Models multiple interrelated subjects.

o Captures enterprise-wide data.

• Cons:

o Complex structure.

• Use case: Suitable for enterprise data warehouses.

Concept Hierarchies for Dimensions
What is a Concept Hierarchy?

• Maps low-level values (e.g., city) to higher-level concepts (e.g., country).

• Helps summarize or roll-up data in OLAP operations.

Types:

• Schema Hierarchy: Total/partial order (e.g., street < city < province < country).

• Lattice: Partial order where attributes don’t follow a single path (e.g., week <
year; day < month).

• Set-grouping Hierarchy: Value ranges grouped (e.g., price ranges: ($0–$200],

($200–$400], ...).

• Can be manually defined or automatically generated.

Measures in a Data Cube

Measures = Numeric values aggregated over dimension values (e.g., total sales).

Type Definition Examples Efficiency

Distributive Can be computed from sum(), count(), Very

subaggregates and combined. min(), max() efficient

Algebraic Computed using a fixed number avg() = Efficient

of distributive aggregates. sum()/count(),
stddev()

Holistic Requires full data scan, cannot median(), mode(), Inefficient

be broken into subaggregates. rank()

Most OLAP tools focus on distributive and algebraic measures for performance.
OLAP Operations
1. Roll-Up

• Definition: Aggregates data by climbing up a concept hierarchy or by reducing

dimensions.

• Example: Aggregating sales data from city to country (Toronto → Canada).

• Also called: Drill-Up (by some vendors).

2. Drill-Down

• Definition: The reverse of roll-up; navigates from summary data to more detailed
data.

• Example: Moving from quarterly sales data to monthly sales data.

• Also includes: Adding a new dimension (e.g., customer group) for more detail.

3. Slice

• Definition: Selects a single dimension value, resulting in a subcube.

• Example: Selecting data where time = Q1 only.

4. Dice

• Definition: Selects a range of values on two or more dimensions, resulting in a

subcube.

• Example: Data for location = Toronto or Vancouver, time = Q1 or Q2, and item
= home entertainment or computer.

5. Pivot (Rotate)

• Definition: Rotates the cube to view data from different perspectives.

• Example: Swapping the axes item and location for alternate visual layout.
6. Drill-Across

• Definition: Executes queries across multiple fact tables.

7. Drill-Through

• Definition: Accesses the bottom-level data in the data cube using SQL, typically
reaching into backend relational tables.

8. Other Advanced Operations

• Examples:

o Top-N/Bottom-N ranking.

o Moving averages, growth rates, depreciation.

o Currency conversion, internal return rates.

o Forecasting, trend/statistical analysis, variance calculations.

Role of Concept Hierarchies in OLAP

• Enable aggregation and drilling at various levels of detail.

• Facilitate multilevel data exploration across dimensions.

Concept Hierarchies
1. Definition

• Concept hierarchy is a sequence of mappings from low-level (specific)

concepts to high-level (general) concepts.

• Purpose: Allows multilevel data abstraction.

2. Types

• Schema Hierarchies (based on database attributes):

o Example for location: Street < City < Province/State < Country

o Example for time: Day < Month < Quarter < Year

• Lattice Structure:

o Supports partial orders, e.g., Day < Month < Quarter, and Week < Year.

• Set-Grouping Hierarchies:

o Created by grouping values into ranges or categories.

o Example for price: $0–$100, $100–$200, etc.

o User-defined groups: cheap, moderate, expensive

• Multiple Hierarchies:

o A single attribute can have multiple concept hierarchies depending on the

analysis (e.g., price can be by range or by category).

3. Sources of Concept Hierarchies

• Manual: Provided by users, domain experts.

• Automatic: Generated using statistical analysis (e.g., clustering).

Measures: Categorization & Computation

Measure: A numerical value computed for each multidimensional point (e.g.,

sales).

Categories of Measures:

Type Definition Examples Notes

Distributive Can be computed in sum(), count(), Easy and efficient to

parts and then min(), max() compute.
aggregated.

Algebraic Computed from a fixed avg() = sum/count, Depends on multiple

number (M) of stddev() distributive
distributive functions.
measures.

Holistic Cannot be expressed with a median(), mode(), Complex, may require

bounded number rank() approximation.
of distributive
results.
OLAP Engine Capabilities

• Enables complex analytical computations.

• Supports:

o Aggregations, hierarchies, ratios

o Forecasting, trend and statistical analysis

• Provides a user-friendly, interactive environment for querying multidimensional

dat

Detailed Definition of Mining Frequent Patterns

Frequent Pattern Mining is a fundamental task in data mining that involves discovering
patterns (like itemsets, sequences, or structures) that occur frequently in a
dataset. These patterns reveal relationships and associations between data
items that can be useful in decision-making, prediction, recommendation, and
classification.

Patterns
Key Concepts in Frequent Pattern Mining

1. Frequent Pattern

A frequent pattern is a set of items, subsequences, or structures that appear together

frequently in a dataset.

Examples:

• Frequent Itemset: {milk, bread} appears together in many transactions.

• Frequent Sequential Pattern: <PC → Digital Camera → Memory Card> appears in

many customer purchase histories.

• Frequent Structured Pattern: A frequently recurring subgraph in a chemical

compound dataset.

2. Itemset

A collection of one or more items. For instance, in a supermarket:

• {milk}, {milk, bread}, {bread, butter, eggs} are itemsets.

A k-itemset contains k items.

3. Support

• Support of an itemset is the proportion (or count) of transactions that contain

the itemset.

• It measures how frequently an itemset appears in the dataset.

Support(A) = (Number of transactions containing A) / (Total number of transactions)

4. Confidence

• Confidence of a rule A → B is the probability that transactions containing A also

contain B.

Confidence(A → B) = Support(A ∪ B) / Support(A)

It shows how reliable the rule is.

5. Association Rules

An association rule is an implication of the form:

css

CopyEdit

A → B [support = s%, confidence = c%]

It means that if A occurs, B is likely to occur with support s and confidence c.

6. Closed and Maximal Frequent Itemsets

• Closed Frequent Itemset: A frequent itemset that has no superset with the
same support.

• Maximal Frequent Itemset: A frequent itemset that has no frequent supersets.

Frequent Itemsets, Closed Itemsets, and Association Rules

1. Basic Definitions

➤ Itemset:

• A group of items.

• A k-itemset contains k items (e.g., {bread, milk} is a 2-itemset).

➤ Transaction (T):

• A set of items bought together.

• Identified by a unique TID.

2. Support and Confidence

➤ Support:

• Fraction of transactions that contain an itemset.

Transactions containing both A and B

Support(𝑨 ∪ 𝑩) =
Total transactions
➤ Confidence:

• Likelihood of item B occurring given item A.

Support(𝐴 ∪ 𝐵)
Confidence(𝐴 → 𝐵) =
Support(𝐴)

3. Association Rules

• Form: A → B

• Indicates a strong relationship: "If A occurs, B is likely to occur."

Strong Rules:

• Satisfy both:

o Minimum Support (minsup)

o Minimum Confidence (minconf)

4. Frequent Itemsets

• An itemset is frequent if its support ≥ minsup.

• Support count = number of transactions containing the itemset.

5. Closed Frequent Itemsets

• An itemset is closed if no proper superset has the same support count.

• Captures complete support info.

• Used to eliminate redundancy.

6. Maximal Frequent Itemsets

• An itemset is maximal frequent if it is frequent and none of its supersets are

frequent.

• Represents the outer boundary of frequent itemsets.

• More compact, but may lose support details of subsets.

7. Why Use Closed or Maximal?

• Mining frequent itemsets may result in an exponential number of patterns.

• Closed and maximal reduce computation and storage.

8. Example

Dataset:

• T1: {a1, a2, ..., a100}

• T2: {a1, a2, ..., a50}

• minsup = 1

➤ Frequent itemsets: All subsets of T1 and T2

Total = 2^100 - 1 → Too large!

➤ Closed frequent itemsets:

• {a1, ..., a50} → support: 2

• {a1, ..., a100} → support: 1

➤ Maximal frequent itemset:

• {a1, ..., a100} only (as its superset doesn't exist)

9. Association Rule Mining Steps

1. Find all frequent itemsets (support ≥ minsup).

2. Generate strong association rules from those itemsets (confidence ≥ minconf).

Apriori Algorithm – Overview

• Purpose: To mine frequent itemsets for Boolean association rules.

• Proposed by: R. Agrawal and R. Srikant (1994).

• Name Origin: Uses prior knowledge of itemset properties.

Working Principle

• Level-wise iterative approach:

o Finds frequent 1-itemsets (L1) from the database.

o Uses Lk-1 to generate Lk (frequent k-itemsets).

o Iterates until no more frequent itemsets can be found.

Apriori Property (Antimonotonicity)

• Definition: All non-empty subsets of a frequent itemset must also be frequent.

• Implication:

o If itemset I is infrequent, then any superset I ∪ A is also infrequent.

o Helps in pruning the candidate space (reducing computations).

Two-Step Process (Join & Prune)

1. Join Step:

o Generate candidate itemsets Ck by self-joining Lk-1.

o Join l1 and l2 in Lk-1 if their first k-2 items are the same.

o Ensures no duplicates using lexicographic order.

2. Prune Step:

o Remove candidate c ∈ Ck if any of its (k-1)-subsets is not in Lk-1.

Example (Using AllElectronics DB)

• Database D: 9 transactions (T100–T900).

• min_sup = 2 (support count).

• Iterations:

o C1 → L1: All 1-itemsets satisfying min_sup.

o C2 → L2: 2-itemsets from L1 × L1; all subsets are frequent → no pruning.

o C3 → L3: Prune itemsets with infrequent subsets using the Apriori

property.

o C4: Generated but pruned entirely due to infrequent subset →

termination.

Generating Association Rules

Definitions

• Frequent Itemset: An itemset whose support ≥ minimum support threshold.

• Association Rule: An implication of the form A → B, where A and B are itemsets.

• Support Count: Number of transactions containing a given itemset.

• Confidence: Measures how often items in B appear in transactions that contain

Support(𝐴 ∪ 𝐵)
Confidence(𝐴 → 𝐵) =
Support(𝐴)
Steps to Generate Association Rules

1. Find all frequent itemsets using algorithms like Apriori or FP-Growth.

2. For each frequent itemset l:

o Generate all non-empty subsets s of l.

o For each s, form the rule:

s→(l−s)s \rightarrow (l - s)s→(l−s)

o Compute the confidence of each rule.

3. Filter strong rules:

o Only keep rules with confidence ≥ min_conf.

o All rules automatically satisfy min_support because they are derived

from frequent itemsets.

Example

Let X={I1,I2,I5}X = \{I1, I2, I5\}X={I1,I2,I5} be a frequent itemset.

Non-empty subsets of X:

• {I1}, {I2}, {I5}, {I1, I2}, {I1, I5}, {I2, I5}

Possible rules and confidences (assuming support counts):

• {I1, I2} → {I5} → 50%

• {I1, I5} → {I2} → 100%

• {I2, I5} → {I1} → 100%

• {I1} → {I2, I5} → 33%

• {I2} → {I1, I5} → 29%

• {I5} → {I1, I2} → 100%

With min_conf = 70%, strong rules:

• {I1, I5} → {I2}

• {I2, I5} → {I1}

• {I5} → {I1, I2}

FP-Growth Algorithm

Motivation

Apriori algorithm, though effective, suffers from:

• Huge candidate generation (e.g., 10⁴ 1-itemsets → 10⁷ 2-itemsets).

• Multiple full database scans and expensive pattern matching.

FP-Growth Solution

• Avoids candidate generation by using a divide-and-conquer strategy.

• Builds a compressed data structure called the FP-tree (Frequent Pattern

Tree).

• Recursively mines conditional FP-trees for frequent patterns.

How FP-Growth Works

1. First Database Scan

• Count support of all items → generate frequent 1-itemsets.

• Sort items in descending order of support → List L.

2. Build FP-Tree

• Start with a null root.

• For each transaction:

o Sort items according to L.

o Insert path into tree, sharing common prefixes.

o Increment node counts for existing prefixes.

• Maintain node-links for quick access via a header table.

3. Mine FP-Tree

For each item (starting from the least frequent in L):

• Construct Conditional Pattern Base (CPB):

o Paths in FP-tree ending with the item.

• Build Conditional FP-Tree from CPB.

• Recursively mine the conditional FP-tree.

Example Summary

Frequent Items (sorted by support):

L = {I2:7, I1:6, I3:6, I4:2, I5:2}

Mining I5:

• CPB: {I2, I1:1}, {I2, I1, I3:1}

• Conditional FP-tree: I2:2, I1:2

• Patterns: {I2, I5}, {I1, I5}, {I2, I1, I5}

Mining I4:

• CPB: {I2, I1:1}, {I2:1}

• Conditional FP-tree: I2:2

• Pattern: {I2, I4}

Mining I3:

• CPB: {I2, I1:2}, {I2:2}, {I1:2}

• Conditional FP-tree: Two branches

• Patterns: {I2, I3}, {I1, I3}, {I2, I1, I3}

Mining I1:

• CPB: {I2:4}

• Conditional FP-tree: I2:4

• Pattern: {I2, I1}

FP-Tree Benefits

• Compact representation of the database.

• Reduces:

o Database scans

o Candidate generation
o Search space

Recursive Mining

• Focuses on smaller projected databases.

• Combines prefix path with suffix pattern during recursion.

Why Pattern Evaluation Matters

• Even strong rules (high support & confidence) can be misleading or
uninteresting.

• This problem is especially common with:

o Low support thresholds

o Long patterns

Strong Rules May Be Misleading

• Example: The rule "buys computer games → buys videos" has:

o Support: 40%

o Confidence: 66%

• Seems strong, but actually misleading, because:

o The base probability of buying videos is 75%

o So buying games lowers the chance of buying videos (→ negative

correlation)

• Takeaway: Confidence does not imply true correlation

Correlation-Based Evaluation

To improve pattern evaluation, use correlation analysis.

Lift:

• Formula:
lift(A → B) = P(A ∩ B) / (P(A) * P(B))

• Interpretation:

o Lift > 1: Positive correlation

o Lift < 1: Negative correlation

o Lift = 1: No correlation

• In the example, lift = 0.89 → negative correlation

Chi-Square (χ²) Measure:

• Based on a contingency table of observed vs expected values

• Formula:
χ² = Σ (observed - expected)² / expected

• High χ² → Strong evidence of dependence

• In the example, χ² = 555.6, confirming negative correlation

Conclusion: Which Measures Are Interesting?

• Support & Confidence: Good for filtering, but insufficient

• Lift & Chi-square (χ²): Better for identifying true interestingness

o Capture actual dependencies (positive or negative)

Would you like a visual diagram to summarize this comparison of interestingness

measures?
4o

You said:

write as detaield notes

ChatGPT said:

Sure! Here's a detailed set of notes on the topic "Which Patterns Are Interesting? —
Pattern Evaluation Methods", covering all key points, examples, and measures
in a structured and easy-to-study format:

Pattern Evaluation Methods in Association Rule Mining

Overview

• Most association rule mining algorithms use the support–confidence

framework.

• Issue: Even with thresholds, many generated rules may still be uninteresting or
misleading.

• Especially problematic with:

o Low support thresholds

o Long patterns

• To tackle this, additional measures are used to assess the interestingness of

patterns more effectively.

Strong Rules Are Not Necessarily Interesting

Subjective vs Objective Interestingness:

• Subjective: Depends on user preferences, domain knowledge, etc.

• Objective: Based on statistical measures derived from the data.

Objective measures help filter out misleading rules before presenting them to users.

Example 6.7 – Misleading “Strong” Rule

Scenario:

• Data from AllElectronics about purchase behavior:

o Total transactions: 10,000

o Customers who bought:

▪ Computer games: 6000

▪ Videos: 7500

▪ Both: 4000

Discovered Rule:

buys(X, "computer games") → buys(X, "videos")

Support = 40%, Confidence = 66%

Analysis:

• Rule meets minimum support (30%) and confidence (60%) → Considered strong.

• BUT, the actual probability of buying videos is 75%.

o Hence, confidence (66%) < base probability (75%).

o Indicates negative correlation.

• Conclusion: Rule is misleading.

o Buying games actually reduces the likelihood of buying videos.

Key Insight:

• Confidence alone doesn't imply meaningful association.

• Need better metrics to reveal true correlations.

From Association Analysis to Correlation Analysis

Why Use Correlation Measures?

• Support & confidence cannot detect:

o Independence

o Negative correlation

• Correlation measures evaluate statistical dependency between items.

Lift Measure

Formula:
Lift(A → B) = P(A ∩ B) / (P(A) * P(B))

Interpretation:

• Lift > 1: Positive correlation (A implies B more often than by chance)

• Lift = 1: No correlation (independence)

• Lift < 1: Negative correlation (A implies B less than by chance)

Example 6.8 – Applying Lift

• From previous data:

o P(game) = 0.60

o P(video) = 0.75

o P(game ∩ video) = 0.40

Lift = 0.40 / (0.60 × 0.75) = 0.40 / 0.45 = 0.89

• Result: Since lift < 1 → Negative correlation

Lift detects negative correlation that confidence failed to reveal.

Chi-Square (χ²) Measure

Purpose:

• Tests independence between itemsets

• Based on observed vs. expected values in a contingency table

Formula:

χ² = Σ [(Observed − Expected)² / Expected]

Example 6.9 – Applying χ²

Contingency Table: Observed Values (Table 6.6)

Game ¬Game Row Total

Video 4000 3500 7500

¬Video 2000 500 2500

Col Total 6000 4000 10000

Expected Values (Table 6.7)

Game (Exp) ¬Game (Exp)

Video 4500 3000

¬Video 1500 1000

χ² Calculation:

markdown

CopyEdit

χ² = (4000−4500)² / 4500

+ (3500−3000)² / 3000

+ (2000−1500)² / 1500

+ (500−1000)² / 1000

χ² = 500²/4500 + 500²/3000 + 500²/1500 + 500²/1000

= 55.56 + 83.33 + 166.67 + 250.0

= **555.6**

Interpretation:

• χ² > 1 → Statistically significant deviation

• In this case:

o Observed joint occurrence (4000) < Expected (4500)

o Confirms negative correlation

Like Lift, Chi-square also detects the negative correlation missed by the confidence metric.

Data Mining
No ratings yet
Data Mining
48 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
39 pages
DM Unit 1
No ratings yet
DM Unit 1
10 pages
Data Ming Unit 2
No ratings yet
Data Ming Unit 2
8 pages
Unit I Dbmi
No ratings yet
Unit I Dbmi
35 pages
Unit-5 DM
No ratings yet
Unit-5 DM
18 pages
Datamining 1
No ratings yet
Datamining 1
30 pages
Unit No 3
No ratings yet
Unit No 3
10 pages
Unit I DATA MINING AAGAC
No ratings yet
Unit I DATA MINING AAGAC
27 pages
Data Mining for Computer Science Students
No ratings yet
Data Mining for Computer Science Students
52 pages
Data Mining
No ratings yet
Data Mining
4 pages
Data Mining Introduction & Techniques
No ratings yet
Data Mining Introduction & Techniques
9 pages
Chap 1
No ratings yet
Chap 1
45 pages
Data Mining Concepts and Applications
No ratings yet
Data Mining Concepts and Applications
38 pages
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
No ratings yet
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
27 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
Data Mining: An Overview From A Database Perspective
No ratings yet
Data Mining: An Overview From A Database Perspective
30 pages
Unit 1
No ratings yet
Unit 1
59 pages
Module 1
No ratings yet
Module 1
41 pages
DWDM
No ratings yet
DWDM
30 pages
7dm Midterm Reviewer
No ratings yet
7dm Midterm Reviewer
10 pages
Data Science & Big Data Analysis Module 1,2,3,4,5
No ratings yet
Data Science & Big Data Analysis Module 1,2,3,4,5
70 pages
Data Mining and Datawarehousing CS-303
No ratings yet
Data Mining and Datawarehousing CS-303
34 pages
Data Mining
No ratings yet
Data Mining
6 pages
Bca DM Unit I
No ratings yet
Bca DM Unit I
20 pages
Why We Need Data Mining?
No ratings yet
Why We Need Data Mining?
39 pages
Data Mining
No ratings yet
Data Mining
40 pages
Introduction To Data Mining Unit1
No ratings yet
Introduction To Data Mining Unit1
37 pages
Data Mining: M.P.Geetha, Department of CSE, Sri Ramakrishna Institute of Technology, Coimbatore
100% (1)
Data Mining: M.P.Geetha, Department of CSE, Sri Ramakrishna Institute of Technology, Coimbatore
115 pages
Chapter 1 Data Mining Lecture Note
No ratings yet
Chapter 1 Data Mining Lecture Note
31 pages
Chap 1
No ratings yet
Chap 1
32 pages
Viva Preparation Notes
No ratings yet
Viva Preparation Notes
6 pages
Data Warehousing & Data Mining Unit-3 Notes
No ratings yet
Data Warehousing & Data Mining Unit-3 Notes
27 pages
ISS - Module 3
No ratings yet
ISS - Module 3
11 pages
Unit-1 DWDM
No ratings yet
Unit-1 DWDM
20 pages
DWDM Notes
No ratings yet
DWDM Notes
59 pages
Unit-1 Notes
No ratings yet
Unit-1 Notes
24 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
46 pages
Mod 5
No ratings yet
Mod 5
36 pages
Week-1-Introduction To Data Mining
No ratings yet
Week-1-Introduction To Data Mining
43 pages
MCA 301 Data Mining Notes
No ratings yet
MCA 301 Data Mining Notes
6 pages
Data Mining
No ratings yet
Data Mining
55 pages
DM Module1 Notes
No ratings yet
DM Module1 Notes
25 pages
Ai Pass
No ratings yet
Ai Pass
12 pages
Unit 3 PPT (BA)
No ratings yet
Unit 3 PPT (BA)
19 pages
Mekelle University-Mekelle Institute of Technology Department of Information Technology Data Mining and Knowledge Discovery
No ratings yet
Mekelle University-Mekelle Institute of Technology Department of Information Technology Data Mining and Knowledge Discovery
36 pages
Data Mining 1
No ratings yet
Data Mining 1
7 pages
DM Module 1
No ratings yet
DM Module 1
13 pages
DM Lec01
No ratings yet
DM Lec01
27 pages
Ba Unit 3 Own
No ratings yet
Ba Unit 3 Own
7 pages
Durga Erpppt
No ratings yet
Durga Erpppt
16 pages
Dmi Unit 1
No ratings yet
Dmi Unit 1
8 pages
Lecture 1.1.1 1.1.2
No ratings yet
Lecture 1.1.1 1.1.2
32 pages
Data Mining Notes
No ratings yet
Data Mining Notes
297 pages
Data Mining Notes
100% (1)
Data Mining Notes
45 pages
DataMining S
No ratings yet
DataMining S
103 pages
Business Intelligence
No ratings yet
Business Intelligence
38 pages
Internship
No ratings yet
Internship
12 pages
Unit-1 PPT Dma
No ratings yet
Unit-1 PPT Dma
83 pages
Written Work #2 Quarter 3 - Research G9
No ratings yet
Written Work #2 Quarter 3 - Research G9
2 pages
S 274
No ratings yet
S 274
151 pages
Lesson Plan Quartiles For Grouped Data 2021
100% (4)
Lesson Plan Quartiles For Grouped Data 2021
4 pages
Project Report On Customer Satisfaction at Oriental Bank
No ratings yet
Project Report On Customer Satisfaction at Oriental Bank
62 pages
Open Source Software Development and Lotka's Law: Bibliometric Patterns in Programming
No ratings yet
Open Source Software Development and Lotka's Law: Bibliometric Patterns in Programming
10 pages
1754125741674.MA 1 - VIII - Geog - Disaster Management - Man-Made Disasters
No ratings yet
1754125741674.MA 1 - VIII - Geog - Disaster Management - Man-Made Disasters
2 pages
Finals - Practical Research 1
33% (3)
Finals - Practical Research 1
6 pages
Memory Management Techniques
No ratings yet
Memory Management Techniques
4 pages
DBMS Mini Project Review 1
No ratings yet
DBMS Mini Project Review 1
8 pages
Edited Thesis Title
No ratings yet
Edited Thesis Title
12 pages
Huawei Storage Product & Solution Overview - Phintraco
No ratings yet
Huawei Storage Product & Solution Overview - Phintraco
27 pages
Xi ChineseJudiciaryWorks 2022
No ratings yet
Xi ChineseJudiciaryWorks 2022
9 pages
Generative AI in Higher Art Education
No ratings yet
Generative AI in Higher Art Education
6 pages
Requirements Capture
No ratings yet
Requirements Capture
14 pages
Power Bi
No ratings yet
Power Bi
19 pages
FYP Proposal Form1
No ratings yet
FYP Proposal Form1
8 pages
Unix and AIX Commands For Basis and Oracle DBA Consultant
No ratings yet
Unix and AIX Commands For Basis and Oracle DBA Consultant
11 pages
Open Source Framework For The Deisgn Development and Integration of Effective Digital Twin PDF
100% (1)
Open Source Framework For The Deisgn Development and Integration of Effective Digital Twin PDF
24 pages
Hadoop for Data Professionals
No ratings yet
Hadoop for Data Professionals
12 pages
Final MPR 2
No ratings yet
Final MPR 2
40 pages
Economics PhD Application Guide
No ratings yet
Economics PhD Application Guide
26 pages
Business Analytics
No ratings yet
Business Analytics
124 pages
Data Friend: Building A Database With Kexi
No ratings yet
Data Friend: Building A Database With Kexi
4 pages
Improved K-Means Clustering Algorithm by Getting Initial Cenroids
No ratings yet
Improved K-Means Clustering Algorithm by Getting Initial Cenroids
9 pages
Telco Big Data Analytics Using Open-Source Data Pipeline Use Cases, Detailed Use Case Implementation Results and Findings
No ratings yet
Telco Big Data Analytics Using Open-Source Data Pipeline Use Cases, Detailed Use Case Implementation Results and Findings
11 pages
ABAP Certification Questions Guide
No ratings yet
ABAP Certification Questions Guide
12 pages
Dissertation Writing Support Guide
100% (2)
Dissertation Writing Support Guide
8 pages
Oracle Data Dictionary Tables
No ratings yet
Oracle Data Dictionary Tables
28 pages
FD Resourses
No ratings yet
FD Resourses
21 pages