0% found this document useful (0 votes)

26 views10 pages

DWDM Mid 1

Uploaded by

charitha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views10 pages

DWDM Mid 1

Uploaded by

charitha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

OLAP operations?

OLAP operations are techniques used in Online Analytical Processing (OLAP) to explore, analyze, and
manipulate multi-dimensional data for business intelligence, reporting, and decision-making. OLAP deals with
complex queries that allow users to slice, dice, and navigate through data to derive insights.
Here are the main OLAP operations:
1. Roll-up (Drill-up):
o What it does: Aggregates data, moving from a lower level of detail to a higher one.
o Example: Summarizing daily sales into monthly or yearly totals.

2. Drill-down:
o What it does: Breaks data down into finer granularity, providing more detailed views.
o Example: Drilling from yearly sales to monthly or daily sales data.

3. Slice:
o What it does: Extracts a single layer from the data cube by fixing a specific value for one
dimension.
o Example: Viewing sales data only for a particular year or region.

4. Dice:
o What it does: Extracts a subcube by selecting a range of values from multiple dimensions.
o Example: Viewing sales data for a specific product, region, and time range.

5. Pivot (Rotation):
o What it does: Reorients the data cube to present data from different perspectives by rotating
dimensions.
o Example: Changing the view from sales per region to sales per product by switching rows and
columns.

--------------------------------------------------------------------------------------------------------------------------------------------------
SCHEMAS?
The basic concepts of dimensional modeling are: facts, dimensions and measures. A fact is a
collection of related data items, consisting of measures and context data. It typically represents business
items or business transactions. A dimension is a collection of data that describe one business dimension.
Dimensions determine the contextual background for the facts; they are the parameters over which we
want to perform OLAP. A measure is a numeric attribute of a fact, representing the performance or
behavior of the business relative to the dimensions. Considering Relational context, there are three basic
schemas that are used in dimensional modeling:
1. Star schema
2. Snowflake schema
3. Fact constellation schema
Star Schema
 Each dimension in a star schema is represented with only one-dimension table.
 This dimension table contains the set of attributes.
 The following diagram shows the sales data of a company with respect to the four dimensions, namely
time, item, branch, and location.
 There is a fact table at the center. It contains the keys to each of four dimensions.
 The fact table also contains the attributes, namely dollars sold and units sold.

Snowflake Schema
 Some dimension tables in the Snowflake schema are normalized.
 The normalization splits up the data into additional tables.
 Unlike Star schema, the dimensions table in a snowflake schema is normalized. For example, the item
dimension table in star schema is normalized and split into two dimension tables, namely item and
supplier table.

 Now the item dimension table contains the attributes item_key, item_name, type, brand, and supplier-
key.
 The supplier key is linked to the supplier dimension table. The supplier dimension table contains the
attributes supplier_key and supplier_type.

Fact Constellation Schema

 A fact constellation has multiple fact tables. It is also known as galaxy schema.
 The following diagram shows two fact tables, namely sales and shipping.
 The sales fact table is same as that in the star schema.
 The shipping fact table has the five dimensions, namely item_key, time_key, shipper_key,
from_location, to_location.
 The shipping fact table also contains two measures, namely dollars sold and units sold.
 It is also possible to share dimension tables between fact tables. For example, time, item, and location
dimension tables are shared between the sales and shipping fact table.

--------------------------------------------------------------------------------------------------------------------------------------------------

OLAP & OLTP DIFFERENCES?

Aspect OLAP (Online Analytical OLTP (Online Transaction Processing)

Processing)
Purpose Analytical querying and reporting Transactional operations and data entry
Data Operations Complex queries for data analysis Simple, read/write queries for daily operations
Query Complex, involving aggregation, Simple and often predefined
Complexity multi-table joins
Data Volume Handles large volumes of historical Handles smaller, real-time operational data
data
Data Type Historical, aggregated, and Current, transactional, and detailed data
summarized data
Response Time Generally slower due to complex Fast response time, optimized for quick data
queries operations
Normalization Typically uses denormalized data (e.g., Highly normalized data (to reduce redundancy)
star schema)
Concurrency Fewer users with fewer concurrent High concurrency with many users performing
queries transactions simultaneously
Usage Business intelligence, reporting, data Day-to-day business operations like order
mining processing, banking transactions
Backup & Not as frequent as OLTP, focus on Frequent, as data is more critical for daily
Recovery historical data integrity operations
Examples Data warehouses, decision support ERP systems, online booking systems
systems

--------------------------------------------------------------------------------------------------------------------------------------------------
DATAWARE HOUSE ARCHITECTURE?

Tier-1:
 The bottom tier is a warehouse database server that is almost always a relational database system. Back-
end tools and utilities are used to feed data into the bottom tier from operational databases or other
external sources (such as customer profile information provided by external consultants). These tools
and utilities perform data extraction, cleaning, and transformation (e.g., to merge similar data from
different sources into a unified format), as well as load and refresh functions to update the data
warehouse . The data are extracted using application program interfaces known as gateways. A gateway
is supported by the underlying DBMS and allows client programs to generate SQL code to be executed
at a server.

Examples of gateways include ODBC (Open Database Connection) and OLEDB (Open Linking and
Embedding for Databases) by Microsoft and JDBC (Java Database Connection).
 This tier also contains a metadata repository, which stores information aboutthe data warehouse and its
contents.

Tier-2:
 The middle tier is an OLAP server that is typically implemented using either a relational OLAP
(ROLAP) model or a multidimensional OLAP.
 OLAP model is an extended relational DBMS thatmaps operations on multidimensional data to standard
relational operations.
 A multidimensional OLAP (MOLAP) model, that is, a special-purpose server that directly implements
multidimensional data and operations.

Tier-3:
 The top tier is a front-end client layer, which contains query and reporting tools, analysis tools, and/or
data mining tools (e.g., trend analysis, prediction, and so on).

Data warehouse Architecture and its seven components

Data warehouse architecture typically consists of several layers and components designed to manage, store,
and analyze large amounts of data from different sources. The seven key components of a data warehouse
architecture are:

1. Data Sources:
o These are external systems or databases (e.g., OLTP systems, flat files, external data feeds)
from which the data warehouse extracts data. Data sources can include operational databases,
CRM, ERP systems, spreadsheets, and third-party data sources.
2. Data Extraction, Transformation, and Loading (ETL) Tools:
o Extraction: Extracts data from various sources.
o Transformation: Cleanses, formats, and transforms data into a suitable structure for the
warehouse.
o Loading: Loads the transformed data into the data warehouse.
o ETL processes ensure that data is accurate, consistent, and structured for analysis.
3. Data Staging Area:
o A temporary storage area where data is held after extraction and before transformation and
loading. It provides a space to clean and consolidate data before it moves into the data
warehouse. This stage ensures that the data is processed properly without affecting the
performance of the data warehouse.
4. Data Storage (Data Warehouse Repository):
o This is the central repository where the cleansed, integrated, and consolidated data is stored. It
can be structured in different ways:
 Normalized or denormalized databases.
 Star schema or snowflake schema.
 Data in the repository is typically organized in a multi-dimensional model (data cubes) to
facilitate analysis and reporting.
5. Metadata:
o Metadata is data about data, describing the structure, data types, relationships, and lineage of
data in the warehouse. It is crucial for understanding, managing, and using the data warehouse
effectively. Metadata is used by ETL tools, query tools, and reporting tools to process data.
6. OLAP (Online Analytical Processing) Engine:
o OLAP tools enable users to perform complex queries, reporting, and data analysis using multi-
dimensional data models. The OLAP engine allows for operations such as slicing, dicing, roll-up,
and drill-down to analyze data across various dimensions.
7. Data Access and Reporting Tools:
o These tools allow end-users to access, visualize, and analyze data stored in the data warehouse.
Common tools include:
 Business intelligence (BI) tools for dashboards and reporting.
 Query tools for ad-hoc querying and analysis.
 Data mining tools for discovering patterns and insights.
 These tools help translate the raw data into actionable insights for decision-making.

--------------------------------------------------------------------------------------------------------------------------------------------------

Knowledge Discovery in Databases (KDD)?

Knowledge Discovery in Databases refers to the overall process of discovering useful, non-trivial information
or patterns from large datasets. It involves several steps, from data preparation to the extraction of patterns,
and ultimately transforming raw data into meaningful knowledge. KDD is closely related to data mining, but
while data mining focuses on applying algorithms to extract patterns, KDD encompasses the entire process of
discovery.

Main Steps of KDD:

1. Data Selection:
o Relevant data is selected from a larger dataset for analysis. This step involves identifying which
data sources or attributes will be used in the knowledge discovery process.
2. Data Preprocessing (Cleaning and Integration):
o This step involves cleaning the data to handle missing values, noise, and inconsistencies. Data
from different sources might also be integrated into a coherent dataset. Preprocessing ensures
the data is ready for analysis.
3. Data Transformation:
o The selected and cleaned data is transformed into a suitable format for mining. This could
involve normalization, aggregation, or constructing new attributes (feature engineering) to
highlight important features in the data.
4. Data Mining:
o At this stage, algorithms are applied to the transformed data to discover patterns. Various
techniques, such as classification, clustering, association rule mining, and anomaly detection,
can be employed based on the specific goal.
5. Pattern Evaluation:
o The discovered patterns are evaluated for their significance and usefulness. Not all patterns
found are interesting or meaningful, so this step involves selecting the patterns that have the
most value.
6. Knowledge Representation:
o The final step involves representing the discovered knowledge in a comprehensible way, often
through visualization, reports, or integration into decision-support systems. This step ensures
the extracted patterns can be applied to real-world problems.

--------------------------------------------------------------------------------------------------------------------------------------------------

Data mining functionalities?

Data mining functionalities refer to the different types of tasks or operations that data mining techniques can
perform to extract patterns or insights from large datasets. The main data mining functionalities include:

1. Classification:
o Assigns data to predefined categories or classes based on attributes. This is often used in
predictive modeling where the goal is to predict the class label of new data based on past data.
o Example: Predicting whether an email is spam or not based on its content.
2. Clustering:
o Groups data into clusters of similar objects where the objects in one cluster are more similar to
each other than to those in other clusters. Unlike classification, clustering does not rely on
predefined labels.
o Example: Segmenting customers based on purchasing behavior.
3. Association Rule Mining:
o Identifies relationships or correlations between different items in a dataset. The most common
example is market basket analysis, which finds associations between products frequently
bought together.
o Example: If a customer buys bread, they are likely to also buy butter.
4. Regression:
o A predictive modeling technique used to predict a numeric value based on input data.
Regression analyzes the relationship between dependent and independent variables.
o Example: Predicting house prices based on factors like location, size, and number of rooms.
5. Anomaly Detection:
Identifies data points that deviate significantly from the norm. These outliers can indicate fraud,
o
errors, or rare events.
o Example: Detecting fraudulent transactions in financial datasets.
6. Sequential Pattern Mining:
o Finds patterns or trends in data over time. It is particularly useful for discovering time-based
associations.
o Example: Identifying customer purchasing patterns that follow specific sequences, like buying a
laptop followed by a mouse and then a keyboard.
7. Summarization:
o Provides a concise description of the dataset, often through generating summaries or
aggregated views of the data.
o Example: Summarizing the key statistics of a large dataset (e.g., average age of customers).

--------------------------------------------------------------------------------------------------------------------------------------------------
Data preprocessing techniques?

Data preprocessing is a crucial step in data mining that involves cleaning, transforming, and preparing raw
data for analysis. It ensures that the data is accurate, consistent, and ready for mining. Key data preprocessing
techniques include:

1. Data Cleaning:
o Purpose: Handles missing, inconsistent, and noisy data.
o Techniques:
 Filling in missing values (e.g., using mean, median, or most frequent value).
 Removing or correcting errors in the data.
 Smoothing noisy data using techniques like binning or clustering.
2. Data Integration:
o Purpose: Combines data from multiple sources into a unified dataset.
o Techniques:
 Merging datasets from different databases, data warehouses, or files.
 Resolving data inconsistencies and redundancies (e.g., schema matching).
3. Data Transformation:
o Purpose: Converts data into a suitable format for mining.
o Techniques:
 Normalization: Scaling data to a specific range (e.g., 0 to 1) to improve algorithm
performance.
 Aggregation: Summarizing data (e.g., monthly sales totals instead of daily).
 Feature Selection/Creation: Reducing the number of features or creating new relevant
features for better analysis.
4. Data Reduction:
o Purpose: Reduces the volume of data while retaining important information.
o Techniques:
 Dimensionality reduction: Using techniques like Principal Component Analysis (PCA) to
reduce the number of variables.
 Data compression: Using coding schemes to reduce data size.
 Sampling: Selecting a subset of data for analysis when working with large datasets.
5. Data Discretization:
o Purpose: Converts continuous data into discrete intervals or categories.
o Techniques:
Binning: Dividing data into bins or ranges.

Histogram analysis: Grouping data based on frequency distributions.

6. Data Encoding:
o Purpose: Converts categorical data into a numerical format.
o Techniques:
 Label encoding: Assigning numerical values to categories.
 One-hot encoding: Creating binary variables for each category.

--------------------------------------------------------------------------------------------------------------------------------------------------

Data Warehousing Unit 1,2
No ratings yet
Data Warehousing Unit 1,2
9 pages
DWM Unit 1 (2023)
No ratings yet
DWM Unit 1 (2023)
38 pages
Data Warehouse and Mining
No ratings yet
Data Warehouse and Mining
7 pages
Define Data Warehouse. Differentiate Between OLTP and OLAP Databases
No ratings yet
Define Data Warehouse. Differentiate Between OLTP and OLAP Databases
6 pages
Unit I DMT
No ratings yet
Unit I DMT
74 pages
Unit 2
No ratings yet
Unit 2
32 pages
CST466-M1 - Ktunotes - in
No ratings yet
CST466-M1 - Ktunotes - in
24 pages
DWM Unit 1
No ratings yet
DWM Unit 1
67 pages
Data Warehouse
No ratings yet
Data Warehouse
71 pages
DWDM Notes
No ratings yet
DWDM Notes
19 pages
Unit 2 DATA WAREHOUSE AND DATA MART
No ratings yet
Unit 2 DATA WAREHOUSE AND DATA MART
17 pages
Unit 2
No ratings yet
Unit 2
34 pages
R20-DMT Unit-I
No ratings yet
R20-DMT Unit-I
24 pages
Mod1 Data Warehouse
No ratings yet
Mod1 Data Warehouse
30 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
46 pages
Introduction To Datawarehousing: Duration: 45 Minutes (Approx.) Abhishek Ranjan
No ratings yet
Introduction To Datawarehousing: Duration: 45 Minutes (Approx.) Abhishek Ranjan
32 pages
What Is A Data Warehouse?
No ratings yet
What Is A Data Warehouse?
47 pages
2.data Warehouse and OLAP
No ratings yet
2.data Warehouse and OLAP
14 pages
MultiDimensional Data Model
No ratings yet
MultiDimensional Data Model
22 pages
Chapter 2.introduction To Data Warehouse
No ratings yet
Chapter 2.introduction To Data Warehouse
49 pages
Unit 1
No ratings yet
Unit 1
36 pages
FALLSEM2023-24 CSI3010 ETH VL2023240104197 2023-07-28 Reference-Material-I
No ratings yet
FALLSEM2023-24 CSI3010 ETH VL2023240104197 2023-07-28 Reference-Material-I
32 pages
Vicon DIM IV
No ratings yet
Vicon DIM IV
26 pages
DATA WAREHOUSE Basic Concepts
No ratings yet
DATA WAREHOUSE Basic Concepts
26 pages
Dimensional Modelling
No ratings yet
Dimensional Modelling
5 pages
Unit 2 Datawarehouse
No ratings yet
Unit 2 Datawarehouse
58 pages
Unit-2 1
No ratings yet
Unit-2 1
60 pages
Data - Mining - Warehousing Unit 1
No ratings yet
Data - Mining - Warehousing Unit 1
35 pages
Data Warehouse Concepts & Terminology: - Vamshi Myana
No ratings yet
Data Warehouse Concepts & Terminology: - Vamshi Myana
39 pages
Data Warehouse - Unit-2 - S
No ratings yet
Data Warehouse - Unit-2 - S
21 pages
DWDM Set-2
No ratings yet
DWDM Set-2
55 pages
Data - Mining - Warehousing Unit I
No ratings yet
Data - Mining - Warehousing Unit I
45 pages
DWDM Questions
No ratings yet
DWDM Questions
8 pages
04OLAP
No ratings yet
04OLAP
50 pages
UNIT-1 (RIT-062) : Data Warehousing
No ratings yet
UNIT-1 (RIT-062) : Data Warehousing
34 pages
UEU Sistem Pendukung Keputusan Pertemuan 5
No ratings yet
UEU Sistem Pendukung Keputusan Pertemuan 5
46 pages
Data Engineering 101 Sample DW
No ratings yet
Data Engineering 101 Sample DW
23 pages
Unit-1 Lecture Notes
100% (1)
Unit-1 Lecture Notes
43 pages
Data Warehousing: Data Models and OLAP Operations: Lecture-1
No ratings yet
Data Warehousing: Data Models and OLAP Operations: Lecture-1
47 pages
Datascience Unit 02 1
No ratings yet
Datascience Unit 02 1
53 pages
Lecture 4 (Dataware Housing)
No ratings yet
Lecture 4 (Dataware Housing)
50 pages
FALLSEM2023-24 CSI3010 ETH VL2023240104197 2023-07-26 Reference-Material-I
No ratings yet
FALLSEM2023-24 CSI3010 ETH VL2023240104197 2023-07-26 Reference-Material-I
28 pages
OLAP (Online Analytical Processing) : Zalpa Rathod (39) Yatin Puthran (37) Mayuri Pawar (35) Mitesh Patil
No ratings yet
OLAP (Online Analytical Processing) : Zalpa Rathod (39) Yatin Puthran (37) Mayuri Pawar (35) Mitesh Patil
37 pages
ML Module1
No ratings yet
ML Module1
56 pages
Data Warehousing Interview Q&A
No ratings yet
Data Warehousing Interview Q&A
14 pages
Data Warehousing: Data Models and OLAP Operations
No ratings yet
Data Warehousing: Data Models and OLAP Operations
41 pages
04OLAP
No ratings yet
04OLAP
66 pages
Concepts and Techniques: - Chapter 4
No ratings yet
Concepts and Techniques: - Chapter 4
58 pages
Concepts and Techniques: - Chapter 4
No ratings yet
Concepts and Techniques: - Chapter 4
58 pages
Data Warehousing & OLAP Guide
No ratings yet
Data Warehousing & OLAP Guide
20 pages
Chapter 2 and 3
No ratings yet
Chapter 2 and 3
89 pages
DW Concepts
100% (1)
DW Concepts
40 pages
DW&DM Material
No ratings yet
DW&DM Material
107 pages
DWDM IT-32 DATAWAREHOUSING & DATAMINING
No ratings yet
DWDM IT-32 DATAWAREHOUSING & DATAMINING
9 pages
Data Warehousing, Business Analytics and Online Analytical - 1
No ratings yet
Data Warehousing, Business Analytics and Online Analytical - 1
35 pages
Software Testing
No ratings yet
Software Testing
211 pages
VDR G4 (E) Voyage Data Recorder VDR G4 (E) Technical Manual For VDR G4 (E)
No ratings yet
VDR G4 (E) Voyage Data Recorder VDR G4 (E) Technical Manual For VDR G4 (E)
300 pages
Course Outline ECON2236 - 2021
No ratings yet
Course Outline ECON2236 - 2021
4 pages
Pressurements - C Series
No ratings yet
Pressurements - C Series
6 pages
CV Nataraj M 19
No ratings yet
CV Nataraj M 19
4 pages
70mai Dash Cam Installation Guide
No ratings yet
70mai Dash Cam Installation Guide
1 page
Technical Milestones & Deliverables
No ratings yet
Technical Milestones & Deliverables
7 pages
SmallCellSIG02.04.19 David Rose Veea
No ratings yet
SmallCellSIG02.04.19 David Rose Veea
19 pages
EtherCAT Training PDF
No ratings yet
EtherCAT Training PDF
68 pages
NetworkBuffers TheBSD Unix SVR4 and Linux Approach - SKB Reduced
No ratings yet
NetworkBuffers TheBSD Unix SVR4 and Linux Approach - SKB Reduced
124 pages
Lifestyle Vision Board: Manifest and Brainstorm With This Whiteboard
No ratings yet
Lifestyle Vision Board: Manifest and Brainstorm With This Whiteboard
17 pages
Operating Systems Lab Manual (2023-2024)
No ratings yet
Operating Systems Lab Manual (2023-2024)
104 pages
Cristallo 400 SP LR 1
No ratings yet
Cristallo 400 SP LR 1
2 pages
XXX
No ratings yet
XXX
8 pages
3 Designing and Developing Social Advocacy
No ratings yet
3 Designing and Developing Social Advocacy
58 pages
Gujarat University
No ratings yet
Gujarat University
2 pages
Mobile Application Development PDF
100% (1)
Mobile Application Development PDF
21 pages
MAGNETOM Aera A Tim+Dot 64-48-24-Channel System Datasheet Syngo MR D13 A
No ratings yet
MAGNETOM Aera A Tim+Dot 64-48-24-Channel System Datasheet Syngo MR D13 A
162 pages
CSE101 Assignment 02
No ratings yet
CSE101 Assignment 02
4 pages
DIFFRAC - EVA Release Notes
No ratings yet
DIFFRAC - EVA Release Notes
2 pages
Etech Q1 M3
No ratings yet
Etech Q1 M3
11 pages
Microsoft Dynamics CRM IG Operating
No ratings yet
Microsoft Dynamics CRM IG Operating
59 pages
MP Unit 5
No ratings yet
MP Unit 5
76 pages
Function Approximations Using Matlab
No ratings yet
Function Approximations Using Matlab
11 pages
Sathyabama University Mech Syllabus (Highlighted Copy)
No ratings yet
Sathyabama University Mech Syllabus (Highlighted Copy)
109 pages
V Par Modify
No ratings yet
V Par Modify
6 pages
File Classifier User Guide
No ratings yet
File Classifier User Guide
16 pages
Sdu 6
No ratings yet
Sdu 6
7 pages
Computerized Enrollment System
No ratings yet
Computerized Enrollment System
18 pages
Power System Automation Solutions
No ratings yet
Power System Automation Solutions
64 pages

DWDM Mid 1

Uploaded by

DWDM Mid 1

Uploaded by

OLAP operations?

Fact Constellation Schema

OLAP & OLTP DIFFERENCES?

Aspect OLAP (Online Analytical OLTP (Online Transaction Processing)

Data warehouse Architecture and its seven components

Knowledge Discovery in Databases (KDD)?

Main Steps of KDD:

Data mining functionalities?

You might also like