0% found this document useful (0 votes)

40 views18 pages

DWM QB Soln

Uploaded by

Madhura Kanse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views18 pages

DWM QB Soln

Uploaded by

Madhura Kanse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 18

1)Explain features of Data Warehouse

1. Subject-Oriented

 Focus on Subjects: Data warehouses are organized around major subjects

like sales, customers, or products. This helps in analyzing data based on
specific themes rather than just transactions.
 Subject-Specific Data: Data is categorized and stored by subjects, making
it easier to generate reports and conduct analysis.

2. Integrated

 Data Integration: Data from different sources is consolidated, cleaned, and

stored in a consistent format in the data warehouse. This ensures that data
from various operational systems is uniform and comparable.
 Consistency: Different data sources might use different conventions,
formats, and units. The data warehouse standardizes these, ensuring that
all data is consistent.

3. Time-Variant

 Historical Data: Data warehouses store historical data, which means they
can store snapshots of data over time. This is crucial for trend analysis,
forecasting, and reporting.
 Time Stamps: Data in a warehouse is associated with time periods,
enabling the analysis of trends over various periods.

4. Non-Volatile

 Stable Data: Once data is entered into the data warehouse, it is not
changed or deleted. This ensures data consistency over time.
 Read-Only Access: Data warehouses are primarily read-only, meaning data
is rarely altered or deleted after it is loaded.

5. Data Granularity

 Detailed and Summary Data: A data warehouse contains data at various

levels of granularity. It stores both highly detailed data for deep analysis
and summarized data for quick insights.
 Flexible Queries: Users can query data at different levels of detail,
depending on their needs.

6. Optimized for Query Performance

 Efficient Query Processing: Data warehouses are optimized to handle

complex queries quickly, enabling fast retrieval of large datasets.
 Indexing and Partitioning: Data is often indexed and partitioned to improve
query performance, allowing faster access to specific data segments.

7. Support for Business Intelligence Tools

 Integration with BI Tools: Data warehouses are designed to work

seamlessly with business intelligence (BI) tools like reporting software,
OLAP (Online Analytical Processing) tools, and data mining applications.
 Decision Support: They provide the data infrastructure required for
decision-making processes in an organization.

8. Scalability

 Handling Large Volumes of Data: Data warehouses are scalable, meaning

they can grow in storage capacity and processing power as the amount of
data increases.
 Distributed Architecture: Many data warehouses use a distributed
architecture, allowing them to handle large volumes of data efficiently.

9. Data Mart Integration

 Subsets of Data Warehouses: Data marts are smaller, more focused

versions of data warehouses that cater to specific business lines or
departments. They can be integrated into a data warehouse to provide
specialized data access.
 Departmental Focus: Data marts allow departments to access relevant
data quickly without querying the entire data warehouse.
10. Metadata Management

 Data about Data: Metadata in a data warehouse describes the data's

source, transformation processes, structure, and usage. It helps users
understand the data and how to use it effectively.
 User-Friendly Access: Metadata management tools in a data warehouse
make it easier for users to find and understand the data they need.

2)Top down vs bottom up approach for building data

warehouse

Top-Down Approach

 Initiated by: Enterprise-wide data warehouse design.

 Development Sequence: First, create a centralized data warehouse, then
develop data marts.
 Data Flow: Data is extracted from operational systems, transformed, and
loaded into the centralized data warehouse, then segmented into data
marts.
 Complexity: More complex and time-consuming due to the need for
comprehensive design upfront.
 Advantages: Ensures consistency and uniformity across the organization;
ideal for large enterprises.
 Disadvantages: High initial costs, longer implementation time.

Bottom-Up Approach

 Initiated by: Individual data marts development.

 Development Sequence: First, build data marts for specific departments,
then integrate them into a centralized data warehouse.
 Data Flow: Data is extracted and loaded into individual data marts, which
are then combined into a data warehouse.
 Complexity: Simpler and faster to implement, with a focus on specific
business needs first.
 Advantages: Quick implementation, lower initial costs; allows for gradual
expansion.
 Disadvantages: Risk of data inconsistency; potential challenges in
integrating data marts later into a cohesive data warehouse.
Top-Down Design Approach Bottom-Up Design Approach

Breaks the vast problem into Solves the essential low-level

smaller subproblems. problem and integrates them into a
higher one.

Inherently architected- not a union Inherently incremental; can

of several data marts. schedule essential data marts first.

Single, central storage of Departmental information stored.

information about the content.

Centralized rules and control. Departmental rules and control.

It includes redundant information. Redundancy can be removed.

It may see quick results if Less risk of failure, favorable return

implemented with repetitions. on investment, and proof of
techniques.

3)Define metadata and explain the types of metadata

Metadata is "data about data." It provides information that describes other

data, helping to understand, manage, and use the data effectively. Metadata
serves as a blueprint that helps in organizing, finding, and managing data
within a system.
Roles of Metadata

 Data Discovery and Search

o Helps users find relevant data quickly through descriptions, tags, and
classifications.
 Data Understanding
o Provides context and meaning to data, helping users interpret and
understand the information.
 Data Management
o Facilitates data organization, storage, and retrieval by describing data
structures, relationships, and formats.
 Data Quality and Consistency
o Ensures accuracy by documenting data sources, transformations, and
validation rules.
 Data Integration
o Enables the combination of data from different sources by providing a
common reference and mapping of data fields.
 Data Governance
o Supports compliance and security by detailing access rights, usage
restrictions, and data lineage.
 Data Analysis and Reporting
o Assists in generating reports and analyses by providing templates,
business rules, and examples.
 Data Operations Monitoring
o Helps track and optimize data processing workflows through
operational metadata on load times, errors, and performance.
 Data Preservation
o Aids in long-term data storage and retrieval by documenting version
history, formats, and preservation strategies.

Types of Metadata
1. Operational Metadata

 Definition: Describes the processes and operations involved in managing

data within a system.
 Details: Includes information on data load times, data source refresh rates,
error logs, and processing times.
 Usage: Helps in monitoring and optimizing data processing workflows,
ensuring smooth and efficient data operations.

2. Extractional and Transformational Metadata

 Definition: Describes the processes of extracting and transforming data

from various sources before loading it into a data warehouse.
 Extractional Metadata:
o Details: Information about the sources of data, extraction methods,
data formats, and timestamps of data extraction.
o Usage: Helps track where data comes from and how it's extracted,
aiding in troubleshooting and ensuring data accuracy.
 Transformational Metadata:
o Details: Describes the rules and algorithms used to cleanse, format,
and transform data into the desired structure.
o Usage: Critical for understanding how raw data is converted into the
standardized format used in the data warehouse, ensuring data
consistency and quality.

3. End-User Metadata

 Definition: Metadata that provides context and understanding for end

users to effectively interact with and analyze data.
 Details: Includes data definitions, business rules, report templates, and
query examples tailored for user interpretation.
 Usage: Facilitates easier access to data for non-technical users, helping
them understand what the data represents and how to use it effectively in
reports or analyses.
4) Difference between olap and oltp

Criteria OLAP OLTP

Purpose OLAP helps you OLTP helps you

analyze large volumes manage and process
of data to support real-time transactions.
decision-making.

Data source OLAP uses historical OLTP uses real-time

and aggregated data and transactional data
from multiple sources. from a single source.

Data structure OLAP uses OLTP uses relational

multidimensional databases.
(cubes) or relational
databases.

Data model OLAP uses star OLTP uses normalized

schema, snowflake or denormalized
schema, or other models.
analytical models.
Volume of data OLAP has large OLTP has
storage requirements. comparatively smaller
Think terabytes (TB) storage requirements.
and petabytes (PB). Think gigabytes (GB).

Response time OLAP has longer OLTP has shorter

response times, response times,
typically in seconds or typically in
minutes. milliseconds

Example applications OLAP is good for OLTP is good for

analyzing trends, processing payments,
predicting customer customer data
behavior, and management, and
identifying order processing.
profitability.

5) Data Warehouse vs Data Mart

Data Warehouse Data Mart

A Data Warehouse is a vast A data mart is an only

repository of information subtype of a Data
collected from various Warehouses. It is
organizations or architecture to meet the
departments within a requirement of a specific
corporation. user group.

It may hold multiple subject It holds only one subject

areas. area. For example, Finance
or Sales.

It holds very detailed It may hold more

information. summarized data.

Works to integrate all data It concentrates on

sources integrating data from a given
subject area or set of source
systems.

In data warehousing, Fact In Data Mart, Star Schema

constellation is used. and Snowflake Schema are
used.

It is a Centralized System. It is a Decentralized System.

Data Warehousing is the Data Marts is a project-

data-oriented. oriented.
6) Operations and examples of olap cube

OLAP stands for Online Analytical Processing Server. It is a software

technology that allows users to analyze information from multiple
database systems at the same time. It is based on multidimensional
data model and allows the user to query on multi-dimensional data
(eg. Delhi -> 2018 -> Sales data). OLAP databases are divided into one
or more cubes and these cubes are known as Hyper-cubes.

OLAP operations:

There are five basic analytical operations that can be performed on

an OLAP cube:

Drill down: In drill-down operation, the less detailed data is

converted into highly detailed data. It can be done by:

o Moving down in the concept hierarchy

o Adding a new dimension

In the cube given in overview section, the drill down operation is

performed by moving down in the concept hierarchy of Time

dimension (Quarter -> Month).

Roll up: It is just opposite of the drill-down operation. It performs

aggregation on the OLAP cube. It can be done by:

o Climbing up in the concept hierarchy

o Reducing the dimensions

In the cube given in the overview section, the roll-up operation is

performed by climbing up in the concept hierarchy of Location

dimension (City -> Country).

Dice: It selects a sub-cube from the OLAP cube by selecting two or

more dimensions. In the cube given in the overview section, a sub-

cube is selected by selecting following dimensions with criteria:

o Location = “Delhi” or “Kolkata”

o Time = “Q1” or “Q2”

o Item = “Car” or “Bus”

Slice: It selects a single dimension from the OLAP cube which results

in a new sub-cube creation. In the cube given in the overview section,

Slice is performed on the dimension Time = “Q1”.

Pivot: It is also known as rotation operation as it rotates the current

view to get a new view of the representation. In the sub-cube

obtained after the slice operation, performing pivot operation gives a

new view of it.

7) ETL process
The ETL (Extract, Transform, Load) process is a fundamental part of data

warehousing. It involves extracting data from various source systems,

transforming it into a format suitable for analysis, and loading it into a data

warehouse. Here’s a breakdown of each stage:

1. Extract

 Purpose: The extraction phase involves pulling data from various source

systems, which can include databases, ERP systems, CRM systems, flat

files, cloud services, or other data repositories.

 Process:

o Identify Data Sources: Determine the source systems from which data

will be extracted.

o Data Collection: Use connectors or APIs to collect data from these

sources. This might involve querying databases, extracting files, or

streaming data from real-time sources.

o Data Handling: During extraction, it's essential to handle large volumes

of data efficiently and manage different data formats.

 Example: Extracting customer transaction data from an e-commerce

platform’s database and sales data from a CRM system.

2. Transform

 Purpose: The transformation phase cleanses, standardizes, and organizes

the extracted data to make it suitable for analysis in the data warehouse.

 Process:

o Data Cleansing: Remove duplicates, handle missing data, correct

errors, and filter out irrelevant information.

o Data Standardization: Convert data into a consistent format (e.g., date

formats, units of measure) to ensure uniformity across the dataset.

o Data Integration: Combine data from multiple sources, resolve data

conflicts, and consolidate related data into a single unified view.

o Data Enrichment: Add derived or calculated data, such as aggregating

sales data to calculate monthly totals or adding geographic data based

on customer addresses.

o Business Rules Application: Apply business logic to transform data

into a form that aligns with the organization's requirements.

 Example: Converting all date fields into a standard format (e.g., YYYY-MM-

DD), integrating customer data from different regions, and calculating

total sales per region.

3. Load

 Purpose: The final stage involves loading the transformed data into the

data warehouse, where it can be stored, indexed, and made available for

querying and analysis.

 Process:

o Data Loading: Insert the transformed data into the data warehouse,

which might involve bulk loading large datasets or incremental loading

(e.g., updating only changed records).

o Data Indexing: Index the data for faster querying and retrieval.

o Data Validation: Perform checks to ensure that the data has been

loaded correctly and that it meets the necessary quality standards.

o Data Partitioning: Optionally, partition the data in the warehouse to

optimize performance.

 Example: Loading the cleansed and standardized customer transaction

data into a sales fact table in the data warehouse, where it can be analyzed

by business intelligence tools.

8)Central tendancy numerical

(Find mean, median, mode, variance and midrange value)
9) Hierarchical clustering

10)K-means clustering algorithm + numerical

Data Mining
No ratings yet
Data Mining
25 pages
Big Query
No ratings yet
Big Query
8 pages
Datawarehouse Unit-2
No ratings yet
Datawarehouse Unit-2
59 pages
Overview of Data Warehousing and OLAP
No ratings yet
Overview of Data Warehousing and OLAP
12 pages
Unit 01 Data Warehousing
No ratings yet
Unit 01 Data Warehousing
45 pages
Unit 1
No ratings yet
Unit 1
22 pages
6th - SEM Data Science Notes
No ratings yet
6th - SEM Data Science Notes
46 pages
Data Warehousing Introduction Pages 2 53
No ratings yet
Data Warehousing Introduction Pages 2 53
52 pages
Unit 2
No ratings yet
Unit 2
39 pages
Chapter 2
No ratings yet
Chapter 2
44 pages
Course Overview: What Is Data Warehouse
No ratings yet
Course Overview: What Is Data Warehouse
75 pages
Unit 1
No ratings yet
Unit 1
99 pages
Dataware Housing Notes
No ratings yet
Dataware Housing Notes
134 pages
Unit I
No ratings yet
Unit I
36 pages
BD&CC Unit2
No ratings yet
BD&CC Unit2
14 pages
Data Warehouse
No ratings yet
Data Warehouse
19 pages
Data Warehouse - BSA 1st Year For BCA
No ratings yet
Data Warehouse - BSA 1st Year For BCA
20 pages
Unit - 1 Introduction To Data Warehousing
No ratings yet
Unit - 1 Introduction To Data Warehousing
57 pages
DWDM Imp Qnotes - Mid1
No ratings yet
DWDM Imp Qnotes - Mid1
31 pages
1 & 2 Data Warehousing - 021052
No ratings yet
1 & 2 Data Warehousing - 021052
80 pages
CSE 530 - Database Management Systems: Data Warehousing Presentation by Ali Gardezi Prashanth Janardanan Aaron Sheffield
No ratings yet
CSE 530 - Database Management Systems: Data Warehousing Presentation by Ali Gardezi Prashanth Janardanan Aaron Sheffield
69 pages
Unit 1
No ratings yet
Unit 1
18 pages
DWDM Notes/Unit 1
No ratings yet
DWDM Notes/Unit 1
31 pages
Unit 2 Data Mining & Warehouse
No ratings yet
Unit 2 Data Mining & Warehouse
40 pages
Need of Two Types of Data: Information
No ratings yet
Need of Two Types of Data: Information
7 pages
Data Warehousing & OLAP Essentials
No ratings yet
Data Warehousing & OLAP Essentials
65 pages
Unit 2 Updated
No ratings yet
Unit 2 Updated
50 pages
DWDM202
No ratings yet
DWDM202
6 pages
Unit - 2
No ratings yet
Unit - 2
116 pages
CS2202 DataWarehouse OLAP
No ratings yet
CS2202 DataWarehouse OLAP
49 pages
Data Repositories in Data Analytics
No ratings yet
Data Repositories in Data Analytics
8 pages
Business Intelligence
No ratings yet
Business Intelligence
17 pages
Antim Prahar Business Data Warehousing Data Mining 2024
No ratings yet
Antim Prahar Business Data Warehousing Data Mining 2024
65 pages
Data Warehouse Components
No ratings yet
Data Warehouse Components
26 pages
Data Mining UNIT 2 LECTURE NOTES
No ratings yet
Data Mining UNIT 2 LECTURE NOTES
32 pages
chp15 16 17 Warehouse NoSQL
No ratings yet
chp15 16 17 Warehouse NoSQL
38 pages
DMDW1
No ratings yet
DMDW1
13 pages
Data Warehousing Essentials Guide
No ratings yet
Data Warehousing Essentials Guide
5 pages
DM Unit 2
No ratings yet
DM Unit 2
21 pages
Data Mining - 1.
No ratings yet
Data Mining - 1.
34 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
135 pages
Data Warehouse Fundamentals
No ratings yet
Data Warehouse Fundamentals
30 pages
Data Warehousing for Analysts
No ratings yet
Data Warehousing for Analysts
26 pages
DH&DM Unit-1
No ratings yet
DH&DM Unit-1
16 pages
DW Part A Part B Notes
No ratings yet
DW Part A Part B Notes
69 pages
Data Mining Course: Data Warehousing
No ratings yet
Data Mining Course: Data Warehousing
98 pages
Unit 3
No ratings yet
Unit 3
7 pages
DWM Exp1
No ratings yet
DWM Exp1
16 pages
Data Warehousing Lecture Notes
No ratings yet
Data Warehousing Lecture Notes
38 pages
Data Warehousing Unit 1,2
No ratings yet
Data Warehousing Unit 1,2
9 pages
1.1 Basic Concepts & Architecture
No ratings yet
1.1 Basic Concepts & Architecture
27 pages
Unit 1
No ratings yet
Unit 1
18 pages
Data Warehousing and Management Midterms
No ratings yet
Data Warehousing and Management Midterms
9 pages
1a Ravi
No ratings yet
1a Ravi
37 pages
DW Unit I Notes
No ratings yet
DW Unit I Notes
28 pages
100 Important Questions With Solutions For Data Warehousing & Data Mining (BCS058)
No ratings yet
100 Important Questions With Solutions For Data Warehousing & Data Mining (BCS058)
119 pages
Unit I
No ratings yet
Unit I
90 pages
Unit 1 DWDM
No ratings yet
Unit 1 DWDM
122 pages
Module 5 - Cryptocurrencies and Digital Tokens
No ratings yet
Module 5 - Cryptocurrencies and Digital Tokens
87 pages
CSS Exp 1
No ratings yet
CSS Exp 1
10 pages
Module4 (A) - Private Blockchain
No ratings yet
Module4 (A) - Private Blockchain
24 pages
CCL Exp 2
No ratings yet
CCL Exp 2
10 pages
Autoencoders
No ratings yet
Autoencoders
103 pages
Merkle Tree
No ratings yet
Merkle Tree
10 pages
Data Knowledge Management
No ratings yet
Data Knowledge Management
9 pages
CNN Module-4 2
No ratings yet
CNN Module-4 2
304 pages
Module 2 - Consensus Protocol
No ratings yet
Module 2 - Consensus Protocol
49 pages
Module 1 Notes
No ratings yet
Module 1 Notes
23 pages
Mod 6 - BT Applications
No ratings yet
Mod 6 - BT Applications
50 pages
DL Module 2
No ratings yet
DL Module 2
148 pages
NLP Exp 2
No ratings yet
NLP Exp 2
4 pages
DL Exp 1
No ratings yet
DL Exp 1
4 pages
Exp 10
No ratings yet
Exp 10
4 pages
Sepm Exp1
No ratings yet
Sepm Exp1
5 pages
Be Computer Engineering Semester 3 2017 December Digital Logic Design and Analysis Cbcgs
No ratings yet
Be Computer Engineering Semester 3 2017 December Digital Logic Design and Analysis Cbcgs
25 pages
Be Computer Engineering Semester 3 2018 May Applied Mathematics III Cbcgs
No ratings yet
Be Computer Engineering Semester 3 2018 May Applied Mathematics III Cbcgs
19 pages
DWM Exp1
No ratings yet
DWM Exp1
12 pages
SE-Comps SEM4 CG MAY16
No ratings yet
SE-Comps SEM4 CG MAY16
1 page
Be Computer Engineering Semester 3 2018 December Applied Mathematics III Cbcgs
No ratings yet
Be Computer Engineering Semester 3 2018 December Applied Mathematics III Cbcgs
20 pages
Be First Year Engineering Semester 1 2022 December Engineering Chemistry I Chem1rev 2019c Scheme
No ratings yet
Be First Year Engineering Semester 1 2022 December Engineering Chemistry I Chem1rev 2019c Scheme
28 pages
SE-Mini Project Report Format-1
No ratings yet
SE-Mini Project Report Format-1
18 pages
Be Computer-Engineering Semester-4 2019 December Applied-Mathematics-Iv-Cbcgs
No ratings yet
Be Computer-Engineering Semester-4 2019 December Applied-Mathematics-Iv-Cbcgs
20 pages
Be - Computer Engineering - Semester 4 - 2019 - December - Computer Organization and Architecture Cbcgs
No ratings yet
Be - Computer Engineering - Semester 4 - 2019 - December - Computer Organization and Architecture Cbcgs
23 pages
DSS TestBank
100% (1)
DSS TestBank
162 pages
Fundamentals of Data Engineering
No ratings yet
Fundamentals of Data Engineering
16 pages
Time Series Data Warehouse For CDH 6
No ratings yet
Time Series Data Warehouse For CDH 6
18 pages
Data Mining & Techniques Guide
No ratings yet
Data Mining & Techniques Guide
108 pages
DM Unit-1 Notes
No ratings yet
DM Unit-1 Notes
47 pages
Customer Database Is
No ratings yet
Customer Database Is
4 pages
Management Information Systems Unit - 3 Notes-1
No ratings yet
Management Information Systems Unit - 3 Notes-1
13 pages
Dbms
No ratings yet
Dbms
13 pages
Howto Configurethe Authoringand DWHModels
No ratings yet
Howto Configurethe Authoringand DWHModels
156 pages
CH 16 Data and Competitive Advantage
No ratings yet
CH 16 Data and Competitive Advantage
48 pages
DWM Notes
No ratings yet
DWM Notes
27 pages
Learning Datawarehouse, Informatica, Odi, Obiee.....
No ratings yet
Learning Datawarehouse, Informatica, Odi, Obiee.....
4 pages
DWM Te QP
No ratings yet
DWM Te QP
7 pages
1 s2.0 S187705092201835X Main
No ratings yet
1 s2.0 S187705092201835X Main
8 pages
IBM Banking & Financial Markets Data Warehouse
No ratings yet
IBM Banking & Financial Markets Data Warehouse
40 pages
Datawarehouse Architecture Business Analysis Framework
No ratings yet
Datawarehouse Architecture Business Analysis Framework
7 pages
MCA III Sem Scheme Syllabus
No ratings yet
MCA III Sem Scheme Syllabus
14 pages
SAP Training Courses Overview
No ratings yet
SAP Training Courses Overview
66 pages
How To Land On Azure Data Engineer Job
No ratings yet
How To Land On Azure Data Engineer Job
5 pages
Accumulated Snapshot Fact Table
100% (1)
Accumulated Snapshot Fact Table
2 pages
Data Preprocessing
No ratings yet
Data Preprocessing
22 pages
Business Intelligence
No ratings yet
Business Intelligence
98 pages
Data Engineering Fundamentals
No ratings yet
Data Engineering Fundamentals
29 pages
Low Level Design
0% (1)
Low Level Design
22 pages
PDF Export
No ratings yet
PDF Export
110 pages
Design Science Methodology for IS
100% (1)
Design Science Methodology for IS
53 pages
Data Management Officer II Position
No ratings yet
Data Management Officer II Position
7 pages
Associate Analytics m2 FG
No ratings yet
Associate Analytics m2 FG
140 pages
IT Students' Practical Guide
No ratings yet
IT Students' Practical Guide
94 pages
BIDM: The Business Intelligence Development Model
No ratings yet
BIDM: The Business Intelligence Development Model
14 pages