0% found this document useful (0 votes)

14 views15 pages

Data Warehouse 9 Oct

Data warehousing, conceptualized in the late 1980s, is a relational database management system designed to support decision-making by providing a centralized repository of integrated historical data. It features characteristics such as being subject-oriented, integrated, non-volatile, and time-variant, and includes components like ETL tools, metadata, and query tools. Various architectures exist for data warehouses, including single-tier, two-tier, and three-tier structures, with data marts serving as subsets tailored for specific business areas.

Uploaded by

soumyasree

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views15 pages

Data Warehouse 9 Oct

Uploaded by

soumyasree

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 15

DATA WAREHOUSE

The idea of data warehousing came to the late 1980's when IBM researchers Barry
Devlin and Paul Murphy established the "Business Data Warehouse."

In essence, the data warehousing idea was planned to support an architectural model for
the flow of information from the operational system to decisional support
environments. The concept attempt to address the various problems associated with
the flow, mainly the high costs associated with it.

Data Warehouse is a relational database management system (RDBMS) construct

to meet the requirement of transaction processing systems. It can be loosely
described as any centralized data repository which can be queried for business
benefits. It is a database that stores information oriented to satisfy decision-making
requests. So, Data Warehousing support architectures and tool for business executives
to systematically organize, understand and use their information to make strategic
decisions.

Data Warehouse environment contains an extraction, transportation, and loading

(ETL) solution, an online analytical processing (OLAP) engine, customer analysis
tools, and other applications that handle the process of gathering information and
delivering it to business users.

What is a Data Warehouse?

A Data Warehouse (DW) is a relational database that is designed for query and analysis
rather than transaction processing. It includes historical data derived from transaction
data from single and multiple sources.

A Data Warehouse provides integrated, enterprise-wide, historical data and focuses

on providing support for decision-makers for data modeling and analysis.

Data Warehouse is a subject-oriented, integrated, non-volatile and time-variant

store of information in support of management's decisions."

Characteristics of Data Warehouse

Subject-Oriented
A data warehouse target on the modeling and analysis of data for decision-makers.
Therefore, data warehouses typically provide a concise and straightforward view
around a particular subject, such as customer, product, or sales, instead of the global
organization's ongoing operations. This is done by excluding data that are not useful
concerning the subject and including all data needed by the users to understand the
subject.

Integrated
A data warehouse integrates various heterogeneous data sources like RDBMS, flat
files, and online transaction records. It requires performing data cleaning and integration
during data warehousing to ensure consistency in naming conventions, attributes types,
etc., among different data sources.
Non-Volatile
The data warehouse is a physically separate data storage, which is transformed
from the source operational RDBMS. Once the data is entered into the warehouse, it
should not be changed, ie, data within the warehouse should not be edited.

Time-Variant
Historical information is kept in a data warehouse. For example, one can retrieve files
from 3 months, 6 months, 12 months, or even previous data from a data warehouse.
These variations with a transactions system, where often only the most current file is
kept.

Database Data Warehouse

1. It is used for Online Transactional 1. It is used for Online Analytical

Processing (OLTP) but can be used for Processing (OLAP).
other objectives such as Data
Warehousing.
2. The tables and joins are complicated 2. The tables and joins are accessible
since they are normalized for RDBMS. This since they are de-normalized. This is
is done to reduce redundant files and to done to minimize the response time for
save storage space. analytical queries.

3. Data is dynamic 3. Data is largely static

4. Entity: Relational modeling procedures 4. Data: Modeling approach are used

are used for RDBMS database design. for the Data Warehouse design.

5. Optimized for write operations. 5. Optimized for read operations.

6. Performance is low for analysis queries. 6. High performance for analytical

queries.

7. The database is the place where the 7. Data Warehouse is the place where
data is taken as a base and managed to the application data is handled for
get available fast and efficient access. analysis and reporting objectives.

Architecture of a Data Warehouse

Three common architectures are:

o Data Warehouse Architecture: Basic

o Data Warehouse Architecture: With Staging Area
o Data Warehouse Architecture: With Staging Area and Data Marts

Data Warehouse Architecture: Basic

 An operational system is a method used in data warehousing to refer to
a system that is used to process the day-to-day transactions of an organization.

 Flat Files: A Flat file system is a system of files in which transactional data is
stored, and every file in the system must have a different name.

 Meta Data: A set of data that defines and gives information about other data.

 Lightly and highly summarized data : The area of the data warehouse saves all
the predefined lightly and highly summarized (aggregated) data generated by the
warehouse manager. The goals of the summarized information are to speed
up query performance. The summarized record is updated continuously as new
information is loaded into the warehouse.

 End-User access Tools: The principal purpose of a data warehouse is to provide

information to the business managers for strategic decision-making. These
customers interact with the warehouse using end-client access tools.
Data
Warehouse Architecture: With Staging Area
We must clean and process your operational information before put it into the
warehouse. We can do this programmatically, although data warehouses uses a staging
area (A place where data is processed before entering the warehouse).

A staging area simplifies data cleansing and consolidation for operational method
coming from multiple source systems, especially for enterprise data warehouses where
all relevant data of an enterprise is consolidated.

Data Warehouse Staging Area is a temporary location where a record from source
systems is copied.
Data Warehouse Architecture: With Staging Area and Data Marts
We may want to customize our warehouse's architecture for multiple groups within our
organization.

We can do this by adding data marts. A data mart is a segment of a data warehouses
that can provided information for reporting and analysis on a section, unit, department
or operation in the company, e.g., sales, payroll, production, etc.

The figure illustrates an example where purchasing, sales, and stocks are separated. In
this example, a financial analyst wants to analyze historical data for purchases and sales
or mine historical information to make predictions about customer behavior.
Types of Data Warehouse Architectures

Single-Tier Architecture
Single-Tier architecture is not periodically used in practice. Its purpose is to minimize the
amount of data stored to reach this goal; it removes data redundancies. Not effective for
organizations with large data needs and multiple streams.

Two-Tier Architecture
The requirement for separation plays an essential role in defining the two-tier
architecture for a data warehouse system. While it is more effective at storing and
sorting the data, it is not scalable and supports minimum number of end users.

Three-Tier Architecture
The three-tier architecture consists of the source layer (containing multiple source
system), the reconciled layer and the data warehouse layer (containing both data
warehouses and data marts). The reconciled layer sits between the source data and data
warehouse.

The main advantage of the reconciled layer is that it creates a standard reference data
model for a whole enterprise. At the same time, it separates the problems of source data
extraction and integration from those of data warehouse population.

This architecture is especially useful for the extensive, enterprise-wide systems. A

disadvantage of this structure is the extra file storage space used through the extra
redundant reconciled layer. It also makes the analytical tools a little further away from
being real-time.

Data Warehouses usually have a three-level (tier) architecture that includes:

1. Bottom Tier (Data Warehouse Server)

2. Middle Tier (OLAP Server)
3. Top Tier (Front end Tools).

1. A bottom-tier that consists of the Data Warehouse server, which is almost

always an RDBMS. It may include several specialized data marts and a metadata
repository.

Data from operational databases and external sources (such as user profile data
provided by external consultants) are extracted using application program
interfaces called a gateway. A gateway is provided by the underlying DBMS and
allows customer programs to generate SQL code to be executed at a server.

Examples of gateways contain ODBC (Open Database Connection) and OLE-

DB (Open-Linking and Embedding for Databases), by Microsoft, and JDBC (Java
Database Connection).

2. A middle-tier which consists of an OLAP server for fast querying of the data
warehouse.

The OLAP server is implemented using either

 A Relational OLAP (ROLAP) model, i.e., an extended relational DBMS that

maps functions on multidimensional data to standard relational operations.

 A Multidimensional OLAP (MOLAP) model, i.e., a particular purpose server

that directly implements multidimensional information and operations.

3. A top-tier that contains front-end tools for displaying results provided by OLAP,
as well as additional tools for data mining of the OLAP-generated data.
The metadata repository stores information that defines DW objects. It includes the
following parameters and information for the middle and the top-tier applications:

1. A description of the DW structure, including the warehouse schema, dimension,

hierarchies, data mart locations, and contents, etc.
2. Operational metadata, which usually describes the currency level of the stored
data, i.e., active, archived or purged, and warehouse monitoring information, i.e.,
usage statistics, error reports, audit, etc.
3. System performance data, which includes indices, used to improve data access
and retrieval performance.
4. Information about the mapping from operational databases, which provides
source RDBMSs and their contents, cleaning and transformation rules, etc.
5. Summarization algorithms, predefined queries, and reports business data, which
include business terms and definitions, ownership information, etc.

Datawarehouse Components
We will learn about the Datawarehouse Components and Architecture of Data
Warehouse with Diagram as shown below:

Data Warehouse Architecture

The Data Warehouse is based on an RDBMS server which is a central information

repository that is surrounded by some key Data Warehousing components to make
the entire environment functional, manageable and accessible.

There are mainly five Data Warehouse Components:

1. Data Warehouse Database
The central database is the foundation of the data warehousing environment. This
database is implemented on the RDBMS technology. Although, this kind of
implementation is constrained by the fact that traditional RDBMS system is optimized
for transactional database processing and not for data warehousing.

For instance, ad-hoc query, multi-table joins, aggregates are resource intensive and
slow down performance.

Hence, alternative approaches to Database are used as listed below-

 In a datawarehouse, relational databases are deployed in parallel to allow for

scalability. Parallel relational databases also allow shared memory on various
multiprocessor configurations or massively parallel processors.
 Use of multidimensional database (MDDBs) to overcome any limitations which
are placed because of the relational Data Warehouse Models.

2. Sourcing, Acquisition, Clean-up and Transformation Tools

(ETL)
The data sourcing, transformation, and migration tools are used for performing all the
conversions, summarizations, and all the changes needed to transform data into a
unified format in the data warehouse. They are also called Extract, Transform and
Load (ETL) Tools.

Their functionality includes:

 Eliminating unwanted data in operational databases from loading into Data

warehouse.
 Search and replace common names and definitions for data arriving from
different sources.
 In case of missing data, populate them with defaults.
 De-duplicated repeated data arriving from multiple data sources.

These Extract, Transform, and Load tools may generate cron jobs, background jobs,
Cobol programs, shell scripts, etc. that regularly update data in data warehouse.
These tools are also helpful to maintain the Metadata.

These ETL Tools have to deal with challenges of Database & Data heterogeneity.

3. Metadata
Metadata is data about data which defines the data warehouse. It is used for
building, maintaining and managing the data warehouse.
In the Data Warehouse Architecture, meta-data plays an important role as it specifies
the source, usage, values, and features of data warehouse data. It also defines how
data can be changed and processed. It is closely connected to the data warehouse.

For example, a line in sales database may contain:

4030 KJ732 299.90
This is a meaningless data until we consult the Meta that tell us it was

 Model number: 4030

 Sales Agent ID: KJ732
 Total sales amount of $299.90

Therefore, Meta Data are essential ingredients in the transformation of data into
knowledge.

Metadata helps to answer the following questions

 What tables, attributes, and keys does the Data Warehouse contain?
 Where did the data come from?
 How many times do data get reloaded?
 What transformations were applied with cleansing?

Metadata can be classified into following categories:

1. Technical Meta Data: This kind of Metadata contains information about

warehouse which is used by Data warehouse designers and administrators.
2. Business Meta Data: This kind of Metadata contains detail that gives end-users a
way easy to understand information stored in the data warehouse.

4. Query Tools
One of the primary objects of data warehousing is to provide information to
businesses to make strategic decisions. Query tools allow users to interact with the
data warehouse system.

These tools fall into four different categories:

1. Query and reporting tools

2. Application Development tools
3. Data mining tools
4. OLAP tools
1.Query and reporting tools: Query and reporting tools can be further divided into
 Reporting tools
 Managed query tools

Reporting tools: can be further divided into production reporting tools and
desktop report writer.

1. Report writers: This kind of reporting tool are tools designed for end-users for
their analysis.
2. Production reporting: This kind of tools allows organizations to generate regular
operational reports. It also supports high volume batch jobs like printing and
calculating. Some popular reporting tools are Brio, Business Objects, Oracle,
PowerSoft, SAS Institute.

Managed query tools: This kind of access tools helps end users to resolve
problems in database and SQL and database structure by inserting meta-layer
between users and database.

2.Application development tools: Sometimes built-in graphical and analytical tools do

not satisfy the analytical needs of an organization. In such cases, custom reports are
developed using Application development tools.

3. Data mining tools: Data mining is a process of discovering meaningful new

correlation, pattens, and trends by mining large amount data. Data mining tools are
used to make this process automatic.
4. OLAP tools: These tools are based on concepts of a multidimensional database. It
allows users to analyse the data using elaborate and complex multidimensional views.
5. Data Marts
A data mart is an access layer which is used to get data out to the users. It is presented
as an option for large size data warehouse as it takes less time and money to build.
However, there is no standard definition of a data mart is differing from person to
person.

In a simple word Data mart is a subsidiary of a data warehouse. The data mart is used
for partition of data which is created for the specific group of users. Data marts could
be created in the same database as the Datawarehouse or a physically separate
Database.
Benefits of Data Warehousing:

What Is a Data Mart?

A data mart is a subset of a database —usually a data warehouse— where data is stored
for a specific business area. That is, a data mart stores concise and specific data sets used
for analysis for a specific department or line of business, such as the sales department.
The data mart is query-oriented and, as in a data warehouse, the data
has a clear structure —also typically star or snowflake dimensional models—. The intent of
using a data mart is to index data and facilitate queries about specific areas of the business
and to meet the needs of a particular group of users within the organization.
What is a data lake?

A data lake is a central storage repository that holds big data from many sources in a raw,
granular format. It can store structured, semi-structured, or unstructured data, which
means data can be kept in a more flexible format for future use. When storing data, a data
lake associates it with identifiers and metadata tags for faster retrieval.

A data lake works on a principle called schema-on-read. This means that there is no
predefined schema into which data needs to be fitted before storage. Only when the data is
read during processing is it parsed and adapted into a schema as needed. This feature
saves a lot of time that’s usually spent on defining a schema. This also enables data to be
stored as is, in any format.

https://youtu.be/GHpcLEkkmLc

Introduction To Data Warehouse
No ratings yet
Introduction To Data Warehouse
34 pages
Introduction To Data Warehouse Edited
No ratings yet
Introduction To Data Warehouse Edited
34 pages
Data Mining 1
No ratings yet
Data Mining 1
41 pages
Unit One
No ratings yet
Unit One
41 pages
Data Warehousing-Notes (Module - I & II)
No ratings yet
Data Warehousing-Notes (Module - I & II)
32 pages
CH 2 Introduction To Data Warehousing
No ratings yet
CH 2 Introduction To Data Warehousing
31 pages
Data Warehousing and Mining Guide
No ratings yet
Data Warehousing and Mining Guide
46 pages
DWM 1
No ratings yet
DWM 1
15 pages
Data Warehouse Tutorial
No ratings yet
Data Warehouse Tutorial
88 pages
Lec 01 - Intro To Data Warehouse
No ratings yet
Lec 01 - Intro To Data Warehouse
54 pages
Data Warehouse Architecture
No ratings yet
Data Warehouse Architecture
4 pages
Unit 1 Notes - DW
No ratings yet
Unit 1 Notes - DW
29 pages
DWDM Imp Qnotes - Mid1
No ratings yet
DWDM Imp Qnotes - Mid1
31 pages
Data Ware House Architectures
No ratings yet
Data Ware House Architectures
34 pages
Btech Cse & Aids DWDM Material - 2025
100% (1)
Btech Cse & Aids DWDM Material - 2025
45 pages
Data Warehousing
No ratings yet
Data Warehousing
71 pages
What Is A Data Warehouse
No ratings yet
What Is A Data Warehouse
34 pages
Data Warehouse - Final
No ratings yet
Data Warehouse - Final
28 pages
Notes CS-703 (B) Data Mining & Warehousing All Units
No ratings yet
Notes CS-703 (B) Data Mining & Warehousing All Units
46 pages
DWM Exp1
No ratings yet
DWM Exp1
12 pages
3 Marks 1.what Is Data Warehouse?: o o o o o
No ratings yet
3 Marks 1.what Is Data Warehouse?: o o o o o
13 pages
Data Warehousing Essentials
No ratings yet
Data Warehousing Essentials
25 pages
Data Warehouse
No ratings yet
Data Warehouse
39 pages
DMW Unit2
No ratings yet
DMW Unit2
69 pages
DMW p1 Merged
No ratings yet
DMW p1 Merged
316 pages
Data Warehouse
No ratings yet
Data Warehouse
22 pages
R16 4-2 DataMining Notes UNIT-I
No ratings yet
R16 4-2 DataMining Notes UNIT-I
31 pages
Test
No ratings yet
Test
2 pages
Unit 1
No ratings yet
Unit 1
26 pages
Unit - I DW
No ratings yet
Unit - I DW
12 pages
Data Warehousing and Data Mining Original Notes
No ratings yet
Data Warehousing and Data Mining Original Notes
47 pages
Unit 1 Notes - DW
No ratings yet
Unit 1 Notes - DW
25 pages
Datawarehouse Unit-2
No ratings yet
Datawarehouse Unit-2
59 pages
FD Unit 2
No ratings yet
FD Unit 2
20 pages
Unit I
No ratings yet
Unit I
90 pages
Data Warehousing and Data Mining Bhoj Reddy Engineering College For Women
No ratings yet
Data Warehousing and Data Mining Bhoj Reddy Engineering College For Women
11 pages
DWDM
No ratings yet
DWDM
15 pages
Unit I Data Warehouse and OLAP
No ratings yet
Unit I Data Warehouse and OLAP
61 pages
Data Warehouse
No ratings yet
Data Warehouse
5 pages
Data Warehousing Fundamentals
No ratings yet
Data Warehousing Fundamentals
47 pages
Introduction to Data Warehousing
No ratings yet
Introduction to Data Warehousing
39 pages
Module 1 Notes
No ratings yet
Module 1 Notes
29 pages
Data Warehousing
No ratings yet
Data Warehousing
11 pages
DATA Ware House Mining NOTES
No ratings yet
DATA Ware House Mining NOTES
31 pages
DATA Ware House & Mining NOTES
100% (2)
DATA Ware House & Mining NOTES
31 pages
Data Warehousing Research Paper
50% (2)
Data Warehousing Research Paper
7 pages
Data Warehouse: From Wikipedia, The Free Encyclopedia
No ratings yet
Data Warehouse: From Wikipedia, The Free Encyclopedia
5 pages
DWDM Unit 1
No ratings yet
DWDM Unit 1
24 pages
DWDM Unit 1
No ratings yet
DWDM Unit 1
103 pages
Data Warehousing Module 1
No ratings yet
Data Warehousing Module 1
58 pages
What Is A Data Warehouse?
No ratings yet
What Is A Data Warehouse?
39 pages
Introduction To DW
No ratings yet
Introduction To DW
28 pages
Data and AI - Data Warehousing
No ratings yet
Data and AI - Data Warehousing
58 pages
Functional Dependency
No ratings yet
Functional Dependency
95 pages
Database Programming Overview
No ratings yet
Database Programming Overview
80 pages
Lecture 7 - CS50x 2024
No ratings yet
Lecture 7 - CS50x 2024
20 pages
Testblancbi
No ratings yet
Testblancbi
29 pages
DBMS Integrity Constraints Guide
No ratings yet
DBMS Integrity Constraints Guide
5 pages
SQL Lab File (1-10) - Tushar-21scse2030128
No ratings yet
SQL Lab File (1-10) - Tushar-21scse2030128
44 pages
TCS Ipa SQL Session
No ratings yet
TCS Ipa SQL Session
30 pages
Nosql Tricks
No ratings yet
Nosql Tricks
34 pages
Bahr 312 Sim
No ratings yet
Bahr 312 Sim
115 pages
Data Base Models
No ratings yet
Data Base Models
26 pages
Module 6 Access
No ratings yet
Module 6 Access
119 pages
SQL Cheat Sheet Sqlserver
No ratings yet
SQL Cheat Sheet Sqlserver
1 page
Bcom Ecom Syllabus 2015 2018
No ratings yet
Bcom Ecom Syllabus 2015 2018
87 pages
Database Management Exam Paper
No ratings yet
Database Management Exam Paper
6 pages
GIS Data Models Explained
No ratings yet
GIS Data Models Explained
17 pages
Database Systems: Nguyễn Văn Diêu
No ratings yet
Database Systems: Nguyễn Văn Diêu
121 pages
Avaya CMS ODBC/JDBC Guide R20
No ratings yet
Avaya CMS ODBC/JDBC Guide R20
74 pages
SQL Database Design and Creation
No ratings yet
SQL Database Design and Creation
12 pages
Oracle SQL & PL-SQL Optimization For Developers Documentation PDF
No ratings yet
Oracle SQL & PL-SQL Optimization For Developers Documentation PDF
103 pages
Attendance Management System Report
No ratings yet
Attendance Management System Report
29 pages
Database Engineering
No ratings yet
Database Engineering
3 pages
Database Design & Development Quiz
67% (3)
Database Design & Development Quiz
237 pages
DBMS Basics for Class 10 Students
100% (1)
DBMS Basics for Class 10 Students
7 pages
PD Dac: Concepts of Programming & Operating System
No ratings yet
PD Dac: Concepts of Programming & Operating System
5 pages
TE-2016-1-65computer Engineering DJ Sanghvi
No ratings yet
TE-2016-1-65computer Engineering DJ Sanghvi
65 pages
DBS - Lecture 1-3
No ratings yet
DBS - Lecture 1-3
61 pages
CBSE Class 12 Computer Science Syllabus 2023 24
No ratings yet
CBSE Class 12 Computer Science Syllabus 2023 24
4 pages
Database Systems Overview
No ratings yet
Database Systems Overview
75 pages
Syllabus (2020) : NTA UGC-NET Computer Science and Applications
No ratings yet
Syllabus (2020) : NTA UGC-NET Computer Science and Applications
24 pages
Joins
No ratings yet
Joins
16 pages

Data Warehouse 9 Oct

Uploaded by

Data Warehouse 9 Oct

Uploaded by

DATA WAREHOUSE

Data Warehouse is a relational database management system (RDBMS) construct

Data Warehouse environment contains an extraction, transportation, and loading

What is a Data Warehouse?

A Data Warehouse provides integrated, enterprise-wide, historical data and focuses

Data Warehouse is a subject-oriented, integrated, non-volatile and time-variant

Characteristics of Data Warehouse

Database Data Warehouse

1. It is used for Online Transactional 1. It is used for Online Analytical

3. Data is dynamic 3. Data is largely static

4. Entity: Relational modeling procedures 4. Data: Modeling approach are used

5. Optimized for write operations. 5. Optimized for read operations.

6. Performance is low for analysis queries. 6. High performance for analytical

Architecture of a Data Warehouse

o Data Warehouse Architecture: Basic

Data Warehouse Architecture: Basic

 End-User access Tools: The principal purpose of a data warehouse is to provide

This architecture is especially useful for the extensive, enterprise-wide systems. A

Data Warehouses usually have a three-level (tier) architecture that includes:

1. Bottom Tier (Data Warehouse Server)

1. A bottom-tier that consists of the Data Warehouse server, which is almost

Examples of gateways contain ODBC (Open Database Connection) and OLE-

The OLAP server is implemented using either

 A Relational OLAP (ROLAP) model, i.e., an extended relational DBMS that

 A Multidimensional OLAP (MOLAP) model, i.e., a particular purpose server

1. A description of the DW structure, including the warehouse schema, dimension,

Data Warehouse Architecture

The Data Warehouse is based on an RDBMS server which is a central information

There are mainly five Data Warehouse Components:

Hence, alternative approaches to Database are used as listed below-

 In a datawarehouse, relational databases are deployed in parallel to allow for

2. Sourcing, Acquisition, Clean-up and Transformation Tools

Their functionality includes:

 Eliminating unwanted data in operational databases from loading into Data

For example, a line in sales database may contain:

 Model number: 4030

Metadata helps to answer the following questions

Metadata can be classified into following categories:

1. Technical Meta Data: This kind of Metadata contains information about

These tools fall into four different categories:

1. Query and reporting tools

2.Application development tools: Sometimes built-in graphical and analytical tools do

3. Data mining tools: Data mining is a process of discovering meaningful new

What Is a Data Mart?

You might also like