0% found this document useful (0 votes)
21 views16 pages

Data Warehousing

A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data designed to support management's decision-making. It differs from operational databases by focusing on data analysis rather than transaction processing, and includes components such as ETL processes, data marts, and enterprise data warehouses. Key operations in data warehousing include data extraction, transformation, loading, and various online analytical processing (OLAP) functionalities.

Uploaded by

Jane Hale
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views16 pages

Data Warehousing

A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data designed to support management's decision-making. It differs from operational databases by focusing on data analysis rather than transaction processing, and includes components such as ETL processes, data marts, and enterprise data warehouses. Key operations in data warehousing include data extraction, transformation, loading, and various online analytical processing (OLAP) functionalities.

Uploaded by

Jane Hale
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

DATA WAREHOUSING

DATA WAREHOUSING
• A data warehouse is a subject-oriented,
integrated, time-variant, nonvolatile collection
of data in support of management's
decision-making process.

• Characteristics of Data Warehousing


– Subject oriented
– Integrated
– Time variant (time series)
– Nonvolatile
Data
Warehouse—Subject-Oriented
• Organized around major subjects, such as customer, product
and sales
• Focusing on the modeling and analysis of data for decision
makers, not on daily operations or transaction processing
• Provide a simple and concise view around particular subject
issues by excluding data that are not useful in the decision
support process
Data Warehouse—Integrated
• Constructed by integrating multiple, heterogeneous data
sources
– relational databases, flat files, on-line transaction records
• Data cleaning and data integration techniques are applied.
– Ensure consistency in naming conventions, encoding
structures, attribute measures, etc. among different data
sources
• E.g., Hotel price: currency, tax, breakfast covered, etc.
– When data is moved to the warehouse, it is converted.
Data Warehouse—Time Variant
• The time horizon for the data warehouse is significantly longer
than that of operational systems
– Operational database: current value data
– Data warehouse data: provide information from a historical
perspective (e.g., past 5-10 years)
• Every key structure in the data warehouse
– Contains an element of time, explicitly or implicitly
– But the key of operational data may or may not contain
“time element”
Data Warehouse—Nonvolatile
• A physically separate store of data transformed from the
operational environment
• Operational update of data does not occur in the data
warehouse environment
– Does not require transaction processing, recovery, and
concurrency control mechanisms
– Requires only two operations in data accessing:
• initial loading of data and access of data
Database Systems Vs
Datawarehouse
• The major task of online operational database systems is to perform online
transaction and query processing.
• These systems are called online transaction processing (OLTP) systems.
• They cover most of the day-to-day operations of an organization such as
purchasing, inventory, manufacturing, banking, payroll, registration, and
accounting.
• Data warehouse systems, on the other hand, serve users or knowledge
workers in the role of data analysis and decision making.
• Such systems can organize and present data in various formats in order to
accommodate the diverse needs of different users. These systems are
known as online analytical processing (OLAP) systems.
Data Warehousing : Multitired
Architecture

A three-tier
data
warehousing
architecture
Extraction, Transformation and Loading (ETL)
• Data extraction
– gathers data from multiple, heterogeneous, and external
sources
• Data cleaning
– detect errors in the data and rectify them when possible
• Data transformation
– convert data from legacy or host format to warehouse format
• Load
– sort, summarize, consolidate, compute views, check integrity,
and build indicies and partitions
• Refresh
– propagates the updates from the data sources to the
warehouse
9
Extraction,Transformation and Loading

• Besides cleaning, loading, refreshing, and metadata definition tools, data


warehouse systems usually provide a good set of data warehouse
management tools.
• Data cleaning and data transformation are important steps in improving
the data quality and, subsequently, the data mining results.
• Data Marts :
– A data mart is usually smaller and focuses on a
particular subject or department.
– A data mart is a subset of a data warehouse,
typically consisting of a single subject area (e.g.,
marketing,

• Operational Data Stores:

An operational data store (ODS) provides a fairly recent


form of customer information file (CIF).
• Enterprise Data Warehouses (EDW) :
– An enterprise data warehouse (EDW) is a
large-scale data warehouse that is used across
the enterprise for decision support.

• Metadata
– Metadata are data about data Metadata
describe the structure of and some meaning
about data, thereby contributing to their
effective or ineffective use
Data Warehouse Framework and
Views (Components)
Major components of the data
warehousing process
• Data sources
• Data extraction and transformation
• Data loading
• Comprehensive database
• Metadata
Typical OLAP Operations
• Slice and Dice
• Roll-up and Drill down
• Pivot

You might also like