CM5107
Unit 6. Data Warehousing
6.1 Basic concepts
6.2 Differences between Operational Database Systems and Data Warehouses
6.3 Datawarehouse: A Multitire architecture
6.4 Data warehouse models: Enterprise Warehouse, Data Mart, Virtual Warehouse
***************************************************************************
Data Warehousing
Data Warehousing is the process of constructing and using the data warehouse.
Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection
of data that supports management's decision-making process.
A data warehouse is a centralized repository for storing and managing large amounts of
data from various sources for analysis and reporting.
It is optimized for fast querying and analysis, enabling organizations to make informed
decisions by providing a single source of truth for data.
Data warehousing typically involves transforming and integrating data from multiple
sources into a unified, organized, and consistent format.
A Data Warehouse is built by combining data from multiple sources that support
analytical reporting, structured and unstructured queries, and decision making for the
organization. Data Warehousing is a step-by-step approach for constructing and using a
Data Warehouse.
The architecture of the data warehouse primarily consists of the proper arrangement of its
elements, to build an efficient data warehouse with software and hardware components.
The elements and components may vary based on the requirement of organizations. All of
these depend on the organization's circumstances
Data warehouse users can be divided into four categories: Statisticians, knowledge
workers, information consumers, and executives.
Some applications include financial services, banking services, customer goods, retail
sectors, controlled manufacturing.
Key Features of Data Warehousing
Subject-oriented:
o A data warehouse typically provides information on a topic (such as a sales inventory
or supply chain) rather than company operations.
o A data warehouse is also subject-oriented, which means that the data is organized
around specific subjects, such as customers, products, or sales.
o This allows for easy access to the data relevant to a specific subject, as well as the
ability to track the data over time
Integrated:
o A data warehouse combines/collects data from various sources, such as transactional
systems, and then cleaned, transformed, and consolidated into a single, unified view.
o These may include a cloud, relational databases, flat files, structured and semi-
structured data, metadata, and master data.
1|Pag e
CM5107
o The sources are combined in a manner that’s consistent, relatable, and ideally
certifiable, providing a business with confidence in the data’s quality.
o This allows for easy access and analysis of the data, as well as the ability to track data
over time.
Time-Variant:
o Time variant keys (e.g., for the date, month, time) are typically present.
o A data warehouse is also time-variant, which means that the data is stored with a time
dimension.
o This allows for easy access to data for specific time periods, such as last quarter or
last year. This makes it possible to track trends and patterns over time.
Non-Volatile and Persistent
o Prior data isn’t deleted when new data is added. Historical data is preserved for
comparisons, trends, and analytics.
o This means that the data in the warehouse is never updated or deleted, only added to.
o This is important because it allows for the preservation of historical data, making it
possible to track trends and patterns over time.
Need for Data Warehousing
1. Increasing Business Complexity
2. Global Policies
3. Technological Advancements
4. Store historical data
5. Make strategic decisions
6. For data consistency and quality
7. High response time
Advantages of Data Warehousing
1. Since a data warehouse can gather information quickly and efficiently, it can enhance
business productivity.
2. A data warehouse provides us a consistent view of customers and items, hence, it helps us
manage customer relationship.
3. A data warehouse also helps in bringing down the costs by tracking trends, patterns over
along period in a consistent and reliable manner.
4. Understand business trends and make better forecasting decisions.
5. Data Warehouses are designed to perform enormous amounts of data.
6. The structure of data warehouses is more accessible for end-users to navigate, understand,
and query.
2|Pag e
CM5107
7. Queries that would be complex in many normalized databases could be easier to build and
maintain in data warehouses.
8. Data warehousing is an efficient method to manage demand for lots of information from
lots of users.
9. Data warehousing provides the capabilities to analyze a large amount of historical data.
10. Most Cost-effective decision making: DW helps to reduce overall cost of the product by
reducing the number of channels.
Disadvantages of Data Warehousing
1. Underestimation of resources of data loading:
o Sometimes we underestimate the time reqd. To extract, clean & load the data into
DW.
o It may take significant proportion of the total development time, although some tools
are there which are used to reduce the time & effort spent on this process.
2. Increased end user demands: After satisfying the demands, the user request’s increases.
This is because of Increasing awareness of the users on capability & value of DW.
3. Data Homogenization: The concept of DW deals with similarity of data formats b/w
different data sources. Thus, results in to lose of some imp. Value of the data.
4. High Maintenance: DW is usually not static and have high cost. : DW are high
maintenance systems. Any reorganization of the business processes & the source systems
may affect the DW & result in high maintenance cost
Data Warehousing Tools
Data Warehouse
1. SQL Server 2000 DTS
2. Oracle 8i Warehouse Builder
OLAP tools
1. SQL Server Analysis Services
2. Oracle Express Server
Reporting tools
1. MS Excel Pivot Chart
2. VB Applications
Data Warehouse Architecture
Designing a data warehouse is known as data warehouse architecture and depending on
the needs of the data warehouse, can come in a variety of tiers. Typically there are tier
one, tier two, and tier three architecture designs.
Single-tier Architecture: Single-tier architecture is hardly used in the creation of data
warehouses for real-time systems. They are often used for batch and realtime processing
to process operational data. A single-tier design is composed of a single layer of hardware
with the goal of keeping data space at a minimum.
Two-tier Architecture: In a two-tier architecture design, the analytical process is separated
from the business process. The point of this is to increase levels of control and efficiency.
Three-tier Architecture: A three-tier architecture design has a top, middle, and bottom tier;
these are known as the source layer, the reconciled layer, and the data warehouse layer.
3|Pag e
CM5107
This design is suited for systems with long life cycles. When changes are made in the
data, an extra layer of review and analysis of the data is completed to ensure there have
been no errors.
Regardless of the tier, all data warehouse architectures must meet the same five
properties: separation, scalability, extensibility, security, and administrability.
Data warehouse: A Multitier Architecture
Three-TierData warehouse architecture –
Generally a data warehouses adopts a three-tier architecture. Following are the three tiers
of the data warehouse architecture.
Bottom Tier − The bottom tier of the architecture is the data warehouse database server.
It is the relational database system. We use the back end tools and utilities to feed data
into the bottom tier.
These back end tools and utilities perform the Extract, Clean, Load, and refresh functions.
Examples of gateways contain ODBC (Open Database Connection) & OLE-DB (Open-
Linking and Embedding for Databases), by Microsoft, & JDBC (Java Database
Connection).
Middle Tier − In the middle tier, we have the OLAP Server that can be implemented in
either of the following ways.
By Relational OLAP (ROLAP), which is an extended relational database management
system. The ROLAP maps the operations on multidimensional data to standard relational
operations.
By Multidimensional OLAP (MOLAP) model, which directly implements the
multidimensional data and operations.
Top-Tier − This tier is the front-end client layer. This layer holds the query tools and
reporting tools, analysis tools and data mining tools (e.g., trend analysis, prediction, and
so on).
The following diagram depicts the three-tier architecture of data warehouse
4|Pag e
CM5107
Data warehouse models
From the perspective of data warehouse architecture, we have
the following data warehouse models −
Virtual Warehouse
Data mart
Enterprise Warehouse
Virtual Warehouse-
The view over an operational data warehouse for efficient query is known as a virtual
warehouse.
It is easy to build a virtual warehouse.
Building a virtual warehouse requires excess capacity on operational database servers.
Only some of the possible summary views may be materialized.
Data Mart
Data mart contains a subset of organization-wide data. This subset of data is valuable to
specific groups of an organization.
In other words, we can claim that data marts contain data specific to a particular group. It
uses bottom up approach.
Data Mart a subset of corporate-wide data that is of value to a specific groups of users.
Its scope is confined to specific, selected groups, such as marketing data mart
Independent vs. dependent (directly from warehouse) data mart
For example, the marketing data mart may contain data related to items, customers, and
sales. Datamarts are confined to subjects.
Points to remember about data marts −
Window-based or Unix/Linux-based servers are used to implement data marts. They
are implemented on low-cost servers.
The implementation data mart cycles is measured in short periods of time, i.e., in
weeks rather than months or years.
The life cycle of a data mart may be complex in long run, if its planning and design
are not organization-wide.
Data marts are small in size and flexible.
Data marts are customized by department.
The source of a data mart is departmentally structured data warehouse.
Enterprise Warehouse
An enterprise warehouse collects all the information and the subjects spanning an entire
organization
It provides us enterprise-wide data integration.
The data is integrated from operational systems and external information providers.
This information can vary from a few gigabytes to hundreds of gigabytes, terabytes or
beyond.
It uses the top down approach.
Implemented in mainframes, super servers or parallel architecture platforms.
5|Pag e
CM5107
Operational Database Systems Vs. Data Warehouses
***************************************************************************
6|Pag e