0% found this document useful (0 votes)
24 views6 pages

The Design of A Data Warehouse.: Lesson Introduction

This lesson covers the design and architecture of data warehouses, emphasizing the importance of understanding business needs and the various design approaches. It outlines the three-tier architecture, the significance of a metadata repository, and different OLAP server models. Students will learn about the data warehouse design process, including steps for modeling business processes and selecting appropriate data structures.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views6 pages

The Design of A Data Warehouse.: Lesson Introduction

This lesson covers the design and architecture of data warehouses, emphasizing the importance of understanding business needs and the various design approaches. It outlines the three-tier architecture, the significance of a metadata repository, and different OLAP server models. Students will learn about the data warehouse design process, including steps for modeling business processes and selecting appropriate data structures.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

7. The Design of a Data Warehouse.

Lesson Introduction

This lesson will start regarding data warehouse architecture and give an insight into how to design and
construct a data warehouse. To create an active data warehouse, a clear understanding and analysis of
business needs are essential. We will discuss the different views in data warehouse design and what are
the different approaches for data warehouse design. Then the description of the main components in the
three-tier data warehouse architecture and the importance of the metadata repository. The last section
of this lesson presents various types of warehouse servers for OLAP processing.

Learning Outcomes:
After successful completion of this lesson, the students will be able to understand data warehouse
architecture, data warehouse design process and the different types of warehouse server models.

Lesson Outline:
 The design of a data warehouse.
 Data warehouse design process.
 Importance of metadata repository
 OLAP server models.

7.1 The Design of a Data Warehouse:

We can achieve the following advantages in the data warehouse approach.

 A data warehouse may provide a competitive advantage by presenting relevant information from which to measure
performance and make critical adjustments to help win over competitors.
 A data warehouse can enhance business productivity because it can quickly and efficiently gather information that
accurately describes the organization.
 A data warehouse facilitates customer relationship management because it provides a consistent view of customers
and items across all lines of business, all departments, and all markets.
 A data warehouse may bring about cost reduction by tracking trends, patterns, and exceptions over long periods
consistently and reliably.

The construction of a large and complex information system can be viewed as the construction of a large and complex building,
for which the owner, architect, and builder have different views. Four different views regarding the design of a data warehouse
must be considered: the top-down view, the data source view, the data warehouse view, and the business query view.

 The top-down view allows the selection of the relevant information necessary for the data warehouse. This
information matches the current and future business needs.
 The data source view exposes the information being captured, stored, and managed by operational systems. This
information may be documented at various levels of detail and accuracy, from individual data source tables to
integrated data source tables. Data sources are often modeled by traditional data modeling techniques, such as the
entity-relationship model or CASE (computer-aided software engineering) tools.
 The data warehouse view includes fact tables and dimension tables. It represents the information that is stored inside
the data warehouse, including precalculated totals and counts, as well as information regarding the source, date, and
time of origin, added to provide historical context.
 The business query view is the perspective of data in the data warehouse from the viewpoint of the end user.

7.2 The Data Warehouse Design Process

A data warehouse can be built using a top-down approach, a bottom-up approach, or a combination of
both. The top-down approach starts with the overall design and planning. It is useful in cases where the
technology is mature and well known, and where the business problems that must be solved are clear
and well understood. The bottom-up approach starts with experiments and prototypes. This is useful in
the early stage of business modeling and technology development. It allows an organization to move
forward at considerably less expense and to evaluate the benefits of the technology before making
significant commitments. In the combined approach, an organization can exploit the planned and strategic
nature of the top-down approach while retaining the rapid implementation and opportunistic application
of the bottom-up approach.

From the software engineering point of view, the design and construction of a data warehouse may
consist of the following steps: planning, requirements study, problem analysis, warehouse design, data
integration, and testing, and finally deployment of the data warehouse. Large software systems can be
developed using two methodologies: the waterfall method or the spiral method. The waterfall method
performs a structured and systematic analysis at each step before proceeding to the next, which is like a
waterfall, falling from one stage to the next. The spiral method involves the rapid generation of
increasingly functional systems, with short intervals between successive releases.

In general, the warehouse design process consists of the following steps:

1. Choose a business process to model, for example, orders, invoices, shipments, inventory, account administration,
sales, or the general ledger. If the business process is organizational and involves multiple complex object collections,
a data warehouse model should be followed. However, if the process is departmental and focuses on the analysis of
one kind of business process, a data mart model should be chosen.
2. Choose the grain of the business process. The grain is the fundamental, atomic level of data to be represented in the
fact table for this process, for example, individual transactions, individual daily snapshots, and so on.
3. Choose the dimensions that will apply to each fact table record. Typical dimensions are time, item, customer, supplier,
warehouse, transaction type, and status.
4. Choose the measures that will populate each fact table record. Typical measures are numeric additive quantities like
dollars sold and units sold.
7.3 The Three-Tier DataWarehouse Architecture

Data warehouses often adopt a three-tier architecture, as presented in Figure 7-1.


The bottom tier is a warehouse database server that is almost always a relational database system. Back-
end tools and utilities are used to feed data into the bottom tier from operational databases or other
external sources. These tools and utilities perform data extraction, cleaning, and transformation, as well
as load and refresh functions to update the data warehouse. This tier also contains a metadata repository,
which stores information about the data warehouse and its contents.

Figure 7-1: The Three-Tier Data Warehouse Architecture


The middle tier is an OLAP server that is typically implemented using either a relational OLAP (ROLAP)
model, that is, an extended relational DBMS that maps operations on multidimensional data to standard
relational operations; or a multidimensional OLAP (MOLAP) model, that is, a special-purpose server
that directly implements multidimensional data and services.

The top tier is a front-end client layer, which contains query and reporting tools, analysis tools, and data
mining tools. From the architecture point of view, there are three data warehouse models: the enterprise
warehouse, the data mart, and the virtual warehouse.

7.4 Importance of the Metadata Repository

Metadata are data about data. When used in a data warehouse, metadata are the data that define
warehouse objects. Figure 7-1 showed a metadata repository within the bottom tier of the data
warehousing architecture. Metadata are created for the data names and definitions of the given
warehouse. Additional metadata are generated and captured for timestamping any extracted data, the
source of the retrieved information, and missing fields that have been added by data cleaning or
integration processes.

A metadata repository should contain the following:

 A description of the structure of the data warehouse, which includes the warehouse schema, view,
dimensions, hierarchies, and derived data definitions, as well as data mart locations and contents
 Operational metadata, which provides for data lineage (history of migrated data and the
sequence of transformations applied to it), the currency of data (active, archived, or purged), and
monitoring information (warehouse usage statistics, error reports, and audit trails)
 The algorithms used for summarization, which include measure and dimension definition
algorithms, data on granularity, partitions, subject areas, aggregation, summary, and predefined
queries and reports
 The mapping from the operational environment to the data warehouse, which includes source
databases and their contents, gateway descriptions, data partitions, data extraction, cleaning,
transformation rules and defaults, data refresh and purging rules, and security (user authorization
and access control)
 Data related to system performance, which include indices and profiles that improve data access
and retrieval performance, in addition to rules for the timing and scheduling of refresh, update,
and replication cycles
 Business metadata, which include business terms and definitions, data ownership information,
and charging policies
7.5 OLAP Server Models

Mainly there are three different types of OLAP server models they are Relational OLAP (ROLAP) servers,
Multidimensional OLAP (MOLAP) servers, and Hybrid OLAP (HOLAP) servers.

1. Relational OLAP (ROLAP) servers: These are the intermediate servers that stand in between a
relational back-end server and client front-end tools. They use a relational or extended-relational
DBMS to store and manage warehouse data, and OLAP middleware to support missing pieces.
ROLAP servers include optimization for each DBMS back end, implementation of aggregation
navigation logic, and additional tools and services. ROLAP technology tends to have greater
scalability than MOLAP technology. The DSS server of Micro strategy, for example, adopts the
ROLAP approach.

2. Multidimensional OLAP (MOLAP) servers: These servers support multidimensional views of data
through array-based multidimensional storage engines. They map multidimensional views
directly to data cube array structures. The advantage of using a data cube is that it allows fast
indexing to precomputed summarized data. Many MOLAP servers adopt a two-level storage
representation to handle dense and sparse data sets: denser subcubes are identified and stored
as array structures, whereas sparse subcubes employ compression technology for efficient
storage utilization. A comparison of the two models is given in figure 7-2.

Figure 7-2: MOLAP vs ROLAP


3. Hybrid OLAP (HOLAP) servers: The hybrid OLAP approach combines ROLAP and MOLAP
technology, benefiting from the greater scalability of ROLAP and the faster computation of
MOLAP. For example, a HOLAP server may allow large volumes of detail data to be stored in a
relational database, while aggregations are kept in a separate MOLAP store. The Microsoft SQL
Server 2000 supports a hybrid OLAP server.

You might also like