0% found this document useful (0 votes)

9 views12 pages

DataWare Housing Concepts

Document talks about basic DataWare Housing Concepts.

Uploaded by

kapilgoel.noida

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views12 pages

DataWare Housing Concepts

Document talks about basic DataWare Housing Concepts.

Uploaded by

kapilgoel.noida

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

OLAP

OLAP stands for On-Line Analytical Processing. The first attempt to provide a definition to OLAP was by
Dr. Codd, who proposed 12 rules for OLAP. Later, it was discovered that this particular white paper was
sponsored by one of the OLAP tool vendors, thus causing it to lose objectivity. The OLAP Report has
proposed the FASMI test, Fast Analysis of Shared Multidimensional Information. For a more detailed
description of both Dr. Codd's rules and the FASMI test, please visit The OLAP Report.

For people on the business side, the key feature out of the above list is "Multidimensional." In other
words, the ability to analyze metrics in different dimensions such as time, geography, gender, product,
etc. For example, sales for the company are up. What region is most responsible for this increase? Which
store in this region is most responsible for the increase? What particular product category or categories
contributed the most to the increase? Answering these types of questions in order means that you are
performing an OLAP analysis.

Depending on the underlying technology used, OLAP can be broadly divided into two different camps:
MOLAP and ROLAP. A discussion of the different OLAP types can be found in the MOLAP, ROLAP, and
HOLAP section.

MOLAP, ROLAP, And HOLAP

In the OLAP world, there are mainly two different types: Multidimensional OLAP (MOLAP) and Relational
OLAP (ROLAP). Hybrid OLAP (HOLAP) refers to technologies that combine MOLAP and ROLAP.

MOLAP

This is the more traditional way of OLAP analysis. In MOLAP, data is stored in a multidimensional cube.
The storage is not in the relational database, but in proprietary formats.

Advantages:

 Excellent performance: MOLAP cubes are built for fast data retrieval, and are optimal for slicing
and dicing operations.
 Can perform complex calculations: All calculations have been pre-generated when the cube is
created. Hence, complex calculations are not only doable, but they return quickly.

Disadvantages:

 Limited in the amount of data it can handle: Because all calculations are performed when the
cube is built, it is not possible to include a large amount of data in the cube itself. This is not to
say that the data in the cube cannot be derived from a large amount of data. Indeed, this is
possible. But in this case, only summary-level information will be included in the cube itself.
 Requires additional investment: Cube technology are often proprietary and do not already exist in
the organization. Therefore, to adopt MOLAP technology, chances are additional investments in
human and capital resources are needed.

ROLAP

This methodology relies on manipulating the data stored in the relational database to give the appearance
of traditional OLAP's slicing and dicing functionality. In essence, each action of slicing and dicing is
equivalent to adding a "WHERE" clause in the SQL statement.
Advantages:

 Can handle large amounts of data: The data size limitation of ROLAP technology is the limitation
on data size of the underlying relational database. In other words, ROLAP itself places no
limitation on data amount.
 Can leverage functionalities inherent in the relational database: Often, relational database already
comes with a host of functionalities. ROLAP technologies, since they sit on top of the relational
database, can therefore leverage these functionalities.

Disadvantages:

 Performance can be slow: Because each ROLAP report is essentially a SQL query (or multiple
SQL queries) in the relational database, the query time can be long if the underlying data size is
large.
 Limited by SQL functionalities: Because ROLAP technology mainly relies on generating SQL
statements to query the relational database, and SQL statements do not fit all needs (for
example, it is difficult to perform complex calculations using SQL), ROLAP technologies are
therefore traditionally limited by what SQL can do. ROLAP vendors have mitigated this risk by
building into the tool out-of-the-box complex functions as well as the ability to allow users to
define their own functions.

HOLAP

HOLAP technologies attempt to combine the advantages of MOLAP and ROLAP. For summary-type
information, HOLAP leverages cube technology for faster performance. When detail information is
needed, HOLAP can "drill through" from the cube into the underlying relational data.

Factless Fact Table

A factless fact table is a fact table that does not have any measures. It is
essentially an intersection of dimensions. On the surface, a factless fact table
does not make sense, since a fact table is, after all, about facts. However,
there are situations where having this kind of relationship makes sense in data
warehousing.

For example, think about a record of student attendance in classes. In this

case, the fact table would consist of 3 dimensions: the student dimension, the
time dimension, and the class dimension. This factless fact table would look
like the following:
The only measure that you can possibly attach to each combination is "1" to
show the presence of that particular combination. However, adding a fact that
always shows 1 is redundant because we can simply use the COUNT function
in SQL to answer the same questions.

Factless fact tables offer the most flexibility in data warehouse design. For
example, one can easily answer the following questions with this factless fact
table:

 How many students attended a particular class on a particular day?

 How many classes on average does a student attend on a given day?

Without using a factless fact table, we will need two separate fact tables to
answer the above two questions. With the above factless fact table, it
becomes the only fact table that's needed.

Junk Dimension

In data warehouse design, frequently we run into a situation where there are
yes/no indicator fields in the source system. Through business analysis, we
know it is necessary to keep such information in the fact table. However, if
keep all those indicator fields in the fact table, not only do we need to build
many small dimension tables, but the amount of information stored in the fact
table also increases tremendously, leading to possible performance and
management issues.

Junk dimension is the way to solve this problem. In a junk dimension, we

combine these indicator fields into a single dimension. This way, we'll only
need to build a single dimension table, and the number of fields in the fact
table, as well as the size of the fact table, can be decreased. The content in
the junk dimension table is the combination of all possible values of the
individual indicator fields.

Let's look at an example. Assuming that we have the following fact table:
In this example, TXN_CODE, COUPON_IND, and PREPAY_IND are all
indicator fields. In this existing format, each one of them is a dimension. Using
the junk dimension principle, we can combine them into a single junk
dimension, resulting in the following fact table:

Note that now the number of dimensions in the fact table went from 7 to 5.

The content of the junk dimension table would look like the following:
In this case, we have 3 possible values for the TXN_CODE field, 2 possible
values for the COUPON_IND field, and 2 possible values for the
PREPAY_IND field. This results in a total of 3 x 2 x 2 = 12 rows for the junk
dimension table.

By using a junk dimension to replace the 3 indicator fields, we have

decreased the number of dimensions by 2 and also decreased the number of
fields in the fact table by 2. This will result in a data warehousing environment
that offer better performance as well as being easier to manage.

Conformed Dimension

A conformed dimension is a dimension that has exactly the same meaning

and content when being referred from different fact tables. A conformed
dimension can refer to multiple tables in multiple data marts within the same
organization. For two dimension tables to be considered as conformed, they
must either be identical or one must be a subset of another. There cannot be
any other type of difference between the two tables. For example, two
dimension tables that are exactly the same except for the primary key are not
considered conformed dimensions.

Why is conformed dimension important? This goes back to the definition of

data warehouse being "integrated." Integrated means that even if a particular
entity had different meanings and different attributes in the source systems,
there must be a single version of this entity once the data flows into the data
warehouse.

The time dimension is a common conformed dimension in an organization.

Usually the only rule to consider with the time dimension is whether there is a
fiscal year in addition to the calendar year and the definition of a week.
Fortunately, both are relatively easy to resolve. In the case of fiscal vs.
calendar year, one may go with either fiscal or calendar, or an alternative is to
have two separate conformed dimensions, one for fiscal year and one for
calendar year. The definition of a week is also something that can be different
in large organizations: Finance may use Saturday to Friday, while marketing
may use Sunday to Saturday. In this case, we should decide on a definition
and move on. The nice thing about the time dimension is once these rules are
set, the values in the dimension table will never change. For example,
October 16th will never become the 15th day in October.

Not all conformed dimensions are as easy to produce as the time dimension.
An example is the customer dimension. In any organization with some history,
there is a high likelihood that different customer databases exist in different
parts of the organization. To achieve a conformed customer dimension means
those data must be compared against each other, rules must be set, and data
must be cleansed. In addition, when we are doing incremental data loads into
the data warehouse, we'll need to apply the same rules to the new values to
make sure we are only adding truly new customers to the customer
dimension.

Building a conformed dimension also part of the process in master data

management, or MDM. In MDM, one must not only make sure the master data
dimensions are conformed, but that conformity needs to be brought back to
the source systems.

What are the Dimensions in Data warehouse?

A dimension table consists of the attributes about the facts. Dimensions store the textual
descriptions of the business attribute. Without the dimensions, we cannot measure the facts and
facts are just disordered Numbers.In Business, Customer, Products, Buyers information can be
different dimensions.

Let’s walk through commonly used types of dimensions.

Types of commonly used Dimensions in data warehouse Design
 Conformed Dimensions
 Junk Dimensions
 Role-playing Dimensions
 Slowly Changing Dimensions
 Degenerated Dimensions

Conformed Dimensions

A Dimension that is used in multiple locations is called conformed dimensions. A conformed

dimension may be used with multiple fact tables in single database, or across multiple data marts or
Data warehouses.

I.e. Above shown Customer and Product Dimensions are Conformed Dimensions as they are
connected to Shipment Fact table, Sales Order Fact table, and Service Request Fact table.

Junk Dimensions

A junk dimension is a collection of random transaction codes flags and/or text attributes that are
unrelated to any particular dimension. The junk dimension is simply a structure that provides a
convenient place to store the junk attributes.
I.e.: Assume that we have a gender dimension and marital status dimension. In the fact table we
need to maintain two keys referring to these dimensions. Instead of that create a junk dimension
which has all the combinations of gender and marital status (cross join gender and marital status
table and create a junk table). Now we can maintain only one key in the fact table.

Role-playing Dimensions

Role Playing Dimensions are the Dimensions which often used for multiple purposes within same
database.Here same dimension key is associated with more than one foreign key in the fact table in
the database for the different purposes.

I.e.: In Date dimensions, [FullDateAlternateKey] is associated with [Orderdate key], [Duedate

key], and [Shipdate] key in the fact table to solve different purpose in Data warehouse.

Slowly Changing Dimensions

This is widely used Dimensions type. It is the dimensions where attribute values changes with time.
There are various types of Slowly Changing Dimensions (SCD) based on how business manages
this dimensions.

Types of SCD

TYPE 0: It is the dimensions where we do not change attribute values at all. They are rarely
used. I.e. Employee birth date

TYPE 1: In this type, Old value of attribute is overwritten by new values of attribute and no history
kept

I.e Customer City where company decided to show only current one.

In this case previous city name London is replaced by new city name Edinburgh.

TYPE 2: In this type we tracks historical data by creating multiple records for a given Natural key
(business key) in the dimensional tables with separate surrogate key and/or different version
numbers. Unlimited history is preserved for each insert.
I.e. Customer City where company decided to have historical data then we will have to add an extra
row with column to identify the Current/Historical attributes value by start and end date columns.

TYPE 3: In this type, we tracks changes using separate columns and preserves limited history.it is
limited to how many columns we want to add in dimension table.

I.e. Customer City where New columns “previous City” and “Current City” being added.

TYPE 4: In this type, we keep all or some historical data in separate table and current data stays in
main Dimension table. Both historical and current dimension table joined to fact table with same
surrogate key, this will enhance the query performance. This type used very rarely.
I.e. we create new table to store previous Customer City and Current Customer City in Historical
table with Created date And Current Customer city in Current dimension table.

Degenerated Dimensions

A degenerate dimension is a dimension which is derived from the fact table and doesn’t have its own
dimension table.

In Data warehouse this Dimension often used to show drill through capability where in the report you
can see how aggregated no came up.
I.e. Invoice no can be stored in the fact table and then used as separate dimensions for the drill
through purpose to find out what invoices are part of total buying cost in report.

So Dimensions are one of the pillar of the data warehouse.Choosing right one can define future of
the data warehouse.It always good to use right type

Oh 3
No ratings yet
Oh 3
30 pages
Datawarehouse Concepts
No ratings yet
Datawarehouse Concepts
5 pages
DWHDoc
No ratings yet
DWHDoc
37 pages
Data Warehousing FAQ
No ratings yet
Data Warehousing FAQ
5 pages
Data Warehousing for Analysts
No ratings yet
Data Warehousing for Analysts
11 pages
DWH Unit 2
No ratings yet
DWH Unit 2
13 pages
Data Cubemod2
100% (1)
Data Cubemod2
21 pages
Data Warehouse Dimensions & Facts
No ratings yet
Data Warehouse Dimensions & Facts
8 pages
Chapter Eight
No ratings yet
Chapter Eight
33 pages
What Is The Difference Between OLTP and OLAP?
No ratings yet
What Is The Difference Between OLTP and OLAP?
33 pages
OLAP and Metadata
No ratings yet
OLAP and Metadata
6 pages
Different Types of Dimensions and Facts in Data
No ratings yet
Different Types of Dimensions and Facts in Data
5 pages
Data Database Data Mining Server Client Database Management Systems (DBMS)
No ratings yet
Data Database Data Mining Server Client Database Management Systems (DBMS)
4 pages
Unit - 2: Online Analytical Processing (OLAP)
No ratings yet
Unit - 2: Online Analytical Processing (OLAP)
32 pages
Ais Prof 1 Chapter 5
No ratings yet
Ais Prof 1 Chapter 5
39 pages
Unit 4
No ratings yet
Unit 4
41 pages
What Is A Data Warehouse
No ratings yet
What Is A Data Warehouse
5 pages
Dimensional Modeling Guide
No ratings yet
Dimensional Modeling Guide
22 pages
Dimensions DW
No ratings yet
Dimensions DW
6 pages
5.data Warehouse
No ratings yet
5.data Warehouse
19 pages
Dimensional Modeling
No ratings yet
Dimensional Modeling
59 pages
DWH Int Questions
100% (1)
DWH Int Questions
9 pages
Unit V
No ratings yet
Unit V
51 pages
OLAP
No ratings yet
OLAP
6 pages
Data Warehousing Basic
No ratings yet
Data Warehousing Basic
5 pages
Tuesday Introduction To OLAP and Dimensional Modelling
No ratings yet
Tuesday Introduction To OLAP and Dimensional Modelling
29 pages
Data Warehouse Schemas & OLAP
No ratings yet
Data Warehouse Schemas & OLAP
12 pages
Types of Dimensions - Javatpoint
No ratings yet
Types of Dimensions - Javatpoint
1 page
Data Warehouse Schema
No ratings yet
Data Warehouse Schema
10 pages
Product Table Feature Table: Sources
No ratings yet
Product Table Feature Table: Sources
23 pages
DWT Chapter 2 Part 2
No ratings yet
DWT Chapter 2 Part 2
14 pages
Junk Dimension
No ratings yet
Junk Dimension
2 pages
Online Analytical Processing (OLAP)
No ratings yet
Online Analytical Processing (OLAP)
43 pages
Dimensional Modeling Guide
No ratings yet
Dimensional Modeling Guide
59 pages
What Is Data Warehouse?: Explanatory Note
No ratings yet
What Is Data Warehouse?: Explanatory Note
11 pages
Data Warehouse Ques
No ratings yet
Data Warehouse Ques
10 pages
Olap Case Study - VJ
No ratings yet
Olap Case Study - VJ
16 pages
Dimensional Modeling Guide
No ratings yet
Dimensional Modeling Guide
14 pages
Unit 2
No ratings yet
Unit 2
33 pages
Chapter 2 Kimball Dimensional Modelling Techniques Overview
No ratings yet
Chapter 2 Kimball Dimensional Modelling Techniques Overview
14 pages
ETL Testing
No ratings yet
ETL Testing
3 pages
Unit 2
No ratings yet
Unit 2
36 pages
Abinitio Vijay - 8553385664
No ratings yet
Abinitio Vijay - 8553385664
28 pages
Star and Snowflake Schemas
No ratings yet
Star and Snowflake Schemas
4 pages
A Material
No ratings yet
A Material
191 pages
DWT Chapter 2 Part 1
No ratings yet
DWT Chapter 2 Part 1
18 pages
DWM 2
No ratings yet
DWM 2
21 pages
Facts & Dims
No ratings yet
Facts & Dims
14 pages
Data Stage
No ratings yet
Data Stage
10 pages
Some Term Relate To SQL
No ratings yet
Some Term Relate To SQL
9 pages
DWM Chp2 Notes
No ratings yet
DWM Chp2 Notes
21 pages
6 1 DWM 2019 S
No ratings yet
6 1 DWM 2019 S
7 pages
Unit 2 Remaining Notes DM
No ratings yet
Unit 2 Remaining Notes DM
17 pages
Business Objects: Frequently Asked Questions
No ratings yet
Business Objects: Frequently Asked Questions
63 pages
Normalization Detail
No ratings yet
Normalization Detail
9 pages
Types of Keys in DBMS
No ratings yet
Types of Keys in DBMS
5 pages
Introduction To Data Lakes
No ratings yet
Introduction To Data Lakes
6 pages
What Is SCD in Datastage ? Types of SCD in Datastage
No ratings yet
What Is SCD in Datastage ? Types of SCD in Datastage
7 pages
Chapter 4-3
No ratings yet
Chapter 4-3
14 pages
Mca Part II & Part III (Sem III To Vi)
No ratings yet
Mca Part II & Part III (Sem III To Vi)
42 pages
Stack and Heap
100% (1)
Stack and Heap
3 pages
It3401 Unit 5 Notes
No ratings yet
It3401 Unit 5 Notes
73 pages
Sports Club Data Dictionary
No ratings yet
Sports Club Data Dictionary
26 pages
Rupa Bject Torage: Prepared by
No ratings yet
Rupa Bject Torage: Prepared by
25 pages
SAP BW DataSource
No ratings yet
SAP BW DataSource
21 pages
DTSxChange Vs MSWizard
No ratings yet
DTSxChange Vs MSWizard
27 pages
Communication L2 Error Checking and Correction
No ratings yet
Communication L2 Error Checking and Correction
18 pages
Secondary Memory Overview
No ratings yet
Secondary Memory Overview
15 pages
SIP - VOIP - IMS Interview Questions PDF
100% (1)
SIP - VOIP - IMS Interview Questions PDF
5 pages
Universal Asynchronous Receiver/Transmitter (Uart) : EXAR Corporation, 48720 Kato Road, Fremont, CA 94538
No ratings yet
Universal Asynchronous Receiver/Transmitter (Uart) : EXAR Corporation, 48720 Kato Road, Fremont, CA 94538
3 pages
DMA Controller Code in VHDL
100% (2)
DMA Controller Code in VHDL
7 pages
Chapter 3
No ratings yet
Chapter 3
30 pages
Hard Disk Drive
No ratings yet
Hard Disk Drive
16 pages
Different Select Statements in SAP ABAP
100% (3)
Different Select Statements in SAP ABAP
4 pages
Error Control Coding Guide
0% (1)
Error Control Coding Guide
2 pages
Logical Database Design and The Relational Model
No ratings yet
Logical Database Design and The Relational Model
52 pages
Instrumentation and Control Systems Spectrum
No ratings yet
Instrumentation and Control Systems Spectrum
132 pages
Cloud Computing IIT Kanpur PDF
No ratings yet
Cloud Computing IIT Kanpur PDF
123 pages
Zend_Pdf Tutorial: Create & Manipulate PDFs
No ratings yet
Zend_Pdf Tutorial: Create & Manipulate PDFs
10 pages
Binary Numbers Lesson Plan
No ratings yet
Binary Numbers Lesson Plan
10 pages
Frequency Shift-Keying: Also Known As FSK
No ratings yet
Frequency Shift-Keying: Also Known As FSK
33 pages
Cybersecurity Forensics Quiz
No ratings yet
Cybersecurity Forensics Quiz
17 pages
Exploring CAT Grade 11 LU1,2,3
No ratings yet
Exploring CAT Grade 11 LU1,2,3
105 pages
Open Source Time Series Rasters
No ratings yet
Open Source Time Series Rasters
24 pages
Lab 01 AutomobileManagement Using LINQ and WinForms
No ratings yet
Lab 01 AutomobileManagement Using LINQ and WinForms
19 pages
Qno1: Draw Block Diagram of Computer. and Also Define Major Parts in Own Language
No ratings yet
Qno1: Draw Block Diagram of Computer. and Also Define Major Parts in Own Language
6 pages
Linux Directory Structure & rsync Guide
No ratings yet
Linux Directory Structure & rsync Guide
7 pages
CV-Noman Ali
No ratings yet
CV-Noman Ali
2 pages

DataWare Housing Concepts

Uploaded by

DataWare Housing Concepts

Uploaded by

OLAP

MOLAP, ROLAP, And HOLAP

Factless Fact Table

For example, think about a record of student attendance in classes. In this

 How many students attended a particular class on a particular day?

Junk dimension is the way to solve this problem. In a junk dimension, we

By using a junk dimension to replace the 3 indicator fields, we have

A conformed dimension is a dimension that has exactly the same meaning

Why is conformed dimension important? This goes back to the definition of

The time dimension is a common conformed dimension in an organization.

Building a conformed dimension also part of the process in master data

What are the Dimensions in Data warehouse?

Let’s walk through commonly used types of dimensions.

A Dimension that is used in multiple locations is called conformed dimensions. A conformed

I.e.: In Date dimensions, [FullDateAlternateKey] is associated with [Orderdate key], [Duedate

Slowly Changing Dimensions

You might also like