0 ratings0% found this document useful (0 votes) 54 views39 pagesDW Unit 4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
Dimensional Modeling
and Schema
Sy
pinensional Modeling - Multi- Dimensional Data Modeling - Data Cube - Star Schema - Snowflake
“hema - Star Vs Snowflake schema - Fact constellation Schema - Schema Definition - Process
cipitecture - Types of Data Base Parallelism - Datawarehouse Tools.
A
Contents
14 Dimensional Modeling
42. Multi-Dimensional Data Modeling
43 Data Cube
44 Star Schema
45 Snowflake Schema
46 Comparison of Star Schema and Snowflake Schema
47 Fact Constellation Schema
48 Schema Definition
49° Process Architecture
410 Types of Data Base Parallelism
411 Datawarehouse Tools
412 Two Marks Questions with Answers
(4-1)—
Data Warehousing 4-2 Dimensional Modeling and Schema
Dimensional Modeling
© Dimensional modeling is a desi
structure data for effi
1 technique used in data warehousing to organize and
ent analysis and reporting. It provides a way to represent the data
ina dimensional format that is optimized for querying and aggregating information.
© In dimensional modeling, the transaction record is separated into "facts," which are
often numerical transaction data and "dimensions," which are the reference data that
places the facts in context. For instance, a sale transaction can be broken down into
specifics like the quantity of products ordered and the cost of those things, as well as
into dimensions like the date of the purchase, the user's name, the product number, the
ship - to and bill - to addresses and the salesperson who was in charge of receiving the
order.
In dimensional modeling, data is organized into two main types of tables =
1. Fact tables and
2. Dimension tables.
e Fact tables : Fact tables contain the numeric and additive data that represents the
business transactions or events being analyzed. Each row in the fact table corresponds to
a specific event or transaction and contains the associated measures or metrics, such as
sales amount, quantity, or revenue. Fact tables also include foreign keys that link to the
dimension tables.
© Dimension tables : Dimension tables provide descriptive attributes or context for the
data in the fact tables. They contain the textual or qualitative data that provides
additional details about the business entities involved in the transactions. Examples of
dimension tables include customer, product, time, geography and other relevant
dimensions. Dimension tables typically have a primary key column that uniquely
identifies each row.
The Key Principles of Dimensional Modeling
The key principles of dimensional modeling include :
1, Star schema
Dimensional models typically follow a star schema design, where the
fact table is at the center and dimension tables radiate from it like the points of a star.
This schema simplifies queries and enables fast aggregations.
2. Denormalization : Dimensional models denormalize data by combining multiple
tables into a single table. This improves query performance by reducing the number
of table joins required for analysis.
TECHNICAL PUBLICATIONS® - an up-thrust for knowledgeusing.
me Dimensional Modeling and Schema.
5 epresent different
Aan res nelude hierarchies that represent di
levels o a 'y OF aggregation. For example, a time dimension may include
hierarchies like year, quarter, month and dy
rrogate keys : ,
se Bice meee oe Keys are artificially generated primary keys used in
jim a Provide a stable identi each dimensi w, even if the
source system's primary keys change lentifier for each dimension row,
5, Conformed dimensi
Hierarchies ension tables often
ns: ;
S * Conformed dimensions are dimensions that are used
multiple fe i
across MUNUP'S Fact tables. They ensure consistency and enable data integration and
comparison across different business Processes.
the benefits of dimensional modelin
ysis and flexibi g include improved query performance, simplified
analys!
fos bun ty in data exploration and reporting. It provides a user - friendly
structure for business users to navigate and understand the data within a data warehouse
or data mart environment.
pa The Objectives of Dimensional Modeling
+ The objectives of dimensional modeling are as follows :
1, Simplify data analysis : Dimensional modeling aims to simplify the process of data
analysis by providing a structure that is intuitive and easy to understand for business
users. The design focuses on organizing data into dimensions and facts, allowing
users to navigate and explore the data efficiently.
2. Optimize query performance : Dimensional models are designed to optimize query
performance by reducing the complexity of joins and aggregations. The star schema
design, denormalization and pre-aggregation techniques employed in dimensional
modeling enable faster query execution, making it easier to retrieve insights from
large volumes of data.
3. Support business reporting and analytics : Dimensional models are specifically
tailored to support business reporting and analytics requirements. By structuring data
into dimensions and facts, dimensional models facilitate the creation of meaningful
reports, dashboards and visualizations that provide actionable insights to business
users.
4. Enable Ad Hoe analysis : Dimensional models are flexible and enable ad hoe
analysis, allowing users to slice and dice data along various dimensions, drill down
into details and perform multidimensional analysis. The structure of dimensional
models facilitates self - service exploration and empowers users to answer their own
business questions.
5. Promote data consistency and integrati S odeling
consistency by utilizing conformed dimensions. Conformed dimensions ensure that
Dimensional modeling promotes data
TECHNICAL PUBLICATIONS® - en upthrst for knowledgeDate Warehousing
4-4 Dimensional Modeling and Schoma
the same dimension attributes are used consistently across different facts, enabling
integration and comparison of data from multiple business processes.
ibility : Dimensional models are designed to
6. Facilitate scalability and exten:
accommodate future growth and changes in business requirements. The modular
nature of dimensional models allows for easy addition of new dimensions or facts ag
the data warchouse evolves, making it a scalable and extensible solution.
7. Enhance user adoption and user experience : Dimensional models are user .
centric and provide a user - friendly experience. By organizing data in a way that
aligns with the users' mental model of the business, dimensional models enhance user
adoption and acceptance of the data warehouse, fostering a data - driven culture
within the organization.
Overall, the primary objectives of dimensional modeling in a data warehouse are to
simplify data analysis, optimize query performance, support business reporting and
analytics, enable ad hoc analysis, promote data consistency and integration, facilitate
scalability and extensibility and enhance user adoption and user experience.
Elements of Dimensional Data Model
1. Fact
© Facts are the measurements or metrics from the business process. For example, in sales
business process, a measurement can be quarterly sales number
2. Dimension
) Dimension provides the background close to a business process event. In simple terms,
they give who, what, where of a fact. In the sales business process, for the fact quarterly
sales number, dimensions would be
© Who - Customer names
© Where - Location
© What - Product name
In other words, a dimension is a window to view information in the facts.
3. Attributes
The attributes are the various characteristics of the dimension in dimensional data
modeling.
In the location dimension, the attributes can be
© State
© Country
° Zipcode etc.
TECHNICAL PUBLICATIONS® - an up-thrust for knowledgei worehousing
e s Dimensional Modeling and Schema
+ Axtributes are used to search, filter or clas
. Dimension tables contain Attributes
‘ Fact table
«A fact table is a primary table in dimension modelin
« A fact table contains :
1. Measurements / facts
2. Foreign key to dimension table
5 pimension table
« Adimension table contains dimensions of a fact,
« They are joined to fact table via a foreign key,
+ Dimension tables are de-normalized tables,
« The dimension attributes are the various columns in a dimension table.
« Dimensions offers descriptive characteristics of the facts with the help of their attributes.
¢ No set limit set for given for number of dimensions,
¢ The dimension can also contain one or more hierarchical relationships.
[EE] steps of Dimensional Modeling
Fig. 4.1.1 shows steps involved in dimensional modeling.
Select the
business | eWhy
process,
Declare the
grain | @How much
Identify the
dimension ae
Identify
Build the
schema
Fig, 4.1.4 Stops of dimensional modeling
up-thrust for knowledge
pal CATIONData Warehousing 4-6 Dimensional Modeling and Schema
Steps in dimensional modeling are :
4. Identify business process
© Determining the precise business processes that a data warehouse should cover.
Depending on the organization's requirements for data analysis, this might be marketing,
sales, HR, etc, The quality of the data available for that process affects the choice of the
business process as well. The data modeling process is most crucial stage, failing to
which would result in cascading and disastrous flaws. To describe business process one
can use plain text or simple Business Process Modeling Notation (BPMN), or Unified
Modeling Language (UML) to describe the business process.
2. Identify grain (Level of detail)
© The level of detail for the business problem / solution is described by the grain. Finding
the lowest level of information for any table in the data warehouse is what this method
entails. A table should have daily granularity if it contains daily sales data. A table has
monthly granularity if it contains total sales information for each month.
Example of grain :
The CEO at an MNC wants to find the sales for specific products in different locations ona
daily basis. So, the grain is “product sale information by location by the day.”
3. Identify dimensions
e Dimensions are nouns like date, store, inventory, etc. These dimensions are where all
the data should be stored. For example, the date dimension may contain data like a year,
month and weekday.
4. Identify facts
© This step is associated with the business users of the system because this is where they
get access to data stored in the data warehouse. Most of the fact table rows are
numerical values like price or cost per unit, etc.
5. Build schema
® In this step, the implementation of the dimension model is involved. A schema is the
database structure (Arrangement of tables).
Advantages of Dimensional Modeling
Dimensional modeling offers several advantages in the context of data warehousing and
analytics :
* Simplicity and ease of understanding : Dimensional models are designed to be intuitive
and easily comprehensible. The use of dimensions, facts and hierarchies simplifies the
TECHNICAL PUBLICATIONS® - an up-thrust for knowledgeie
orenusing 4-7 Dimensional Modeling and Schema
8.
La making it easier for b
+ Flex
*inizing and analyzing data in a data warehouse. It enables efficient querying, fle
i business-focused analysis, empowering organizations to derive valuable insights trom
their data ”
to un
tract
. tand and navigate the data. This
omotes user adoption and facilitates self -
service analytics.
query performance : Dimensional models are optimized for query performance. The
structure allows for efficient querying and Aggregating of data, enabling faster response
times for analytical queries. The use of pre - Aggregated measures and summarized data
in the fact tables further enhances query performance,
ility and scalability : Dimensional models. provide flexibility in adding or
modifying dimensions and measures as business requirements evolve, ‘They can
accommodate new attributes, hierarchies, or even entirely new dimensions without
disrupting the existing structure. This scalability makes it easier to adapt to changing
business needs.
+ Enhanced data analysis : Dimensional models facilitate easy data analysis and
exploration. The clear separation of dimensions and measures allows users to slice and
dice data along different dimensions, drill down into details, and perform
multidimensional analysis. This enables users to gain insights and make informed
decisions based on the data.
+ Business - focused design : Dimensional modeling aligns with business processes and
user requirements. It emphasizes the representation of business entities, attributes and
relationships, making it more business - centric. This ensures that the data warehouse
caters to the specific needs of the business, enabling better decision-making,
* Integration with reporting and visualization tools : Dimensional models are well - suited
for integration with reporting and data visualization tools. The structure aligns with the
way users typically analyze data, making it easier to create meaningful reports and
visualizations. This integration enhances the usability and effectiveness of these tools.
* Simplified ETL processes : Extract, Transform, Load (ETL) processes are typically
simplified in dimensional models. The structured nature of dimensions and facts makes
it easier to extract and transform data from source systems and load it into the data
Warehouse. This can lead to streamlined and efficient ETL pipelines.
Overall, dimensional modeling provides a robust and user - friendly framework for
bility,
TECHNICAL PUBLICATIONS® - an up-thrust for knowledgeData Warehousing 4-8 Dimensional Modeling and Schema
Multi-Dimensional Data Modeling
© The way that data is seen in a multidimensional model is as a data - cube. Data may be
modeled and seen in various dimensions using a data cube. Facts and dimensions serve
to define it.
The views or things that an organization keeps data of are known as the dimensions. For
instance, a retailer may establish a sales data warehouse to document sales for the
dimensions of time, item and place. With the help of these parameters, the save may
keep track of things like monthly sales of particular commodities and the places where
those sales took place. A table called a dimensional table is associated with each
dimension and contains additional information about the dimension. For instance, an
item's dimensional table may include the attributes item_name, brand and type.
Tabular representation
228
a@resao
at] 1] 4 [25
uf2[ife
vif 3 | a [15]
12] 1 | 1 [30
12] 2] 1 [20
12[ 3 | 1 | 50
13) 1] 118 13] 8 | 10| 10
13] 2] 1] 10 z HRS
13] 3] 1] 10 af 25[ 8 | 15
a[1] 2 [35 1.2 +3
Timeid
Multidimensional representation
Slice loci
is shown
Locid
Fig. 4.2.1 Example of multi dimensional data modeling
e A multidimensional data model is structured around a central theme, for example, sales.
This theme is represented by a fact table. Facts are numerical measures. The fact table
contains the names of the facts or measures of the related dimensional tables.
In following table the 3D data of the table are represented as a series of 2D tables.
TECHNICAL PUBLICATIONS® - an up-thrust for knowledge:
8
3
3
&
3
=
=
$
=
a
OS. SEE Bh “Bes OR Sy Bh BEE SE ER CSOE SEE BEE HDS wD
Or 0S HE EH GE EHS BIS BHCGE(OGHCC8Ds «hCG
06 «Oz «STE St 99 gow se Sy SBE OBE OSLER HD
09 SI 80S 0st 0c = S8EO6E ST 0c 09h Sth 0c ONE OPED.
unosigy Peg AWA 839 unosig prog AN 38g unosig peg YIN 88q unosig pearg AWA 33g OWL
way
way
way
wart
«!UIAC,, = UoHEI07]
«fQuiny,,, = UOHBI07
LBIPYOY.,, = wOHHeIO}
efBUUDYD,, = WORD]
yg woronousing
y| Data Warehousing 4-10 Dimensional Modeling and Schema
>
i & Chenna\/"z49 / 360 20 10
& Kolkata /435 7” 460 20 15
Mumbai/“99 / 385 / 20 39
Delhi 50
35
ai] 260 | 508 18 60 a8 ||
48
| e 90 Ay,
B aa| a0 | 256 | 20 a
& | 80,
2 cs| 436 | sv | 50 | 4 [
—
as] 520 | 403 | 25 | 50
Egg Milk ~=—sBread Biscuit
Item (types)
Fig. 4.2.2 Example of multi dimensional data modeling
EER Key Components of Multi - Dimensional Data Modeling
© Multi - dimensional data modeling is a technique used in data warehousing to represent
and organize complex business data in a way that supports efficient querying and
analysis across multiple dimensions. It extends the concept of dimensional modeling by
incorporating additional dimensions, hierarchies and measures to capture the various
perspectives of the data. Here are the key components and considerations in multi-
dimensional data modelling :
1. Dimensions : Dimensions represent the descriptive attributes by which data is
analyzed. In multi - dimensional modeling, multiple dimensions are included to
capture different perspectives of the data. For example, in a sales data warehouse,
dimensions could include time, product, customer, geography and organization.
2. Hierarchies : Hierarchies define the relationships between the attributes within a
dimension. They represent the drill - down or roll - up paths to navigate through the
data at different levels of granularity. For instance, a time hierarchy could have levels
such as year, quarter, month and day, allowing users to analyze data at different time
granularities.
3. Measures : Measures represent the numeric facts or metrics that are analyzed in the
data warehouse. They provide quantitative data for analysis and reporting. Measures
TECHNICAL PUBLICATIONS® - an up-thrust for knowledgealt.
werehousing cot Dimensional Modeling and Schema
can be aggregated or calculated b,
ased on the dimensions, Examples of measures
include sales revenue, quantity sold
, profit and av
4, Cubes : In multi - dimensional data modeling, a cube is a fundamental structure that
represents the combination of dimensions, hicrarchie
a multi - dimensional view of the ¢;
along different dimensions,
age order value,
s and measures. A cube provides
lata, enabling users to analyze and aggregate data
Each cell within the cube represents a specific
intersection of dimension values and contait
ns the corresponding measures).
5, OLAP (Online Analytical Processing) :
, ‘ OLAP is a technology that supports
multi - dimensional data modeling and enab!
les fast and interactive analysis of the
data warehouse. OLAP tools provide capabilities to slice, dice, drill down, roll up
and pivot the data along different dimensions. They allow users to perform complex
queries and calculations on the multi - dimensional data.
6. Aggregation : Aggregation is an important concept in multi - dimensional data
modeling. It involves pre - calculating and storing aggregated values for measures at
various levels of granularity in the cube. Aggregations speed up query performance
by avoiding the need to calculate measures from the detailed data during analysis.
Aggregations can be defined at different levels of the hierarchies within dimensions.
7, Data granularity : Data granularity refers to the level of detail at which data is
stored in the data warehouse. It is an important consideration in multi - dimensional
data modeling. Choosing the appropriate granularity ensures that the data warehouse
can support the desired level of analysis and reporting. It is important to strike a
balance between providing sufficient detail for analysis and maintaining acceptable
query performance.
8. Complex relationships : Multi - dimensional data modeling can handle complex
relationships between dimensions. For example, a many-to-many relationship
between dimensions can be represented using bridge tables or factless fact tables.
These relationships enable the analysis of data across multiple dimensions
simultaneously.
By employing multi - dimensional data modeling techniques, organizations can create a
Comprehensive and flexible representation of their business data in the data warehouse.
This enables in - depth analysis and reporting, supports complex queries and provides
Valuable insights to support decision - making process
TECHNICAL PUBLICATIONS® - an up-thrust for knowiedge“
| Modeli ¢
Data Warehousing 4-12 Dimensional Hing and Schem,
EX) Data Cube
© A data cube is a fundamental concept in data warehousing that represents a
multi - dimensional view of data. It provides a structured and efficient way to organi.
and analyze data along multiple dimensions, A data cube combines dimensions
hierarchies, and measures to enable users to slice, dice, drill down and roll up data fon
,
analysis and reporting purposes.
© Here are the key components and characteristics of'a data cube :
1. Dimensions : Dimensions represent the descriptive attributes by which data js
analyzed. Examples of dimensions include time, product, customer, geography ang
organization. Each dimension can have multiple levels or hicrarchies representing
different levels of granularity.
2. Measures : Measures represent the numeric facts or metrics that are analyzed in the
data warehouse. They provide quantitative data for analysis and reporting. Examples
of measures include sales revenue, quantity sold, profit and average order value,
Measures can be aggregated or calculated based on the dimensions.
3. Hierarchies : Hierarchies define the relationships between the attributes within a
dimension. They represent the drill - down or roll - up paths to navigate through the
data at different levels of granularity. For example, a time hierarchy could have
levels such as year, quarter, month and day, allowing users to analyze data at
different time granularities. iz
4. Cells : Cells represent the intersections of dimension values within the data cube. —
Each cell contains the corresponding measure(s) for that specific combination of
dimension values. For example, a cell in a sales data cube may represent the revenue
generated by a specific product in a particular month.
5. Aggregation : Aggregation involves pre - calculating and stori
for measures at various levels of granularity in the data cube. Aggregations speed Up
query performance by avoiding the need to calculate measures from the detailed data
during analysis. Aggregations can be defined at different levels of the hierarchies
within dimensions.
eregated values
6. OLAP Operations : Online Analytical Processing (OLAP) operations are performed
on the data cube to analyze and report on the data, OLAP operations include slic
(selecting a specific subset of data), dicing (selecting a subcube based on specific
criteria), drilling down (viewing data at a more detailed level), rolling up (viewits
data at a higher level of aggregation), and pivoting (rotating the dimensions
analyze data from different perspectives),
TECHNICAL PUBLICATIONS® - en up-thrust for knowledge iwa
using ets I
ehd Dimensional Modeling and Schema
@
pata cubes ron’ ‘ multi - dimensional representation of data, enabling users to
explore dat i TI '
\d exp ta from different angles. They facilitate efficient and interactive
analyze aM
gata analysis 1m data warehousing environments, supporting decision - making processes
and providing insights into business performance. OLAP tools and technologies are
often used to interact with data cubes and perform these multi - dimensional analysis
operations.
Example = In the 2-D representation, All Electronics sales data for items sold per quarter in
the city of Vancouver. The measured display in dollars sold (in thousands).
Location = “Vancouver”
Item (type)
Time (quarter) Home entertainment __ Computer Phone Security
Qi 605 825 14 400
Q 680 952 31 512
@ 812 1023 30 301
Q@ 927 1038 38 380 |
Fig. 4.3.1 2-D view of sales data
Location = “Chicago” Location = “New York” Location = “Toronto”
Item
item item
home
qa 854 882 89 623-1087 968 38 872. 818 746 4B 591
870 1142 1091 54 984 978 864 359 784
Qs 1129. 992 63
Fig. 4.3.2 3-D view of sales data
Conceptually, data in the form of 3-D data cubes, as shown in Fig. 4.3.3.=
Data Warehousing 4-14 Dimensional Modeling and Schema
&
s Chicage/“gs4 / 882 89 623
ge NewNOO\os7 7” 968 38 / 872
Y
Toronto/ aia / 740 / 43/591
Vancouver,
x
Q1) Gos 825 14 400 sg,
Ay,
g siz |PA
2 aa] sv | oz | 31 | 512 SX)
& 2
o WN SY
2 as] siz | 1023 | 30 | sot
e is:
as} 927 | 1038 | 38 | 500
Computer Security
Home Phone
entertainment
Item (types)
Fig. 4.3.3 3-D Data cube
EB star Schema
Star schema is a widely used data modeling technique in the design of data warehouses.
It is named so because the diagrammatic representation of this schema resembles a star
shape, with a fact table at the center and multiple dimension tables radiating outwards.
In the above example, SALES is a fact table having attributes i.e. (Product ID. Order ID.
Customer ID, Employer ID, Total, Quantity, Discount) which references to the
dimension tables. Employee dimension table contains the attributes : Employee ID.
Employee Name, Title, Department and Region. Product dimension table contains the
attributes : Product ID, Product Name, Product Category, Unit Price. Time dimension
table contains the attributes : Order ID, Order Date, Year, Quarter, Month. Customer
dimension table contains the attributes : Customer ID, Customer Name, Address, City.
Zip.
In a star schema, the fact table represents the central focus of the schema and contains
the measurements or metrics of the busin
of numerical values or facts that are additive or semi - additive, such as sales revenue,
quantity sold, or units produced. The fact table is associated with one or more dimension
tables through foreign key relationships.
process being analyzed. It typically con:
TECHNICAL PUBLICATIONS® - an up-thrust for knowledgeyoronousiad a6 Dimonsional Modoling and Schema
i ;
Time Dimension Product Dimension
Order ID
Product 1D
ae Product Name
a Product Category
pee Unit Price
Month —
Sales
Product ID
Order ID
Customer ID eS a |
Employer ID
Total
Quantity
Customer Dimension Discount Emp Dimensions
Customer ID Emp ID
Customer Name Emp Name
Address: Title
City Department
Zip Region
Fig. 4.4.1 Star schema
« Dimension tables provide descriptive attributes or context to the data in the fact table.
They contain textual or descriptive information about the dimensions or characteristics
that describe the business process being analyzed. For example, in a sales data
warehouse, dimension tables could include information about products, customers, time,
geography and so on. Each dimension table is linked to the fact table through a primary
key - foreign key relationship.
The star schema is characterized by its simplicity and ease of use. It provides a
denormalized structure where each dimension is represented by a single table, making
Queries and aggregations efficient and straightforward, The dimension tables are
typically small in size, as they contain the descriptive attributes, while the fact table may
be relatively larger, containing the numerical facts. This structure enables efficient
Guerying and reporting capabilities for analytical purposes.
ty Features of Star Schema
.
The star schema is a popular data modeling technique in data warehousing. It offers
a features that make it suitable for analytical reporting and query performance.
le ;
Te ate some key features of the star schema :
TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
eeData Warehousing 4-16 Dimensional Modeling and Schema
1. Simplicity : The star schema has a simple and intuitive structure. It consists of a
central fact table surrounded by multiple dimension tables. This simplicity makes it
easy to understand, implement and query.
2. Denormalization : The star schema is denormalized, meaning that it reduces the
number of joins required to retrieve data, Dimension tables contain all the descriptive
attributes related to a dimension, climinating the need for complex join operations
during querying.
3. Fact table
measurement
revenue, quantity sold, or units produced. The fact table is typically large and
contains numeric facts that are additive or semi - additive.
The star schema has a fact table at its core. This table holds the
or metrics of the business process being analyzed, such as sales
4. Dimension tables : Dimension tables provide the descriptive context for the
measurements in the fact table. Each dimension table represents a specific dimension
or characteristic related to the business process, such as product, customer, time, or
geography. These tables contain attributes that describe the dimension, such as
product name, customer ID, or date.
5. One-to-Many relationships : The relationship between the fact table and dimension
tables in the star schema is typically one-to-many. Each row in the fact table is
associated with one or more rows in the dimension tables through foreign key
relationships. This structure simplifies data retrieval and enables efficient query
performance.
6. Query performance : The denormalized structure of the star schema facilitates fast
and efficient query performance. By reducing the number of joins and providing
pre-aggregated data in the fact table, analytical queries can be executed quickly and
easily.
7. Analytical reporting : The star schema is designed to support analytical reporting
and data analysis. It provides a structure that is optimized for querying and
aggregating data for reporting purposes. Users can easily slice and dice the data
along various dimensions to gain insights and make informed business decisions.
8. Scalability : The star schema is highly scalable and can handle large volumes of data
without sacrificing performance. It can accommodate the addition of new dimension
tables or attributes as the busines:
requirements evolve.
9, Data quality : The star schema simplifies data validation and improves data quality:
De:
consistency of attribute values.
iptive attributes are stored in dimension tables, allowing for better control and
TECHNICAL PUBLICATIONS® - an up-thrust for knowiedgevoonusing fenie Dimensional Modeling and Schema
7 overall, the star schema's simplicity, denormalized structure and efficient querying
+ aabiltes make ita widely adopted data modeling technique for data warehousing and
analytical reporting. However, it's important to consider the specific requirements and
de - of fs of the project when deciding whether to use a star schema.
‘Advantages of Star Schema
penefits of using a star schema in a data warehouse include :
mplified querying : The star schema's simple structure allows for easy and
intuitive querying, as it eliminates the need for complex joins and enables faster
query performance.
2. Performance optimization : The denormalized structure of the star schema
enhances performance for aggregations and summaries, as it minimizes the number
of joins required to retrieve data,
: Star schemas are highly scalable and can accommodate large volumes
3, Scalabil
of data without sacrificing performance.
4, Flexibility : The star schema provides flexibility in adding or modifying dimensions
and measures as the business requirements evolve. New dimensions can be easily
incorporated without affecting the existing schema.
5. Improved data quality : The star schema simplifies data validation and improves
data quality by storing descriptive attributes in dimension tables separately.
* Despite its advantages, the star schema may not be suitable for all scenarios. It is best
suited for analytical reporting and data warehousing purposes, where read performance
and ease of use are important. In more transactional systems, where data integrity and
updates are critical, a more normalized schema might be preferred.
[ZX Disadvantages of Star Schema
* While star schemas offer numerous benefits, they also have some potential
disadvantages to consider. Here are a few drawbacks associated with star schemas :
1. Data redundancy : Star schemas denormalize data by duplicating dimension
attributes across multiple dimension tables. This redundancy can result in increased
storage requirements, especially when dealing with large amounts of data,
Redundaney also introduces the possibility of inconsistent data if updates or changes
| instances of a dimension attribute.
are not properly managed across all
ity in adding or modifying
2. Limited flexibility : While star schemas provide flex‘
dimensions, they can be less flexible when it comes to adding new measures or
TECHNICAL PUBLICATIONS® an up-thrust for knowledgeData Warehousing 4-18 Dimensional Modeling and Schema
metrics. If new measures need to be added to the schema, it may require altering the
structure of the fact table, which can be complex and time - consuming. This lack of
flexibility can make star schemas less suitable for environments with rapidly
changing and diverse analytical requirements.
3. Performance challenges with complex queries : Although star schemas are
optimized for simple and straightforward queries, they may struggle to handle
complex queries involving multiple dimensions and extensive aggregations. In such
cases, the performance of star schemas can degrade, as they may require numerous
joins and calculations across large data sets.
4. Limited support for non - additive facts : Star schemas are designed primarily for
additive or semi - additive facts, which are values that can be summed up across
dimensions. However, they may not be well - suited for non - additive facts, such as
ratios or percentages, which require more sophisticated calculations and different
aggregation methods.
5. Data
risk of data integrity issues if updates or changes are not properly managed.
tegrity concerns : As star schemas denormalize data, there is an increased
Inconsistent or inaccurate data can occur if dimension attributes are not synchronized
or if changes are not propagated correctly across the schema.
6. Difficulty in modeling complex relationships : Star schemas work best for simple
one-to-many relationships between the fact table and dimension tables. However,
they can be challenging to model more complex relationships, such as
many-to-many or self - referencing relationships. In such cases, additional
techniques like bridge tables or snowflake schemas may be required, adding
complexity to the schema design.
© It's important to note that while these disadvantages exist, they may not be significant
concerns for all use cases. The suitability of a star schema depends on the specific
requirements, data volume, query patterns and the trade - offs that need to be made in
the data warehousing environment.
EG snowflake Schema
The snowflake schema is a data modeling technique used in data warehousing, which is
an extension of the star schema. It is designed to address some of the limitations of the
star schema, particularly related to data redundancy and the ability to handle complex
relationships.
TECHNICAL PUBLICATIONS® - an up-thnust for knowledge4-19
Dimensional Modeling and Schema
Ina snowflake schema, dimension tab)
: al
into multiple smaller table:
’ ; Ss each repr i
within the dimension, cb ee eee le Mor
relationships, formin; These smaller tables are conn: isugitand Se dy
g 8 structure that resembles a sni oe, itieaatd
snowflake, hence the name.
les are i
Normalized, meaning that they are divided
Time Dimension ae
Product bimensi
lon ProduetCatego
ore Prowse |__| —, Dimension”
dit oduct
ven cist » esha ProduetCategor/D
mpl roductCat Name
Month Toe ran :
Quantity ee
Discount
Customer Employee
Dimension [pimension,
Customerid Employee!
CustomerName Empleyeetame
‘Address DepartmentiD 7
as CityiD Region Dimension
Dimension serrton,
as Deparment
cityName Leeaon
Zipcode
Sate
Country
Fig. 4.5.1 Snowflake schema
«In Fig 4.5.1 the employee dimension table now contains the attributes : Employee ID.
Employee Name, Department ID, Region, and Territory. The Department ID attribute
links with the Employee table with the Department dimension table. The Department
dimension is used to provide detail about each department, such as the Name and
Location of the department. The Customer dimension table now contains the attributes :
Customer ID, Customer Name, ‘Address, and City ID. The City ID attributes link the
Customer dimension table with the City dimension table. The City dimension table has
has ci
.¢ from the star schema in that it allows for more granular
By normalizing dimension tables,
details about each city such ity name, Zipcode, State and Country.
flake schema differs
ion within dimensions.
ted, resulting in reduced storage requirements compared
so help improve data integrity and
© The snow!
levels of data representa
redundant attributes are climinalees 1
to the star schema. This normalization can al
consistency.
TacnNIGALEUBL! sn up-thrust for knowledgeData Warehousing 4-20 Dimensional Modeling and Schema
EES] Features of Snowflake Schema
Here are some key features of the snowflake schema :
L
Normalized dimension tables : In a snowflake schema, dimension tables are
normalized, meaning they are divided into multiple smaller tables. Each smaller table
represents a specific level of detail or hierarchy within the dimension. This
normalization reduces data redundancy by avoiding duplication of attributes.
Hierarchical structure : The snowflake schema allows for a hierarchical structure
within dimensions. The dimension tables are connected through one - to - many
relationships, forming a snowflake - like structure. This structure enables drill - down
and roll - up operations, allowing users to navigate through various levels of detail
within a dimension.
The snowflake schema excels at handli
complex relationships, such as many - to - many or self - referencing relationships.
Complex relationship handling
By breaking down dimension tables into smaller tables, it becomes easier to
represent and manage these relationships accurately.
Improved data integrity : With normalized dimension tables, updates or changes to
dimension attributes can be applied consistently across all levels of the hierarchy.
This helps maintain data integrity and ensures that changes are accurately reflected
throughout the schema.
Reduced data redundancy : Normalizing the dimension tables in a snowflake
schema eliminates redundant attributes. This reduction in redundancy leads to lower
storage requirements and makes data maintenance easier, especially for large data
warehouses.
Flexibility in querying : The snowflake schema provides flexibility in querying by
allowing drill - down and roll - up operations across different levels of detail within a
dimension. Users can navigate the hierarchy and retrieve data at various levels of
granularity as needed.
Scalability : The snowflake schema offers scalability as the data warehouse grows.
The normalized structure allows for efficient management of large volumes of data
without sacrificing performance or data integrity.
Enhanced data consistency : With the normalized structure of the snowflake
schema, data consistency can be maintained across different levels of the dimension
hierarchy. Changes made to a_ speci attribute in one level will propagate
consistently to all related levels, ensuring consistent and accurate data representation.
TECHNICAL PUBLIGATIONS®- an up-trust for knowiedgeoronusid 4-21
Dimensional Modeling and Schema
g
|. important to note that whil
: ane improved a = snowflake schema provides benefits such as reduced
ear yces increased complexi egrity and complex relationship handling, it also
fee to use a ae to the schema design and querying process. The
gecisi schema should consi :
on i
trade - offs of the data warehousing project. sider the specific requirements and
ge
qhead
I ae 2 redundancy : Normalizing the dimension tables reduces data
redundancy by avoiding duplication of attributes. This can lead to lower storage
requirements and easier data maintenance, especially for large data warehouses.
2. Improv
dimension attributes can be applied consistently across all levels of the hierarchy.
reflected
‘this helps maintain data integrity and ensures that changes are accurately #
throughout the schema.
3, Flexibility in querying +
querying by allowing drill -
dvantages of Snowflake Schema
vantages of snowflake schema are :
ed data integrity : With normalized dimension tables, updates or changes to
¢ schema provides more flexibility in
rious levels of
The snowflal
down and roll - up operations across va
detail within a dimension. Users can navigate the hierarchy and retrieve data at
different levels of granularity as needed
The snowflake schema is particularly useful
such as many - to - many oF self -
‘ons into multiple tables, it
4, Handling complex relationships
1 complex. relationships.
By breaking down dimensi
and represent these relationships accurately.
when dealing with
referencing relationships
becomes easier to manag
[BQ Disadvantages of Snowflake Schema
Despite its advantages, the snowflake schema als
ity : The normalization of dimen:
and query formu!
itiple levels of the hierarchy. which ca
‘9 has some drawbacks to consider :
sion tables introduces additio
Jation. More joins are typically
n impact
nal
1. Increased complexi
complexity to the schema desig!
om mu
required to retrieve data fr
{joins in a snowflake schema
d to the star schema. The
Jower query
query performance.
2. Performance implications =
ively impact query
\d more co!
==
The increased number 01
performance compare’
can neg
mplex query paths can lead to sl
additional join operations an
€xccution times.Data Warehousing 4-22 Dimensional Modeling and Schema
3. Query complexity : Querying data from a snowflake schema can be more
challenging for users due (o the increased complexity of the schema. Formulating
queries across multiple tables and understanding the relationships between them can
be more time - consuming and error - prone.
ESS comparison of Star Schema and Snowflake Schema
© The star schema and snowflake schema are two common data modeling techniques used
in data warehousing, While they share similarities, they also have distinct differences,
Here's a comparison between the star schema and snowflake schema :
to the snowflake schema. The
denormalized structure requires
fewer joins, leading to simpler and
faster query execution,
Sr.No Parameters Star schema Snowflake schema
1. Structure The star schema has a simple The snowflake schema extends
structure with a single fact table atthe star schema by normalizing
the center and multiple dimension ___ dimension tables. Dimension
tables radiating outwards. tables are divided into multiple
Dimension tables are denormalized, smaller tables, creating a
containing all the attributes related snowflake - like structure with
to.a dimension. additional levels of detail.
2 Data ‘The star schema tends to have more The snowflake schema reduces
redundancy data redundancy as dimension tables _ data redundancy by normalizing
are denormalized, resulting in dimension tables, resulting in
attribute duplication across multiple smaller storage requirements.
rows. This redundancy can lead to
larger storage requirements
3+ Query The star schema generally offers ‘Snowflake schema may have
performance better query performance compared _ slightly slower query performance
due to additional joins required to
navigate between the normalized
dimension tables. However, with
proper indexing and optimization,
the performance difference can be
minimized.
4 Complexity
The star schema is simpler to
understand, implement and query
due to its denormalized structure, It
is easier to maintain and modify as
it requires fewer tables and
relationships.
Snowflake schema introduces
additional complexity due to the
normalized structure. It involves
more tables and relationships,
requiring more complex joins and
query formulation.
TECHNICAL PUBLICATIONS® - an up-thrust for knowledgey-
werent!
Relationship
handling
using 4-29
pst Th
rhe star schema is suitable for
simpler one - to - many
relationships between the fact table
and dimension tables. It may not
handle complex relationshif
as many - to - many or self-
referencing relationships, as
effectively,
5
, such
The star schema can be easier to
maintain data integrity since it has a
denormalized structure, reducing the
chance of inconsistencies when
updating data,
Data integrity
~ The choice between the star schema and snowflake
requirements, com|
considerations of tl
weaknesses and the decision s|
the data and business requirements.
i] Fact Constellation Schema
« Fact constellation schema is ar
‘and business intelligence to organize an
different facts an
handle more complex and interrelated business scen
bles are
a differ
multidimensional data
d_represent
Ina fact constellation schema, multiple fact tal
of dimension tables. Each fact table represents
inventory, or customer
ive information re!
business, such as sales,
provide the context and descript
geography, product and customer.
* Fig. 4.7.1 Shows fact constellation
Dimension Table
Schema.
Dimension Table|
Dimension Table|
act Table
Dimension Table
Fig. 4 re of fact constell
plexity of relationships, query perfor
he data warehousing project. Both schemas have the
hould be based on careful analysis and w
.d dimensions. It is an extension of the sta
jarios.
Dimonsional Modoling and Scheme
Snowflake schema excels at
handling complex relationships:
By normalizing dimension tables-
it can handle many - to - many OF
self - referencing relationships
more efficiently.
The snowflake schema requires
more attention to maintain data
integrity due to the presence of
normalized tables. Changes oF
updates to attributes must be
consistently propagated across
multiple tables.
schema depends on the specific
mance needs and data integrity
ir strengths and
nderstanding of
model used in data warehousing
complex relationships between
schema and is designed to
connected to a central collection
ent aspect or measurement of the
interactions. The dimension tables
Jated to the facts, such as time,
Dimension Table]
Dimension Table|
yn schema
7.1 General struct!
ON ea| Data Warehousing 4-24 Dimensional Modeling and Schems
Example :
Dimension Table
Student
Stud_roll
Eact Table Name Fact Table
= Placement CGPA Workshop, Dimension Table
ineneie a ple aan stud_ro | [Training institute
compan
pany: Company_id Institute_id Institute_ia
Company_id TPO_id TPO_id acs
eewrs No, of students Dimension Table No. of stadents Full_course_fee
Offer_Package [Cs a selected
No. of students No. of students
attented
placed TPO_id
Name
Age
Fig. 4.7.2 Example of fact constellation schema
4. Placement is a fact table having attributes like
Stud_roll
| Company_id,
TPO_id
with the facts
Number of students eligible,
Number of students placed.
2. Workshop is a fact table having attributes like
Stud_roll
Institute_id
TPO_id
with the facts
‘Number of students selected
‘Number of students attended the workshop.
3. Company is a dimension table having attributes
Company_id
Name
Offer_package
TECHNICAL PUBLICATIONS® - an up-thrust for knowledgeeonousing 225) Dimensional Modeling and Schema
es
L student
uudent_roll
a dimension table having attributes
Name
caPA
is a dimension table having attributes
Vos 7P0 i
7Po_id
Name
Ase
g. training institute is a dimension table having attributes
Institute_id
Name
Full_course_fee.
So, there are two fact tables namely, placement and workshop which are part of two
different star schemas having dimension tables - Company, Student and TPO in Star schema
with fact table Placement and dimension tables - Training Institute, Student and TPO in Star
schema with fact table Workshop. Both the star schema have two dimension tables common
and hence, forming a fact constellation or galaxy schema.
Features of Fact Constellation Schema
The key features of a fact constellation schema are :
1, Multiple fact tables : Unlike the star schema, which typically has one central fact
table, the fact constellation schema includes multiple fact tables. Each fact table
represents a different business process or measurement.
2. Shared dimension tables : The dimension tables are shared among the different fact
tables. This means that multiple fact tables can have relationships with the same
dimension tables, allowing for complex analysis and drill - down capabilities.
3. Complex relationships : The fact constellation schema allows for complex
relationships between the fact tables. For example, one fact table may have a direct
telationship with another fact table, enabling analysts to analyze the impact of one
business process on another.
4 Increased flexibility : The fact constellation schema provides greater flexibility in
analyzing data across multiple dimensions. Analysts can perform complex queries
and aggregations involving multiple fact tables and dimensions, enabling them to
Sain deeper insights into business performance and relationships.
L tan ip tiiet for kaotwiedgéData Warehousing 4-26 Dimensional Modeling and Schema
© Overall, the fact constellation schema is
interrelated business processe:
seful when dealing with complex ang
where multiple measurements need to be analyzed ang
understood together, It provides a flexible and powerful data model for decision suppor
and business intelligence applic:
lions.
Advantages of Fact Constellation Schema
© The fact constellation schema offers several advantages over other data modeling
approaches in data warehousing and business intelligence. Here are some of the key
advantages =
Flexibility and scalability : The fact constellation schema provides a high degree of
flexibility and scalability. It can handle complex and evolving business requirements
by accommodating multiple fact tables and their relationships. As new business
processes or measurements emerge, additional fact tables can be added without
impacting existing dimensions or facts.
Enhanced analysis capabilities : The fact constellation schema enables advanced
analysis capabilities by allowing analysts to explore relationships between different
fact tables. Analysts can perform multidimensional analysis, drill across different
dimensions and investigate the impact of one business process on another. This
flexibility enhances the depth and breadth of data analysis.
Comprehensive data integration : With multiple fact tables and shared dimension
tables, the fact constellation schema promotes comprehensive data integration. It
allows for the integration of data from diverse sources and systems, enabling a
holistic view of the business. This integration supports cross - functional analysis and
helps identify correlations and patterns that might not be evident in isolated data sets.
Improved performance : By distributing the data across multiple fact tables, the
fact constellation schema can improve query performance. Queries involving specific
fact tables can be optimized, resulting in faster response times. Additionally, the
schema's structure facilitates indexing and partitioning strategies, further enhancing
performance.
Granular level of detail : The fact constellation schema allows for a granular level
of detail in data analysis. Different fact tables can capture specific measurements of
events, enabling analysts to analyze data at a highly detailed level. This granularity
supports accurate and precise reporting, forecasting and decision - making processes.
Separation of concerns : The fact constellation schema separates the concerns of
different business processes into separate fact tables while sharing common
TECHNICAL PUBLICATIONS® - an up-thrust for knowledge# using
o wore ;
dimensions. This separati Dimensional Modeling and Schema
on helps main
‘ain data integrity, as changes or updates to
It also cables better data governance and
model,
one fact table do not impact other:
i d s.
simplifies the maintenance of the dat
an
werall, the fact constellation schema em, mplex
: r ;
Oo Cmpowers organizations to handle complex and
. rerconnected business ‘
ied sale en Clfectively. It offers flexibility, scalability and
adv F abilities, enabling de a : r
decision - making. per insights and —_ informe
Disadvantages of Fact Constellation Schema
while the fact constellation sch A
. ee ema provides several advantages, it is important to
consider its Pp isadvantages as well. Here are some of the disadvantages
associated with the fact constellation schema : -
1 Increased complexity The fact constellation schema introduces increased
complexity compared to simpler data models such as the star schema. Managi
multiple fact tables, their relationships and shared dimensions can be more
challenging and require a higher level of expertise. The complexity can make the
schema harder to understand, maintain and modify.
2. Higher storage requirements : Due to the presence of multiple fact tables and
shared dimensions, the fact constellation schema may require more storage space
compared to simpler schemas. Each additional fact table adds to the overall stora:
requirements and shared dimensions need to be duplicated across different fact
nd impact system performance.
tables. This can increase storage costs a
Integrating data from various sources and sy
be more complex in a fact constellation schema. As multiple fact tables are involved,
each with its own set of dimensions, aligning data across different sources can be
challenging. Ensuring consistent and accurate integration of data from diverse
systems requires careful planning and robust ETL (Extract. ‘Transform, Load)
Processes.
4. Query performance conside
improved query performance, ita i
Analyzing data across multiple fact «bles 2°? im
i impact quer
Operations and aggregations, which can me ia
p :
indexing, partitioning and qu
performance issues.
1 The fact constell
5. Maii Kee}
intenance and upkecP keep compare’
in terms of maintenance and UP
a
ee
tems can
3. Data integration challenges :
vations + While the fact constellation schema ean oftter
Iso introduces potentia
ensions may involve complex join
performance challenges.
ry response times. Careful
jmizatit required to mitigate
ery optimization ! ig
‘on schema can be more demanding
lati
4 to simpler schemas. As the schemaDate Warehousing 4-28 Dimensional Modeling and Schema
evolves with the addition of new fact tables or dimensions, ongoing management jg
required to ensure data integrity and cons
istency. Changes to dimensions or
relationships may necessitate updates to existing reports, queries and analytical tools,
which can be time - consuming.
6. Learning curve for users
The complexity of the fact constellation schema may
increase the learning curve for end users. Understanding the relationships between
different fact tables and dimensions, as well as the appropriate usage of the schema
for analysis. requires specialized knowledge and training. This can potentially limit
the accessibility and usability of the data model for non - technical users.
© It is crucial to weigh these disadvantages against the specific requirements and
complexities of the business scenario when considering the adoption of a fact
constellation schema. Depending on the nature of the data and analytical needs, the fact
constellation schema may provide substantial benefits despite its drawbacks.
EY schema Definition
e Ina data warehouse, schema definition refers to the process of designing the structure
and organization of the data within the warehouse. It involves determining how the data
will be organized, the relationships between different data elements and the overall
schema or structure that will be used for data storage and retrieval.
© There are two common types of schema designs used in data warehousing :
1. Star schema : The star schema is a widely adopted schema design in data
warehousing. It consists of a central fact table surrounded by dimension tables. The
fact table contains the numerical or measurable data (e.g., sales, revenue), while the
dimension tables provide descriptive attributes related to the facts (e.g., product,
customer, time). The fact table and dimension tables are connected through
primary - key and foreign - key relationships. The star schema simplifies queries and
allows for fast aggregation of data, making it suitable for reporting and analysis.
2. Snowflake schema : The snowflake schema is an extension of the star schema,
where dimension tables are further normalized into multiple levels of tables. The
normalization reduces data redundancy and can improve data integrity. In a
snowflake schema, the dimension tables are connected through primary - key and
foreign - key relationships, similar to the star schema, However, the snowflake
schema's additional normalization can lead to more complex queries and slightly
slower performance compared to the star schema. .
TECHNICAL PUBLICATIONS® - an up-thrust for knowledgea definition involves the following steps :
ae sui :
sa gentity busines requirements : Understand the busi
pee cs
analytical goals of the data warehouse. Identify the key entities or ¢0'
represente
mensions : Identify the dimensions (e.g., product, customers
= rmi eit
the data. Determine the attributes or characteristics
attributes like pro
.d in the schema,
time) that provide
that define
duct
pefine di
descriptive conte)
dimension. For example, the product dimension may include
me, category and price.
I be stored in the fact
ets : Determine the numerical or measurable data that will
.) or metrics that are of
cach
1D. nal
pefine fai
tables. These are typically the Key Performance Indicators (KPIs;
interest for analysis, such as sales, revenue, or inventory levels.
pstablish
tables. Thi
correspondin,
Determine granul:
will be stored in the fact tal
depending on the business requirements.
‘ew the schema design and ensure it ali
cents or refinements as necessary 0 OP
relationships : Define the relationships between the fact table and dimension
5 is done by identifying the primary keys in dimension tables and
g foreign keys in the fact table.
larity : Determine the level of detail or granularity at which the data
ble. This could be at a daily, weekly, monthly, or any other
ens with the business
appropriate level,
timize the schema for
Validate and refine : Revi
requirements. Make adjustm
performance and data analysis.
re and usability of
Je in shaping the overall structu
query
d analysis, allowing
Schema definition plays a crucial rol
icient data storage, retrieval an
the data warehouse. It enables effici
formed decisions based on the de
users to derive insights and make ini jata contained within
the warehouse.
overall design and structure of the
Process Architecture
cture refers t0 the
a data’ warehouse. It encompasses
* Data warehouse process archite’
processes involved in building and maintaining
various stages and components that work together t
data for analytical purposes-
found ina data warel
.
Here are the key components typically
The first step js to extract data from
flat files, APIs, oF other systems. ‘This involves identifying
1g it into the data warehouse env!
et for knowledge
o extract, transform, load and present
house process architecture :
multiple sources such as
1,
Data extraction :
transactional databases,
the relevant data and pullin
ironment.Data Warehousing 4-30 Dimensional Modeling and Schema
: Once the data is extracted, it often needs to be transformed to
2. Data transformation
and compatibility. Data transformation may involve cleaning,
ensure consistency
or restructuring the data to meet the requirements
filtering, aggregating, integrating,
of the data warehouse schema.
3. Data loading : After transformation, the data is loaded int
There are different loading strategies, including full loads (where all data is loaded)
he changes since the last load are loaded). Loading
depending on the
10 the data warchouse.
or incremental loads (where only th
can be done through batch processing or real - time streaming,
requirements.
: The loaded data is stored in the data warehouse in a structured
4, Data storage
a Relational Database
manner to support efficient querying and analysis. Typically,
Management System (RDBMS) is used for storage, such as Oracle, SQL Server, or
PostereSQL. Alternatively, some data warehouses employ NoSQL databases or
distributed file systems for storage.
5, Data modeling : Data modeling involves designing the structure of the data
warehouse, including the definition of tables, relationships and hierarchies. The most
common modeling techniques are dimensional modeling (¢.g., star schema,
snowflake schema) and entity - relationship modeling.
6. Metadata management : Metadata, which provides information about the data, is
crucial in a data warehouse. It includes data definitions, data lineage, business rules,
data quality metrics and other details. Effective metadata management ensures data
accuracy, improves data understanding and supports data governance.
7. Query and analysis : Once the data is loaded and organized, users can query and
analyze the data warehouse. Business Intelligence (BI) tools and analytics platforms
are commonly used to create reports, dashboards, visualizations and perform
advanced analytics on the data.
8. Data governance and security : Data governance ensures the quality, availability
ndards
and integrity of the data in the warehouse. It involves establishing policies, sta
protect
and procedures for data management. Security measures are implemented to
the data from unauthorized access, ensuring compliance with privacy regulations and
maintaining data confidentiality.
: Data warehouses are not static; they require
9, Data refresh and maintenance
he data
regular maintenance and periodic data refreshes. This involves updating
warehouse with new or modified data, performing data quality checks and optimizing
performance by indexing, partitioning, or aggregating data.
TECHNICAL PUBLICATIONS® - an up-thrust for knowledgevy
pata Warehousing 4-31 Dimensional Modeling and Schema
10. Monitoring and performance tuning»: Continuous monitoring of the data
warchouse's performance is crucial to identify bottlenecks, optimize query execution
and ensure responsiveness. Performance tuning techniques, such as indexing,
caching, or parallel processing, may be applied to enhance the system's efficiency.
¢ The exact architecture of a data warehouse process may vary depending on the specific
requirements, technologies used and organizational preferences. However, these
components provide a general framework for understanding the key stages and activities
involved in building and maintaining a data warehouse.
EES Types of Process Architecture
Following are the two types process architectures :
1, Centralized process architecture
2. Distributed process architecture
Centralized
process
architecture
Distributed
process
architecture
Fig. 4.9.1 Types of process architecture
BI centralized Process Architecture
© Centralized process architecture refers to a | Client
design approach where all processes and
Client ] Client
Operations are consolidated and controlled
from a central location or system. In this
architecture, a single central entity or system
handles the core functionalities and
Operations, while other entities or components
interact with it to perform their tasks.
Central data
warehouse
Here are some key characteristics and benefits of
4 centralized process architecture :
* Central control : The central entity or system
has control and oversight over all processes Fig. 4.9.2 Centralized process
architecture
TECHNICAL PUBLICATIONS® - an up-thrust for knowedgeData Warehousing 4-32 Dimensional Modeling and Schema
and operations. It acts as a central hub for coordinating acti
resources.
© Standardization : A centralized architecture promotes standardization of processes,
procedures and data formats, This ensures consistency and uniformity across the entire
system, making it easier to manage and maintain.
ies and managing
© Data integration : Centralization facilitates data integration by consolidating data from
various sources into a single repository or system. This enables better data management,
analysis and reporting capabilities.
© Resource optimization : By consolidating processes and resources, a centralized
architecture allows for better resource allocation and optimization. It helps eliminate
duplication and redundancy, leading to improved efficiency and cost savings.
© Decision-making and governance : Centralization provides a centralized decision-
making structure, allowing for better control, governance and enforcement of policies
and guidelines across the system.
Benefits of Centralized Process Architecture
Following are the benefits of centralized Process Architecture :
1. Improved efficiency : Centralizing processes eliminates the need for multiple
systems and redundant operations. It streamlines workflows, reduces manual
intervention and enables more efficient resource utilization.
2. Enhanced data quality : Centralized architectures enable better data governance
and quality control measures. With standardized processes and data integration,
organizations can ensure data consistency, accuracy and integrity.
3. Increased collaboration : Centralized architectures foster collaboration and
information sharing across different departments or entities. It facilitates seamless
communication, data access and knowledge transfer.
Scalability and flexibility : Centralization simplifies scalability as resources can be
easily managed and allocated from a central point. It also offers flexibility in
adapting to changing business needs and accommodating growth.
5. Simplified maintenance : Having a centralized architecture makes maintenance and
updates more manageable. System upgrades, bug fixes and enhancements can be
applied centrally, reducing complexity and downtime.
6. Enhanced security and compliance : Centralized architectures provide better
control and security measures. Data access, user permissions, and security protocols
can be enforced centrally, reducing the risk of data breaches and ensuring
compliance with regulatory requirements.
TECHNICAL PUBLICATIONS® - an up-thrust for knowledgeyr
pata Warehousing 4-33 Dimensional Modeling and Schema
¢ However, it's important to note that centralized architectures may have some limitations.
They can introduce single points of failure, and if the central entity or system
experiences issues, it may impact the entire system. Additionally, implementing and
managing a centralized architecture requires careful planning, coordination and
investment in infrastructure and resources.
Overall, a centralized process architecture offers several benefits in terms of efficiency,
data management, collaboration and scalability. It can be particularly advantageous for
organizations that require standardized Processes, centralized control, and streamlined
operations.
[22 Distributed Process Architecture
Distributed process architecture, also known as decentralized process architecture, is an
approach where processes and operations are spread across multiple entities or systems.
In this architecture, different components or entities perform specific tasks or functions
independently, yet coordinate and communicate with each other to achieve a common
goal.
There are some architectures of the distributed process :
4. Client-server
In this architecture, the user will accomplish all the information collecting and
arrangement, while the server does the processing and management of data.
2, Three-tier architecture
In this architecture, the client machines is connected to a server machine, thus mandating
finite states and introducing latencies and overhead in terms of record to be carried between
clients and servers.
3. N-tier architecture
The n-tier or multi - tier architecture is where clients, middleware, applications and servers
are isolated into tiers.
4. Cluster architecture
In this architecture, machines that are connected in network architecture (software or
hardware) to approximately work together to process information or compute requirements in
Parallel. Each device in a cluster is associated with a function that is processed locally and the
"sult sets are collected to a master server that returns it to the user.
TECHNICAL PUBLICATIONS® - an up-thrust for knowledgeModeling and Sche
Data Warehousing 4-34 Dimensional 9 ema
5. Peer-to-Peer Architecture
This is a type of architecture where there are no dedicated servers and clients. Instead, alj
the processing responsibilities are allocated among all machines, called peers. Each machine
can perform the function of a client or server or just process data.
Here are some key characteristics and benefits of a distributed process architecture :
1. Decentralized control : In a distributed architecture, control and decision - making
are distributed among multiple entities or systems. Each component has autonomy in
managing its processes and operations.
2. Task segmentation : Processes and tasks are segmented and allocated to different
entities based on their capabilities, resources, or expertise. Each entity focuses on its
assigned tasks, leading to specialization and efficiency.
3. Communication and coordination : Entities in a distributed architecture
communicate and coordinate with each other through standardized protocols,
interfaces, or messaging systems. This ensures synchronization and collaboration
between different components.
Fault tolerance and resilience ; Distributed architectures are designed to be resilient
in the face of failures or disruptions. If one component fails, other components can
continue to operate independently, reducing the impact on the overall system.
7
Scalability : Distributed architectures offer scalability by allowing additional entities
or components to be added as the system grows. It enables the system to handle
increased workloads and accommodate expanding requirements.
EH) Benefits of Distributed Process Architecture
1. Improved performance : By distributing tasks across multiple entities, a distributed
architecture can enhance performance and throughput. Each entity can focus on its
specific tasks, leading to parallel processing and reduced bottlenecks,
2. Increased flexibility : Distributed architectures offer flexibility in terms of adding or
removing components as needed. It allows for easy integration of new technologies,
systems, or services, enabling adaptability to changing business needs.
3. Enhanced fault tolerance : The decentralized nature of a distributed architecture
improves fault tolerance. If one component fails or experiences issues, other
components can continue functioning, minimizing the impact on the overall system.
4. Geographic distribution : Distributed architectures can be advantageous when
dealing with geographically dispersed entities or users. It enables efficient processing
and data access, even across different locations or regions.
TECHNICAL PUBLICATIONS® - an up-thrust for knowledgeworehousing a
e Dimensional Modeling and Schema
5, Load balancing : Distribute architectures facilitate load balancing by distributing
workloads across multiple entities, This helps optimize resource utilization and
ensures smooth operation during peak demand ae
6, Reduced dependencies : Components in a distributed architecture can operate
independently, reducing dependencies on a single system or entity. This enhances
resilience and reduces the risk of a single point of failure.
7, Data locality : In some distributed architectures, data can be distributed closer to the
entities or systems that need it, reducing data transfer latency and improving data
access performance,
Distributed process architectures offer several advantages in terms of performance,
flexibility, fault tolerance and scalability. However, they also introduce complexities in
terms of communication, coordination and management of distributed entities. Proper
design, synchronization mechanisms and robust communication protocols are crucial to
ensure the smooth functioning and effective collaboration of the distributed components.
[Sd Types of Data Base Parallelism
« Database parallelism refers to the ability to perform multiple tasks simultaneously in a
database system to improve performance and efficiency. There are different types of
parallelism techniques employed in database systems to achieve parallel processing.
Here are some commonly used types of database parallelism :
1. Task parallelism : Task parallelism involves dividing a database task-or query into
multiple subtasks and executing them concurrently on different processing units.
Each subtask operates on a separate portion of the data, allowing for parallel
execution. This parallelism technique is commonly used in database systems to
improve query performance.
Data parallelism : Data parallelism involves splitting a large dataset into smaller
partitions and processing them concurrently across multiple processing units. Each
partition is processed independently, allowing for parallel execution. Data
parallelism is well-suited for operations that can be applied separately to different
portions of the data, such as filtering, sorting and aggregating.
3. Pipeline parallelism : Pipeline parallelism divides @ database task into multiple
stages and each stage is executed concurrently. Each stage processes a subset of the
data and passes the intermediate results to the next stage. This parallelism technique
is useful for tasks that involve a sequence of operations, such as data transformation
or data integration.
nee
“ATIONS®.
TECHNICAL PUBLICData Warehousing 4-36 Dimensional Modeling and Schema
4. Parallel query execution : Parallel query execution involves breaking down a
database query into smaller units and executing them simultaneously across multiple
processing units. The query optimizer determines the most efficient way to divide the
query into subtasks and distribute them for parallel processing. This technique is
commonly used in parallel database systems to improve query response time.
5. Parallel data loading : Parallel data loading is used when loading large amounts of
data into a database. It involves dividing the data into smaller chunks and loading
them simultaneously into different database partitions or tables. Parallel data loading
can significantly speed up the data loading process and reduce the overall load time.
6. Parallel indexing : Parallel indexing is a technique used to create or rebuild database
indexes in parallel. The index creation process is divided into multiple tasks and each
task is executed concurrently on different processing units. This approach accelerates
the index creation process and reduces the time required for indexing large volumes
of data.
7. Parallel backup and recovery : Parallel backup and recovery techniques involve
performing backup and recovery operations simultaneously across multiple database
components or partitions. This approach reduces the backup or recovery time,
especially for large databases, by distributing the workload across multiple resources.
© These are some of the commonly used types of database parallelism. The specific
parallelism techniques employed depend on the database system, hardware
infrastructure and the nature of the tasks or queries being executed. Effective utilization
of parallelism can significantly improve the performance and scalability of database
systems.
Datawarehouse Tools
There are several data warehouse tools available in the market that facilitate the
development, management and analysis of data warehouses. These tools offer various
functionali
ies, such as data Extraction, Transformation, Loading (ETL), data modeling,
reporting and analytics. Here are some popular data warehouse tools :
1. Oracle Data Integrator (ODI) : ODI is an ETL tool provided by Oracle that enables
data extraction, transformation and loading into the data warehouse. It supports both
batch and real - time data integration and provides a graphical interface for designing
and managing data integration workflows.
2. Informatica PowerCenter : Informatica PowerCenter is a widely used ETL tool
that offers comprehensive data integration capabilities. It supports data extraction
TECHNICAL PUBLICATIONS® - an up-thrust for knowledgebh TECHNICAL PUBLICATIONS® -
1g and Schema
7 Dimensional Model
to data warehouse:
ty
from various sources, data transformation and data loading
powerCenter provides a visual development environment and extensive connec!
options:
Microsoft SQL Server Integration Services (SSIS) : SSIS is an ETL tool included
in Microsoft SQL Server. It allows users to create data integration workflows, extract
data from various sourees, transform data and load it into the SQL Server database or
data warehouse. SSIS provides a visual development environment within SQL Server
Management Studio.
IBM InfoSphere DataStage : InfoSphere DataStage is an ETL tool provided by
IBM that facilitates data integration and transformation. It offers a graphical interface
for designing and managing ETL processes. DataStage supports parallel processing
and provides a scalable platform for data integration in data warehouse
environments.
Talend Data Integration : Talend Data Integration is an open-source ETL tool that
enables data integration and transformation across various sources and targets.
including data warehouses. It offers a comprehensive set of data integration and
quality features and supports both batch and real - time data integration.
. MicroStrategy : MicroStrategy is a business intelligence platform that provides
robust reporting and analytics capabilities for data warehouses. It offers a suite of
tools for data visualization, ad-hoc querying, dashboards and advanced analytics.
MicroStrategy supports integration with various data warehouse platforms.
. Tableau : Tableau is a popular data visualization and analytics tool that connects to
data warehouses and allows users to create interactive visualizations and reports. It
supports data exploration, ad-hoc analysis and collaborative analytics. Tableau
integrates with multiple data warehouse systems and provides easy-to-use drag-and-
drop functionality.
QlikView / Qlik Sense : QlikView and Qlik Sense are data discovery and
Visualization tools that enable users to create interactive dashboards and reports from
data warehouses, They offer associative data modeling, powerful visualizations and
self-service analytics capabilities, Qlik tools support integration with various data
Warehouse platforms.
. Amazon Redshift : Amazon Redshift is a fully managed data warehousing service
Provided by Amazon Web Services (AWS). It offers scalable and high - performance
ties in the cloud. Redshift provides data loading, querying
ration options with other AWS services,
data warehousing capab
nd analytics functionalities, along with integr
‘an upethnust for knowiedgeData Warehousing 4-38 Dimensional Modeling and Schema
10. Google BigQuery : Google BigQuery is a serverless data warehousing and
analytics platform offered by Google Cloud. It allows users to analyze large
datasets using SQL queries and supports fast and scalable data processing,
BigQuery integrates with various data sources and offers built-in machine learning
capabilities.
© These are just a few examples of data warehouse tools available in the market. The
choice of tool depends on factors such as specific requirements, budget, scalability
needs, integration capabilities and existing technology stack within an organization.
Review Questions
1. Explain the key principles of dimensional modeling. (Refer section 4.1.1) FE
2. Discuss the objectives of dimensional modeling. (Refer section 4.1.2) F=za
3. Explain elements of dimensional data model. (Refer section 4.1.3) Eezg
4 =
Fag
|. Explain in detail steps of dimensional modeling. (Refer section 4.1.4)
w
Explain multi-dimensional data modeling. (Refer section 4.2)
6. Explain key components of multi - dimensional data modeling.
(Refer section 4.2.1) hee
7. Explain data cube. (Refer section 4.3)
8. Explain star schema. (Refer section 4.4) fea
9. Explain snowflake schema. (Refer section 4.5) ReoG
YZ] Two Marks Questions with Answers
Q.1 What is dimensional modeling ?
Ans. : Dimensional modeling is a design technique used in data warehousing to organize and
structure data for efficient analysis and reporting. It provides a way to represent the data in a
dimensional format that is optimized for querying and aggregating information.
Q.2 Whatare the elements of dimensional data model ?
Ans. :
1. Fact
2. Dimension
3. Attributes
4. Fact table
5.
. Dimension table
TECHNICAL PUBLICATIONS® - an up-thrust for knowledgey
we wereho!
what is data cube ?
using ae
39 Dimensional Modeling and Schema
pe! data cube is a fundamental Concept in data warehousing that represents a multi-
jimensional oy of Cae ht Provides a structured and efficient Way to organize and analyze
gata 1008 multiple dimensions. A data cube combines dimensions, hierarchies, and measures
rn enable users to slice, dice, drill down, and roll up data for analysis and reporting purposes.
4 What is star schema ?
ans. : Star schema is a widely used data modeling technique in the design of data warehouses.
wis named so because the diagrammatic representation of this schema resembles a star shape,
with a fact table at the center and multiple dimension tables radiating outwards.
5 Whatis snowflake schema ?
Ans. ¢
anextension of the star schema. It is designed to address some of the limitations of the star
he snowflake schema is a data modeling technique used in data warehousing, which is
schema, particularly related to data redundancy and the ability to handle complex relationships.
Q6 What is fact constellation schema ?
Ans. : Fact constellation schema is a multidimensional data model used in data warehousing and
business intelligence to organize and represent complex relationships between different facts
and dimensions. It is an extension of the star schema and is designed to handle more complex
and interrelated business scenarios.
gaa