0% found this document useful (0 votes)

58 views40 pages

Business Intelligence & Data Analysis

The document provides an introduction to business intelligence and data analysis, defining key concepts like business intelligence, data warehousing, data marts, and ETL. It discusses how data is extracted from various sources, transformed and loaded into a data warehouse using ETL tools to support analysis, reporting, and strategic decision making. The document also compares operational databases to data warehouses and data marts in terms of goals, structure, size, technologies used, and performance optimization.

Uploaded by

Amine SAIED

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views40 pages

Business Intelligence & Data Analysis

Uploaded by

Amine SAIED

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

Introduction to business

intelligence and data

analysis

IT300

TBS 2020-2021

1
What is Business intelligence ?
▪ Zeng et al. (2006) define Business Intelligence (BI)
as the process of collection, treatment and
diffusion of information that has an objective, the
reduction of uncertainty in the making of all
strategic decisions.

▪ Stackowiak et al. (2007) define BI as the process of

taking large amounts of data, analyzing that data,
and presenting a high-level set of reports that
condense the essence of that data into the basis
of business actions, enabling management to
make fundamental daily business decisions. 2
What is Business intelligence ?
▪ To learn from the past and forecast the future,
many companies are adopting BI tools and
systems.

▪ In such rough and competitive environment,

strategic decision making is extremely complex
and often requires the consideration of several
objectives while satisfying hard constraints.
As a result, the main concern of the managers is to
deliver sophisticated solutions strategies in an
attempt to reach the predefined objectives and
fulfill all system constraints
3
Business intelligence

▪ The institutions and firms operate in an open

system (that is impacted by external variables)

▪ There is a big need to make business analysis and

strategic decision making

▪ Is to find a match between the opportunities in

the environment and the strengths and
weaknesses of the firm
4
Decision-making support systems

▪ Decision making support systems are

information systems (SI) which are designed
to interactively support all phases of an end-
user’s decision making process in
organizations.
▪ Many organizations are turning to decision
support systems to improve decision
making.
▪ Turning a challenge to a learning curve.
5
Decision Making Process
1. Define the problem: the manager must identify the
problem.
2. Establish decision criteria and goals : obtaining
necessary information and data.
3. Analyze Data : Formulate a model between goals and
the important variables (physical models as a scale
model of a building):
The approaches to decision making include:
▪ the use of models,
▪ the use quantitative methods,
▪ the analysis of trade-offs,
▪ establishing priorities,
▪ the systems approach 6
Decision Making Process
Models are often a key tool used by all decision makers.
A model is an abstraction of reality, a simplified
representation of something. For example, a child’s toy
car is a model of a real automobile.
4. Identify and evaluate various alternatives or
solutions.
5. Select the best alternative (decision theory : an
analytical approach to select the best alternative,
it is closely related to the field of game theory).
6. Implement your decision.

7
Decision Making process
Define the
1 Problem

Obtaining
2 Data
Decision making

Analyze Data
3
Create
4 alternatives

Select the
best
5 alternative
Implement
the decision
6 alternative 8
Data analysis
Information processing is the analysis of a large
quantity of data or other forms of information to
support decision making and to discover
knowledge in data.
This is indeed the biggest challenge posed by big
and often unstructured data: how to analyze it in a
useful way.
Objectives of Data analysis:
• Increase the effectiveness of the manager’s
decision making process,
• Support the manager in the decision making
process but not replace it,
• And improve the directions of the decision
9
making.
Evolution of Database Technology
1960s :
• Data collection, database creation, IMS and
network DBMS
1970s :
• Relational data model, relational DBMS
implementation
1980s:
• RDBMS, advanced data models and
application-oriented DBMS
1990s—2000s:
• Data mining, data warehousing (DW),
multimedia databases and web databases 10
Origins of Data Warehouses
▪ Database developers understood that their
software was required for both transactional and
analytical processing.
▪ However, operational and analytical data are
separate with different requirements and different
user communities.
▪ Once these differences were understood, new data
bases were created specifically for analysis use.

11
Origins of Data Warehouses
▪ Operational processing (transactional processing)
captures, stores and manipulates data to support
daily operations.
▪ Information processing is the analysis of data or
other forms of information to support decision
making.
▪ DW can consolidate and integrate information
from many internal and external sources and
arrange it in a meaningful format for making
business decisions.

12
What is a Data Warehouse ?
▪ According to Inmon’s (father of data warehousing) :
It is a collection of integrated, subject-oriented,
databases designed to support the DSS function,
where each unit of data is non-volatile and relevant
to some moment in time.
▪ Or a DW is : A subject-oriented, integrated, time-
variant, non-updatable collection of data used in
support of management decision-making processes:
▪ Subject-oriented: e.g. customers, patients, products
▪ Integrated: Consistent naming conventions, formats,
encoding structures; from multiple data sources
▪ Time-variant: Can study trends and changes
▪ Non-updatable: Read-only, periodically refreshed 13
Need for Data Warehousing

▪ A DW allows its users to extract required data, for

business analysis and strategic decision making
▪ DW is a process, not a product
▪ DW is an architecture : is the way of organizing data

14
Database, Data warehouse and
Data set
▪ DB : contains tables, rows refer to records and
columns to fields. Most DBs are relational DBs
(relating tables to reduce redundancy & improve
DB performance via the normalization process)
▪ DW : is a type of DB that has been denormalized
& archived.
▪ Denormalization is the process of combining
some tables into a single table. This may
introduce duplicate data, but will reduce the
number of joins a query has to process.
▪ Data set : is a sub-set of a DW or a DB. It is usually
denormalized so that only one table is used. 15
How Do Data Warehouses Differ
From Operational Systems?
▪ Goals
▪ Structure
▪ Size
▪ Performance optimization
▪ Technologies used

16
Need to separate operational and
information systems
Three primary factors:
▪ A DW centralizes data that are scattered
throughout disparate operational systems and
makes them available for DM.
▪ A well-designed data warehouse adds value to
data by improving their quality and consistency.
▪ A separate DW eliminates much of the contention
for resources that results when information
applications are mixed with operational
processing.

17
Comparison of Database Types

Data warehouse Operational system

Subject oriented Transaction oriented
Large (hundreds of GB up to Small (MB up to several GB)
several TB)
Historical data Current data
Denormalized table structure Normalized table structure
(few tables, many columns per (many tables, few columns per
table) table)
Batch updates Continuous updates
Usually very complex queries Simple to complex queries

18
From the Data Warehouse to Data
Marts
▪ A data mart contains only those data that are
specific to a particular group. For example, the
marketing data mart may contain only data
related to items, customers, and sales.
▪ Data marts are confined to subjects.
▪ Data marts are small in size.
▪ Data marts are customized by department.

19
How Data Warehousing works

20
How Data Warehousing works
Extraction Transformation Loading–ETL tools
Extract Transform Load
& Clean

Sources DSA DW

DSA: A Data Staging Area is a temporary location where data from

21
source systems are copied.
Example of analytical questions: Historical
analysis of the number of passengers

•How many passengers are frequent flyers?

• How much time do passengers spend on
average in the different zones?
•How much time do passengers spend on average
before entering different zones? How is the
distribution of time spent per passenger in a
given zone?
•Which day of week are the zones used the most?
•When is there a risk of bottlenecks in specific
zones?

22
ER Model vs. Multidimensional
Model
▪ Why don’t we use the entity-relationship (ER)
model in data warehousing?
▪ ER model: a data model for general purposes
– All types of data are equal, difficult to identify the
data that is:
• important for business analysis
• No difference between: What is important ? What
just describes the important?
• Normalized databases (many details that can affect
privacy and security)
– Hard to overview a large ER diagram (e.g., over 100
entities/relations for an enterprise)
23
ER Model vs. Multidimensional
Model
▪ Traditional DBs generally deal with two-dimensional
data. However, querying performance in a multi-
dimensional data storage model is more efficient.
▪ More built in “meaning”
– What is important
– What describes the important
– What we want to optimize
▪ Recognized by OLAP/BI tools : Tools that offer powerful
query facilities based on Multi-Dimensional (MD) design

24
Multidimensional Model
▪ Data is divided into: Facts and Dimensions
▪ A fact is the important entity: exp a sale
▪ Facts have measures that can be aggregated: sales
price
▪ Dimensions describe facts
▪ Facts “live” in a MD cube
▪ Goal for dimensional modeling:
– Surround facts with as much context (dimensions) as
possible
– Hint: redundancy may be ok (in well-chosen places)
– But you should not try to model all relationships in the
data (unlike E/R and OO modeling!) 25
Dimension
▪ Dimensions are the core of MD databases
▪ Dimensions are used for
▪ Selection of data
▪ Grouping of data at the right level of detail
▪ Dimensions consist of dimension values
▪ Product dimension has values ”milk”, ”cream”, …
▪ Time dimension has values ”1/1/2001”, ”2/1/2001”,…
▪ Dimension values may have an ordering
▪ Used for comparing cube data across values
▪ Especially used for Time dimension

26
Dimension
▪ Dimensions have hierarchies with levels
▪ Typically 3-5 levels (of detail)
▪ Dimension values are organized in a tree structure
▪ Product: Product->Type->Category
▪ Store: Store->Area->City->County
▪ Time: Day->Month->Quarter->Year
▪ Dimensions have a bottom level and a top level
▪ Levels may have attributes
▪ Simple, non-hierarchical information
▪ Day has Workday as attribute
▪ Dimensions should contain much information
▪ Time dimension may contain holiday, season, events,…
▪ Good dimensions have 50-100 or more attributes/levels
27
Facts
▪ Facts represent the subject of the desired analysis
• The important in the business that should be
analyzed
▪ A fact is identified via its dimension values
• A fact is a non-empty cell
▪ Generally, a fact should:
• Be attached to exactly one dimension value in
each dimension
• Only be attached to dimension values in the
bottom levels

28
Measures
▪ Measures represent the fact property that the
users want to study and optimize
▪ Example: total sales price
▪ A measure has two components
▪ Numerical value: (exp: sales price)
▪ Aggregation formula (exp: SUM): used for
aggregating/combining a number of measure values
into one

29
Multidimensional Model
Example: sales of supermarkets
• Facts and measures
– Each sales record is a fact, and its sales value is a
measure
• Dimensions
– Group correlated attributes into the same
dimension
– Each sales record is associated with its values of
Product, store, Time

30
Granularity: Dimensionality Hierarchy
▪ Granularity of facts is important
▪ Level of detail
▪ Given by combination of bottom levels
▪ A dimensional hierarchy defines mappings from a set of
lower-level concepts to higher level concepts.
Country
Year

2D data
Region Season
Quarter

City
Month Week
Area

31
ZipCode Day
Schema Design
▪ A schema is a logical description of the entire
database.
▪ Much like a database, a data warehouse also
requires to maintain a schema.
▪ A database uses relational model, while a data
warehouse uses Star, Snowflake, and Fact
Constellation schema.

32
Star schema
▪ A star schema consists of two types of tables:
• fact table
• dimension tables
▪ Each dimension in a star schema is represented
with only one-dimension table.
▪ This dimension table contains the set of
attributes.

33
Star schema: Components

time
time_key item
day item_key
day_of_the_week Sales Fact Table item_name
month brand
quarter time_key type
year supplier_type
item_key
branch_key
branch location
location_key
branch_key location_key
branch_name units_sold street
branch_type city
dollars_sold state_or_province
country
avg_sales
Measures
34
Snowflake schema
▪ Snowflake schema is an expanded version of a
star schema in which dimension tables are
normalized into several related tables.
▪ Advantages
• Small saving in storage space
• Normalized structures are easier to update and
maintain
▪ Disadvantages
• A schema that is less intuitive
• The ability to browse through the content is difficult
• A degraded query performance because of additional
joins.
35
Snowflake schema : Example

time
item
time_key
day item_key supplier
day_of_the_week Sales Fact Table item_name
supplier_key
month brand
time_key supplier_type
quarter type
year item_key supplier_key

branch_key
branch location
location_key
location_key
branch_key
units_sold street
branch_name
city_key city
branch_type
dollars_sold
city_key
avg_sales city
province_or_street
Measures country
36
Fact Constellation Schema
▪ A fact constellation has multiple fact tables. It is
also known as galaxy schema.
▪ The following diagram shows two fact tables,
namely sales and shipping.

37
Fact Constellation Schema
time
time_key item Shipping Fact Table
day item_key
day_of_the_week Sales Fact Table item_name time_key
month brand
quarter time_key type item_key
year supplier_type shipper_key
item_key
branch_key from_location

branch location_key location to_location

branch_key location_key dollars_cost
branch_name
units_sold
street
branch_type dollars_sold city units_shipped
province_or_state
avg_sales country shipper
Measures shipper_key
shipper_name
location_key
38
shipper_type
38
On-Line Analytical Processing (OLAP)
▪ Original definition : The dynamic synthesis,
analysis, and consolidation of large volumes of
multi-dimensional data, [Codd, 1993].

▪ Describes a technology that is designed to

optimize the storing and querying of large
volumes of multi-dimensional data that is
aggregated (summarized) to various levels of
detail to support the analysis of this data.

39
The Complete Decision Support
System

Information Data Warehouse OLAP Servers Clients

Sources Server (Tier 2) (Tier 3)
(Tier 1)
e.g., MOLAP
OLAP
Semi-structured
Sources
serve

extract Query/Reporting
transform Data
serve
load Warehouse
refresh e.g., ROLAP
.
Data Mining
Operational serve
DB’s

Data Marts
40

Adbms Data Warehousing and Data Mining
No ratings yet
Adbms Data Warehousing and Data Mining
169 pages
Data Warehousing AND Data Mining
No ratings yet
Data Warehousing AND Data Mining
168 pages
Data Warehousing & Mining Course
No ratings yet
Data Warehousing & Mining Course
169 pages
Data Warehousing AND Data Mining: S. Sudarshan Krithi Ramamritham
No ratings yet
Data Warehousing AND Data Mining: S. Sudarshan Krithi Ramamritham
169 pages
Data Warehouse
No ratings yet
Data Warehouse
169 pages
BI Unit 1 Data Warehouse
No ratings yet
BI Unit 1 Data Warehouse
169 pages
Business Intelligence/ Data Warehousing: Lakshmi Prashad PMG
100% (1)
Business Intelligence/ Data Warehousing: Lakshmi Prashad PMG
101 pages
CH3 Data Warehousing
No ratings yet
CH3 Data Warehousing
51 pages
Data Warehousing AND Data Mining
No ratings yet
Data Warehousing AND Data Mining
169 pages
Lecture DW 021
No ratings yet
Lecture DW 021
195 pages
Data Mining and Data Warehouse - Mukesh Prasad Chaudhary
No ratings yet
Data Mining and Data Warehouse - Mukesh Prasad Chaudhary
651 pages
CH 1
No ratings yet
CH 1
53 pages
Lecture 7-Data Warehousing-Data Mining
No ratings yet
Lecture 7-Data Warehousing-Data Mining
68 pages
Intoduction DW
No ratings yet
Intoduction DW
112 pages
Data Warehousing & Mining Course
No ratings yet
Data Warehousing & Mining Course
169 pages
Data Warehouse & DSS Overview
No ratings yet
Data Warehouse & DSS Overview
55 pages
Data Warehousing
No ratings yet
Data Warehousing
162 pages
Data Warehouse
No ratings yet
Data Warehouse
167 pages
Data Ware House
No ratings yet
Data Ware House
25 pages
Business Intelligence and Data Warehousing
No ratings yet
Business Intelligence and Data Warehousing
117 pages
Business Intelligence & Data Warehousing
No ratings yet
Business Intelligence & Data Warehousing
22 pages
Lecture 1
No ratings yet
Lecture 1
43 pages
Krithi Talk Impact
100% (1)
Krithi Talk Impact
169 pages
Tue, Jan 20, 2009 - 1800: 2100 FAST - NU, Karachi
No ratings yet
Tue, Jan 20, 2009 - 1800: 2100 FAST - NU, Karachi
21 pages
Krithi Talk Impact
No ratings yet
Krithi Talk Impact
169 pages
What Is A Data Warehouse?: A Single, Complete and Consistent Store of Data Obtained Ina What They Can
No ratings yet
What Is A Data Warehouse?: A Single, Complete and Consistent Store of Data Obtained Ina What They Can
18 pages
Data Warehousing for Business Insights
No ratings yet
Data Warehousing for Business Insights
127 pages
Data Warehousing AND Data Mining
100% (1)
Data Warehousing AND Data Mining
90 pages
Data Warehousing AND Data Mining
No ratings yet
Data Warehousing AND Data Mining
134 pages
Data Warehousing & Data Mining Chapter 2
No ratings yet
Data Warehousing & Data Mining Chapter 2
88 pages
Data Warehousing & BI Essentials
No ratings yet
Data Warehousing & BI Essentials
41 pages
Chapter-2 DM
No ratings yet
Chapter-2 DM
23 pages
Chapter 1 Modified
100% (1)
Chapter 1 Modified
51 pages
DWM Lecture 1
No ratings yet
DWM Lecture 1
33 pages
Data Warehousing AND Data Mining
No ratings yet
Data Warehousing AND Data Mining
51 pages
Data Mining
No ratings yet
Data Mining
142 pages
Unit-I DW - Architecture
100% (1)
Unit-I DW - Architecture
96 pages
Bid M Course
No ratings yet
Bid M Course
76 pages
Prof. Ramesh Behl Rbehl@imi - Edu
No ratings yet
Prof. Ramesh Behl Rbehl@imi - Edu
60 pages
Data Warehousing & Data Mining
100% (1)
Data Warehousing & Data Mining
22 pages
Wa0077.
No ratings yet
Wa0077.
25 pages
Data Warehousing and Mining
No ratings yet
Data Warehousing and Mining
169 pages
DW Slides
No ratings yet
DW Slides
248 pages
Data Warehousing and Data Mining
100% (4)
Data Warehousing and Data Mining
169 pages
DWH Start l2
No ratings yet
DWH Start l2
117 pages
Introduction To Data Warehousing and Business Intelligence
No ratings yet
Introduction To Data Warehousing and Business Intelligence
72 pages
DW Notes
No ratings yet
DW Notes
72 pages
Tips & Tricks
No ratings yet
Tips & Tricks
5 pages
The New Sales Manager
No ratings yet
The New Sales Manager
14 pages
Thesis HardBound
No ratings yet
Thesis HardBound
227 pages
Handbook of Research Methods in Health Psychology 1st Edition Deborah Fish Ragin (Editor) Available Full Chapters
No ratings yet
Handbook of Research Methods in Health Psychology 1st Edition Deborah Fish Ragin (Editor) Available Full Chapters
97 pages
Understanding Medical Knowledge
No ratings yet
Understanding Medical Knowledge
4 pages
Acti9 A9A iSD+OF Contact Specs
No ratings yet
Acti9 A9A iSD+OF Contact Specs
3 pages
Sociology of Gender
No ratings yet
Sociology of Gender
36 pages
Color Difference Between Cast and White Iron
No ratings yet
Color Difference Between Cast and White Iron
13 pages
Question Bank U3& U4-1
No ratings yet
Question Bank U3& U4-1
18 pages
Philosophy Logic Basics
No ratings yet
Philosophy Logic Basics
4 pages
CII School Education Report - Final
No ratings yet
CII School Education Report - Final
52 pages
Chap 6 - Grammar Answer Key Mosaic 2
No ratings yet
Chap 6 - Grammar Answer Key Mosaic 2
8 pages
Study Strategies for Students
No ratings yet
Study Strategies for Students
4 pages
Black Francophone Power Dynamics
100% (8)
Black Francophone Power Dynamics
22 pages
Streams and Eligibility
No ratings yet
Streams and Eligibility
7 pages
MDB Lesson 4 Shear Deformation, Poisson's Ratio, Thermal Deformation
100% (1)
MDB Lesson 4 Shear Deformation, Poisson's Ratio, Thermal Deformation
15 pages
Mental Imagery - From Basic Research To Clinical Practice
No ratings yet
Mental Imagery - From Basic Research To Clinical Practice
32 pages
Waterproofing 2022
No ratings yet
Waterproofing 2022
8 pages
The Effects of Watching Korean Dramas To The Academic Performance of Grade 11 GHT 2a Phinma University of Pangasinan
50% (4)
The Effects of Watching Korean Dramas To The Academic Performance of Grade 11 GHT 2a Phinma University of Pangasinan
6 pages
Edited Introdution To Epidemiology
No ratings yet
Edited Introdution To Epidemiology
90 pages
Ques
No ratings yet
Ques
3 pages
MSDS - York K Oil
No ratings yet
MSDS - York K Oil
4 pages
Transcript - Nelson Mandela
No ratings yet
Transcript - Nelson Mandela
3 pages
Stair Climbing Wheelchair
No ratings yet
Stair Climbing Wheelchair
15 pages
Hind Swaraj
No ratings yet
Hind Swaraj
16 pages
HandOn ProcurementCycle-GeM PDF
No ratings yet
HandOn ProcurementCycle-GeM PDF
46 pages
28 Day Shred Day05
No ratings yet
28 Day Shred Day05
2 pages
Molecular Biotechnology 6th Edition Bernard R. Glick Online PDF
No ratings yet
Molecular Biotechnology 6th Edition Bernard R. Glick Online PDF
119 pages
Animal 2
No ratings yet
Animal 2
33 pages
S4100C General Monitors Gas Detector Data Sheet
No ratings yet
S4100C General Monitors Gas Detector Data Sheet
2 pages

Business Intelligence & Data Analysis

Uploaded by

Business Intelligence & Data Analysis

Uploaded by

Introduction to business

intelligence and data

▪ Stackowiak et al. (2007) define BI as the process of

▪ In such rough and competitive environment,

▪ The institutions and firms operate in an open

▪ There is a big need to make business analysis and

▪ Is to find a match between the opportunities in

▪ Decision making support systems are

▪ A DW allows its users to extract required data, for

Data warehouse Operational system

DSA: A Data Staging Area is a temporary location where data from

•How many passengers are frequent flyers?

branch location_key location to_location

▪ Describes a technology that is designed to

Information Data Warehouse OLAP Servers Clients

You might also like