SEMINAR REPORT
ON
DATA WAREHOUSE
SAM HIGGINBOTTOM INSTITUTE OF AGRICULTURE,
TECHNOLOGY & SCIENCES (DEEMED TO BE UNIVERSITY)
Department of Computer Science and I.T
Submitted To:
Mudita Shrivastava
Submitted By:
Ayush Barnawal (13BTCSE007)
CONTENTS
HISTORY
OBJECTIVES
EVOLUTION IN ORGANIZATIONAL USE
ARCHITECTURE
OLTP
OLAP
BENEFITS OF DATA WAREHOUSE
STRATEGIC USE
LIMITATIONS
HISTORY
The concept of data warehousing dates back to the late
1980s when IBM researchers Barry Devlin and Paul
Murphy developed the "business data warehouse.
Bill Inmon, Father Of Data Warehouse.
According to Inmons Definition:
It is a collection of integrated, subject-oriented databases
designed to support the DSS function, where each unit of
data is non-volatile and relevant to some moment in time.
DATA WAREHOUSE
A single, complete and consistent store of
data obtained from a variety of different
sources made available to end users in a what
they can understand and use in a business
context.
A subject-oriented, integrated, time-variant,
non-updatable collection of data used in
support of management decision-making
processes
DATA WAREHOUSING IS
Relational or Multidimensional Database Management System
designed to support Management Decision Making.
Technique for Assembling and Managing Data from various sources
for the purpose of answering Business questions.
A subject-oriented, integrated, time-variant, non-updatable
collection of data used in support of management decision-making
processes.
Common Accessing systems include queries, analysis and reporting.
The final result, however, is homogeneous data, which can be more
easily manipulated.
SUBJECT ORIENTED
DATA WAREHOUSE
Organized around major subjects, such as customer,
product, sales.
Focusing on the modeling and analysis of data for decision
makers, not on daily operations or transaction processing.
Provide a simple and concise view around particular
subject issues by excluding data that are not useful in the
decision support process.
INTEGRATED DATA WAREHOUSE
Constructed by integrating multiple, heterogeneous
data sources.
Relational databases, flat files, on-line transaction records.
Data cleaning and Data Integration techniques are
applied.
Ensures Consistency in naming conventions, encoding
structures, attribute structures, etc. among different data sources.
E.g., Hotel price: currency, tax, breakfast covered, etc.
When data is moved to the warehouse, it is converted.
TIME VARIANT DATA WAREHOUSE
The time horizon for the data warehouse is significantly
longer than that of operational systems.
Operational database: current value data.
Data warehouse data: provide information from a historical
perspective (e.g., past 5-10 years)
Every key structure in the Data Warehouse.
Contains an element of time, explicitly or implicitly.
But the key of operational data may or may not contain time
element.
NON VOLATILE
DATA WAREHOUSE
Physically separate store of Data transformed from the
Operational Environment.
Operational update of data does not occur in the data
warehouse environment.
Does not require transaction processing, recovery, and concurrency
control mechanisms.
Requires only two operations in Data Accessing.
Initial loading of Data.
Access of Data.
EVOLUTION
IN
ORGANIZATIONAL USE
Data, Data Everywhere yet,
I cant find the data I need
Data is Scattered over the Network
Many versions, Subtle differences
I cant understand the data I found
Available Data poorly documented
I cant use the data I found
Results are Unexpected
Data needs to be transformed from one form to Other
WHY DATA WAREHOUSING ?
Which
Whichare
areour
our
lowest/highest
lowest/highestmargin
margin
customers
customers??
Who
Whoare
aremy
mycustomers
customers
and
andwhat
whatproducts
products
are
arethey
theybuying?
buying?
What
Whatisisthe
themost
most
effective
effectivedistribution
distribution
channel?
channel?
What
Whatproduct
productpromprom-otions
-otionshave
havethe
thebiggest
biggest
impact
impacton
onrevenue?
revenue?
Which
Whichcustomers
customers
are
aremost
mostlikely
likelyto
togo
go
to
tothe
thecompetition
competition??
What
Whatimpact
impactwill
will
new
products/services
new products/services
have
haveon
onrevenue
revenue
and
andmargins?
margins?
ARCHITECTURE
Client
Client
Query & Analysis
Metadata
Warehouse
Integration
Source
Source
Source
OLTP
OLTP- ONLINE TRANSACTION PROCESSING
Special data organization, access methods and
implementation methods are needed to support data
warehouse queries (typically multidimensional queries).
OLTP systems are tuned for known transactions and
workloads while workload is not known a priori in a Data
Warehouse.
E.g., Average Amount spent on phone calls between 9AM-5PM in
Pune during the month of December
OLTP Vs DW
OLTP
DW
Application Oriented
Subject Oriented
Used to run business
Used to Analyze Business
Detailed Data
Summarized and Refined
Current Up-date Data
Snapshot Data
Isolated Data
Integrated Data
Few Records Accessed at a Time(tens)
Large Records Accessed at a time
(millions)
Read/Update Access
Mostly Read(Batch Update)
No Data Redundancy
Redundancy Present
Database Size (100MB -100 GB)
Database Size (100GB Few Terabytes)
Transaction throughput is performance
Metric.
Query throughput is the performance
Metric.
Thousands of users
Hundreds of Users
META DATA
Data about Data. Its Important for Designing, Constructing,
Retrieving, and controlling the Warehouse Data.
TECHNICAL META DATA
Include where the data come from, how the data were changed, how the data are
organized, how the data are stored, who owns the data, who is responsible for the
data and how to contact them, who can access the data , and the date of last update.
BUSINESS META DATA
Include what data are available, where the data are, what the data mean, how to
access the data, predefined reports and queries, and how current the data are.
OLAP
Online Analytical Processing - coined by EF Codd in 1994
paper contracted by Arbor Software.
Online analytical processing refers to such end user
activities as DSS modelling using spreadsheets and
graphics that are done online.
OLAP involves many different data items in complex
relationships.
Objective of OLAP is to analyze complex relationships
and look for patterns, trends and exceptions.
OLAP IS FASMI
FAST
ANALYTICAL
SHARED
MULTI-DIMENSIONAL
INFORMATION
BENEFITS OF DW
Provides business users with a customer-centric view of the
companys Heterogeneous data by helping to integrate data from
sales, service, manufacturing and distribution, and other customerrelated Business systems.
Provides added value to the companys customers by allowing them
to access better information when Data Warehousing is coupled with
Internet technology.
Consolidates data about individual customers and provides a
repository of all customer contacts for segmentation modeling,
customer retention planning, and cross sales analysis.
Reports on trends across multidivisional, multinational operating
units, including trends or relationships in areas such as
merchandising, production planning etc.
STRATEGIC USE
LIMITATIONS
Data warehouses are not the optimal environment for
unstructured data.
Over their life, data warehouses can have high Maintenance
costs.
Data warehouses can get outdated relatively quickly. There is a
cost of delivering suboptimal information to the organization.
There is often a fine line between data warehouses and
operational systems. Duplicate, expensive functionality may be
developed. Or, functionality may be developed in the data
warehouse that, in retrospect, should have been developed in
the operational systems and vice versa .
K
N
A
H
T
!
!
!
U
O
Y