0% found this document useful (0 votes)
22 views12 pages

UNITyssu 1 LT

The document discusses topics related to databases, data warehousing, and their differences from online transaction processing systems. A data warehouse integrates data from multiple sources to provide a consistent view for analysis and decision making. It outlines the history, benefits, and challenges of data warehousing.

Uploaded by

Sahil Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views12 pages

UNITyssu 1 LT

The document discusses topics related to databases, data warehousing, and their differences from online transaction processing systems. A data warehouse integrates data from multiple sources to provide a consistent view for analysis and decision making. It outlines the history, benefits, and challenges of data warehousing.

Uploaded by

Sahil Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

SELECTED TOPICS OF RECENT TRENDS IN

INFORMATION TECHNOLOGY
UNIT I

Database and Data Ware Housing

DWH Constitute Entire Information Base For All Time.

– Database Constitute Real Time Information.

– DWH Supports DM and Business Intelligence.

– Database Is Used To Running The Business

DWH Is How to Run the Business

What is a Data Warehouse?

A single, complete and consistent store of data obtained from a variety of different sources
made available to end users in a way they can understand and use in a business context.

A process of transforming data into information and making it available to users in a


timely enough manner to make a difference.

A process -

• It is a relational or multidimensional database management system designed to


support management decision making.

• A data warehousing is a copy of transaction data specifically structured for querying


and reporting.

• Technique for assembling and managing data from various sources for the purpose of
answering business questions. Thus making decisions that were not previous possible.

• Subject Oriented: Data that gives information about a particular subject instead of
about a company's ongoing operations.

• Integrated: Data that is gathered into the data warehouse from a variety of sources
and merged into a coherent whole.

• Time-variant: All data in the data warehouse is identified with a particular time
period.
• Non-volatile: Data is stable in a data warehouse. More data is added but data is never
removed. This enables management to gain a consistent picture of the business.

• Data warehousing is combining data from multiple and usually varied sources into
one comprehensive and easily manipulated database.

• Common accessing systems of data warehousing include queries, analysis and


reporting.

• Because data warehousing creates one database in the end, the number of sources can
be anything you want it to be, provided that the system can handle the volume, of
course.

• The final result, however, is homogeneous data, which can be more easily
manipulated.

History of data warehousing:

• The concept of data warehousing dates back to the late 1980s when IBM researchers
Barry Devlin and Paul Murphy developed the "business data warehouse".

• 1960s - General Mills and Dartmouth College, in a joint research project, develop the
terms dimensions and facts.

• 1970s - ACNielsen and IRI provide dimensional data marts for retail sales.

• 1983 – Tera data introduces a database management system specifically designed for
decision support.

• 1988 - Barry Devlin and Paul Murphy publish the article An architecture for a
business and information systems in IBM Systems Journal where they introduce the
term "business data warehouse".

A data warehouse is simply a single, complete, and consistent store of data obtained from a
variety of sources and made available to end users in a way they can understand and use it in
a business context.

Evolution in organizational use of data warehouses

Organizations generally start off with relatively simple use of data warehousing. Over time,
more sophisticated use of data warehousing evolves. The following general stages of use of
the data warehouse can be distinguished:

• Off line Operational Database

– Data warehouses in this initial stage are developed by simply copying the data
off an operational system to another server where the processing load of
reporting against the copied data does not impact the operational system's
performance.

• Off line Data Warehouse

– Data warehouses at this stage are updated from data in the operational systems
on a regular basis and the data warehouse data is stored in a data structure
designed to facilitate reporting.

• Real Time Data Warehouse

– Data warehouses at this stage are updated every time an operational system
performs a transaction (e.g. an order or a delivery or a booking.)

• Integrated Data Warehouse

– Data warehouses at this stage are updated every time an operational system
performs a transaction. The data warehouses then generate transactions that
are passed back into the operational systems.

Benefits of Data Warehousing

The successful implementation of a data warehouse can bring major, benefits to an


organization including:

• Potential high returns on investment

Implementation of data warehousing by an organization requires a huge investment typically


from Rs 10 lack to 50 lacks. However, a study by the International Data Corporation (IDC) in
1996 reported that average three-year returns on investment (RO I) in data warehousing
reached 401%.

• Competitive advantage

The huge returns on investment for those companies that have successfully implemented a
data warehouse is evidence of the enormous competitive advantage that accompanies this
technology. The competitive advantage is gained by allowing decision-makers access to data
that can reveal previously unavailable, unknown, and untapped information on, for example,
customers, trends, and demands.

• Increased productivity of corporate decision-makers

Data warehousing improves the productivity of corporate decision-makers by creating an


integrated database of consistent, subject-oriented, historical data. It integrates data from
multiple incompatible systems into a form that provides one consistent view of the
organization. By transforming data into meaningful information, a data warehouse allows
business managers to perform more substantive, accurate, and consistent analysis.
• More cost-effective decision-making

Data warehousing helps to reduce the overall cost of the· product· by reducing the number of
channels.

• Better enterprise intelligence.

It helps to provide better enterprise intelligence.

• Enhanced customer service.

• It is used to enhance customer" service.

OLTP- ONLINE TRANSACTION PROCESSING

• Special data organization, access methods and implementation methods are needed to
support data warehouse queries (typically multidimensional queries)

• OLTP systems are tuned for known transactions and workloads while workload is not
known a priori in a data warehouse

– e.g., average amount spent on phone calls between 9AM-5PM in Pune during
the month of December

OLTP vs Data Warehouse

OLTP Data Warehouse

• Application Oriented – Subject Oriented


• Used to run business – Used to analyze business
• Detailed data – Summarized and refined
• Current up to date – Snapshot data
• Isolated Data – Integrated Data
• Clerical User – Knowledge User (Manager)
• Few Records accessed at a time (tens) – Large volumes accessed at a time
• Read/Update Access (millions)
• No data redundancy – Mostly Read (Batch Update)
• Database Size 100MB -100 GB – Redundancy present
• Transaction throughput is the – Database Size 100 GB - few
performance metric terabytes
• Thousands of users – Query throughput is the
• Managed in entirety performance metric
– Hundreds of users
– Managed by subsets

Problems of Data Warehousing

The problems associated with developing and managing a data warehousing are as follows:
Underestimation of resources of data loading

Sometimes we underestimate the time required to extract, clean, and load the data into the
warehouse. It may take the significant proportion of the total development time.

Hidden problems with source systems

Sometimes hidden .problems associated with the source systems feeding the data warehouse
may be identified after years of being undetected. For example, when entering the details of a
new property, certain fields may allow nulls which may result in staff entering incomplete
property data, even when available and applicable.

Required data not captured

In some cases the required data is not captured by the source systems which may be very
important for the data warehouse purpose. For example the date of registration for the
property may be not used in source system but it may be very important analysis purpose.

Increased end-user demands

After satisfying some of end-users queries, requests for support from staff may increase
rather than decrease. This is caused by an increasing awareness of the users on the
capabilities and value of the data warehouse. Another reason for increasing demands is that
once a data warehouse is online, it is often the case that the number of users and queries
increase together with requests for answers to more and more complex queries.

Data homogenization

The concept of data warehouse deals with similarity of data formats between different data
sources. Thus, results in to lose of some important value of the data.

High demand for resources

The data warehouse requires large amounts of data.

Data ownership

Data warehousing may change the attitude of end-users to the ownership of data. Sensitive
data that owned by one department has to be loaded in data warehouse for decision making
purpose. But some time it results in to reluctance of that department because it may hesitate
to share it with others.

High maintenance

Data warehouses are high maintenance systems. Any reorganization· of the business
processes and the source systems may affect the data warehouse and it results high
maintenance cost.
Complexity of integration

The most important area for the management of a data warehouse is the integration
capabilities. An organization must spend a significant amount of time determining how well
the various different data warehousing tools can be integrated into the overall solution that is
needed. This can be a very difficult task, as there are a number of tools for every operation of
the data warehouse.

Operational Data and Data store -

An operational data store (ODS) is a type of database that's often used as an interim logical
area for a data warehouse.

While in the ODS, data can be scrubbed, resolved for redundancy and checked for
compliance with the corresponding business rules. An ODS can be used for integrating
disparate data from multiple sources so that business operations, analysis and reporting can
be carried out while business operations are occurring. This is the place where most of the
data used in current operation is housed before it's transferred to the data warehouse for
longer term storage or archiving.

An ODS is designed for relatively simple queries on small amounts of data (such as finding
the status of a customer order), rather than the complex queries on large amounts of data
typical of the data warehouse. An ODS is similar to your short term memory in that it stores
only very recent information; in comparison, the data warehouse is more like long term
memory in that it stores relatively permanent information.

Process managers are responsible for maintaining the flow of data both into and out of the
data warehouse. There are three different types of process managers:

 Load manager

 Warehouse manager

 Query manager

Data Warehouse Load Manager

Load manager performs the operations required to extract and load the data into the database.
The size and complexity of a load manager varies between specific solutions from one data
warehouse to another.

Load Manager Architecture

The load manager does performs the following functions:

 Extract data from the source system.


 Fast load the extracted data into temporary data store.

 Perform simple transformations into structure similar to the one in the data
warehouse.

Extract Data from Source

The data is extracted from the operational databases or the external information providers.
Gateways are the application programs that are used to extract data. It is supported by
underlying DBMS and allows the client program to generate SQL to be executed at a server.
Open Database Connection (ODBC) and Java Database Connection (JDBC) are examples of
gateway.

Fast Load

 In order to minimize the total load window, the data needs to be loaded into the
warehouse in the fastest possible time.

 Transformations affect the speed of data processing.

 It is more effective to load the data into a relational database prior to applying
transformations and checks.

 Gateway technology is not suitable, since they are inefficient when large data
volumes are involved.

Simple Transformations

While loading, it may be required to perform simple transformations. After completing


simple transformations, we can do complex checks. Suppose we are loading the EPOS sales
transaction, we need to perform the following checks:
 Strip out all the columns that are not required within the warehouse.

 Convert all the values to required data types.

Warehouse Manager

The warehouse manager is responsible for the warehouse management process. It consists of
a third-party system software, C programs, and shell scripts. The size and complexity of a
warehouse manager varies between specific solutions.

Warehouse Manager Architecture

A warehouse manager includes the following:

 The controlling process

 Stored procedures or C with SQL

 Backup/Recovery tool

 SQL scripts

Functions of Warehouse Manager

A warehouse manager performs the following functions:

 Analyzes the data to perform consistency and referential integrity checks.

 Creates indexes, business views, partition views against the base data.
 Generates new aggregations and updates the existing aggregations.

 Generates normalizations.

 Transforms and merges the source data of the temporary store into the published data
warehouse.

 Backs up the data in the data warehouse.

 Archives the data that has reached the end of its captured life.

Note: A warehouse Manager analyzes query profiles to determine whether the index and
aggregations are appropriate.

Query Manager

The query manager is responsible for directing the queries to suitable tables. By directing the
queries to appropriate tables, it speeds up the query request and response process. In addition,
the query manager is responsible for scheduling the execution of the queries posted by the
user.

Query Manager Architecture

A query manager includes the following components:

 Query redirection via C tool or RDBMS

 Stored procedures

 Query management tool

 Query scheduling via C tool or RDBMS

 Query scheduling via third-party software


Functions of Query Manager

 It presents the data to the user in a form they understand.

 It schedules the execution of the queries posted by the end-user.

 It stores query profiles to allow the warehouse manager to determine which indexes
and aggregations are appropriate.

Data Warehouse Architecture

• The data has been selected from various sources and then integrate and store the data
in a single and particular format.
• Data warehouses contain current detailed data, historical detailed data, lightly and
highly summarized data, and metadata.

• Current and historical data are voluminous because they are stored at the highest level
of detail.

• Lightly and highly summarized data are necessary to save processing time when
users request them and are readily accessible.

• Metadata are “data about data”. It is important for designing, constructing, retrieving,
and controlling the warehouse data.

Technical metadata include where the data come from, how the data were changed, how the
data are organized, how the data are stored, who owns the data, who is responsible for the
data and how to contact them, who can access the data , and the date of last update.

Business metadata include what data are available, where the data are, what the data mean,
how to access the data, predefined reports and queries, and how current the data are.

In Lightly Summarized Data the evaluational data is summarized by removing one, or a


few, data characteristic from the primary key of the data focus.

Highly Summarized Data are summarized data obtained by removing many data
characteristics from the primary key of the data focus. Dealing with highly summarized data
which refer to evaluational data, greatly falls under the domain of a data warehousing
implementation.

DW Three-tier architecture:

a. Data acquisition software (back-end).


b. The data warehouse that contains the data & software
c. Client (front-end) software that allows users to access and analyze data from the
warehouse

Two-tier architecture: First two tiers in three-tier architecture is combined into one,

The Complete Decision Support System


Data Warehouse vs. Data Marts

 Enterprise warehouse: collects all information about subjects


(customers,products,sales,assets, personnel) that span the entire organization

 Requires extensive business modeling (may take years to design and build)

 Data Marts: Departmental subsets that focus on selected subjects

 Marketing data mart: customer, product, sales

 Faster roll out, but complex integration in the long run

 Virtual warehouse: views over operational dbs

 Materialize sel. summary views for efficient query processing

 Easy to build but require excess capability on operat. db servers

You might also like