0% found this document useful (0 votes)
50 views2 pages

Data Min

Uploaded by

airteloffice219
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views2 pages

Data Min

Uploaded by

airteloffice219
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

A Data Mart is a subset of a directorial information store, generally 2.

A description of the relationship between the data


oriented to a specific purpose or primary data subject which may be components.
distributed to provide business needs. Data Marts are analytical record 3. The description of the method user will interface with the
stores designed to focus on particular business functions for a specific system.
community within an organization. Data marts are derived from subsets of 4. The algorithms and business rules that describe what to do and
data in a data warehouse, though in the bottom-up data warehouse design how to do it.
methodology, the data warehouse is created from the union of
organizational data marts. The fundamental use of a data mart is Business
Intelligence (BI) applications. BI is used to gather, store, access, and analyze
record. It can be used by smaller businesses to utilize the data they have
accumulated since it is less expensive than implementing a dat

a warehouse.
Reasons for creating a data mart  What is ETL?
o Creates collective data by a group of users The mechanism of extracting information from source systems and bringing
o Easy access to frequently needed data it into the data warehouse is commonly called ETL, which stands
o Ease of creation for Extraction, Transformation and Loading.
o Improves end-user response time The ETL process requires active inputs from various stakeholders, including
developers, analysts, testers, top executives and is technically challenging.
o Lower cost than implementing a complete data warehouses
To maintain its value as a tool for decision-makers, Data warehouse
o Potential clients are more clearly defined than in a technique needs to change with business changes. ETL is a recurring
comprehensive data warehouse method (daily, weekly, monthly) of a Data warehouse system and needs to
o It contains only essential business data and is less cluttered. be agile, automated, and well documented.
Types of Data Marts There are mainly two approaches to designing
data marts. These approaches are
o Dependent Data Marts and Independent Data Marts
Dependent Data Marts
A dependent data marts is a logical subset of a physical subset of a higher
data warehouse. According to this technique, the data marts are treated as
the subsets of a data warehouse. In this technique, firstly a data warehouse
is created from which further various data marts can be created. These data
mart are dependent on the data warehouse and extract the essential How ETL Works?
record from it. In this technique, as the data warehouse creates the data ETL consists of three separate phases:
mart; therefore, there is no need for data mart integration. It is also known
as a top-down approach.

Extraction
o Extraction is the operation of extracting information from a
Independent Data Marts source system for further use in a data warehouse
The second approach is Independent data marts (IDM) Here, firstly environment. This is the first stage of the ETL process.
independent data marts are created, and then a data warehouse is o Extraction process is often one of the most time-consuming
designed using these independent multiple data marts. In this approach, as tasks in the ETL.
all the data marts are designed independently; therefore, the integration of o The source systems might be complicated and poorly

to
data marts is required. It is also termed as a bottom-up approach as the
data marts are integrated to develop a data warehouse. documented, and thus determining which data needs be
Virtual Data Warehouses extracted can be difficult.
Virtual Data Warehouses is created in the following stages: o The data has to be extracted several times in a periodic
1. Installing a set of data approach, data dictionary, and process manner to supply all changed data to the warehouse and keep
management facilities. it up-to-date.
2. Training end-clients. Transformation
3. Monitoring how DW facilities will be used Transformation is the core of the reconciliation phase. It converts records
4. Based upon actual usage, physically Data Warehouse is created from its operational source format into a particular data warehouse format.
to provide the high-frequency results If we implement a three-layer architecture, this phase outputs our
This strategy defines that end users are allowed to get at operational reconciled data layer.
databases directly using whatever tools are implemented to the data access The following points must be rectified in this phase:
network. This method provides ultimate flexibility as well as the minimum
amount of redundant information that must be loaded and maintained. The
o Loose texts may hide valuable information. For example, XYZ
PVT Ltd does not explicitly show that this is a Limited
data warehouse is a great idea, but it is difficult to build and requires
Partnership company.
investment. Why not use a cheap and fast method by eliminating the
transformation phase of repositories for metadata and another database. o Different formats can be used for individual data. For example,
This method is termed the 'virtual data warehouse.' data can be saved as a string or as three integers.
To accomplish this, there is a need to define four kinds of data: Following are the main transformation processes aimed at populating the
1. A data dictionary including the definitions of the various reconciled data layer:
databases. o Conversion and normalization that operate on both storage
formats and units of measure to make data uniform.
o Matching that associates equivalent fields in different sources.
What is Snowflake Schema?
o Selection that reduces the number of source fields and
A snowflake schema is equivalent to the star schema. "A schema is known
records.
as a snowflake if one or more dimension tables do not connect directly to
Cleansing and Transformation processes are often closely linked in ETL
the fact table but must join through other dimension tables."
tools.
The snowflake schema is an expansion of the star schema where each point
of the star explodes into more points. It is called snowflake schema because
the diagram of snowflake schema resembles a snowflake. Snowflaking is a
method of normalizing the dimension tables in a STAR schemas. When we
normalize all the dimension tables entirely, the resultant structure
resembles a snowflake with the fact table in the middle.
The snowflake schema consists of one fact table which is linked to many
dimension tables, which can be linked to other dimension tables through a
many-to-one relationship. Tables in a snowflake schema are generally
normalized to the third normal form. Each dimension table performs
exactly one level in a hierarchy.

Advantage of Snowflake Schema


1. The primary advantage of the snowflake schema is the
development in query performance due to minimized disk
Loading storage requirements and joining smaller lookup tables.
The Load is the process of writing the data into the target database. During 2. It provides greater scalability in the interrelationship between
the load step, it is necessary to ensure that the load is performed correctly dimension levels and components.
and with as little resources as possible. 3. No redundancy, so it is easier to maintain.
Loading can be carried in two ways:
1. Refresh: Data Warehouse data is completely rewritten. This Disadvantage of Snowflake Schema
means that older file is replaced. Refresh is usually used in 1. The primary disadvantage of the snowflake schema is the
combination with static extraction to populate a data additional maintenance efforts required due to the increasing
warehouse initially. number of lookup tables. It is also known as a multi fact star
2. Update: Only those changes applied to source information schema.
are added to the Data Warehouse. An update is typically 2. There are more complex queries and hence, difficult to
carried out without deleting or modifying preexisting data. This understand.
method is used in combination with incremental extraction to 3. More tables more join so more query execution time.
update data warehouses regularly.
OLAP Operations in the Multidimensional Data Model
 What is Data Cube? In the multidimensional model, the records are organized into various
When data is grouped or combined in multidimensional matrices called dimensions, and each dimension includes multiple levels of abstraction
Data Cubes. The data cube method has a few alternative names or a few described by concept hierarchies. This organization support users with the
variants, such as "Multidimensional databases," "materialized views," and flexibility to view data from various perspectives. A number of OLAP data
"OLAP (On-Line Analytical Processing)." cube operation exist to demonstrate these different views, allowing
interactive queries and search of the record at hand.
What is Star Schema? Roll-Up
A star schema is the elementary form of a dimensional model, in which data The roll-up operation (also known as drill-up or aggregation
are organized into facts and dimensions. A fact is an event that is counted operation) performs aggregation on a data cube, by climbing down concept
or measured, such as a sale or log in. A dimension includes reference data hierarchies, i.e., dimension reduction. Roll-up is like zooming-out on the
about the fact, such as date, item, or customer. data cubes. Figure shows the result of roll-up operations performed on the
A star schema is a relational schema where a relational schema whose dimension location. The hierarchy for the location is defined as the Order
design represents a multidimensional data model. The star schema is the Street, city, province, or state, country. The roll-up operation aggregates
explicit data warehouse schema. It is known as star schema because the the data by ascending the location hierarchy from the level of the city to
entity-relationship diagram of this schemas simulates a star, with points, the level of the country.
diverge from a central table. The center of the schema consists of a large
fact table, and the points of the star are the dimension tables. Drill-Down
The drill-down operation (also called roll-down) is the reverse operation
of roll-up. Drill-down is like zooming-in on the data cube. It navigates from
less detailed record to more detailed data. Drill-down can be performed by
either stepping down a concept hierarchy for a dimension or adding
additional dimensions.
Figure shows a drill-down operation performed on the dimension time by
stepping down a concept hierarchy which is defined as day, month, quarter,
Characteristics of Star Schema and year. Drill-down appears by descending the time hierarchy from the
o It creates a DE-normalized database that can quickly provide level of the quarter to a more detailed level of the month.
query responses.
o It provides a flexible design that can be changed easily or Slice
added to throughout the development cycle, and as the A slice is a subset of the cubes corresponding to a single value for one or
database grows. more members of the dimension. For example, a slice operation is executed
when the customer wants a selection on one dimension of a three-
o It provides a parallel in design to how end-users typically think
dimensional cube resulting in a two-dimensional site. So, the Slice
of and use the data.
operations perform a selection on one dimension of the given cube, thus
o It reduces the complexity of metadata for both developers and resulting in a subcube.
end-users.
Advantages of Star Schema
1. Query Performance
2. Load performance and administration
3. Built-in referential integrity
4. Built-in referential integrity

You might also like