See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/340546587
COMPARATIVE STUDY ON DATA WAREHOUSE TABLES AND SCHEMA-AN
OVERVIEW
Article  in  Adalya · April 2020
DOI: 10.37896/aj9.4/016
CITATIONS                                                                                              READS
0                                                                                                      1,296
3 authors, including:
             K Rajasekaran                                                                                        Saravanan P.
             D B jain college                                                                                     Dhanraj baid jain college(Autonomous), Chennai
             7 PUBLICATIONS   9 CITATIONS                                                                         28 PUBLICATIONS   34 CITATIONS   
                SEE PROFILE                                                                                          SEE PROFILE
Some of the authors of this publication are also working on these related projects:
               PREDICTION OF PHARMA STOCKS USING DEEP LEARNING MODELS View project
               Analysis of Various Contrast Improvement Techniques for Dehazing an Image View project
 All content following this page was uploaded by Saravanan P. on 10 April 2020.
 The user has requested enhancement of the downloaded file.
ADALYA JOURNAL                                 https://doi.org/10.37896/aj9.4/016                           ISSN NO: 1301-2746
                     COMPARATIVE STUDY ON DATA
                    WAREHOUSE TABLES AND SCHEMA–AN
                              OVERVIEW
                                                     Dr.K.Rajasekaran
                        Associate Professor, P.G. and Research Dept of Computer Science,
                                        D.B.Jain College, Chennai, India
                                              krs.salem@gmail.com
                                                   P.Saravanan
                        Assistant Professor, P.G. and Research Dept of Computer Science,
                                         D.B.Jain College, Chennai, India
                                              sharandwh@gmail.com
                                                  P.Prabakaran
                        Assistant Professor, P.G. and Research Dept of Computer Science,
                                         D.B.Jain College, Chennai, India
                                            spprabakaran@yahoo.com
    Abstract- An Aim of focusing various types of tables and Schema in Data Warehouse. A Data Warehouse (DWH) is a
    subject-oriented, integrated, time-variant, non-volatile and large amount of data in support of management’s decision-
    making process. It is gaining importance day by day. The words On-Line Analytical Processing (OLAP) bring together a
    set of tools that use multidimensional modeling in the extraction of information from the Data Warehouse.
    Multidimensional data model is an connected part of On-Line Analytical Processing, or OLAP. It contains a
    multidimensional view of huge amounts of historical data from operational sources, thus supplying useful information for
    decision makers to improve their business intelligence which has become an integral part of decision making strategy.
    There are two forms for data management i.e. operational databases and data warehouse. The operational databases are
    where the put in data. Users of this type almost handle with one record at a time and they usually perform the same tasks.
    The data warehouse is where we find the data away. Users of this kind of type almost deal with set of row at a time and
    their questions require that thousands of rows be fetched into an answer set. The data stored in the warehouse is upload
    from the operational system such as marketing, sales, finance etc. There are two types of Tables first one is
    Dimension table and second one is Fact table.
    Keywords - Dimensional modeling, Dimension table, Slowly Changing, Rapidly Changing, Junk, Inferred, Conformed,
    Degenerate, Role Playing, Shrunken and Static Dimension table, Fact Table , Schema, Star, Snowflake and Fact
    constellation (or) Multi-Star.
   1. INTRODUCTION
            The term Data Warehouse (DWH) was first coined by Bill Inmon in 1990. According to Bill Inmon, a data
    warehouse is a subject-oriented, integrated, time-variant, non-volatile and large amount of data. This data (or)
    information helps analyst to take informed decision in an organization.
            An operational database undergoes frequent change on a daily basis on account of the communication that take
    place. Assume a business executive wants to analyze earlier comment on any data such as a supplier, a product, or any
    consumer data, then the executive will have no data available to study because the previous or old data has been updated
    due to transactions.
         A data warehouses provide us comprehensive and consolidate data in multidimensional view. Along with
    comprehensive and consolidate view of data, a data warehouses also provide us Online Analytical Process (OLAP) tools.
    These tools help us in interactive and efficient analysis of data in a multidimensional space. This analysis outcome in
Volume 9, Issue 4, April 2020                                 126                                        http://adalyajournal.com/
ADALYA JOURNAL                                  https://doi.org/10.37896/aj9.4/016                            ISSN NO: 1301-2746
    data generalization and data mining.
   1.1. Data Warehouse Features
         The advantages of a data warehouse are given below:
         • Subject Oriented - It provide data (or) information around a subject rather than the organization ongoing
    operations. These subjects can be customers, product, suppliers, revenue, sales, etc. A data warehouse doesn’t focus on
    the continuing operations; rather it focuses on model and study of data for decision making.
         • Integrated – A data warehouse is construct by integrate data from heterogeneous sources such as relational
    databases, flat files, etc. This integration enhances the effective analysis of data.
         • Time Variant - The data collected in a data warehouse is standard with a particular time era. The data in a data
    warehouse provide information from the historical data point of view.
         • Non-volatile - Non-volatile means the before data is not erased when new data is added to it. A data warehouse is
    kept split from the operational database and hence frequent change in operational database is not reflected in the data
    warehouse.
         Note: A data warehouse does not require transaction processing, concurrency controls, and recovery, because it is
    physically stored and separate from the operational database.
                                                       Figure 1.1. Data warehouse Design
   2. DATA WAREHOUSE DIMENSIONAL MODELING
        Dimensional modeling (DM) is the design concept used by many data warehouse designers to build their data
    warehouse. Dimensional model is the fundamental data model used by many of the commercial OLAP product available
    today in the market. Dimensional model names a set of techniques and concepts used in design of data warehouse. It is
    consider being different from entity-relationship modeling(ER). It is simpler, more communicative and easier to
    understand than ER modeling. It is a technique for conceptualize and visualize data models as a set of measures that are
    described by common aspects of the business. It is especially useful for summarizing and rearranging the data and
    presenting views of the data to support data analysis.
        Dimensional modeling focuses on numeric data such as values, counts, balances, weights, and occurrences. It does not
    necessarily involve a relational database. The same model approach, at logical level, can be used for any physical form,
    such as multidimensional database or even flat files. Dimensional model always uses the concept of facts (numeric
    values) and dimensions (text). Facts are typically numeric values that can be aggregated and dimensions are groups of
    hierarchies and descriptors that define the facts. For example, sales amount is a fact; register, timestamp, product, store,
    etc. are elements of dimensions. Dimensional models are built by business process area, for example store sales,
    inventory, claims, etc. Because the various business process areas share some but not all dimensions, efficiency in design,
    operation, and consistency, is achieved using conformed dimensions, i.e. using one copy of the shared dimension across
    subject areas. In this model, all data is contained in two types of tables called Dimension Table and Fact Table.
Volume 9, Issue 4, April 2020                                  127                                        http://adalyajournal.com/
ADALYA JOURNAL                                  https://doi.org/10.37896/aj9.4/016                           ISSN NO: 1301-2746
       3. TABLES
       A fact table consists of the measurements, metrics or facts of a business process. Tables mainly classified into two
    types. There are
               Dimensions Table
               Fact Table
       3.1. Dimensions Table
        A Dimension table contain primary key and textual message. It is a table in a star schema of a data warehouse. A
    dimension table stores attribute or dimension that describes the objects in a fact table. From the above example Item_no
    dimension, the attributes are Item_name, supplier, etc. Generally the Dimension Attributes are used in report labels, and
    query constraints such as where Supplier='P.Saravanan'.
                                               Item_no    Item_name Supplier
                                               210         LED          P.Saravanan
                                               572        Refrigerator Dr.K .Rajasekaran
                                               335        Air           Dr.S.Balaji
                                                          Conditioner
                                               348        Washing       Dr.B.Jagadhesan
                                                          Machine
                                                   Table 1.1. Dimension Table
            The dimension attributes contain one or more hierarchical relationships. Before design your data warehouse, you
    need to decide what this data warehouse contains. The measure is a numeric attribute in the fact table which illustrate the
    behavior of the business of the dimension. If you need to build a data into data warehouse contain monthly sales numbers
    across multiple store locations, across time and across products then your dimensions are:
                           1) Location
                           2) Time
                           3) Product
        Each dimension table contains data for one dimension. In above example get all your store location information and
    put that into one single table called Location. Your store location data may be spanned across multiple tables in your
    OLTP system (unlike OLAP), but you need to de-normalize all that data into one single table. A data warehouse require
    concise, subject oriented schema that make possible on-line analysis. The majority of data model for data warehouse is
    multidimensional model. There are nine types are Slowly Changing, Rapidly Changing, Junk, Inferred, Conformed,
    Degenerate, Role Playing, Shrunken and Static Dimension table.
        Slowly Changing Dimension- Attributes of a dimension that would undertake change over time. It depend on the
    business requirement whether particular attribute history of change should be sealed in the data warehouse. This is called
    a slowly changing attribute and a dimension contain such an attribute is called a slowly changing dimension.
        Rapidly Changing Dimension- A dimension attribute that changes regularly is a rapidly altering attribute. If you
    don’t need to track the change, the rapidly changing attribute is no problem, but if you do need to track the changes,
    using a standard slowly changing dimension can result in a enormous increase of the size of the dimension. One solution
    is to move the attribute to its own dimension, with a separate reference key in the fact table.
        Junk Dimension - A junk dimension is a single table with a combination of various and dissimilar attributes to avoid
    having a large number of reference keys in the fact table. Junk dimensions are often created to manage the reference keys
    created by rapidly changing dimension.
        Inferred Dimension- While loading fact(numeric) records, a dimension record may not yet be ready. One solution is
    to generate a surrogate key with empty for all the other attributes. This should technically be called an inferred member,
    but is often called an inferred dimension.
        Conformed Dimension is used in various locations is called a conformed dimension. A conformed dimension may be
    used with many fact tables in a single database, or across multiple data marts or data warehouses.
        Degenerate Dimension is when the dimension attribute is store as part of fact table, and not in a part dimension
    table. These are essentially dimension keys for which there are no other attributes. In a data warehouse, these are often
    used as the result of a drill through query to examine the source of an aggregate number in a report. It can use these
Volume 9, Issue 4, April 2020                                  128                                       http://adalyajournal.com/
ADALYA JOURNAL                                   https://doi.org/10.37896/aj9.4/016                            ISSN NO: 1301-2746
    values to outline back to communication in the OLTP system.
        Role Playing Dimension is one where the same dimension key along with its associated attributes can be joined to
    more than one reference key in the fact table. For example, a fact table may include reference keys for both ship date and
    delivery date. But the same date dimension attributes apply to each reference key, so you can join the same dimension
    table to both reference keys. Here the date dimension is taking many roles to map ship date as well as delivery date, and
    therefore the name of role in performance dimension.
        Shrunken Dimension is a subset of one more dimension. For example, the orders fact table may include a reference
    key for product, but the target fact table may include a reference key only for product category, which is in the product
    table, but much less granular. Creating a smaller dimension table, with product category as its primary key, is one way of
    deali with this situation of heterogeneous grain. If the product dimension is snowflaked, there is probably already a
    separate table for product category, which can serve as the shrunken dimension.
        Static Dimensions are not extract from the original data source, created within the context of the data warehouse. A
    static dimension can be loaded physically for example with status codes or it can be generate by a procedure, such as a
    date or time dimension.
   3.2. Fact Table
        Fact table contains reference key and numerical value of business processes. For example business process is Sales
    revenue, then a measurement of this business process such as "monthly sales number" is captured in the fact table. In
    addition to the measurements, the only other things a fact table contains are reference keys for the dimension tables. The
    fact is a set of related data, contains analytical context data and measures. It used to represent business items or business
    transactions. A fact table is the central table in a star schema or multi-star schema of a data warehouse. A fact table stores
    numerical information for analysis and is often denormalized. Suppose an electronic shop sells its product. Thus, every
    sale is a fact that happens and the fact table is used to record these facts. For example:
                                         Item_no.    Branch_code Location         Unit_Sold
                                         210         1238           Chennai       15
                                         572         4567           Salem         35
                                         335         4892           Trichy        77
                                         348         3296           Mumbai        48
                                                       Table 2.1. Fact Table
       There are three types of facts:
       Additive Facts- Additive facts can be used with any aggregation function like Sum( ), Avg( ) etc. Example is
    Quantity, sales amount etc.
       Semi Additive Facts- Semi-additive facts are those where only a few of aggregation function can be applied. For
    example, Consider bank account details. You cannot apply the Sum( ) on the bank balance that does not give useful
    results but min( ) and max( ) function may return useful information.
       Non-Additive Facts- You cannot use numeric aggregation functions such as Sum( ), Avg( ) etc on Non-additive facts.
    For example of non-additive fact is any kind of ratio or percentage. Non numerical values facts can also be a non-additive
    facts.
       Types of Fact-less Fact Tables
        Fact-less Fact Tables- A fact table that does not have any measure is a fact-less fact table. For example, a fact table
    which has only product_ID and date key is a fact-less fact table.
        Centipede Fact Table- Centipede fact table is a normalized fact table. Modeler may decide to normalize the fact
    instead of snow flaking dimensions tables.
        Conformed Fact Tables- It can be measures reused across multiple dimension model. For example, KPI such as
    profit, sales revenue etc
       Incident and Snapshot Facts- A fact table store some kind of measurement and captured against a specific time.
    Now it might so happen that the business might unable to capture all of its measure always for every point in time. Then
    those unavailable measurement can be kept empty (or) can be filled up with the last available measurements. The first
    one is the example of incident fact and the second one is the example of snapshot fact.
       Cumulative Fact- This type of fact table describes what has happened over a period of time. For example, this fact
    table may bring out the total sales by product by store by day.
Volume 9, Issue 4, April 2020                                    129                                        http://adalyajournal.com/
ADALYA JOURNAL                                   https://doi.org/10.37896/aj9.4/016                            ISSN NO: 1301-2746
   4. SCHEMA
            Schema is a logical description of the entire database and Collection of Dimension and Fact table. It contain the
   name and detail of records of all record types including all associated data-items and aggregate value. Similar to a database,
   a data warehouse also require to maintain a schema. A database uses relational model, while a data warehouse uses Star,
   Snowflake, and Fact Constellation schema or Multi-Star Schema.
   4.1. Star Schema
             Each dimension table directly connected to one central fact table is called. This dimension table contains the set of
   attributes. The following diagram show the sales fact table data of a company with respect to the four dimension tables are
   namely time, item, branch, and location. There is a fact table at the center. It contains the keys to each of four dimensions
   tables like namely time_key, item_key, branh_key and location_key. The fact table also contain the attributes are namely
   dollars_sold and units_sold.
                                                      Figure 4.1 Star Schema
   4.2. Snowflake Schema
            A Snowflake Schema is all the dimension table directly is not connected to fact table is called. A few dimension
   table in the Snowflake schema are normalized. The normalization split up the data into added tables. Dissimilar Star
   schema, the dimensions table in a snowflake schema are normalized. For example, the item dimension table in star schema
   is normalized and split into two dimension tables, namely item and supplier table.
                                                   Figure 4.2 Snowflake Schema
Volume 9, Issue 4, April 2020                                   130                                        http://adalyajournal.com/
ADALYA JOURNAL                                  https://doi.org/10.37896/aj9.4/016                            ISSN NO: 1301-2746
   4.3 Fact Constellation Schema (or) Multi-Star Schema
            Fact constellation has the entire dimension table directly connected to more than one central fact tables are called.
   The following diagram show two fact tables, namely shipping and sales. Now the item dimension table contain the
   attributes are item_key, item_name, brand, type, and supplier-key. The shipper_key is linked to the shipper dimension
   table. The shipper dimension table contain the attributes are shipper _key , shipper _name and shipper _type.
                                  Figure 4.3 Fact Constellation Schema (or) Multi-Star Schema
    5. CONCLUSION
             This paper discuss the data warehouse types of tables such as dimension tables are Slowly Changing, Rapidly
    Changing, Junk, Inferred, Conformed, Degenerate, Role Playing, Shrunken and Static Dimensions table and fact tables
    are Additive Fact , Semi- Additive Fact and Non-Additive Fact table ,Multidimensional schemas such as star schema,
    snowflake schema and fact constellation or multi-star schema. The advantage of using these table and schema are simpler
    and communicative, easy to read then E-R models. It is useful for terse and rearranging the data and presenting views of
    the data to support data analysis. Data warehouses (DWH) separate study workload from transaction workload and allow
    an organization to consolidate data from several sources. Dimensional model is a logical design technique for structure
    data so that it is instinctive to business users and delivers fast query presentation. Data presented to the business
    intelligence tools must be grounded in simplicity to stand any chance of success. Simplicity is a fundamental requirement
    because it ensures that users can easily understand databases, as well as allows software to efficiently navigate databases.
    References
    [1] Mandeep Kaur Sandhu, Amanjot Kaur, Ramandeep Kaur: Data Warehouse Schemas, International Journal of
        Innovative Research in Advanced Engineering (IJIRAE) ISSN: 2349-2163, Issue 4, Volume 2 (April 2015), pp-47-
        51.
    [2] Boris Vrdoljak, Marko Banek, and Stefano Rizzi: Designing Web Warehouses from XML Schemas Y. Kambayashi,
         M. Mohania, W. Wöß (Eds.): DaWaK 2003, LNCS 2737, pp. 89-98, 2003. SpringerVerlag Berlin Heidelberg 2003
    [3] Wolfgang Hummer, Andreas Bauer, Gunnar Harde: XCube – XML For Data Warehouses, DOLAP’03, November
         7, 003, USA.
    [4] M. Golfarelli, S. Rizzi, and B. Vrdoljak, Data warehouse design from XML sources., Proc. DOLAP’01, Atlanta, pp.
         40-47, 2001.
    [5] Jiawei Han and Micheline Kamber , “Data Mining Concepts and Technique” ,2nd Edition, , Morgan Kaufmann
Volume 9, Issue 4, April 2020                                   131                                        http://adalyajournal.com/
ADALYA JOURNAL                              https://doi.org/10.37896/aj9.4/016                      ISSN NO: 1301-2746
         Publisher.
     [6] Ralph Kimball and Margy Ross, “The Data Warehouse Toolkit: The Complete Guide to Dimensional Modelling”
          2nd Edition, R, John Willy & Sons.
     [7] Soumya Sen, Ranak Ghosh, Debanjali Paul, Nabendu Chaki “Integrating related XML data into multiple data
          warehouse schemas” research paper 2012-2013.
     [8] R. Patel, MS.Alpa, “Data Modeling techniques for data warehouse” International Journal of Multidisciplinary
          Research, Vol.2 Issue 2, February 2012, ISSN 2231 5780.
     [9] Keshav Dev Gupta, Jyoti Gupta, Prakati Prasoon, “Novel Architecture with Dimensional Approach of Data
          Warehouse” International Journal of Advanced Research in Computer Science and Software Engineering, Volume
          3, Issue 3, March 2013.
     [10] Anirban Sarkar “Data Warehouse Requirements Analysis Framework: Business-Object Based Approach”
          International Journal of Advanced Computer Science and Applications, Vol. 3, No. 1, 2012.
     [11] https://www.folkstalk.com/2010/01/types-of-dimensions.html
     [12] http://dwgeek.com/types-of-fact-tables-data-warehouse.html/
Volume 9, Issue 4, April 2020                              132                                   http://adalyajournal.com/
   View publication stats