Unit 2
Unit 2
ARCHITECTURE
Types of OLAP servers,
OLAP Servers
Online Analytical Processing(OLAP) refers to a set of software tools used for data analysis
in order to make business decisions. OLAP provides a platform for gaining insights from
databases retrieved from multiple database systems at the same time. It is based on a
multidimensional data model, which enables users to extract and view data from various
perspectives. A multidimensional database is used to store OLAP data. Many Business
Intelligence (BI) applications rely on OLAP technology.
ROLAP
MOLAP
HOLAP
Relational On-Line Analytical Processing (ROLAP) is primarily used for data stored in a
relational database, where both the base data and dimension tables are stored as relational
tables. ROLAP servers are used to bridge the gap between the relational back-end server
and the client’s front-end tools. ROLAP servers store and manage warehouse data using
RDBMS, and OLAP middleware fills in the gaps.
Benefits:
Limitations:
MOLAP stores data on discs in the form of a specialized multidimensional array structure.
It is used for OLAP, which is based on the arrays’ random access capability. Dimension
instances determine array elements, and the data or measured value associated with each
cell is typically stored in the corresponding array element. The multidimensional array is
typically stored in MOLAP in a linear allocation based on nested traversal of the axes in
some predetermined order.
However, unlike ROLAP, which stores only records with non-zero facts, all array elements
are defined in MOLAP, and as a result, the arrays tend to be sparse, with empty elements
occupying a larger portion of them. MOLAP systems typically include provisions such as
advanced indexing and hashing to locate data while performing queries for handling sparse
arrays, because both storage and retrieval costs are important when evaluating online
performance. MOLAP cubes are ideal for slicing and dicing data and can perform complex
calculations. When the cube is created, all calculations are pre-generated.
Benefits:
Limitations:
Since all calculations are performed when the cube is built, a large amount of data cannot
be stored in the cube itself
Benefits:
Limitations
Because it supports both MOLAP and ROLAP servers, HOLAP architecture is extremely
complex.
The Three-Tier Data Warehouse Architecture is the commonly used Data Warehouse
design in order to build a Data Warehouse by including the required Data Warehouse
Schema Model, the required OLAP server type, and the required front-end tools for
Reporting or Analysis purposes, which as the name suggests contains three tiers such as
Top tier, Bottom Tier and the Middle Tier that are procedurally linked with one another
from Bottom tier(data sources) through Middle tier(OLAP servers) to the Top tier(Front-end
tools).
Data Warehouse Architecture is the design based on which a Data Warehouse is built, to
accommodate the desired type of Data Warehouse Schema, user interface application and
database management system, for data organization and repository structure. The type of
Architecture is chosen based on the requirement provided by the project team. Three-tier
Data Warehouse Architecture is the commonly used choice, due to its detailing in the
structure. The three different tiers here are termed as:
Top-Tier
Middle-Tier
Bottom-Tier
Each Tier can have different components based on the prerequisites presented by the
decision-makers of the project but are subject to the novelty of their respective tier.
1. Bottom Tier
The Bottom Tier in the three-tier architecture of a data warehouse consists of the Data
Repository. Data Repository is the storage space for the data extracted from various data
sources, which undergoes a series of activities as a part of the ETL process. ETL stands for
Extract, Transform and Load. As a preliminary process, before the data is loaded into the
repository, all the data relevant and required are identified from several sources of the
system. These data are then cleaned up, to avoid repeating or junk data from its current
storage units. The next step is to transform all these data into a single format of storage.
The final step of ETL is to Load the data on the repository. Few commonly used ETL tools
are:
Informatica
Microsoft SSIS
Snaplogic
Confluent
Apache Kafka
Alooma
Ab Initio
IBM Infosphere
The storage type of the repository can be a relational database management system or a
multidimensional database management system. A relational database system can hold
simple relational data, whereas a multidimensional database system can hold data that
more than one dimension. Whenever the Repository includes both relational and
multidimensional database management systems, there exists a metadata unit. As the name
suggests, the metadata unit consists of all the metadata fetched from both the relational
database and multidimensional database systems. This Metadata unit provides incoming
data to the next tier, that is, the middle tier. From the user’s standpoint, the data from the
bottom tier can be accessed only with the use of SQL queries. The complexity of the
queries depends on the type of database. Data from the relational database system can be
retrieved using simple queries, whereas the multidimensional database system demands
complex queries with multiple joins and conditional statements.
2. Middle Tier
The Middle tier here is the tier with the OLAP servers. The Data Warehouse can have more
than one OLAP server, and it can have more than one type of OLAP server model as well,
which depends on the volume of the data to be processed and the type of data held in the
bottom tier. There are three types of OLAP server models, such as:
ROLAP
MOLAP
This is applied when the repository consists of only the multidimensional database system
in it.
HOLAP
When the repository contains both the relational database management system and the
multidimensional database management system, HOLAP is the best solution for a smooth
functional flow between the database systems. HOLAP allows storing data in both the
relational and the multidimensional formats.
The Middle Tier acts as an intermediary component between the top tier and the data
repository, that is, the top tier and the bottom tier respectively. From the user’s standpoint,
the middle tier gives an idea about the conceptual outlook of the database.
3. Top Tier
The Top Tier is a front-end layer, that is, the user interface that allows the user to connect
with the database systems. This user interface is usually a tool or an API call, which is used
to fetch the required data for Reporting, Analysis, and Data Mining purposes. The type of
tool depends purely on the form of outcome expected. It could be a Reporting tool, an
Analysis tool, a Query tool or a Data mining tool.
It is essential that the Top Tier should be uncomplicated in terms of usability. Only user-
friendly tools can give effective outcomes. Even when the bottom tier and middle tier are
designed with at most cautiousness and clarity, if the Top tier is enabled with a bungling
front-end tool, then the whole Data Warehouse Architecture can become an utter failure.
This makes the selection of the user interface/ front-end tool as the Top Tier, which will
serve as the face of the Data Warehouse system, a very significant part of the Three-Tier
Data Warehouse Architecture designing process.
IBM Cognos
Microsoft BI Platform
Pentaho
Crystal Reports
SAP BW
Most organizations build and maintain a single centralized data warehouse environment.
This setup makes sense for many reasons:
The data in the warehouse is integrated across the corporation, and an integrated view is
used only at headquarters.
The volume of data in the data warehouse is such that a single centralized repository of
data makes sense.
Even if data could be integrated, if it were dispersed across multiple local sites, it would be
cumbersome to access.
In short, politics, economics, and technology greatly favor a single centralized data
warehouse. Still, in a few cases, a distributed data warehouse makes sense.
A virtual warehouse is another term for a data warehouse. A data warehouse is a computing
tool designed to simplify decision-making in business management. It collects and displays
business data relating to a specific moment in time, creating a snapshot of the condition of
the business at that moment. Virtual warehouses often collect data from a wide variety of
sources.
1. Requirements analysis and capacity planning: The first process in data warehousing
involves defining enterprise needs, defining architectures, carrying out capacity
planning, and selecting the hardware and software tools. This step will contain be
consulting senior management as well as the different stakeholder.
2. Hardware integration: Once the hardware and software has been selected, they
require to be put by integrating the servers, the storage methods, and the user
software tools.
3. Modeling: Modelling is a significant stage that involves designing the warehouse
schema and views. This may contain using a modeling tool if the data warehouses are
sophisticated.
4. Physical modeling: For the data warehouses to perform efficiently, physical modeling
is needed. This contains designing the physical data warehouse organization, data
placement, data partitioning, deciding on access techniques, and indexing.
5.Sources: The information for the data warehouse is likely to come from several
data sources. This step contains identifying and connecting the sources using the
gateway, ODBC drives, or another wrapper.
6.ETL: The data from the source system will require to go through an ETL phase.
The process of designing and implementing the ETL phase may contain defining a
suitable ETL tool vendors and purchasing and implementing the tools. This may
contains customize the tool to suit the need of the enterprises.
7. Populate the data warehouses: Once the ETL tools have been agreed upon, testing
the tools will be needed, perhaps using a staging area. Once everything is working
adequately, the ETL tools may be used in populating the warehouses given the
schema and view definition.
8. User applications: For the data warehouses to be helpful, there must be end-user
applications. This step contains designing and implementing applications required
by the end-users.
9. Roll-out the warehouses and applications: Once the data warehouse has been
populated and the end-client applications tested, the warehouse system and the
operations may be rolled out for the user’s community to use.
Data Warehousing – Tuning
A data warehouse keeps evolving and it is unpredictable what query the user is going to
post in the future. Therefore it becomes more difficult to tune a data warehouse system. In
this chapter, we will discuss how to tune the different aspects of a data warehouse such as
performance, data load, queries, etc.
It is very difficult to predict what query the user is going to post in the future.
Performance Assessment
Scan rates
It is of no use trying to tune response time, if they are already better than those required.
To hide the complexity of the system from the user, aggregations and views should be used.
It is also possible that the user can write a query you had not tuned for.
Data load is a critical part of overnight processing. Nothing else can run until data load is
complete. This is the entry point into the system.
Note – If there is a delay in transferring the data, or in arrival of data then the entire system
is affected badly. Therefore it is very important to tune the data load first.
There are various approaches of tuning data load that are discussed below –
The very common approach is to insert data using the SQL Layer. In this approach, normal
checks and constraints need to be performed. When the data is inserted into the table, the
code will run to check for enough space to insert the data. If sufficient space is not
available, then more space may have to be allocated to these tables. These checks take time
to perform and are costly to CPU.
The second approach is to bypass all these checks and constraints and place the data
directly into the preformatted blocks. These blocks are later written to the database. It is
faster than the first approach, but it can work only with whole blocks of data. This can lead
to some space wastage.
The third approach is that while loading the data into the table that already contains the
table, we can maintain indexes.
The fourth approach says that to load the data in tables that already contain data, drop the
indexes & recreate them when the data load is complete. The choice between the third and
the fourth approach depends on how much data is already loaded and how many indexes
need to be rebuilt.
Integrity Checks
Integrity checking highly affects the performance of the load. Following are the points to
remember –
Integrity checks need to be limited because they require heavy processing power.
Integrity checks should be applied on the source system to avoid performance degrade of
data load.
Tuning Queries
Fixed queries
Ad hoc queries
Fixed Queries
Fixed queries are well defined. Following are the examples of fixed queries –
Regular reports
Canned queries
Common aggregations
Tuning the fixed queries in a data warehouse is same as in a relational database system.
The only difference is that the amount of data to be queried may be different. It is good to
store the most successful execution plan while testing fixed queries. Storing these executing
plan will allow us to spot changing data size and data skew, as it will cause the execution
plan to change.
Ad hoc Queries
To understand ad hoc queries, it is important to know the ad hoc users of the data
warehouse. For each user or group of users, you need to know the following –
Testing is very important for data warehouse systems to make them work correctly and
efficiently. There are three basic levels of testing performed on a data warehouse –
Unit testing
Integration testing
System testing
Unit Testing
Each module, i.e., procedure, program, SQL Script, Unix shell is tested.
Integration Testing
In integration testing, the various modules of the application are brought together and then
tested against the number of inputs.
The purpose of system testing is to check whether the entire system works correctly
together or not.
Since the size of the whole data warehouse is very large, it is usually possible to perform
minimal system testing before the test plan can be enacted.
ETL stands for Extract, Transform, Load and it is a process used in data warehousing to
extract data from various sources, transform it into a format suitable for loading into a data
warehouse, and then load it into the warehouse. The process of ETL can be broken down
into the following three stages:
Extract: The first stage in the ETL process is to extract data from various sources such as
transactional systems, spreadsheets, and flat files. This step involves reading data from the
source systems and storing it in a staging area.
Transform: In this stage, the extracted data is transformed into a format that is suitable for
loading into the data warehouse. This may involve cleaning and validating the data,
converting data types, combining data from multiple sources, and creating new data fields.
Load: After the data is transformed, it is loaded into the data warehouse. This step involves
creating the physical data structures and loading the data into the warehouse.
The ETL process is an iterative process that is repeated as new data is added to the
warehouse. The process is important because it ensures that the data in the data warehouse
is accurate, complete, and up-to-date. It also helps to ensure that the data is in the format
required for data mining and reporting.
Additionally, there are many different ETL tools and technologies available, such as
Informatica, Talend, DataStage, and others, that can automate and simplify the ETL
process.
ETL is a process in Data Warehousing and it stands for Extract, Transform and Load. It is a
process in which an ETL tool extracts the data from various data source systems,
transforms it in the staging area, and then finally, loads it into the Data Warehouse system.
Extraction:
The first step of the ETL process is extraction. In this step, data from various source
systems is extracted which can be in various formats like relational databases, No SQL,
XML, and flat files into the staging area. It is important to extract the data from various
source systems and store it into the staging area first and not directly into the data
warehouse because the extracted data is in various formats and can be corrupted also.
Hence loading it directly into the data warehouse may damage it and rollback will be much
more difficult. Therefore, this is one of the most important steps of ETL process.
Transformation:
The second step of the ETL process is transformation. In this step, a set of rules or
functions are applied on the extracted data to convert it into a single standard format. It
may involve following processes/tasks:
Cleaning – filling up the NULL values with some default values, mapping U.S.A, United
States, and America into USA, etc.
Loading:
The third and final step of the ETL process is loading. In this step, the transformed data is
finally loaded into the data warehouse. Sometimes the data is updated by loading into the
data warehouse very frequently and sometimes it is done after longer but regular intervals.
The rate and period of loading solely depends on the requirements and varies from system
to system.
ETL process can also use the pipelining concept i.e. as soon as some data is extracted, it
can transformed and during that period some new data can be extracted. And while the
transformed data is being loaded into the data warehouse, the already extracted data can
be transformed. The block diagram of the pipelining of ETL process is shown below:
ETL Tools: Most commonly used ETL tools are Hevo, Sybase, Oracle Warehouse builder,
CloverETL, and MarkLogic.
Data Warehouses: Most commonly used Data Warehouses are Snowflake, Redshift,
BigQuery, and Firebolt.
Designing an ETL (Extract, Transform, Load) process involves several key steps:
Define Requirements:
Clearly understand the business needs and data requirements. Identify data sources,
formats, and transformation rules.
Data Extraction:
Extract data from source systems using appropriate methods such as APIs, database
queries, or flat file imports.
Data Transformation:
Apply necessary transformations to the extracted data, such as cleaning, filtering, and
aggregating. Ensure data quality and consistency.
Data Loading:
Load the transformed data into the target system, which can be a data warehouse,
database, or other storage solutions.
Error Handling:
Implement mechanisms for handling errors during extraction, transformation, and loading.
Logging and alerting are crucial for monitoring and troubleshooting.
Design the ETL process to handle large volumes of data efficiently. Consider parallel
processing, batching, and optimization techniques for performance improvement.
Incremental Loading:
Implement strategies for incremental loading to update only the changed or new data,
reducing processing time and resource usage.
Metadata Management:
Maintain metadata to document the ETL process. This includes data lineage,
transformation rules, and source-target mappings.
Security:
Testing:
Conduct thorough testing, including unit testing for individual components, integration
testing for the entire ETL process, and validation against business requirements.
Documentation:
Document the ETL design, configuration, and any specific considerations for future
reference or handover to other teams.
Charts and Graphs: Include bar charts, line graphs, pie charts, and other visual
representations to showcase trends, comparisons, and distributions within the data.
Heatmaps: Display data values using color gradients, making it easy to identify patterns,
concentrations, or anomalies in large datasets.
Pivot Tables: Allow users to manipulate and explore multidimensional data interactively,
facilitating dynamic analysis and reporting.
Word Clouds: Represent word frequency in textual data, with larger font sizes for more
frequently occurring terms, offering a quick overview of common themes.
Bubble Charts: Combine three dimensions of data by using bubbles of different sizes and
colors, conveying information about quantity, category, and another variable.
Time Series Visualizations: Display data changes over time, using line charts or other time-
focused visualizations to highlight trends, seasonality, or anomalies.
Cohort Analysis: Track and compare groups of users who share a common characteristic
over time, providing insights into user behavior and performance metrics.
Histograms: Illustrate the distribution of a single variable, showing the frequency of data
within different ranges or bins.
Scatter Plots: Visualize relationships between two variables, revealing correlations, clusters,
or outliers.
Data warehousing maintenance includes tasks like monitoring performance, ensuring data
quality, and managing ETL processes. Regularly update and optimize indexes, review
storage usage, and address any issues promptly. Implement data backups and recovery
plans to safeguard against data loss. Stay current with software updates and periodically
reassess the data warehouse architecture to accommodate evolving business needs.
Regularly communicate with stakeholders to understand changing requirements and adapt
the data warehouse accordingly.
The data warehouse development lifecycle is a framework that guides the development
process from inception through deployment and ongoing maintenance. It typically includes
the following phases:
Requirement
Front-end design
Implementation
Deployment
Data warehousing is a flow process that gathers and manages structured and unstructured
data from numerous sources into a centralized repository. It makes actionable business
decisions
Requirement Specification: It is the first step in the development of the Data Warehouse
and is done by business analysts. In this step, Business Analysts prepare business
requirement specification documents. More than 50% of requirements are collected from
the client side and it takes 3-4 months to collect all the requirements. After the
requirements are gathered, the data modeler starts recognizing the dimensions, facts &
combinations based on the requirements. We can say that this is an overall blueprint of the
data warehouse. But, this phase is more about determining business needs and placing
them in the data warehouse.
Data Modelling: This is the second step in the development of the Data Warehouse. Data
Modelling is the process of visualizing data distribution and designing databases by
fulfilling the requirements to transform the data into a format that can be stored in the data
warehouse. For example, whenever we start building a house, we put all the things in the
correct position as specified in the blueprint. That’s what data modeling is for data
warehouses. Data modelling helps to organize data, creates connections between data sets,
and it’s useful for establishing data compliance and its security that line up with data
warehousing goals. It is the most complex phase of data warehouse development. And,
there are many data modelling techniques that businesses use for warehouse design. Data
modelling typically takes place at the data mart level and branches out in a data warehouse.
It’s the logic of how the data is stored concerning other data. There are three data models
for data warehouses:
Star Schema
Snowflake Schema
Galaxy Schema.
ELT Design and Development: This is the third step in the development of the Data
Warehouse. ETL or Extract, Transfer, Load tool may extract data from various source
systems and store it in a data lake. An ETL process can extract the data from the lake, after
that transform it and load it into a data warehouse for reporting. For optimal speeds, good
visualization, and the ability to build easy, replicable, and consistent data pipelines between
all of the existing architecture and the new data warehouse, we need ELT tools. This is
where ETL tools like SAS Data Management, IBM Information Server, Hive, etc. Come into
the picture. A good ETL process can be helpful in constructing a simple yet functional data
warehouse that’s valuable throughout every layer of the organization.
OLAP Cubes: This is the fourth step in the development of the Data Warehouse. An OLAP
cube, also known as a multidimensional cube or hypercube, is a data structure that allows
fast analysis of data according to the multiple dimensions that define a business problem. A
data warehouse would extract information from multiple data sources and formats like text
files, excel sheets, multimedia files, etc. The extracted data is cleaned and transformed and
is loaded into an OLAP server (or OLAP cube) where information is pre-processed in
advance for further analysis. Usually, data operations and analysis are performed using a
simple spreadsheet, where data values are arranged in row and column format. This is ideal
for two-dimensional data. However, OLAP contains multidimensional data, with data
typically obtained from different and unrelated sources. Employing a spreadsheet isn’t an
optimum choice. The cube will store and analyze multidimensional data in a logical and
orderly manner. Now, data warehouses are now offered as a fully built product that is
configurable and capable of staging multiple types of data. OLAP cubes are becoming
outdated as OLAP cubes can’t deliver real-time analysis and reporting, as businesses are
now expecting something with high performance.
UI Development: This is the fifth step in the development of the Data Warehouse. So far,
the processes discussed have taken place at the backend. There is a need for a user
interface for how the user and a computer system interact, in particular the use of input
devices and software, to immediately access the data warehouse for analysis and generating
reports. The main aim of a UI is to enable a user to effectively manage a device or machine
they’re interacting with. There are plenty of tools in the market that helps with UI
development. BI tools like Tableau or PowerBI for those using BigQuery are great choices.
Maintenance: This is the sixth step in the development of the Data Warehouse. In this
phase, we can update or make changes to the schema and data warehouse’s application
domain or requirements. Data warehouse maintenance systems must provide means to
keep track of schema modifications as well, for instance, modifications. At the schema level,
we can perform operations for the Insertion, and change dimensions and categories.
Changes are, for example, adding or deleting user-defined attributes.
Test and Deployment: This is often the ultimate step in the Data Warehouse development
cycle. Businesses and organizations test data warehouses to ensure whether the required
business problems are implemented successfully or not. The warehouse testing involves the
scrutiny of enormous volumes of data. Data that has to be compared comes from
heterogeneous data sources like relational databases, flat files, operational data, etc. The
overall data warehouse project testing phases include: Data completeness, Data
Transformation, Data is loaded by means of ETL tools, Data integrity, etc. After testing the
data warehouse, we deployed it so that users could immediately access the data and
perform analysis. Basically, in this phase, the data warehouse is turned on and lets the user
take the benefit of it. At the time of data warehouse deployment, most of its functions are
implemented. The data warehouses can be deployed at their own data center or on the
cloud.
Measurement for creating KPI (Key Performance Indicators) on the basis of historic data
fed into the system.
Identify the benchmark and then set the benchmarks for different processes.
Business Intelligence systems can use to identify market trends and also to spot business
problems that need to be identified and solved.
Business Intelligence helps in data visualization will increase the quality of data and then
also increases the quality of decision making.
Business Intelligence systems can be used by large enterprises, and organizations along
with Small and Medium Enterprises, because it is quite affordable.
Analyst (Data Analyst or Business Analyst): They are the statistician of the company, they
used BI on the basis of historical data priorly stored in the system.
Head or Manager of the Company: Head of the company uses Business Intelligence used to
increase the profitability of their company by increasing the efficiency in their decisions on
the basis of all the knowledge they discovered.
Small Business Owners: Can be used by a small businessman because it is quite affordable
too.
Strategic Level: The strategic level is the level where the Heads of the company decide the
strategies of any business.
Tactical Level: Once the strategy is made though for handling all the details and matters
have a tactical level where all the technologies and methodologies come under one
umbrella. This level is further responsible for continuously updating the data.
Operational Level: Operation decisions are made at this level. Operational decisions help in
operating the system.
Applications of Business Intelligence:
In Predictive Analytics.
In Prescriptive Analytics.
Tableau: A business intelligence and data visualization application that enables users to
connect to different data sources, build interactive dashboards, and share findings with
others.
Microsoft Power BI: A cloud-based business intelligence program that enables users to
connect to a variety of data sources, produce visualizations, and communicate findings.
QlikView is a business intelligence and data visualization platform that enables users to
build interactive dashboards and examine data in novel ways.
Data visualization, reporting, and analytics tools are all included in SAP BusinessObjects, a
complete business intelligence suite.
IBM Cognos: A tool for performance management and corporate intelligence that enables
users to build reports, scorecards, and dashboards.
Data visualization, reporting, and analytics technologies are all part of the full business
intelligence suite known as Oracle Business Intelligence.
Create dynamic dashboards and reports with MicroStrategy, a business intelligence and
data visualization tool.
Data visualization, reporting, and analytics tools are all part of the full business intelligence
suite known as SAS Business Intelligence.
TIBCO Spotfire is a business intelligence and data visualization platform that enables users
to build interactive dashboards and investigate data in novel ways.
Looker: A tool for business intelligence and data visualization that enables users to build
interactive dashboards and investigate data in novel ways.
AI is rapidly changing how businesses operate. It can automate routine tasks and provide
insights for strategic decision-making.
Data governance
Data governance is becoming more important as security and privacy regulations become
stricter. It’s necessary for companies that want to invest in business intelligence and get
meaningful results.
Advanced analytics
Data discovery
Data discovery techniques and visualization can empower businesses by producing relevant
insights for long-term decision-making.
DQM was one of the hottest business intelligence trends in 2019. Businesses want to
implement data quality processes to enhance their ability to utilize business intelligence.
Data security
Automation is a key component of any business intelligence tool. It allows you to create a
repeatable process that can be applied across your organization.
Predictive analytics: Using machine learning models and algorithms to analyze historical
data to predict demand, optimize inventory levels, and anticipate supply chain disruptions
Cloud-based data warehousing: A recent study by Gitnux highlighted this as one of the
most important data warehouse trends
Hybrid data warehousing: Another important data warehouse trend according to the Gitnux
study
Real-time data processing: Another important data warehouse trend according to the
Gitnux study
Self-service analytics: Another important data warehouse trend according to the Gitnux
study
Artificial intelligence and machine learning integration: Another important data warehouse
trend according to the Gitnux study
Data virtualization: Another important data warehouse trend according to the Gitnux study
Metadata management: Another important data warehouse trend according to the Gitnux
study
Data warehouses are centralized storage systems that allow for the storing, analyzing, and
interpreting of data. They provide a comprehensive view of an organization’s data, enabling
valuable insights and data-driven decisions
BUSINESS
INTELLIGENCE
DATA
WAREHOUSING
The SAS System provides a data warehouse model that addresses the entire scope of
warehouse management, organization, and exploitation. It is a comprehensive model,
supported by a broad array of products.
SAS/INSIGHT is a data mining tool. The SAS System has also become a pervasive tool for
managing and mining.
A data warehouse is used to store large amounts of data that are collected from multiple
resources. Data is typically stored in a data warehouse through an extract, transform, and
load (ETL) process.
SAS, or Statistical Analysis System, is a software suite that facilitates data management,
advanced analytics, and business intelligence. It includes tools for statistical analysis, data
visualization, and reporting. SAS is widely used in fields such as finance, healthcare, and
research for handling, analyzing, and interpreting data to make informed decisions.
Tools of SAS
SAS offers a comprehensive suite of tools for various data-related tasks. Some key
components include:
Base SAS: Core functionality for data management, analytics, and reporting.
SAS/STAT: Module for statistical analysis, including regression, analysis of variance, and
multivariate analysis.
SAS/SQL: Provides a SQL interface for querying and manipulating data within SAS.
SAS/GRAPH: Enables data visualization through various charts, plots, and maps.
SAS/ETS: Focuses on time series analysis and econometrics.
SAS Enterprise Guide: A point-and-click interface for data analysis and reporting.
SAS Enterprise Miner: Used for data mining and machine learning tasks.
SAS Visual Analytics: Supports interactive, exploratory data analysis and visual reporting.
These tools cater to different aspects of data handling and analysis, making SAS a versatile
platform for diverse applications.