Foundations of Business Analytics
222MBA5103
Unit 2
Course Instructor
Dr. R Thiru Murugan
Syllabus
• Unit I: Introduction to Business Analytics 9 Hours
Business Analytics - Terminologies, Process, Importance, Overview of Analytics
model. Strategy and Business analytics Value Chain – Types of Data, Big Data –
Characteristics – Sources – Types – Structured, Semi-structures and
Unstructured Data
• Unit II: Data Preparation and Methodology 9 Hours
Data Pre-processing – Data Quality – Cleaning – Integration – Reduction –
Transformation – Normalization – Knowledge Discovery – Supervised Vs Un
Supervised learning – Training and test data sets - Managing Information
policy, data quality and change in BA.
Syllabus
• Unit III: Descriptive Analytics
Introduction to Descriptive analytics - Visualizing and Exploring Data -
Descriptive Statistics - Sampling and Estimation - Probability Distribution for
Descriptive Analytics - Analysis of Descriptive analytics
• Unit IV: Predictive & Prescriptive Analytics
Introduction to Predictive analytics - Logic and Data Driven Models -
Predictive Analysis Modeling and procedure - Data Mining for Predictive
analytics. Analysis of Predictive analytics – Prescriptive Analytics:
Prescriptive Modeling - Non-Linear Optimization - Demonstrating Business
Performance Improvement.
Syllabus
• Unit V: Application of Data Visualization Tools
Data Objects and attributes types – Overviews of data visualization
techniques for various kind of data – Methods for visualizing text, graphs,
tags and multidimensional data – Overview of Power BI for visualization.
Data Pre-processing
Data Pre-processing
Data pre-processing is a critical step in business analytics and data analysis. It involves preparing and
cleaning raw data to make it suitable for analysis. High-quality, well-preprocessed data is essential for
accurate and meaningful insights. Here are the key steps and techniques involved in data pre-processing for
business analytics:
• Data Collection:
– Gather data from various sources, including databases, spreadsheets, external APIs, or data
warehouses. Ensure you have the necessary permissions and rights to use the data.
• Data Cleaning:
– Identify and handle missing data: Decide whether to impute missing values, remove records with
missing data, or use other strategies like mean imputation.
– Handle duplicate records: Remove or consolidate duplicate data points to avoid bias in analysis.
– Handle outliers: Detect and deal with outliers that can skew analysis results. Options include
removing outliers, transforming data, or using robust statistical methods.
– Standardize or normalize data: If different variables have different scales, standardize or normalize
them to ensure they have the same impact during analysis.
Data Pre-processing
• Data Transformation:
– Encode categorical data: Convert categorical variables into numerical representations, such as one-hot
encoding or label encoding.
– Feature engineering: Create new features or modify existing ones to capture meaningful patterns or
relationships in the data.
– Time-series data handling: Handle time-related data, such as date and time formats, by extracting relevant
features like day of the week or month.
– Binning: Group continuous data into bins or intervals to simplify analysis or address non-linearity.
• Data Reduction:
– Principal Component Analysis (PCA): Use PCA to reduce the dimensionality of data while retaining the most
important features.
– Feature selection: Identify and keep the most relevant features, discarding less informative ones, to improve
model performance and reduce complexity.
• Data Integration:
– Combine data from multiple sources into a single dataset, ensuring consistency and compatibility between
different data sets.
• Data Scaling:
– Scale or normalize numerical features to bring them into a similar range, especially when using algorithms
sensitive to feature scaling, like k-means clustering or support vector machines.
Data Pre-processing
• Handling Imbalanced Data:
– If your data has imbalanced classes (e.g., in fraud detection or medical diagnosis), apply
techniques such as oversampling, undersampling, or using synthetic data to balance the
classes.
• Data Splitting:
– Divide the data into training, validation, and test sets for model training, evaluation, and
testing. The typical split ratios are 70-80% training, 10-15% validation, and 10-15% testing.
• Data Validation:
– Continuously validate and check data for inconsistencies or issues. Data quality and integrity
are crucial throughout the analysis process.
• Documentation:
– Maintain comprehensive documentation of all data pre-processing steps, including the
rationale behind each decision. This ensures transparency and reproducibility.
Data Pre-processing
• Data Privacy and Security:
– Protect sensitive data by adhering to privacy and security regulations and best practices.
Anonymize or encrypt data if necessary.
• Automate Data Pre-processing:
– Consider using data pre-processing libraries and tools like Python's pandas, scikit-learn, or R to
streamline and automate many of these tasks.
• Effective data pre-processing sets the foundation for meaningful analysis and accurate modeling in
business analytics. It helps reduce noise, address data quality issues, and ensures that the data used
for decision-making is reliable and relevant.
Data Quality in Business Analytics
Data Quality in Business Analytics
Data quality is a critical aspect of business analytics as it directly impacts the
accuracy, reliability, and effectiveness of analytical processes and the decisions
made based on them. Poor data quality can lead to erroneous conclusions and
misguided strategies. Here are key considerations related to data quality in business
analytics:
1.Accuracy: Accurate data is free from errors and closely reflects the true values or
conditions it represents. Inaccuracies can arise from various sources, such as data
entry mistakes, sensor errors, or system glitches. It's crucial to implement data
validation and verification processes to catch and rectify inaccuracies.
2.Completeness: Complete data includes all the necessary information for the
intended analysis. Missing data points or records can lead to biased results and
hinder comprehensive insights. Data preprocessing techniques, like imputation, can
help address missing data, but it's important to understand the impact of imputed
values on the analysis.
Data Quality in Business Analytics
3.Consistency: Consistent data maintains uniform formats, units, and
conventions throughout the dataset. Data inconsistencies can occur when
different sources or systems use varying standards. Standardization and data
transformation can resolve consistency issues.
4.Timeliness: Timely data is up-to-date and relevant to the analysis. Outdated
or stale data can lead to incorrect conclusions, especially in rapidly changing
environments. Regular data refreshes and updates are essential to maintain
data timeliness.
5.Relevance: Relevant data is directly related to the analysis's objectives and
is not cluttered with extraneous information. Irrelevant data can obscure
insights and increase complexity. Careful data selection and feature
engineering help ensure data relevance.
6.Validity: Valid data adheres to the predefined criteria and constraints set for
the dataset. Invalid data may not conform to these rules, potentially introducing
errors. Validation checks and data profiling can identify and address invalid
data.
Data Quality in Business Analytics
7. Continuous Monitoring: Regularly monitor data quality over
time. Implement data quality metrics and automated alerts to identify
and address issues promptly. Continuous monitoring helps maintain
data quality as data evolves.
8.Conformity to Standards: Data should adhere to industry,
organizational, and regulatory standards and guidelines. Non-
compliance can lead to legal and compliance issues. Data
governance practices should enforce conformity to standards.
9.Data Security and Privacy: Protecting sensitive data is crucial to
maintain data quality. Unauthorized access, data breaches, or
improper handling of sensitive information can compromise data
quality and result in legal and reputational damage.
Data Quality in Business Analytics
11.Documentation: Comprehensive documentation of data sources, data
transformations, and data cleaning processes is essential for transparency,
reproducibility, and auditing purposes. Documentation helps stakeholders
understand the data's lineage and quality.
12.Data Governance: Implementing a robust data governance framework
helps establish and maintain data quality standards, policies, and
responsibilities within an organization. Data stewards and data quality teams
play a crucial role in ensuring data quality.
Data quality is an ongoing effort and requires collaboration between data
analysts, data engineers, data scientists, and other stakeholders. It ensures
that the data used for business analytics is trustworthy, reliable, and fit for
purpose, ultimately leading to more accurate and valuable insights for informed
decision-making.
Knowledge discovery in business analytics
• Knowledge discovery in business analytics (KDD) refers to the non-
trivial process of extracting valuable and actionable insights from raw
data. It's not just about crunching numbers, but about uncovering
hidden patterns, trends, and relationships that can inform smarter
decision-making and drive business growth.
The KDD Process
• Data selection and pre-processing: This involves identifying relevant data
sources, cleaning and integrating data, and preparing it for analysis.
• Data mining: Applying algorithms and statistical techniques to extract
patterns and trends from the data. This is where data mining comes
in, which is a subset of KDD focusing on the specific algorithms used for
pattern extraction.
• Pattern interpretation and evaluation: Analyzing the extracted
patterns, assessing their validity and significance, and determining their
potential usefulness for business decisions.
• Knowledge dissemination: Presenting the discovered knowledge in a clear
and actionable way to decision-makers and stakeholders.
Benefits of KDD in Business:
• Improved decision-making: KDD provides data-driven insights that can
help businesses make better strategic and tactical decisions in various
areas, like marketing, finance, operations, and risk management.
• Enhanced customer understanding: KDD can uncover valuable insights into
customer behavior, preferences, and buying patterns, enabling businesses
to personalize their offerings and improve customer engagement.
• Increased operational efficiency: KDD can identify inefficiencies and
bottlenecks in business processes, allowing businesses to streamline
operations and optimize resource allocation.
• New product and service development: KDD can reveal hidden trends and
unmet customer needs, helping businesses develop innovative products
and services that cater to evolving market demands.
Supervised and Unsupervised learning
Supervised Learning:
• Data: Requires labeled data, where each data point has a pre-defined
outcome or category.
• Goal: Predicts outcomes or classifies new data points based on the
patterns learned from labeled training data.
• Applications: Ideal for tasks like customer churn prediction, product
recommendation, spam filtering, and sentiment analysis.
• Pros: Can achieve high accuracy in prediction tasks, especially with
well-labeled data.
• Cons: Requires significant effort and cost to label data, and
performance can suffer with poor labeling quality.
Supervised and Unsupervised learning
Unsupervised Learning:
• Data: Deals with unlabeled data, where data points lack predefined
categories or outcomes.
• Goal: Discovers hidden patterns, trends, and relationships within the
data, without prior knowledge of what to expect.
• Applications: Useful for market segmentation, anomaly
detection, fraud analysis, and customer clustering.
• Pros: Requires less data preparation and labeling, allowing for faster
insights from raw data.
• Cons: Predictions are not always directly possible, and interpretation of
discovered patterns can be challenging.
Management Information Policy
Management Information Policy
A Management Information Policy is a formal document or set of guidelines that an
organization develops and implements to govern the management and use of information
within the organization. This policy is a critical component of an organization's overall
information management strategy and helps ensure that information is handled
appropriately, securely, and in alignment with the organization's goals and objectives.
1. Purpose and Scope: This section outlines the purpose of the policy and defines its scope,
specifying which types of information it covers and which parts of the organization it
applies to.
2. Roles and Responsibilities: Clearly defines the roles and responsibilities of individuals
and departments within the organization regarding information management. This may
include roles such as Information Security Officer, Data Stewards, and end-users.
3. Information Classification: Describes how information should be classified based on its
sensitivity or importance. Common classifications include public, internal, confidential, and
sensitive.
Management Information Policy
4. Access Control: Outlines the procedures and controls for granting and revoking
access to information systems and data. It also includes guidelines for user
authentication and authorization.
5. Data Retention and Disposal: Establishes rules for how long different types of
information should be retained and the proper methods for disposing of information
that is no longer needed.
6. Information Security: Addresses security measures to protect information assets,
including encryption, firewalls, antivirus software, and employee training on security
best practices.
7. Compliance and Legal Requirements: Specifies how the organization will comply
with relevant laws, regulations, and industry standards related to information
management and data protection.
8. Incident Response: Describes the procedures to follow in the event of a data breach
or other information security incident. This includes reporting requirements and steps
for mitigating and recovering from incidents.
Management Information Policy
• 9. Training and Awareness: Outlines the organization's commitment to educating
employees and stakeholders about information management policies and best
practices.
• 10. Monitoring and Auditing: Describes how the organization will monitor
compliance with the policy and conduct periodic audits to ensure that information
is being managed according to the established guidelines.
• 11. Policy Review and Updates: States how often the policy will be reviewed and
updated to ensure its continued relevance and effectiveness.
• 12. Enforcement and Consequences: Details the consequences of policy violations
and the disciplinary actions that may be taken against individuals or departments
that do not comply with the policy.