0% found this document useful (0 votes)

8 views15 pages

DA unit-II

The document provides an overview of data analytics, emphasizing its importance in extracting insights from large datasets to improve business performance. It outlines the main types of data analytics, popular tools used in the field, and the significance of data modeling techniques for effective decision-making. Additionally, it discusses various database types and the role of big data analytics in modern business environments.

Uploaded by

saimadhavanag

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views15 pages

DA unit-II

Uploaded by

saimadhavanag

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

UNIT – II

INTRODUCTION TO ANALYTICS

2.1 Introduction to Analytics

As an enormous amount of data gets generated, the need to extract useful insights is a
must for a business enterprise. Data Analytics has a key role in improving your business.
Here are 4 main factors which signify the need for Data Analytics:

• Gather Hidden Insights – Hidden insights from data are gathered and then analyzed with
respect to business requirements.

• Generate Reports – Reports are generated from the data and are passed on to the
respective teams and individuals to deal with further actions for a high rise in business.

• Perform Market Analysis – Market Analysis can be performed to understand the strengths
and the weaknesses of competitors.

• Improve Business Requirement – Analysis of Data allows improving Business to

customer requirements and experience.
Data Analytics refers to the techniques to analyze data to enhance productivity and business
gain. Data is extracted from various sources and is cleaned and categorized to analyze
different behavioral patterns. The techniques and the tools used vary according to the
organization or individual.
Data analysts translate numbers into plain English. A Data Analyst delivers value to their
companies by taking information about specific topics and then interpreting, analyzing,
and presenting findings in comprehensive reports. So, if you have the capability to collect
data from various sources, analyze the data, gather hidden insights and generate reports, then
you can become a Data Analyst. Refer to the image below:
In general data analytics also deals with bit of human knowledge as discussed below in figure
2.2 in this under each type of analytics there is a part of human knowledge required in
prediction. Descriptive analytics requires the highest human input while predictive analytics
requires less human input. In case of prescriptive analytics no human input is required since
all the data is predicted.

Types of Data Analytics

There are four major types of data analytics:
1. Predictive (forecasting)
2. Descriptive (business intelligence and data mining)
3. Prescriptive (optimization and simulation)
4. Diagnostic analytics

Fig 2.3 Data and Human work

2.2 Introduction to Tools and Environment
In general data analytics deals with three main parts, subject knowledge, statistics and person
with computer knowledge to work on a tool to give insight in to the business. In the mainly
used tool is Rand Python as shown in figure 2.3
With the increasing demand for Data Analytics in the market, many tools have emerged with
various functionalities for this purpose. Either open-source or user-friendly, the top tools in
the data analytics market are as follows.
•R programming – This tool is the leading analytics tool used for statistics and data
modelling. R compiles and runs on various platforms such as UNIX, Windows, and Mac OS.
It also provides tools to automatically install all packages as per user-requirement.

• Python – Python is an open-source, object-oriented programming language which is easy to

read, write and maintain. It provides various machine learning and visualization libraries such
as Scikit-learn, Tensor Flow, Matplotlib, Pandas, Keras etc. It also can be assembled on any
platform like SQL server, a Mongo DB database or JSON
•Tableau Public – This is a free software that connects to any data source such as Excel,
corporate Data Warehouse etc. It then creates visualizations, maps, dashboards etc with real-
time updates on the web.

• QlikView – This tool offers in-memory data processing with the results delivered to the
end-users quickly. It also offers data association and data visualization with data being
compressed to almost 10% of its original size.
•SAS – A programming language and environment for data manipulation and analytics, this
tool is easily accessible and can analyze data from different sources.

• Microsoft Excel – This tool is one of the most widely used tools for data analytics. Mostly
used for clients’ internal data, this tool analyzes the tasks that summarize the data with a
preview of pivot tables.

• RapidMiner – A powerful, integrated platform that can integrate with any data source
types such as Access, Excel, Microsoft SQL, Tera data, Oracle, Sybase etc. This tool is
mostly used for predictive analytics, such as data mining, text analytics, machine learning.
• KNIME – Konstanz Information Miner (KNIME) is an open-source data analytics
platform, which allows you to analyze and model data. With the benefit of visual
programming, KNIME provides a platform for reporting and integration through its modular
data pipeline concept.
• OpenRefine – Also known as GoogleRefine, this data cleaning software will help you clean
up data for analysis. It is used for cleaning messy data, the transformation of data and parsing
data from websites.

• Apache Spark – One of the largest large-scale data processing engine, this tool executes
applications in Hadoop clusters 100 times faster in memory and 10 times faster on disk. This
tool is also popular for data pipelines and machine learning model development
Apart from the above-mentioned capabilities, a Data Analyst should also possess skills such
as Statistics, Data Cleaning, Exploratory Data Analysis, and Data Visualization. Also, if you
have knowledge of Machine Learning, then that would make you stand out from the crowd.

2.3 Application of modelling a business & Need for business Modelling

Data analytics is mainly involved in field of business in various concerns for the following
purpose and it varies according to business needs and it is discussed below in detail.
Nowadays majority of the business deals with prediction with large amount of data to work
with.
Using big data as fundamental factor of making decision which need new capability, most
firms are far away from accessing all data resources. Companies in various sectors have
acquired crucial insight from the structured data collected from different enterprise systems
and anatomize by commercial database management systems. Eg:
1.) Facebook and Twitter to standard the instantaneous influence on campaign and to
examine consumer opinion about their products
2.) Some companies, like Amazon, eBay, and Google, considered as early commandants,
examining factors that control performance to define what raise sales revenue and user
interactivity.
2.3.1 Utilizing Hadoop in Big Data Analytics.
Hadoop is an open source software platform that enables processing of large data sets in a
distributed computing environment", it discusses some concepts according to big data, the
rules for building, organizing and analyzing huge data-sets in the business environment, they
offered 3 architecture layers and also they indicate some graphical tools to explore and
represent unstructured-data, the authors specified how the famous companies could improve
their business. Eg: Google, Twitter and Facebook show their attention in processing big data
within cloud-environment
Fig 2.4: Working of Hadoop – With Map Reduce Concept

The Map() step: Each worker node applies the Map() function to the local data and writes the
output to a temporary storage space. The Map() code is run exactly once for each K1 key
value, generating output that is organized by key values K2. A master node arranges it so that
for redundant copies of input data only one is processed.
The Shuffle ()step: The map output is sent to the reduce processors, which assign the K2 key
value that each processor should work on, and provide that processor with all of the map-
generated data associated with that key value, such that all data belonging to one key are
located on the same worker node.
The Reduce() step: Worker nodes process each group of output data(per key) in parallel,
executing the user provided Reduce() code; each function is run exactly once for each K2 key
value produced by the map step.
Produce the final output: The Map Reduce system collects all of the reduce outputs and sorts
them by K2 to produce the final out-come.
Fig.2.4 shows the classical “word count problem” using the Map Reduce paradigm. As
shown in Fig.2.4, initially a process will split the data into a subset of chunks that will later
be processed by the mappers. Once the key/values are generated by mappers, a shuffling
process is used to mix (combine) these key values (combining the same keys in the same
worker node). Finally, the reduce functions are used to count the words that generate a
common output as a result of the algorithm. As a result of the execution or
wrappers/reducers, the output will generate a sorted list of word counts from the original
text input
2.3.2 The Employment of Big Data Analytics on IBM.
IBM and Microsoft are prominent representatives. IBM represented many big data options
that enable users to storing, managing, and analyzing data through various resources; it has a
good rendering on business-intelligence also healthcare areas. Compared with IBM, also
Microsoft showed powerful work in the area of cloud computing activities and techniques
another example is Face-book and Twitter, who are collecting various data from user's
profiles and using it to increase their revenue
2.3.3 The Performance of Data Driven Companies.
Big data analytics and Business intelligence are united fields which became widely
significant in the business and academic area, companies are permanently trying to make
insight from the extending the three V's ( variety, volume and velocity) to support decision
making
2.4 Databases
Database is an organized collection of structured information, or data, typically stored
electronically in a computer system. A database is usually controlled by a database
management system (DBMS)
The database can be divided into various categories such as text databases, desktop database
programs, relational database management systems (RDMS), and No SQL and object-
oriented databases
A text database is a system that maintains a (usually large) text collection and provides fast
and accurate access to it. Eg: Text book, magazine, journals, manuals, etc..
A desktop database is a database system that is made to run on a single computer or PC.
These simpler solutions for data storage are much more limited and constrained than larger
data center or data warehouse systems, where primitive database software is replaced by
sophisticated hardware and networking setups. Eg: Microsoft excel, open access, etc.
A relational database (RDB) is a collective set of multiple data sets organized by tables,
records and columns. RDBs establish a well-defined relationship between database tables.
Tables communicate and share information, which facilitates data search ability, organization
and reporting. Eg: sql, oracle,Db2, DbaaS etc
No SQL databases are non-tabular, and store data differently than relational tables. No SQL
databases come in a variety of types based on their data model. The main types are document,
key-value, wide-column, and graph. Eg: JSON,Mango DB,Couch DB etc
Object-oriented databases (OODB) are databases that represent data in the form of objects
and classes. In object-oriented terminology, an object is a real-world entity, and a class is a
collection of objects. Object-oriented databases follow the fundamental principles of object-
oriented programming (OOP). Eg: c++, java, c#, small talk, LISP etc
2.5 Types of Data and variables
In any database we will be working with data to perform any kind of analysis and
predication. In relational data base management system we normally use rows to represent
data and columns to represent the attribute.
In terms of big data we represent the columns from RDMS as an attribute or a variable. This
variable can be divided in to two types’ categorical data or qualitative data and continuous or
discrete data called as quantitative data. As shown below in figure 2.5.
Qualitative data or Categorical data is normally represented as variable that holds characters.
And this is divided in to two types’ nominal data and ordinal data.
In Nominal Data there is no natural ordering in values in the attribute of the dataset. Eg:
color, Gender, nouns (name, place, animal, thing). These categories cannot be predefined
with order for example there is no specific way to arrange gender of 50 students in a class. In
this case the first student can be male or female similarly for all 50 students. So ordering
cannot be valid.
In Ordinal Data there is natural ordering in values in the attribute of the dataset. Eg: size (S,
M, L, XL, XXL), rating (excellent, good, better, worst). In the above example we can
quantify the amount of data after performing ordering which gives valuable insights into the
data.

Fig 2.5: Types of Data Variables

Quantitative data or (discrete or continuous data) can be further divided in to two types’
discrete attribute and continuous attribute.
Discrete Attribute which takes only finite number of numerical values (integers). Eg: number
of buttons, no of days for product delivery etc.. These data can be represented at every
specific interval in case of time series data mining or even in ratio based entries.
Continuous Attribute which takes finite number of fractional values. Eg: price, discount,
height, weight, length, temperature, speed etc. These data can be represented at every specific
interval in case of time series data mining or even in ratio based entries.
2.5 Data Modelling Techniques
Data modelling is nothing but a process through which data is stored structurally in a format
in a database. Data modelling is important because it enables organizations to make data-
driven decisions and meet varied business goals.
The entire process of data modelling is not as easy as it seems, though. You are required to
have a deeper understanding of the structure of an organization and then propose a solution
that aligns with its end-goals and suffices it in achieving the desired objectives.
Types of Data Models
Data modeling can be achieved in various ways. However, the basic concept of each of them
remains the same. Let’s have a look at the commonly used data modeling methods:
Hierarchical model
As the name indicates, this data model makes use of hierarchy to structure the data in a tree-
like format as shown in figure 2.6. However, retrieving and accessing data is difficult in a
hierarchical database. This is why it is rarely used now.

Fig 2.6: Hierarchical Model Structure

Relational model
Proposed as an alternative to hierarchical model by an IBM researcher, here data is
represented in the form of tables. It reduces the complexity and provides a clear overview of
the data as shown below in figure 2.7

Fig 2.7: Relational Model Structure

Network model
The network model is inspired by the hierarchical model. However, unlike the hierarchical
model, this model makes it easier to convey complex relationships as each record can be
linked with multiple parent records as shown in figure 2.8. In this model data can be shared
easily and the computation becomes easier.

Fig 2.8: Network Model Structure

Object-oriented model
This database model consists of a collection of objects, each with its own features and
methods. This type of database model is also called the post-relational database model as
shown in figure 2.8.

Fig 2.9: Object-Oriented Model Structure

Entity-relationship model
Entity-relationship model, also known as ER model, represents entities and their relationships
in a graphical format. An entity could be anything – a concept, a piece of data, or an object.

Fig 2.10: Entity Relationship Diagram

The entity relationship diagram explains relation between variables and with their primary
key and foreign key as shown in figure 2.10. Along with this it also explains the multiple
instances of relation between tables.
Now that we have a basic understanding of data modeling, let’s see why it is important.
Importance of Data Modeling
• A clear representation of data makes it easier to analyze the data properly. It provides a
quick overview of the data which can then be used by the developers in varied applications.

• Data modeling represents the data properly in a model. It rules out any chances of data
redundancy and omission. This helps in clear analysis and processing.

• Data modeling improves data quality and enables the concerned stakeholders to make data-
driven decisions.

Since a lot of business processes depend on successful data modeling, it is necessary to adopt
the right data modeling techniques for the best results.
Best Data Modeling Practices to Drive Your Key Business Decisions
Have a clear understanding of your end-goals and results
You will agree with us that the main goal behind data modeling is to equip your business and
contribute to its functioning. As a data modeler, you can achieve this objective only when
you know the needs of your enterprise correctly.
It is essential to make yourself familiar with the varied needs of your business so that you can
prioritize and discard the data depending on the situation.
Key takeaway: Have a clear understanding of your organization’s requirements and organize
your data properly.
Keep it sweet and simple and scale as you grow
Things will be sweet initially, but they can become complex in no time. This is why it is
highly recommended to keep your data models small and simple, to begin with.
Once you are sure of your initial models in terms of accuracy, you can gradually introduce
more datasets. This helps you in two ways. First, you are able to spot any inconsistencies in
the initial stages. Second, you can eliminate them on the go.
Key takeaway: Keep your data models simple. The best data modeling practice here is to use
a tool which can start small and scale up as needed.
Organize your data based on facts, dimensions, filters, and order
You can find answers to most business questions by organizing your data in terms of four
elements – facts, dimensions, filters, and order.
Let’s understand this better with the help of an example. Let’s assume that you run four e-
commerce stores in four different locations of the world. It is the year-end, and you want to
analyze which e-commerce store made the most sales.
In such a scenario, you can organize your data over the last year. Facts will be the overall
sales data of last 1 year, the dimensions will be store location, the filter will be last 12
months, and the order will be the top stores in decreasing order.
This way, you can organize all your data properly and position yourself to answer an array of
business intelligence questions without breaking a sweat.
Key takeaway: It is highly recommended to organize your data properly using individual
tables for facts and dimensions to enable quick analysis.
Keep as much as is needed
While you might be tempted to keep all the data with you, do not ever fall for this trap!
Although storage is not a problem in this digital age, you might end up taking a toll over your
machines’ performance.
More often than not, just a small yet useful amount of data is enough to answer all the
business-related questions. Spending huge on hosting enormous data of data only leads to
performance issues, sooner or later.
Key takeaway: Have a clear opinion on how much datasets you want to keep. Maintaining
more than what is actually required wastes your data modeling, and leads to performance
issues.
Keep crosschecking before continuing
Data modeling is a big project, especially when you are dealing with huge amounts of data.
Thus, you need to be cautious enough. Keep checking your data model before continuing to
the next step.
For example, if you need to choose a primary key to identify each record in the dataset
properly, make sure that you are picking the right attribute. Product ID could be one such
attribute. Thus, even if two counts match, their product ID can help you in distinguishing
each record. Keep checking if you are on the right track. Are product IDs same too? In those
aces, you will need to look for another dataset to establish the relationship.
Key takeaway: It is the best practice to maintain one-to-one or one-to-many relationships.
The many-to-many relationship only introduces complexity in the system.
Let them evolve
Data models are never written in stone. As your business evolves, it is essential to customize
your data modeling accordingly. Thus, it is essential that you keep them updating over time.
The best practice here is to store your data models in as easy-to-manage repository such that
you can make easy adjustments on the go.
Key takeaway: Data models become out dated quicker than you expect. It is necessary that
you keep them updated from time to time.
The Wrap Up
Data modeling plays a crucial role in the growth of businesses, especially when you
organizations to base your decisions on facts and figures. To achieve the varied business
intelligence insights and goals, it is recommended to model your data correctly and use
appropriate tools to ensure the simplicity of the system.
2.6 Missing Imputations
In statistics, imputation is the process of replacing missing data with substituted values. ...
Because missing data can create problems for analyzing data, imputation is seen as a way
to avoid pitfalls involved with list-wise deletion of cases that have missing values.
I. Do nothing to missing data

II. Fill the missing values in the dataset using mean, median.
Eg: for sample dataset given below

SNO Column 1 Column 2 Column 3

1 3 6 NAN
2 5 10 12
3 6 11 15
4 NAN 12 14
5 6 NAN NAN
6 10 13 16

Can be replaced as using column mean as follows

SNO Column 1 Column 2 Column 3

1 3 6 9.5
2 5 10 12
3 6 11 15
4 5 12 14
5 6 8.66 9.5
6 10 13 16

Advantages:
• Works well with numerical dataset.

• Very fast and reliable.

Disadvantage:
• Does not work with categorical attributes
• Does not correlate relation between columns
• Not very accurate.
• Does not account for any uncertainty in data
III. Imputations using (most frequent) or (zero / constant) values

This can be used for categorical attributes.

Disadvantage:
• Does not correlate relation between columns

• Creates bias in data.

IV. Imputation using KNN
It creates a basic mean impute then uses the resulting complete list to construct a KD Tree.
Then, it uses the resulting KD Tree to compute nearest neighbours (NN). After it finds the k-
NNs, it takes the weighted average of them.
The k nearest neighbours is an algorithm that is used for simple classification. The algorithm
uses ‘feature similarity’ to predict the values of any new data points. This means that the new
point is assigned a value based on how closely it resembles the points in the training set. This
can be very useful in making predictions about the missing values by finding the k’s closest
neighbours to the observation with missing data and then imputing them based on the non-
missing values in the neighbourhood.

Advantage:

• This method is very accurate than mean, median and mode

Disadvantage:

Sensitive to outliers

Data Analytics II-unit
No ratings yet
Data Analytics II-unit
20 pages
Unit II
No ratings yet
Unit II
32 pages
Unit II
No ratings yet
Unit II
91 pages
Unit 2
No ratings yet
Unit 2
15 pages
Advanced Data Analytics and Visualization Course Material
No ratings yet
Advanced Data Analytics and Visualization Course Material
45 pages
Module 2
No ratings yet
Module 2
18 pages
DA Unit 2 Trio 1
No ratings yet
DA Unit 2 Trio 1
26 pages
Unit 2
No ratings yet
Unit 2
13 pages
DA Unit 2
No ratings yet
DA Unit 2
16 pages
Unit 2
No ratings yet
Unit 2
22 pages
Unit 1
No ratings yet
Unit 1
57 pages
Unit-II (Data Analytics)
100% (1)
Unit-II (Data Analytics)
17 pages
Introduction To Data Analytics
No ratings yet
Introduction To Data Analytics
7 pages
Data Analytics-Unit1 Notes
No ratings yet
Data Analytics-Unit1 Notes
33 pages
Kingword
No ratings yet
Kingword
11 pages
Lecture 0
No ratings yet
Lecture 0
21 pages
Data Management & Data Architecture
No ratings yet
Data Management & Data Architecture
21 pages
Unit 1
No ratings yet
Unit 1
50 pages
Da Unit Ii
No ratings yet
Da Unit Ii
25 pages
Data Analytics: Process and Types
No ratings yet
Data Analytics: Process and Types
81 pages
Dataanalysisforbusiness1 9ef21f5d
No ratings yet
Dataanalysisforbusiness1 9ef21f5d
24 pages
Da Unit 2
No ratings yet
Da Unit 2
18 pages
Data Analytics
No ratings yet
Data Analytics
5 pages
Unit 2
No ratings yet
Unit 2
26 pages
Unit 1 Introduction
No ratings yet
Unit 1 Introduction
70 pages
AA THeory and Methods
No ratings yet
AA THeory and Methods
40 pages
Unit 1
No ratings yet
Unit 1
54 pages
Fda 1
No ratings yet
Fda 1
5 pages
What Is Data Analytics
No ratings yet
What Is Data Analytics
44 pages
ITunit 3
No ratings yet
ITunit 3
18 pages
Unit 3 Data-Analytics
No ratings yet
Unit 3 Data-Analytics
48 pages
Data Analytics-Unit1 Notes
No ratings yet
Data Analytics-Unit1 Notes
30 pages
MODULE 1 - ST
No ratings yet
MODULE 1 - ST
13 pages
Data Analytics For Healthcare Notes
No ratings yet
Data Analytics For Healthcare Notes
11 pages
Data Analytics-Unit1 Notes
No ratings yet
Data Analytics-Unit1 Notes
31 pages
Week 1 Lecture
No ratings yet
Week 1 Lecture
26 pages
Introduction To Data Analysis
No ratings yet
Introduction To Data Analysis
24 pages
Unidad 3 The Wonders of The Modern Technology
No ratings yet
Unidad 3 The Wonders of The Modern Technology
34 pages
Rohini 74892907252
No ratings yet
Rohini 74892907252
6 pages
Data Visualization & Analytics Guide
No ratings yet
Data Visualization & Analytics Guide
10 pages
UNIT-2: Importance of Analytics
No ratings yet
UNIT-2: Importance of Analytics
7 pages
Data Analysis
No ratings yet
Data Analysis
36 pages
BDA-24 - Lect (3-4) - (Fundamentals of Data Analysis)
No ratings yet
BDA-24 - Lect (3-4) - (Fundamentals of Data Analysis)
15 pages
Unit-1 DA
No ratings yet
Unit-1 DA
23 pages
Data Analysis Jury Document
No ratings yet
Data Analysis Jury Document
24 pages
Data Analytics and Visualization Program
No ratings yet
Data Analytics and Visualization Program
30 pages
Introduction to Data Analytics
No ratings yet
Introduction to Data Analytics
30 pages
Chapter-1 Introduction To Data Analytics
No ratings yet
Chapter-1 Introduction To Data Analytics
34 pages
Assignment OF Data Science (AIT 120) : Submitted To: Submitted by
No ratings yet
Assignment OF Data Science (AIT 120) : Submitted To: Submitted by
10 pages
Unit 1 Topic 1 Intro
No ratings yet
Unit 1 Topic 1 Intro
30 pages
All About Data Science
No ratings yet
All About Data Science
35 pages
Data Analytics Fundamentals Guide
90% (10)
Data Analytics Fundamentals Guide
17 pages
Unitwise Imp Notes
No ratings yet
Unitwise Imp Notes
34 pages
1 Introduction To Data Analytics
No ratings yet
1 Introduction To Data Analytics
14 pages
Data Analytics and Visualization Unit-I
No ratings yet
Data Analytics and Visualization Unit-I
25 pages
Business Analytics Unit I
No ratings yet
Business Analytics Unit I
45 pages
Big Data and Analytics
No ratings yet
Big Data and Analytics
86 pages
Da 2
No ratings yet
Da 2
25 pages
What Is Water Quality?: Student: Sanchez Santillan Bryan Manuel Topic: Water Quality Level: Intermediate V
No ratings yet
What Is Water Quality?: Student: Sanchez Santillan Bryan Manuel Topic: Water Quality Level: Intermediate V
3 pages
Reinventing Reference How Libraries Deliver Value in The Age of Google Anderson Download
No ratings yet
Reinventing Reference How Libraries Deliver Value in The Age of Google Anderson Download
51 pages
Ravana Case Study
No ratings yet
Ravana Case Study
40 pages
CPX en
No ratings yet
CPX en
215 pages
Discrete Structures Course Guide
No ratings yet
Discrete Structures Course Guide
5 pages
OPGW Design Data Sheet (SFSJ-J-13123) Upated-A
No ratings yet
OPGW Design Data Sheet (SFSJ-J-13123) Upated-A
2 pages
Sayan Exam Form - Signed-1
No ratings yet
Sayan Exam Form - Signed-1
1 page
Muscular System Overview
No ratings yet
Muscular System Overview
19 pages
International Management Institute New Delhi
100% (1)
International Management Institute New Delhi
4 pages
Process Engineer Job Description
100% (1)
Process Engineer Job Description
3 pages
How To Plan Differentiated Reading Instruction Second Edition Sharon Walpole PDF Available
No ratings yet
How To Plan Differentiated Reading Instruction Second Edition Sharon Walpole PDF Available
37 pages
Product Communication - Nobilis Shower Cubicle Fittings MRP v1.02
No ratings yet
Product Communication - Nobilis Shower Cubicle Fittings MRP v1.02
11 pages
Methods and Tools For Directed Activity: Presented by R Harish
No ratings yet
Methods and Tools For Directed Activity: Presented by R Harish
10 pages
Cursors 100112215205 Phpapp01
No ratings yet
Cursors 100112215205 Phpapp01
19 pages
Mariam Chapter 2
No ratings yet
Mariam Chapter 2
13 pages
ALL MCQ QUESTIONS & ANSWERS RELATED TO NCC 'A' CERTIFICATE EXAMINATION - 062416.hi - en
No ratings yet
ALL MCQ QUESTIONS & ANSWERS RELATED TO NCC 'A' CERTIFICATE EXAMINATION - 062416.hi - en
86 pages
TOEIC Explanation of Grading Schema
No ratings yet
TOEIC Explanation of Grading Schema
2 pages
Circle/or Mark Using Different Colors: The Most Correct Answer of The Following Questions
No ratings yet
Circle/or Mark Using Different Colors: The Most Correct Answer of The Following Questions
7 pages
Jipmat 2023 Paper
No ratings yet
Jipmat 2023 Paper
12 pages
HSG Anh 9-Thanh Ba (2021-2022)
No ratings yet
HSG Anh 9-Thanh Ba (2021-2022)
8 pages
Moderator: Dr. Usha Suwalka Presenter: Dr. Suchismita Naik
No ratings yet
Moderator: Dr. Usha Suwalka Presenter: Dr. Suchismita Naik
44 pages
Hub Name - Dat
No ratings yet
Hub Name - Dat
1,603 pages
Quectel EC200U&EG800G&EG91xU&EG915G Series Firmware Download Guide V1.3
No ratings yet
Quectel EC200U&EG800G&EG91xU&EG915G Series Firmware Download Guide V1.3
15 pages
TcpMDT9 Setup Manual
No ratings yet
TcpMDT9 Setup Manual
13 pages
Sage Intelligence Reporting - Beginner Training Manual
83% (6)
Sage Intelligence Reporting - Beginner Training Manual
48 pages
Resume Introduction To Economic 4
No ratings yet
Resume Introduction To Economic 4
4 pages
The Top 50 Questions Kids Ask Susan Bartell PDF Download
100% (3)
The Top 50 Questions Kids Ask Susan Bartell PDF Download
57 pages
Littelfuse Series Inverse Parallel and Turn Off Connections For SCR Thyristor Devices
No ratings yet
Littelfuse Series Inverse Parallel and Turn Off Connections For SCR Thyristor Devices
5 pages
Virtual Machines: Benefits and Limitations
No ratings yet
Virtual Machines: Benefits and Limitations
1 page
The Role of Coaching, Training and Mentoring in Efforts To Improve The Performance of Micro, Small and Medium Enterprises
No ratings yet
The Role of Coaching, Training and Mentoring in Efforts To Improve The Performance of Micro, Small and Medium Enterprises
16 pages

DA unit-II

Uploaded by

DA unit-II

Uploaded by

UNIT – II

2.1 Introduction to Analytics

• Improve Business Requirement – Analysis of Data allows improving Business to

Types of Data Analytics

Fig 2.3 Data and Human work

• Python – Python is an open-source, object-oriented programming language which is easy to

2.3 Application of modelling a business & Need for business Modelling

Fig 2.5: Types of Data Variables

Fig 2.6: Hierarchical Model Structure

Fig 2.7: Relational Model Structure

Fig 2.8: Network Model Structure

Fig 2.9: Object-Oriented Model Structure

Fig 2.10: Entity Relationship Diagram

SNO Column 1 Column 2 Column 3

Can be replaced as using column mean as follows

SNO Column 1 Column 2 Column 3

• Very fast and reliable.

This can be used for categorical attributes.

• Creates bias in data.

• This method is very accurate than mean, median and mode

You might also like