472 Eb

Uploaded by

saurav.sarkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views6 pages

472 Eb

Uploaded by

saurav.sarkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Chapter-2

Arranging and Collecting Data

Data Collection
The method of gathering data for calculating and analyzing reliable insights is known as data collection,
which is done using standard validated techniques. A researcher or scientist works based on the
collected data. Data collection is a primary and essential step in most cases. The approach of data
collection is different in different fields.

For Example- If we survey the temperature of many cities worldwide on the same day, the first
important step would be to collect data on temperature from many towns. Let us assume we have
recorded temperature across six cities at the same time. The temperature data collected is as follows.

Now, if we represent this data in a bar chart, it will look like below.

Variables
A variable is an attribute of an object of study that may vary for different cases. Thus, a variable varies
for different case studies in research. Considering the previous example of the survey on the
temperature of many cities worldwide on the same day, the variables are "Temperature" and "City"
because both the attributes vary for different cases.
Types of Variables
Variables can be of two types -

1) Numerical variable
They represent values that have numbers. For Example, age, weight, height.
2) Categorical variable
These variables represent values that have words. Example, name, nationality, sport, etc.

Types of data
Data can be narrowly divided into two categories -
1) Quantitative Data
Quantitative data are numbers or values that can be measured. For Example, the number of
times a product has been searched on the internet or the number of items sold per month.
Since these data can be quantified, they are comparatively easy to analyze.
2) Qualitative Data
Qualitative data, on the other hand, is subjective. For Example, a traveler's review for a hotel
or customer service feedback given by a consumer after a telephone conversation. These data
help to understand experiences in depth.
Sources of data
Data sources can be classified into primary and secondary sources.
1) Primary
These represent the sources created to collect data for analysis, for example, surveys,
interviews, questionnaires, feedback forms, etc.
Some methods of collecting primary data are:
a) Physical interviews b) Online surveys c) Feedback forms

2) Secondary
At times data is already recorded for some other purpose but then re-used for analysis. These
are secondary data sources. They include internal transactional databases, sensor data, etc.
Some methods of collecting secondary data are:
a) Social medial data tracking b) Web traffic tracking c) Satellite data tracking
Big Data
When the data volumes exceed the processing capacities of traditional databases, they are called Big
Data. Big Data techniques are widely used in different sectors, for example Retail, Science, Sports,
Social media and Health care etc.
Big Data systems
Millions of users are using the platforms and creating an enormous amount of content every minute.
Processing this vast amount of data requires specialized skills and systems. Such systems capable of
extracting statistical insights from a huge amount of data are called Big Data systems.
Some of the key characteristics that can define Big Data:
1) Volume - This refers to the size of the data. Usually, data sets greater than terabytes and
petabytes are called Big Data
2) Variety - Big Data sets are generally collected from a wide range of sources, including
transactional databases, sensor data, etc. It could include images, pictures, audio, video, etc.
So, variety of data is an essential characteristic of Big Data.
3) Velocity - The rate at which data is generated. Big Data has generally created a rapid speed
resulting in high volumes very soon. For Example, social media platforms generate a massive
amount of data every minute.
Big Data techniques are widely used in different sectors. Let us see some of them:
a) Retail - Popular retail chains are spread across the world. They handle millions of customers every
minute. They store and analyze their customer data and their transactions in Big Data systems.
b) Science - On the Discover supercomputing cluster, The NASA Center for Climate Simulation (NCCS)
stores 32 petabytes of climate observations and simulations.
c) Sports - Race cars with hundreds of sensors produce terabytes of data in Formula One races. These
sensors gather data points from tire pressure to fuel burn efficiency. Based on the data, data analysts
and engineers decide whether modifications should be made to get the best outcome in the race.
Moreover, based on simulations using data collected over the season collected through big data, race
teams try to foresee the time they finish the race beforehand.

d) Social media - Most popular social media platforms store and analyze petabytes of data every day.
They use Big Data techniques for storage and analysis.
e) Healthcare - During COVID 19 pandemic, different governments used Big Data to track infected
people's locations to reduce the spread. It was also used for case identification and medical treatment.
How we can interpret the data
Data is typically stored as numbers (numeric) or labels (categories). Based on the type of data, we
need to ask five simple questions to the data.

1) Binary classification or two-class classification algorithm - if a question has two possible

answers. For Example
Q1: Will a customer buy this product?
A: Yes/No
Q2: Can India win this cricket match?
A: Yes/No

Similarly, if a question has more than two possible answers, then we use a multiclass
classification algorithm.

2) Anomaly detection algorithms - unexpected records in a set of mostly consistent data

For example –
1. If an unexpected transaction is done from your bank account, which does not match your
regular transactions, there could be a case of fraud. Banks track these records and alert
the customer that an unexpected transaction has happened, protecting the customer's
money.
2. Your father is getting his blood pressure checked. Is the reading regular?
3. You are checking your car tyre pressure. Is the reading regular?

3) Regression algorithms - to predict numerical values based on the data

For example –
Q1: How many goals will your favorite team score in this football match?
A: 3
Q2: What will be the temperature of your city next Friday?
A: 32°C
4) Sometimes data may be separated into distinct groups. This approach is called clustering.
For Example, consider a class of 60 students. We have recorded their heights and arranged them
in a table.
As you can see, students can be categorized into groups based on their height.

Similarly, if we plot these in a chart, it will look like above.

5) These are questions that, generally, a machine or robot is programmed to do. Based on trial
and error, machines take some actions. These types of learning are called reinforcement
learning.
Consider the following questions:
Q1: I am a self-driving car. I am at a traffic signal with a red light. What should I do now?
A: Brake
Q2: I am a micro-oven. I have already heated the food for the set timing. What should I do
now?
A: Stop
Univariate data:
This type of data has only one variable. They do not involve multiple parameters or
relationships. For Example, the height of students is univariate data.
Multivariate data:

This type of data involves a relationship between multiple variables—for Example, sales of
umbrellas increase during the rainy season. We see umbrella sales are dependent on rainfall.
So, there are two variables – "rainfall" and "sales." These types of data are more complex than
univariate as they involve comparisons and relations with multiple parameters.

Data Science
No ratings yet
Data Science
6 pages
4.0 Introduction To Data
No ratings yet
4.0 Introduction To Data
16 pages
Chapter 2 Arranging and Collecting Data
No ratings yet
Chapter 2 Arranging and Collecting Data
5 pages
Data Science
No ratings yet
Data Science
12 pages
Statistical Learning - Introduction
No ratings yet
Statistical Learning - Introduction
20 pages
Data Science and Ai Education For Young Minds
No ratings yet
Data Science and Ai Education For Young Minds
75 pages
Lecture 01-05 Data, Central Tendency PDF
No ratings yet
Lecture 01-05 Data, Central Tendency PDF
51 pages
I. Data Collection What Is Data?
No ratings yet
I. Data Collection What Is Data?
12 pages
What Is Data
No ratings yet
What Is Data
8 pages
4.02 Statistics Fundamentals
No ratings yet
4.02 Statistics Fundamentals
2 pages
Unit 2
No ratings yet
Unit 2
37 pages
Undestanding Data Module-3
No ratings yet
Undestanding Data Module-3
8 pages
Xi Ai Unit - 5 Notes
No ratings yet
Xi Ai Unit - 5 Notes
28 pages
Slide#3 - Understanding Data
No ratings yet
Slide#3 - Understanding Data
44 pages
DATA ANALYSIS - Full - Note - Immersive 2
No ratings yet
DATA ANALYSIS - Full - Note - Immersive 2
13 pages
TYBSC CS Data Science Munotes
No ratings yet
TYBSC CS Data Science Munotes
137 pages
Ds Notes-Unit 1, II and III Upto Part1
No ratings yet
Ds Notes-Unit 1, II and III Upto Part1
341 pages
Data Science Basics for Beginners
No ratings yet
Data Science Basics for Beginners
291 pages
How Data Is Col
No ratings yet
How Data Is Col
11 pages
Comprehensive Guide to Data Analytics
No ratings yet
Comprehensive Guide to Data Analytics
4 pages
417 AI Handbook Class9 Acquiring Processing Interpreting
No ratings yet
417 AI Handbook Class9 Acquiring Processing Interpreting
16 pages
Data Literacy
No ratings yet
Data Literacy
9 pages
Big Data Answers
No ratings yet
Big Data Answers
14 pages
Module 5 Lecture Note
No ratings yet
Module 5 Lecture Note
8 pages
Unit 1 Data Analytics (KCA-034)
No ratings yet
Unit 1 Data Analytics (KCA-034)
21 pages
Unit 1 Introduction
No ratings yet
Unit 1 Introduction
86 pages
CHAPTER 4 Data Management
No ratings yet
CHAPTER 4 Data Management
16 pages
LESSON1 ObtainingData
100% (1)
LESSON1 ObtainingData
32 pages
S&A Notes
No ratings yet
S&A Notes
5 pages
ITDS Unit 1 - Merged
No ratings yet
ITDS Unit 1 - Merged
86 pages
Biostatistics - Data and Its Types
No ratings yet
Biostatistics - Data and Its Types
11 pages
Chapter 1-Introduction To Data
No ratings yet
Chapter 1-Introduction To Data
18 pages
Lecture+Notes (Upgrad)
No ratings yet
Lecture+Notes (Upgrad)
5 pages
Chapter 1.1 Introduction To Data
No ratings yet
Chapter 1.1 Introduction To Data
10 pages
Imp Answers
No ratings yet
Imp Answers
29 pages
ML Lecture 4 Data
No ratings yet
ML Lecture 4 Data
22 pages
Introduction To Data Science Module 2
No ratings yet
Introduction To Data Science Module 2
35 pages
Data and Its types-WPS Office-Conve)
No ratings yet
Data and Its types-WPS Office-Conve)
9 pages
Data Literacy
No ratings yet
Data Literacy
11 pages
Unit - 1 Notes - Introduction To Data-Analytics PDF
67% (3)
Unit - 1 Notes - Introduction To Data-Analytics PDF
106 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
16 pages
Data Systems and Risk Chapter 1 Types of Data Sources
No ratings yet
Data Systems and Risk Chapter 1 Types of Data Sources
35 pages
Da Notes
No ratings yet
Da Notes
61 pages
Notes of Week-1 and Week-2
No ratings yet
Notes of Week-1 and Week-2
30 pages
Various Types of Statistical Data and Collection
No ratings yet
Various Types of Statistical Data and Collection
22 pages
Imp Mcs226
No ratings yet
Imp Mcs226
321 pages
Data Analytics for CSE Students
No ratings yet
Data Analytics for CSE Students
91 pages
Principles of Data Science
No ratings yet
Principles of Data Science
46 pages
Cs3352 Foundation of Data Science
No ratings yet
Cs3352 Foundation of Data Science
80 pages
CH 1 Data, Information, Knowledge and Processing New Syllabus
No ratings yet
CH 1 Data, Information, Knowledge and Processing New Syllabus
71 pages
Chapter 2 - Sources of Data
No ratings yet
Chapter 2 - Sources of Data
11 pages
Understanding Data-1
No ratings yet
Understanding Data-1
19 pages
DA Unit 1
No ratings yet
DA Unit 1
43 pages
Understanding Data
No ratings yet
Understanding Data
14 pages
Data and Information
No ratings yet
Data and Information
6 pages
Fundamentals of Machine Learning and Data Science
No ratings yet
Fundamentals of Machine Learning and Data Science
73 pages
Chapter 3 - Data Collection 1
No ratings yet
Chapter 3 - Data Collection 1
33 pages
Module1 IntroductionToBusinessAnalytics Notes
No ratings yet
Module1 IntroductionToBusinessAnalytics Notes
27 pages
Amity International School, Vasundhara 6 Class-Ix, Political Science, Chapter-2 Constitutional Design (Rr-2/Pol - Sci.)
No ratings yet
Amity International School, Vasundhara 6 Class-Ix, Political Science, Chapter-2 Constitutional Design (Rr-2/Pol - Sci.)
3 pages
E 39 D 8
No ratings yet
E 39 D 8
6 pages
17 Ac 4
No ratings yet
17 Ac 4
5 pages
Chapter-2 Arranging and Collecting Data Worksheet-1
No ratings yet
Chapter-2 Arranging and Collecting Data Worksheet-1
1 page
Screenshot 2024-07-23 at 3.34.20 AM
No ratings yet
Screenshot 2024-07-23 at 3.34.20 AM
1 page
Computer Concepts and Programming in 'C' (MCA - 103)
No ratings yet
Computer Concepts and Programming in 'C' (MCA - 103)
318 pages
Accident Alert With Automatic Dialer
100% (1)
Accident Alert With Automatic Dialer
39 pages
Copy Rights
No ratings yet
Copy Rights
15 pages
A Scientific Examination of IT Support and Services: The Case of KByte IT Services
No ratings yet
A Scientific Examination of IT Support and Services: The Case of KByte IT Services
10 pages
Erp CH4
No ratings yet
Erp CH4
29 pages
EPBCS - Student Guide
No ratings yet
EPBCS - Student Guide
23 pages
Multifunction Shield Arduino Uno Pin Layout - Electronics Projects Circuits
No ratings yet
Multifunction Shield Arduino Uno Pin Layout - Electronics Projects Circuits
4 pages
1KHL501616 REB500 Firmware Update Sequences
No ratings yet
1KHL501616 REB500 Firmware Update Sequences
6 pages
Apache Kafka
No ratings yet
Apache Kafka
13 pages
Histogram Processing
No ratings yet
Histogram Processing
71 pages
GiD 15 Customization Manual
No ratings yet
GiD 15 Customization Manual
225 pages
HR Email List With Company Names
0% (3)
HR Email List With Company Names
152 pages
CS603 Manual
No ratings yet
CS603 Manual
21 pages
PEGA Application Structure Guide
No ratings yet
PEGA Application Structure Guide
17 pages
Question Bank Cao
No ratings yet
Question Bank Cao
4 pages
Infix and Postfix Expressions
No ratings yet
Infix and Postfix Expressions
32 pages
VPNGate Guide For Umamusume Players
0% (1)
VPNGate Guide For Umamusume Players
16 pages
MSFS Addon Compatibility Guide
No ratings yet
MSFS Addon Compatibility Guide
82 pages
File
No ratings yet
File
928 pages
1 Domain List Up
No ratings yet
1 Domain List Up
6 pages
228R1A0572 Asutosh Tripathy-Resume 2025
No ratings yet
228R1A0572 Asutosh Tripathy-Resume 2025
2 pages
Lect 5 FCFS, SJF, SRTF
No ratings yet
Lect 5 FCFS, SJF, SRTF
18 pages
Digital Competences Report
No ratings yet
Digital Competences Report
2 pages
Test Bank For Accounting Information Systems 14th Edition Romney Steinbart 0134474023 9780134474021 PDF Download
100% (14)
Test Bank For Accounting Information Systems 14th Edition Romney Steinbart 0134474023 9780134474021 PDF Download
77 pages
14073IP Practice Paper 2021
No ratings yet
14073IP Practice Paper 2021
15 pages
Arduino Camera (OV7670) Tutorial - Microcontroller Tutorials
No ratings yet
Arduino Camera (OV7670) Tutorial - Microcontroller Tutorials
1 page
Java Programming Language Report: January 2021
No ratings yet
Java Programming Language Report: January 2021
15 pages
Premier 412 PDF
No ratings yet
Premier 412 PDF
24 pages
EC-Council: Exam Questions 312-85
No ratings yet
EC-Council: Exam Questions 312-85
7 pages

472 Eb

Uploaded by

472 Eb

Uploaded by

Chapter-2

Arranging and Collecting Data

1) Binary classification or two-class classification algorithm - if a question has two possible

2) Anomaly detection algorithms - unexpected records in a set of mostly consistent data

3) Regression algorithms - to predict numerical values based on the data

Similarly, if we plot these in a chart, it will look like above.

You might also like