0 ratings0% found this document useful (0 votes) 208 views42 pagesData Literacy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
J. INTRODUCTION
Modern world is the world of te
data-ruled-world, it becomes import
‘That is where ‘data literacy’ comes
Hteracy, its impact, importance and ri
A Dolo Security,
Wy Bost
sciinolagy AL and Date In fa
anit for ¢
to pict
ole in Al
ure. In this
Privacy and Al
Pretices for Cyber Security
act, data ru the world. In thig)
to understane d befriend data,
will learn about data
sveryone
jon, you
1.2 WHATS DATA LITERACY ?
7 Data Litera
In today's digital world, data is everywhere, and it’s pats cy '
essential to know how to work with it Data literacy is the Data Literacy is the ability,
ability to understand, analyze, and communicate with to understand, analyze, aiid
t because it communicate witht ~aiaiag
data, effectively. Data literacy
helps you think critically,
understand the world of data.
Data literacy involves these three things
1. Reading Data. It refers to the abil
may be available in various formats such as ©
2. Working with Data. It refers to the
process of collecting and managing
data; checking facts and spotting
misleading information in it; storing
and transmitting it in appropriate
formats.
3, Communicating with Data, It refers
to the process of understanding and
interpreting data ; spotting trends
and patterns in the data; tabulating,
and reporting and presenting data
‘in different and diverse formats.
is important
make better clioi
lity to read, anal
(Fig. 1.1)
and data in, which
umbers s, Charts ete.
Reading Data
‘Theory and Analysis
Data
Literacy
with Dar
na
with |
Collection and ki i
‘Management Reporting
Figure 1.1 Data Literacy Componentscy is one of the most important skills needed today. It opens doors to endless.
. The impact and importance of data literacy can be summarized through the
following benefits it provides,
1, Spot Data Trends. Data literacy helps us analyze situations, spot trends, and ;
Predict outcomes. For instance, by examining historical data and spotting the
data trends, we can understand the causes of significant events like economic
recessions.
2. Foster Critical Thinking. With data literacy, instead of accepting information at
face value, we learn to question its source, reliability, and potential biases. This
saves us from misinformation.
3. Make Informed Decisions, With data skills, we can gather and analyse relevant
information. This helps us make choices based on facts, and not on assumptions or
guesses.
>
Communicate Effectively. Data literacy empowers us with factual power. We can
use data to support our arguments and ideas. With the help of Data visualization,
we can present information more clearly.
y
. Solve Complex Problems, Many modem problems involve large amounts of data.
Data literacy makes it easier to break down bulk data, spot patterns in it, identify
trends and solve challenging problems.
Overall, data literacy empowers us to navigate our data-driven world confidently.
1.4 HOWTO BECOME DATA LITER
Becoming proficient in dat:
Data Literacy Process Framewo:
Step 1 : Identify |
In this step, all the data & information collection happens, whether it's numbers, words, |
or pictures, e.g., if we're studying the weather, the data might include temperature
readings, rainfall amounts, wind speeds and so on.
avolves following a structured process called the
It involves najorly the following three steps :
So, to become data literate, as part of Identify step, you need to learn to :
Understand data types (numerical, categorical, etc.).
© Get familiar with common data visualizations. Data visualisation is the graphical or
visual representation of information/data, such as graphs or charts such as, bar charts,
scatter plots etc.
© Practice reading and interpreting data from various sources.died to uncover
analysing sale®
cones need improvements ie.
data literate, as part of Analyse step, you nee aT
data from different sources: Question the data souy
removing redundancie’
information and handle them.
ig and errors, Look for potential biases,
id create visualizations of the data,
‘peplore & Visualize examine patterns al
through the lens of questions (ig
rom this data?” For instance, ify
1 the data to identify areas
curated and visualised data goes
tterns tell us?”, “what can we learn f
are test scores, you might intespret
students need extra help.
So, to become data literate,
ag Determine what insights can be drat
T& Share findings about the data, such as how can #
scenarios ?
‘ Consider alternative interpretations or explanations.
as part of Interpret step, you need to learn to +
ym from data
information be applied in reall
Remember, practice makes perfect! The more you
‘engage with data, the more proficient you'll become
i
at interpreting and leveraging its insights.
re 1.2 Data Literacy
Process Framework
{data all around us and data governing the decisions affecting our lives, Data
become super important. These are twin pillars of safeguarding ouranyone without the decrypt fe
ae pene Data Security
Thus, pect involves : pata Security refers to
protecting data from unauthorized access, theft, or taking steps for sal
damage data and information from
takin i unauthorized access, theft, or
° \g Measures like encryption, access controls, firewall iteration, and _—_involves
and network security to prevent data breaches. preventives measures to Stop
data breaches.
1.5.2 Data Privacy
Data privacy refers to controlling the access to our personal information and
its use. It's like setting boundaries to protect our digital identity from
exploitation or misuse. Consider the data you share on social media platforms.
(8) By adjusting privacy settings, you can control who sees your posts, photos,
and personal details, ensuring they're only accessible to trusted individuals.
Data Privacy involves :
® Concerns for safeguarding personal or sensitive information from misuse.
® Practices like data minimization and user consent.
Control over how one's personal data and digital
v - Data Privacy
footprint is collected and used. ad
Data Privacy refers to
controlling the access to our
personal information and its
The AI systems need the vast amounts of data to learn and use.
make decisions. Whether it’s training data for machine
Jeaming algorithms or personal information used for targeted advertising, maintaining
robust data security and privacy measures is crucial to prevent unauthorized access or
misuse of data.
1.5.3 Data Security, Privacy and A\
So, we can say that the relationship between Al and data security/privacy is complex
Fig. 1.3.
AL raises Data Privacy concerns
_AI provides enhanced Data Security
‘ATS requirement and use of data raises concems about | But AI can also enhance security measures for data :
data privacy :
‘* Machine learning models can detect cyber threats or
4 Al systems require large amounts of data for training | anomalies.
and operation; data may include personal or sensitive | x AI can help identify and protect against data
information. breaches.
raises privacy concerns about how this data is
collected and used.
stems may have vulnerabilities that can be
about individual rights and AL
aeb gn)
governance and ethical AI principles
s impact on security and privacy landscapes
Figure 1.3 Relationship between AT and Data Security/Privacy.
and privacy have a mixed relation with AI. Data used to train AI should 5
__ secure and from authentic sources. Comprised data can impact the performance of Al. At
‘same time, AI can play an important role in detecting and protecting against datg
‘While Al presents new challenges, there are established best practices for protecting data
7 and privacy. Here are some best practices for cyber security :
1. Use Strong Passwords
‘Use unique, complex passwords for each of your accounts. Avoid using easily guessablips
ant to take backup of your work and data regularly. Always maintain offtine
‘Important data, This protects against ransomware or data loss incidents.
e Encryption for Stored Sensitive Data
r iys encrypt sensitive data stored on devices or in transit.
a" "
_ 6 Visit Secure Sites and Use Secure Connections
While browsing online, always make sure to visit secure sites over secure connections.
‘Secure sites’ URLs start with HTTPS and have a padlock sign on the status bar to depict it.
Also, make sure not to use public free WiFi’s unless you are sure about their security
measures. Preferably use VPNs when possible.
7, Limit Access and Sharing of Sensitive Data
Only share personal or confidential information when necessary, Always make sure to
Testrict access to data on a “need-to-know” basis.
8, Educate Yourself and Others
Always, stay updated on latest cyber threats and best practices. It is equally important to
Taise awareness about data security and privacy for ourselves and our community.
By following these guidelines and prioritizing data security/privacy, one can protect
oneself and one's information in our increasingly digital world.
With this we have come to the end of this session. Let us quickly revise what we have
learnt so far.
¢c L heck Point
al
Multiple Choice Questions
1. Data literacy refers to the ability to
(@ Write computer code () Understand and analyze data
(c) Create databases (2) Operate software programs
2% Which of the following is NOT an example of how data literacy can be beneficial ?
@ Fact-checking claims and spotting misinformation
‘@ Understanding trends and patterns in your community
(¢) Making decisions about your health based on data
jriting computer programs
in the Data Literacy Process Framework is :
() Analyze and Communicate(a Analyzing data for insights
ofthe following isa benefit of data literacy ?
{@) Helps in making informed decisions
Enhances critical thinking skills
{o) Enables effective communication wil
(@ All of these
F. Which of the following is NOT a step in ae
(o) Import and Tidy Data (Explore and Visualize
(@ Analyze and Communicate (@) Build Machine Learning, Models
'§& What does it mean to “tidy” data?
(@) Remove any errors or inconsistencies
(6) Organize data in a structured format
(© Visualize data in charts or graphs
9, Which of the follovcing is a measure for ensuring data security ?
(a) Obtaining user consent (b) Data minimization
(©) Encryption (@) Anonymization
10. Data privacy aims to:
(@ Prevent unauthorized access to data
{) Give individuals control over their personal data
(6) Protect data from being lost or damaged
(@ Detect and respond to cyber threats
Ii. Which of the following is NOT a potential security risk associated with AI systems ?
(a) Vulnerabilities that can be exploited _(b) Manipulated data
(c) Challenges to existing privacy regulations
(a) Improved detection of cyber threats
42. What does “Privacy by Design” refer to ?
(@) Developing AI systems with privacy in mind from the start
(@ Implementing strict access controls for sensitive data
( Obtaining explicit consent from users for data collection
(@ Regularly updating privacy policies and terms of service
. Which of the following is NOT a best practice for cyber security ?
(6) Using strong passwords and mult-fator authentication
‘) Keeping software and systems updated with latest security patches
‘cautious with emails and links from unknown sources
information freely over unsecured channels
ith data
the Data Literacy Process Framework ?
(@) Interpret data and draw insights(b) Public Wi-Fi network
(a) Unenerypted email
ik “What is the purpose of limiting accoss to sensitive data ?
{@) To enhance data security
(®) To comply with privacy regulations
{@) To facilitate data sharing and collaboration
{@ Both (a) and (b)
a7, Which of the following is NOT a way to protect data privacy ?
{@) Using data minimization techniques
(@ Obtaining user consent for data collection
{@ Implementing strong encryption measures
(@ Sharing personal data freely with third parties
18. What does it mean to “fact-check” information using data literacy skills?
(@) Verifying the accuracy of claims or information using data
(®) Creating visualizations to present data effectively
(© Organizing and cleaning data before analysis
@ Applying statistical techniques to analyze data
19. Which of the following is NOT a critical question to ask when practicing data literacy ?
(@) What is the source of the data?
(6) Are there any potential biases or errors in the data ?
(6) How much is the storage cost of data ?
(@ Are there alternative interpretations or explanations for the data ?
20, What does it mean to “explore” data in the Data Literacy Process Framework ?
(@) Gather and organize data from various sources
(@ Examine pattems and trends within the data
{0 Draw insights and communicate findings from the data
(@) Validate the accuracy and completeness of the data
21. Which of the following is NOT a potential risk associated with data breaches ? ij
(@) Financial losses (b) Reputational damage
(© Legal consequences (@) Improved cybersecurity measures
__ 2. What is the purpose of maintaining offline backups of important data ? A
fo) To protect against ransomware or data loss incidents——
abit eee es Wd _ with ata effectively
Dhaips you _____ lars aru spot misleading Information
7 Framework guides you through working with data effectively,
Fconcems safeguarding. _. from misuse.
dowolves measures like encryption, access controls, and network security,
al of data literacy is to enable decision-making
Tard ____ are two Key steps in the Data Literacy Process Framework.
helps prevent __- :
By Design” refers to developing Al systems with security and privacy in mind from yp
_____ passwords is one of the best practices for cyber security.
‘Installing ____ is important for maintaining system security.
2 Be cautions with emails and links from _ sources.
@ ___ to protect sensitive data stored on devices or in transit.
HE Comect to websites and use when possible for secure connections.
; Stay updated on the latest and best practices for cyber security.
____ about data security and privacy in your community.
‘Dasta literacy empowers you to navigate our world confidently
True | False
“Diste literacy is only important for data scientists and analysts.
Developing data literacy is a one-time process.
3. Data literacy can help you make better decisions based on facts.
‘4. Data security and data privacy are the same thing.
‘Al systems do not require any personal or sensitive data.
| Data literacy only involves reading and interpreting charts and graphs.
data literate can help you make better decisions based on data.
Literacy Process Framework does not include any steps related to data visualization.
rity and data privacy are completely unrelated concepts,
do not require any data for training or opecation.
ms do not pose any challenges to existing data privacy regulations
passwords is not an important practice for cyber security.
information over unsecured email or websites.
ups of data can protect against ransomware attacks.Q Personal data freely with third parties.
aims using data is an example of applying data literacy skills.
tential Wiases or errors in data is not a critical question in data literacy.
involves examining patterns and trends within the
can only result in financial tosses, but not legal or reputational consequences. |
‘oneself and others about cyber security is important for increasing awareness.
Competency Based Questions
1, Your local community organization is planning a fundraising event, and they have collected data
‘on past attendance, donation amounts, and demographics of supporters. Apply steps to analyze
this data and provide actionable insights to the organization. Your analysis should help them
‘determine the optimal event format, target audience, and marketing strategies to maximize
Particpation and fundraising efforts. How would you accomplish this ?
(@) Ignore the data and plan the event based on personal preferences.
(® Analyze the data but provide vague or irrelevant insights.
{© Apply the Data Literacy Process Framework, analyze the data, and provide actionable
insights for event planning,
{@) Outsource the data analysis to a third-party without providing any guidance.
2, Investigate the use of Al in various industries (¢.g,, healthcare, finance, transportation). What would
be the best approach to use this report
{@) Ignore the potential concerns and focus only on the benefits of Al.
(@) Investigate the use of AI but fail to provide any guidelines or recommendations.
(©) Develop a set of guidelines or recommendations without investigating the use of Al or
potential concerns
(@ Investigate the use of Al, identify potential concerns, and develop guidelines or
recommendations to address
3. Examine a real-world dataset related to a topic of your interest (e.g, environmental issues, social
trends, global health). Which of the following approaches would you follow to present your
analysis of the same highlighting the importance of critical questioning and data literacy skills.
(@ Examine the dataset without any critical analysis or alternative interpretations
(@) Propose alternative interpretations without examining or analyzing the dataset.
(©) Critically analyze the data, identify biases/limitations, propose alternative interpretations, and
present findings. é
(@) Ignore the dataset and present personal opinions on the topic.
To investigate the role of data literacy in combating misinformation and fake news, which of the
following approaches could have been used to identify and counter the spread of misinformation,
Investigate the role of data literacy without any real-world case study
Analyze a real-world case study general way
sate the role of data literacy, analyze a real-world case study, and present findings andon (A), Data literacy involves understanding and analyzing various Pee of data,
‘graphs, and charts i
sson (R), Data literacy is the ability to read and interpret data in only numerical formats,
{A}. The Data Literacy Process Framework ia linear process with no interconnected
Reason (R). The Data Literacy Process Framework includes steps for exploring, visualizing
communicating data.
‘Assertion (A). Data security measures,
‘unauthorized access to data.
“Reason (R). Data security and data pr
“Assertion (A). AI systems can enhance ce!
anomalies.
[Reason (R). Al systems themselves cannot introduce any security risks or abilities,
"5. Assertion (A). Implementing “Privacy by Design” principles involves developing Al systems wig
privacy considerations from the outset.
"Reason (R). Privacy by Design is a concept that is useful in establishin,
privacy protections controls.
rds and enabling multi-factor authentication are best praia
such as encryption and access controls, help p
terms that refer to the same conc
rivacy are interchangeable
cures by detecting cyber threat
tain security meas
1g robust data security andl
|
6 Assertion (A). Using strong passwo
for cyber security.
Reason (R). Regularly updating software and systems with thi
\e latest security
, Asserion (A), Educating oneself and others about cyber threats and best practices can #
"awareness and promote better security practices.
‘Reason (R). Raising awareness about data security
‘and not the general public.
and privacy is only relevant for IT profe
literacy is the ability to understand, analyze, and communicate with data effectively.
involves three things: Reading data, Working with Data, and Communicating with
benefits lite: spotting date trends, fostering critical thinking, helping in
communication, and helping in solving complex problems.
4 eseieoey toes linn a siruclure process called fhe Delerefers to taking steps for safeguarding data and inform
lation from un
‘alteration, and involves preventives measures to stop data breaches, lauthorized access,
refers to controling the access to our personal information and its aa
security and privacy have a mixed relation with AI. Data us;
sources. Comprised data can impact the performance of
Gaportont role n detecting and protecting against data breaches,
Best Patios for Over Security include: using strong passwords, keeping systems and software updated,
tng cnutious with emai and links, backing up data regularly, using encryption for stored sensitive
data, visiting secure sites and using secure connections, limiting access and! sharing of data, and keep
‘educating oneself about latest threats and protection measures,
Spun Time
1, What is data literacy ?
Ans. Data literacy is the ability to understand, analyze,
2 Wiyis data literacy important in today's world ?
Ans. Data literacy is important because it helps individuals think
Posed on facto, communicate effectively using data,
amounts of data.
ed t0 train AT should be secure, from
Ai. At the same time, At can play an
and communicate with data effectively,
critically, make better decisions
and solve complex problems involving large
Describe the three main steps of the Data Literacy Process Framework
Bas. The three main steps are: (i) Import and Tidy Data (gather and organize data), (ii) Explore and
Meedize (@xamine patterns and create visualizations), (fi) Analyze and Communicate (draw insights
and share findings),
4. Differentiate behoeen data security and data privacy,
Ans, Data security refers to protecting data from unauthorized access, theft, or damage. Data
Privacy concerns safeguarding personal or sensitive information from misuse and. giving individuals
Control over how their data is collected and used.
5, How can AI enhance data security measures ?
Ans, Alcan enhance security measures by using machine leaming models to detect cyber threats,
anomalies, or potential data breaches
6 What are some potential security risks associated with Al systems ?
Ans. AI systems may have vulnerabilities that can be exploited, be susceptible to adversarial
attacks with manipulated data, and challenge existing data privacy regulations,
What is the purpose of “Privacy by Design” ?
Ans. Privacy by Design refers to developing Al systems with privacy considerations and
ections in mind from the very beginning, rather than as an afterthought. ;j
ity i it important to use strong passwords and enable multifactor authentication? ;
Using strong, unique passwords and enabling multi-factor authentication re be
ty as they help prevent unauthorized access to a’
heeping softoare and systems updated with the latest security ppalches,
‘installing. software updates and security patches helps address 4
es and protect against potential cyber threats or atacks. ‘
‘of data encryption in ensuring data security ? ,
tion is used to t unauthorized access (0 sensitive data by
x n Marc ely be ed with he cnec derypton key. ing
#5 it recommended to limit access to sensitive data on a “need-to-know" basis ?
miting access to sensitive data on a need:to-know basis enhances data security and jyy
ly With data privacy regulations by restricting, access toonly those who require it for their wo
E How om educating oneself and thers about cyber security promote betler practices?
" Ans. Educating oneself and others about cyber threats, best practices, and the importance of
{Security and privacy can increase awareness and encourage the adoption of safer practices,
analyze, and communicate with data effectively.
‘to our personal information and its use.
jing data and information from unauthorized access.
representation of information and data.
_ 1 Which ofthe following is NOT a benefit of data literacy ?
{@) Helps in making evidence-based decisions
{) Enhances problem-solving skills
(6) Enables effective storytelling with data
_ @ Improves artistic abilities
| 2 What is the first step in the Data Literacy Process Framework ?
(@ Explore and Visualize
(@ Import and Tidy Deta
(0 Analyze and Communicate
(@) Interpret and Conclude
3. Data security zims to protect data from :
__(@) Being accessed by authorized individuals
wers individuals to make _based on facts and data. (decisions)
the Data Literacy Process Framework involves organizing and structuring(F)
a security and data privacy are Interchangeable terms that refer to the same concept. (F)
do not require any data for training or operation. (F)
StTONg encryption measures is a best Practice for protecting data privacy. (T)
by Design principles focus solely on implementing strict access controls for sensitive data. (F)
punique and complex passwords for all accounts is not an important cyber security practice. (F)
iblic Wi-Fi networks for convenience. (F)
iterate and how it can impact various aspects of life,
DDeserbe the process of applying the Mata Literacy Process Fraunework 10 analyze a dataset.
& Discuss the relationship between Al and data Privacy concerns, providing specific examples,
Be Dses: two potential issues related to As relation with data security and
| What measures can individuals and ‘organizations take to enhance data
DWiy és it important to be cautious with emails and links from unkno’
privacy concerns.
security 7
wn sources 2
2B.
‘Descibe the purpose and importance of maintaining offline backups of important data.
Bow can data literacy skills help individuals do fact-check claims and spot misinformation ?
Discuss the role of critical questioning in developing data literacy.
Explain the potential consequences of data breaches for individuals and organizations.
B How can education and awareness-raising efforts contribute to better cyber security practices ?
Describe the steps you would take to protect your personal data privacy online.J Tes of On
A Dota Acquisition
A Bost Practices for Acquiring Dato
J Features of Dato and Data Preprocessing
_ INTRODUCTION
tin the previous session that toda d
ato te types of data is essential for making informed decisions
Griving progress. From the moment data is generated, to its interpretation and app
‘every step in the data journey plays a crucial role. ati |
In this session, we shall talk about various types of data, data acquisition, data proces |
and the importance of effective data interpretation. eg
y's world is data driven world, ty
22 TYPES OF DATA
Data comes in many forms, each serving a distinct purpose in analysis and decisigy.
taking. Although data is available in multiple forms, broadly it can be divided into yg
primary types of data.
1. Quolitetive Dato
Qualitative data provides insights into the characteristics, attributes, and qualities qf)
phenomena. Qualitative data is often subjective and descriptive.
For instance, following are all examples of qualitative data
Customer feedback, such as comments, reviews, and testimonials provide qualitaie
insights into customer satisfaction, preferences, and experiences. |
| © Interview transeripts, such as conversations with individuals or focus groups yi
qualitative data, offering perspectives, opinions, and personal stories.
'® Social media sentiment, such as posts, comments, and discussions on social me
"platforms reveal qualitative insights into public opinion, trends, and sentiment.
ative data describes qualities or characteristics
F phenomena.following are all examples of qualitative data :
ata, sich as transactional records, revenue figures, and purchase history
provide quantitative insights into sales performance, trends, and patterns,
@ Sensor readings, such as measurements of temperature, pressure, and humidity
tollected by sensors for monitoring environmental conditions,
@ Financial metrics, such as stock prices, market indices, and financial statements
PP tamish quantitative insights into economic indicators, investment performance, and
eal financial health.
Qualitative data is also known as Categorical data
(Quantitative 4 Qualitative Data
‘and Quantitative data is known as Numerical data. Quantitative Data is data
You have already read about these in section 4.2.1 involving numbers and measuring
page 119-120. variables and Qualitative Data is
data tnvolving descriptive, non-
‘sumerical information.
gS also known as Categorical data and
iis known as numerical data,
Data
Quantitative Data
represents quantities or numeric values.
TR cannot be measured oF «: 2 It can be measured and expressed numerically, eg.,
‘alows, opinions, feelings, descr counts, measurements, scores, ratings.
MR collected through observations, inte It is collected through structured data collection
questions. methods like surveys, sensors.
It is objective and statistical analysis is possible.
It provides precise, measurable, and testable data,
Its analysed sing statistical and computational methods.
Ie answers “how many” and “how much” types of questions,
"| cuantitative data types: Number, percentages,S96 anni VAL INTELLIGENCE. (Supplement
2.3 DATA ACQUISITION
2 that are used to ¢9
cesses, methods of SIEM ciment or analyse
Data Acquisition refers to pr
inforiation related to a certain theme. or OR/@HTSS °° from diverse suites
ing info
ives collecting yaa Acquisition
phenomenon. Acquiring data invol =.
various methods tailored to specific objectives ad pe Acquisition rel
contexts. ves, methods OF
You have already read about common approaches t©ghat_ 37
data acquisition, in section 5.3 page 136. informe oieci, pe
.e common data or analyse some pheno!
Let us quick recall about some of th
acquisition methods, here = '
ctured question? x :
1, Surveys. case; a ae eintons, preferences, pehaviors, and demoarephics,
or’ te tions between researchers and partiq.
sstions and probing inquities
observation and recording |
or controlled settings,
ction 5.3 page 136 forthe
cies administered to individuals gy
ematic
fi i ies involve syst
Observational stu eo surl
i a in 1
of behaviors, interactions, and phenomen:
There are many other ways of acquiring data. Please refer to s¢
came.
2.4 BEST PRACTICES FOR ACQUIRING DATA
ictices to ensure validity, reliability,
3.
Acquiting jh-quality data requires using the best pra‘
y it is recommended to use following
essential guidelines :
4. Define Clear Objectives. Clearly state the aes }
purpose, goals, and research questions (as per
Gected outcomes) that should give clear guidance to 3 ;
Be elect appropriate
the data acquisition process. abode
for the context of data collection, objectives of data
‘collection, and the target audience from which the
e Ensure data quality
younger audience the questions and languages J
eee a }
collected Proied pa |
older or professional people. porioes J
‘questionnaires, interview guides, observation protocols, Maintain ethica | j
and experimental procedures with careful attention as
Figure 2.1 Best Practices for
nd ethical integrity. For effective data acquisition,
2. Select Appropriate Methods. Choose date
collection methods and techniques that are suitable Design robust
instruments
data is being collected. For example, to get data from
at ‘
“Design Robust Instruments. Develop survey
Data Acquisition.Seision 2: ACQUIRING DATA, PROCESSING, AND INTERPRETING DATA
Mae te ate aalt¥: Implement measures to validate, verify, and clean the
minimize errors, inconsistencies, and biases,
e so gape Privacy. Respect confidentiality, anonymity, and informed
Maintain Bthi eguard the rights and privacy of research participants.
Foal Poles tats. Adhere to ethical guidelines, codes of conduct and
governing research conduct, integrity, and transparency.
Note
ke High quality data is valid, reliable, and of ethical integrity.
25 FEATURES OF DATA AND DATA PREPROCESSING
ATA PREPROCESSING
Raw data often contains errors, inconsistencies, and missing values that require
preprocessing to enhance quality, accuracy, and usability. Thus, data is preprocessed to
clean it and make it appropriate for use,
Refer to section 4.2, page 119 that has already talked about data features and ways to
preprocess data |
Some common data preprocessing techniques are : ‘Daa Preprocessing
(@ Data Cleaning ii) Data Reducti Data Preprocessing refers
Be tans {#) SCY to the process of making data
(ii) Data Transformation (iv) Data Integration _ appropriate for use by removing
1. Dato Cleaning discrepancies in it.
Data cleaning involves identifying and correcting errors, inconsistencies, and anomalies in
yaw data so it’s easier to understand and work with and become more accurate and
reliable, ¢.g., imagine you have a list of student names and ages, but some ages are
missing. You'd fill in the missing ages or remove the incomplete entries to make sure your
data is complete with every student entry having an age.
{Date Cleaning
Data Cleaning is a process
i i js of identifying and correcting
% Removing duplicate records. Identifying and eS
Aliminating duplicate entries or observations to Shomalies in raw data,
prevent redundancy and maintain data integrity.
‘There are multiple data cleaning methods, such as
Handling missing values. Imputing missing values or deleting incomplete records to
mitigate the impact of data gaps on analysis and interpretation.
® Standardizing formats. Converting data into consistent formats, units, and
structures to facilitate comparison, aggregation, and analysis.
2 Dota Reduction
: Data reduction is a technique used to reduce the size
of a dataset while still preserving the most important
information.aes i table format or repre: .
‘transformation involves converting raw data into a suital
S eoreqeaene ‘or modeling. There are multiple ways of doing so, cee
are:
@ Normalization. This refers to scalin¢
distribution to remove differences in magi
4 Encoding categorical variables. This refers to converting categorical variables ing
for computational analysis and modeling,
9 of numerical data to 2 common range
nitude and facilitate comparison,
‘numerical or binary representations
@ Feature engineering. This refers to creating new Data Transformation
‘features or variables from existing data to capture Data Transformation ref
relevant patterns, Jonships, or trends for to the process of converting
aa _— data into a suitable format.
eee representation for a
Det visualization, or modeling. —
Data integration involves combining data from multiple sources or formats into a unified
Gataset for analysis, reporting, or decision-making. There are multiple ways of integrating
data, some of these are:
© Merging datasets, which is combining datasets with common identifiers or keys te
‘enrich data with additional attributes or variables, eg., if you have sales data in one
“spreadsheet and customer data in another, you'd merge to create a single dataset with
both sales and customer information.
° Joining tables, which is linking relational databases or tables based on shared fields
‘or relationships to consolidate related information for analysis.
@ Concatenating files, which is
appending or concatenating files with ean
‘Similar structures or formats to create
comprehensive dataset for analysis
Data Data Data
Cleaning Preprocessing Tiensfomation
Figure 2.2 Data Preprocessing.) Data Interpretation
Analysis
analysis is like examining clues to
(pattems, trends, or answers to qu
ent test scores. You might analy:
if there's a relationship betwee
solve a mystery, It involves studying your data to
estions. For instance, suppose you have a dataset of
ze the data to see
es
Data Ai
m study time and Gece
is i of
est performance by Comparing scores of students Data Analysis is a process
who studied a lot versus those who studied less,
apply many techniques on data
to find/extract trends, correla-
tions, outliers, and variations
that convey a meaning or point
toa specific result.
Descriptive analysis shows what happened. It describes data
, analysing sales data to get sales numbers for each employee and the
Data analysis takes place in many forms :
® Descriptive analysis answers What happened ?
nu ysis. Diagnostic analysis finds why something happened, e.g.,
hospitals suddenly start having increased number of patients. Descriptive analysis may find
that many hospital patients had the same virus symptoms, so the virus caused the patient
increase.
© Diagnostic analysis answers Why did it happen ?
(iii) Predictive Analysis. Predictive analysis predicts what might happen in the
on data patterns, e.g., a product sells best in September and October each year,
sales are predicted for those months next year.
® Predictive analysis answers What might happen in the futureor, numbers. It involves m.
di he story behind the Tr it. For example
nd | gen interpret it as that stegitt
et Fe rapgesting a positive relat;
- ~
‘Data Interpretation
academic achievement. bet
a Data Interpretation jp
a eemeniial for extracting i sense of the anaigh
ss from your data and using them * data, drawing conclusions
and problem-solving. deriving actionable insighge
inform decision-making
Interpretation problem-solving
ition takes place in various ways * : ;
. * ing and interpreting data yi
|. EDA involves analysing an
ee eat and insights without preconceived hypotheses, |
n testing hypotheses, confi |
data analysis (CDA). CDA focuses 0 ig ‘
ae E isting knowledge using statistical methods and inferentj) |
|
alysis, Predictive analysis aims to forecast future trends, behaviours
wv a ‘on historical data patterns and predictive modeling techniques,
data analysis. Diagnostic analysis focuses on understanding the underlying
‘Gatvers, or factors contributing to ohserve patterns, anomalies, or outcomes jn
Dato Interpretation
epretation is important for extracting actionable insigh
driving organizational success. Data interpretation is i
any benefits, such as :
n decision-making instead of relying on gut feelings alone,
pattems, trends, and issues to solve problems and optimize
ts, informing
portant and
performance of employees, products, strategies etc. to foster
by pinpointing potential risks and developing plans:
Past data to find areas to reduce expenses.
: based on historical data trends for bettstudying about: eer pre}
both data forms as given th:
‘Qualitative vs Quantitative Data Handling and Processing
e table below.
Qualitative Data
Non-numercal or catego
ical information, such as
Sescriptions, opinions, observations ete,
Subjective or qualitative aspects of a
phenomenon.
" eet nie a sy al ob ,
factual iteltigence for smarter decision-making, forecasting, optimization
roeesing and Tea Aetins, we can summarise it
Quantitative Data
Numerical information that can be
measured or counted.
Objective or quantitative aspects of a
phenomenon.
Words, texts, images, or codes ;
‘can be orga-
nized into categories, themes, or patterns
or patterns,
Nnumbers or numerical values ; can be
‘organized into tables, graphs, charts, or
Interviews, focus groups,
observations, or open-
ended survey questions,
statistical summaries.
Surveys, experiments, or structured
observations,
Identifying patterns, themes, or commonalities,
patterns, or trends.
Focuses on numerical relationships,
Coding, content analysis, or discourse analysis.
Computations, statistical tests, modeling.
In-depth explanations, rich descriptions, and
contextual insights.
Numerical measurements, statistical
relationships, and quantifiable results,
Coreckpoint
ee
Not easily generalizable to a larger population
(findings may be specific to the studied context).
Multiple Choice Questions
1, Qualitative data deals with
(a) Descriptions and qualities
() Mathematical calculations
“Quantitative data deals with:
| partons and qualities
Generalizable for a larger population.
(&) Numeric values and quantities
(@ Statistical analysis
(#) Numeric values and quantities
(@) Colors and characteristics
(b) Test scores
(©) Focus |(d) Content
(d) Testable
() Measurable
(d) Exploratory
@) Objective ©) Descriptive
is NOT a method of data acquisition ?
(d) Data storytelling
p @) Sensors (@) Observations
fy the data analysis recess?
. ae ) Data interpretation (id) Data processing
¢ (b) Data acquisition iG
() Interpreting the data
(d) Making decisions based on data
data preprocessing ?
(0) Data cleaning
(d) Data storytelling
( Performing operations on the data
{d@) Interpreting the data
Visualizing the data
he following is NOT a data processing operation ?
ormatio (b) Data modeling
(d) Data integration
yn is important ?
() Quality control
(d) Anticipation of future trends
‘communication
[is NOT a reason why data interpretatior
collection
of analysis describes or summarizes data using statistics ?
- analysis (b) Predictive analysis
five analysis (d) Prescriptive analysis
‘analysis determines the reason behind an occurrence ?
(b) Descriptive analysis
analysis (d) Prescriptive analysis
;analysis ‘uses data to make projections about the future ?
analysis () Diagnostic analysis
(d) Prescriptive anal
answers which question ?
| we do about it ? (b) What happened ?
n in the future ? (d) Why did it happen ?
recommendations based on other analysis types ?
0 (b) Diagnostic analysis
(d) Prescriptive analysisof a diagnostic analysis question a
happened?” () “Why did sales decrease last quartet 2”
pemild we do about it 2° (@) "What ate the projected sales for next year 2”
ich industry 7
(©) Manufacturing (d) All of these
ws for future actions?
ctive analysis is commonly used in whi
G@iiesitore (0) Finance
dh type of analysis provides recommend,
{) Descriptive analysis (b) Diagnostic analysis
{@) Predictive analysis (4) Prescriptive analysis
B Which of the following is NOT a benefit of data interpretation?
{@) Informed decision-making
(b) Risk management
(@) Performance assessment
Fill in the Blanks
data deals with descriptions and qualities
: data deals with numeric
{¢) Data acquisition
values and quantities
5 Qualitative data is in natu
@ Quantitative data isin nature
I Data preprocessing involves ___ the data for Processing.
& Detadeaning is a step in data
7, The first step in the da
analysis process is data
§ Sensors are a method o
9 Data normalization is sai! Aghia
$9. The purpose of is to remove errors and inconsistencies,
is tl { combining data from multiple sources,
2 analysis describes or summarizes data using statistics,
#8 analysis aims to determine the reason behind an occurrence.
He analysis uses data to make projections about the future
5, analysis provides recommendations based on other analysis types.
Pe analysis is used to identify the root cause of a problem,
‘True { False
D Oialitative data is useful for exploring new phenomena.
Quantitative data is useful for testing hypotheses and generalizing findings.
tive data is objective in nature.
it data is subjective in nature.
Processing involves performing operations on the data,
interpretation is not important for quality control processes, 4
analysis aims to determine the reason behind an.occurrence, » y//), i
ostic analysis is used to identify the root cause of a pd through structured surveys:
{s collected through methods like observatio
alps answer questions like “How any?” and “how much?”
or quantitative data include numbers and percent :
Idata is useful for testing hypotheses ane! generalizing findings
sssing involves analyzing the data
leaning is a stop in data processing:
aggregation involves performing operations on the dal
‘Data visualization is an example of a data processing op?
Sata interpretation is important for data
Competency Based Questions
ance of students across diff
help teachers and aninistal
ins and interviews,
ages.
acquisition
1. Your school has collected data on the academic perform
and grade levels. Which of the following approaches can
faformed decisions about curriculum planning and student support
{a) Create a presentation with te
{@) Create visualizations without any analysis or inter
Ne) Analyze the data and creat informative viewalatins, °F
{@ Ignore the data and rely on personal experie
> A company is analyzing customer feedback to impr
social media comments, and customer support interactions
Tpethoris would be most effective for acquiring this diverse dataset?
(@) Conducting in-person interviews with a select group of ¢
(@) Scraping data from random websites for a broader perspective
{c) Integrating data from various sources including surveys, socal media APIs, and CRM systems
@ Ignoring social media comments as they might not be relevant
3, A research team is studying the effects of climate chan; i cn.
ge on biodiversity in a particular reg
“They have collected raw data from weather stations, satelite images, and field surveys, What is
rst step they should take in preprocessing this data ?
(@) Removing oulies and anomalies (0) Converting satellite images into nui
() Aggregating data from different sources into a single dataset
(@ Conducting statistical analysis on the rave data
‘team wants to analyze the purchasing beha
Ee ceca es of customers, They have a
r timestamps of transactions.
oe Peay canta? P ions. What preprot
¢ amounts to a standardized range
xt-based explanations of the data.
pretation.
Jaining how they can aif
snces and opinions.
ove its products. They collect data from onli
Which of the following
sustomers
imerical datiSession 2: ACQUIRING DATA, PROCESSING, AND INTERPRETING DATA
Assertion & Reasoning Questions
aT Reasoning Questions
a questions, a statement of assertion (A) is followed by a statement of reason (R).
choice as ;
th A and R are true and R is the correct explanation of A,
th A and R are true but R is not the correct explanation of A.
Anis true but R is false (or partly true,
‘Ais false (oF partly true) but R is true,
(¢) Both A and R are false or not fully true.
‘ iipertion (A). Descriptive analysis summarizes data using statistics,
Reason (R). Descriptive analysis to determine the conclusion about the distribution of data.
f Asstion (A). Qualitative data deals with numeric values and quantities
Reason (R). Quantitative data deals with descriptions and qualities,
4 Ascatton (A). Structured surveys are a method of collecting qualitative data,
Reason (R). Qualitative data is subjective in nature,
i Assation (A). Qualitative data helps answer questions like “how many?”
Reason (R). Quantitative data helps answer questions like “
& Assttion (A). Quantitative data is useful for testing hypotheses and generalizing findings.
Reason (R). Qualitative data is useful for exploring new phenomena
& Ascertion (A). Data cleaning is a step in data preprocessing,
Reason (R). Data preprocessing involves analyzing the data,
J, Sezertion (A). Data agurogation is an example of a data preprocessing operation.
Reason (R). Data processing involves
and “how much?”
‘why?” and “how?”
), performing operations on the data.
§ Assertion (A). Data interpretation is important for deriving outcomes,
Reason (R). Data visualization is useful for communicating insights.
Cc Us REVISE
Data can be divided into two prim
Qualitative data describes quali
Hy types: Qualitative data and Quantitative data.
5 or characteristics of some entity or phenomena,
Guantitative data represents information about something through numerical values,
Guaitative data is also known as categorical data and quantitative data is known as mumerial data.
Common data acquisition methods include surveys, questionnaire, interviews,
High quality data is vali, reliable, and of ethical integrity.
Pata reprocessing refers to the process of making cata appropriate for use by removing discrepancies in it,
Some common data preprocessing techniques are : Data Cleaning, Data Reduction, Data Transformation,
‘nd Data Integration,
observations and many others,
Data cleaning isa process of identifying and correcting errors, inconsistencies, and anomalies in raw data.
Pata reduction is a processor a set of techniques used to reduce the size of a dataset while stil preserving
the most important information. *
Pata transformation refers to the process of converting raw data into a suitable format or repres
Gnalysis, visualization, or modeling.
‘Processing refers to manipulating, analyzing, and Interpreting data to extract
derive insights. ey+ Data Analysis and Date Interpretation. q
sis process of apply many techniques on data to find/extract trends, core
hat convey a meaning or point toa specific result ton,
-can be descriptive, diagnostic, predictive, ai Dae
‘analysis answers : What happened? ; Diagnostic analysis answers: Why dig ip
‘enalysis answerse What might happen in the future’; Prescriptive analysis any
we do about it? a
‘Doe interpretation involves mating sense of the analyzed data, drawing conclusions, ang
‘ectionadle insights to inyorm decision-making and problem-solving. |
"> Bifective dato interpretation is important for extracting actionable insights, informing decision,
‘and driving organizational success. Mahng
tative and quantitative data, How are these alternetively known as?
‘Ans Qualitative data deals with descriptions and qualities, while quantitative data deals y,
‘pumeric values and quantities that canbe measured, Qualitative datais also known as Categoria, an }
‘and quantitative data is known as Numerical Data. |
2 Prooide ax example of qualitative « |
‘Ans. Customer reviews, interview transcripts, and case studies are examples of qualitative dag |
How s quantitative dats collected ?
‘Ans Quantitative data is collected through structured methods like surveys, sensors, and di |
collection instruments.
4. Give an example ofa predictive enalysis question. |
Ans. An example of a predictive analysis question is “What are the projected sales for thene, |
‘quarter based on historical daia trends 2” |
5. What is the purpose of data cleaning in data preprocessing ? |
“Ans. The purpose of data cleaning in data preprocessing is to remove errors, inconsistencies, and
inaccuracies from the data
6. Gine an example of 2 data processing operation. |
‘Acs An example of data processing operation is data aggregation, which involves summarizing
or rolling up data. |
7. Why és dats interpretation important ? |
tee geri oe formed deviciormaking, isk managencte
‘control, performance assessment, and anticipating future trends kased on data insights.
“8 Mention 2 method of data acquisition.
___ Ans: A method of data acquisition is using sensors to collect data, such as sensor readings tf
Internet of Things (IoT) data.
some examples of various types of date analysis,
. Some examples of various types of data analysis are :
a ‘The most popular ice cream flavors based on total sales.
‘Customers buy more when there are promotions/discounts.
avs mall traffic is expected to increase by 10% based on local population growth
'a new store location in the fastest growing neighborhood.a Seven 2 ACQUMNG DATA, MOCESSNG, AND INTERPRETING DATA SWAT
4 normalization in data Preprocessing ?
oemalizaion is data prepraconing
age oF ForMat For consistent analyse,
step that involves scaling of transforming data to
Practical Session
HL Trend Analyaia’s Analyzing Cafeteria Food Sales Trends
tafigure out the trend of sates of fod items in school canteen or cafeteria, Students can collect data
iy eekly 0s of difere food Items inthe school cfeterla over «peri of te (eg, ome seneater
onde yor), They should then analyze the daa to dentytrenda, uch as
{@ Whi fod tome are most popular and least popular?
How do sales vary across diferent
days of the u
asing or decreasing
oF months ?
(49, there an inc trend in the sales of certain items over time ?
p do sales trends correlate with factors like
(ue) How
‘Sehution. Sample Data :
Me table below shows the daily sales (in units) of three
ria for one month (20 schoo! days)
iehool cafeteria for one mont y'
Day Pizza Burgers Salads
‘weather, school events, or holidays 2
food items (Pizza, Burgers, and Salads) in the
1 45 62 18
2 52 58 2
3 48 65 20
20 38 72 5
Interpretation and Results
@ The most popular item is Burgers, with the
{i) Salads have the lowest average daily
popular item.
(i) There is a slight!
awareness camp
highest average daily sales of around 62 units.
sales of around 20 units, indicating they are the least
Y increasing trend in Salad sales over the month, possibly due to health
ns or seasonal changes,
WG) Burger sales peak on Fridays, while Pizza sales are
(©) Overall, there isa
sales on Mondays
relatively consistent throughout the week.
cyclical weekly pattern in sales, with higher sales on Fridays and lower
g errors, inconsistencies, and anomalies in raw data
an The process of combining data from multiple sources or formats into @ unified dataset for
Be Feporting, or decision-making
A process or a set of techniques used to reduce the size of a dataset while stil preserving the
information
‘The process of converting raw data into a suitable format or representation for
tion, or modeling
Data describing qualities or characteristics of some entity or phe
Data representing information about something t In(0) Howe many? and How much ?
(0) What are the characteristicn ?
() Tent, midio, video, Images
(d) Measurements and scores
(@) Numbers, percentages, frequencies
(@ Audio recordings
operation?
(@) Data collection (if) Data visualization
(6) Compliance and reporting
(@) All of these
?
{@) What might happen in the future 2
6 What happened 7 (d What should we do about it ?
‘Diagnostic analysis aims to answer which question?
(@) What happened? () What might happen in the future?
What should we do about it? (@ Why did it happen?
© Diata acquired from sensors is an example of:
: | Qualitative data () Quantitative data (€) Descriptive data__(#) Prescriptive data
TBE Wee of te flowing isa step in data preprocessing?
‘Quatitative data beips answer questions like ___and —__.("wshy’ how?" )
= (“how
2 Quantitative data helps answer questions like _and >*, “how mei
“The purpose of data cleaning in the preprocessing stage is to
(@) Remove errors andi
(@ Visualize the data
(@ Date integration
(@) Data normalization
used to identify the root cause of a problem?
(b) Predictive analysis
at (@) Diagnostic analysis
example of __daia. (qualitative)
ple of _ data. (quantitative)
is an example of _data. (quantitative) ‘