Data Literacy
Data: Data refers to raw facts or numbers (statistics) that are collected, stored,
and analyzed for various purposes. You can think of data as puzzle pieces put
together to reveal a bigger picture.
Data Literacy:
Data literacy means knowing how to understand, work with, and talks about data.
It's about being able to collect, analyze, and show data in ways that make sense.
The process of continuously acquiring, developing, and improving the ability to
understand, interpret, and use data effectively is called cultivating data literacy.
In short, data literacy gives you the ability to analyse and get valuable insights from
the massive amount of data that surrounds us in our daily lives.
Data Pyramid
The data pyramid is a hierarchical structure used in data literacy to represent the
progress of data from its raw form to actionable insights.
Let us move from bottom to top in order to understand the different stages of Data
Pyramid:
* Initially, data exists in its raw form, which is not very useful.
Example:
• It is like scattered pieces of a puzzle or a pile of ingredients before cooking
a meal.
* Data is processed through various methods, like analysing and organising raw
data, to provide meaningful information.
Basically, processing data makes it easier to understand and interpret.
Example:
• It is like arranging the scattered pieces of a puzzle, or using the
ingredients to make a delicious dish.
* The processed data transforms information to knowledge. It help understand
how things are happening in the world around us.
Example:
• It is like understanding how joining the scattered pieces of a puzzle
reveals the complete image, or understanding what ingredients and steps
are involved in making a dish.
* Wisdom takes us a step forward by providing an understanding of things are
happening in a particular way.
Example:
• It is like becoming a master chef who not only knows how to cook a dish
but also understands the science behind it.
Impact of Data Literacy: Data literacy is the ability to read, analyse, and interpret
data effectively. It helps you make appropriate decisions or think critically to solve
problems based on some evidence or experience rather than intuition or
guesswork in your personal or professional lives.
How to become Data Literate?
Data Literate is a person who can interact with data to understand the world
around them.
Data literacy helps people research about products while shopping over the
internet.
How do you decide the following things when we are shopping online?
● Which is the cheapest product available?
● Which product is liked by the users the most?
● Does a particular product meet all the requirements?
A data literate person can –
● Filter the category as per the requirement – If the budget is low, select the
price filter as low to high
● Check the user ratings of the products
● Check for specific requirements in the product
Data Literacy Process
Framework:
The data literacy framework provides guidance on using data efficiently and with all
levels of awareness. Data literacy framework is an iterative process.
What are Data Security and Privacy? How are they related to AI?
In the digital era, data privacy and data security are often used interchangeably but
they are different from each other.
Data Privacy: Data privacy is the right to decide who can
access your personal data, such as bank account balances,
credit card numbers, etc.
Data Security Data security shields data from
unauthorized use, access, and breaches.
It protects data from tampering, destruction, and alteration.
Acquiring Data, Processing, and Interpreting Data:
Working with data is crucial because it helps you identify the types of data and
ways to utilise them to produce meaningful outcomes. Data can be classified into
two types:
Textual data (Qualitative) is further subdivided into two parts:
Nominal data: This textual data is also known as categorical data. This data
represents the categories or groups that have no inherent order. For example,
gender (male, female), eye color (blue, green, brown).
Ordinal data: This data represents categories with a meaningful order. For
example, education level (high school, bachelor's, master's, PhD).
Numeric Data(Quantitative) is further classified as:
Discrete data: This numeric data contains only whole numbers and cannot be
fractional. For example, the number of students in the class can only be a whole
number, not in decimals.
Continuous data: This numeric data is continuous, which is often obtained through
measurement. For example, height, weight, temperature, voltage, etc.
Data Acquisition/Acquiring Data
Data Acquisition, also known as acquiring data, refers to the procedure of gathering
data. This involves searching for datasets suitable for training AI models. The
process typically comprises three key steps:
Data Augmentation: Data augmentation means increasing the amount of data by
adding copies of existing data with small changes.
Data Generation: Data generation refers to generating or recording data using
sensors. For example, recording temperature readings of a building. This recorded
data is stored in the computer in a suitable form.
Sources of Data:
Various Sources for Acquiring Data:
Primary Data Sources — Some of the sources for primary data include surveys,
interviews, experiments, etc. The data generated from the experiment is an
example of primary data.
Secondary Data Sources—Secondary data collection obtains information from
external sources, rather than generating it personally.
Examples: Market or sales records, Government publications, Websites,
Social media data, Satellite data.
Checklist of Factors that Make Data Good or Bad
Data acquisition from websites
Ethical concerns in data acquisition:
Data Preprocessing:
Data Preprocessing: To ensure the data collected is of high quality and usable,
various features of data and steps in data preprocessing are essential. These
features and steps are aimed at improving the accuracy, consistency, and
completeness of the data.
Usability of Data:
There are three primary factors determining the usability of data:
Structure: This defines how the data is stored. Data stored in a structured form is
organised in a pre-defined format, which is easy to access, manage, and analyse. This
structured data is stored in a database, spreadsheet, etc.
Cleanliness: Data cleaning is the process of removing duplicate data, inconsistencies,
and inaccuracies to improve its quality and usability. In the following example, the
duplicate values are removed after cleaning the data.
Accuracy: Accuracy indicates how well the data matches real-world values, ensuring
reliability.
Features of Data:
Data features are the characteristics or properties of the data. They describe each
piece of information in a dataset.
For example, in a table of student records, features could include things like the
student's name, age, or grade. In a photo dataset, features might be the colors
present in each image.
Independent features are the input to the model—they're the information we
provide to make predictions.
Dependent features, on the other hand, are the outputs or results of the model—
they're what we're trying to predict.
Data Processing and Data Interpretation
Data Processing: Data processing is a crucial step in data analysis. It is the series of
operations performed on raw data to convert it into meaningful information.
It involves a number of steps
Data
to convert raw data into a
Processing usable format.
Data Interpretation: It is the process of making sense of processed data to derive
meaningful insights and inform decision-making. This involves analysing the data,
identifying patterns, trends, relationships, and drawing conclusions
that can guide actions and strategies
.
Data It involves making sense of
the processed data to derive
Interpretation meaningful decisions.
Methods of Data Interpretation:
Based on the two types of data, there are two ways to interpret data-
● Quantitative Data Interpretation
● Qualitative Data Interpretation
Qualitative Data Interpretation
● Qualitative data tells us about the emotions and feelings of people
● Qualitative data interpretation is focused on insights and motivations of people
Data Collection Methods – Qualitative Data Interpretation
One-to-One Interviews: In this method, data is collected using a one-to-one
interview.
Observation: In this method, the participant – their behavior and emotions – are
observed carefully
Record keeping: This method uses existing reliable documents and other similar
sources of information as the data source. It is similar to going to a library.
Quantitative Data Interpretation:
▪ Quantitative data interpretation is made on numerical data
▪ It helps us answer questions like “when,” “how many,” and “how often”
Data Collection Methods -Quantitative Data Interpretation
Polls: A poll is a type of survey that asks simple questions to respondents. Polls are
usually limited to one question.
Longitudinal Studies: A type of study conducted over a long time
Survey: Surveys can be conducted for a large number of people to collect
quantitative data.
Types of Data Interpretation:
There are three ways in which data can be presented:
Textual DI
▪ The data is mentioned in the text form, usually in a paragraph. Used when the
data is not large and can be easily comprehended by reading. Textual presentation
is not suitable for large data.
Tabular DI
Data is represented systematically in the form of rows and columns. Title of the
Table (Item of Expenditure) contains the description of the table content. Column
Headings (Year; Salary; Fuel and Transport; Bonus; Interest on Loans; Taxes)
contains the description of information contained in columns.
Graphical DI
Graphical data interpretation involves the analysis of data represented in the form
of graphs, like bar graph, line graph, pie graph, etc.