classVIII DS Student Handbook
classVIII DS Student Handbook
GRADE VIII
     Version 1.0
DATA SCIENCE
     GRADE VIII
   Student Handbook
                                                 ACKNOWLEDGMENT
Patrons
  •   Sh. Ramesh Pokhriyal 'Nishank', Minister of Human Resource Development,
      Government of India
  •   Sh. Dhotre Sanjay Shamrao, Minister of State for Human Resource
      Development, Government of India
  •   Ms. Anita Karwal, IAS, Secretary, Department of School Education and Literacy,
      Ministry Human Resource Development, Government of India Advisory
The objective of this curriculum is to lay the foundation for Data Science,
understanding how data is collected, analyzed and, how it can be used in solving
problems and making decisions. It will also cover ethical issues with data including
data governance and builds foundation for AI based applications of data science.
Therefore, CBSE is introducing ‘Data Science’ as a skill module of 12 hours duration
in class VIII and as a skill subject in classes IX-XII.
CBSE acknowledges the initiative by Microsoft India in developing this data science
handbook for class VIII students. This handbook introduces the concepts of data
science, data visualizations and applications of data science in AI. The course covers
the theoretical concepts of data science followed by practical examples to develop
critical thinking capabilities among students.
The purpose of the book is to enable the future workforce to acquire data science skills
early in their educational phase and build a solid foundation to be industry ready.
                                              Contents
                                                                                                              I
CHAPTER
                                            Introduction to Data
                                                                                   1
                                            2. Real-World examples of
                                               Data
                                            Now that we have understood what is
                                            data and what are types in which data is
                                            categorized, an obvious question that
                                            strikes our mind is that, what is the
                                            application of this data in the real world?
                                                                                     2
content. Combined with that, it analyses    •       Effective   targeting     of    the
the videos that people usually play post-           advertisements
watching a video.
These people's preferences are stored
and studied. Later an algorithm in the          Recap
background creates a pattern of people's
preferences and shows you the same              •     We are surrounded by data. Every
content in suggested videos, which the                computer, every mobile device
majority of people watched post the                   generates immense amount of data.
existing clip.                                  •     Data comes in different types such
                                                      as audio, video, text etc.
This is how data analysis is applied in
                                                •     Data can be qualitative or
the entertainment industry in real life.
                                                      quantitative, continuous or iscrete.
Some of the benefits of data in the             •     Discrete data can take only a
entertainment industry are:                           specific value.
                                                •     Continuous data can have a value
•   Predicting interests of the audience
                                                      within a specific range.
•   Optimized or on-demand scheduling
    of media streams in digital media
    distribution platforms
•   Getting insights from customer
    reviews
                                                                                      3
                                     Exercises
                          Objective Type Questions
Please choose the correct option in the questions below.
   1. Discrete data can take any value in a range.
          a. True
          b. False
   2. Continuous data cannot take decimal values.
          a. True
          b. False
   3. Information stored in a PDF is not considered data.
          a. True
          b. False
   4. Quantitative data cannot take numerical values
          a. True
          b. False
   5. Qualitative data is descriptive in nature.
          a. True
          b. False
   6. “How is the weather like?” is what kind of data
          a. Quantitative
          b. Qualitative
   7. Which of the following is considered data?
          a. Speech
          b. Video
          c. Messages
          d. All of the above
   8. How is data used in the entertainment industry?
          a. Predicting interests
          b. Targeting ads
          c. Both of the above
   9. Number of days in a week is an example of?
          a. Discrete Data
          b. Continuous Data
   10. What are the types of quantitative data?
          a. Discrete
          b. Continuous
          c. Both a and b
                                                            4
                               Standard Questions
Please answer the questions below in no less than 100 words.
   1.   Explain what data is, with the help of two real-life examples.
   2.   How is the data categorized?
   3.   What is Discrete Data?
   4.   What is Continuous Data?
   5.   Give two examples of real-life applications of data.
                                  Applied Project
Data analytics has many applications in our life. Discuss how data analytics is applied
in the airline industry to predict flight delays. Few factors which influence flight delays:
                                                                                          5
CHAPTER
                                                                                    6
                                              create  actionable    plans        for
                                              companies and organizations.
Activity 2.1
                                              Data Scientists are analytical experts
Try to find everyday used applications
                                              who utilize their skills both in
that depend on data science.                  technology and social science to find
                                              trends and manage data. They use
                                              their   industry knowledge and
2. Careers in Data science                    context-specific understanding to
As we understand about Data, Data             find    solutions     to     business
Analysis, and Data Science, one of the        challenges.
important questions that coin up is,       2. Business Intelligence Analyst -
what are the career options that we can       Business Intelligence Analysts use
take up in Data Science?                      data to assess the market and find
We have learned about the real-life           the latest business trends in the
applications of data and data science.        industry. This helps to develop a
Many of us may have found it interesting      clearer picture of how a company
and may want to pursue this career to         should shape its strategy.
explore it further.
                                           3. Data Engineer - Data Engineer
To help you nail through the right            examines not only the Data for their
choice, let us understand which               own business but also that of third
different careers we can take up in Data      parties. In addition to mining data, a
Science. Some common job titles for           data     engineer    creates   robust
data scientists include:                      algorithms to help analyze the data
                                              further.
   1.   Data Scientist
   2.   Business Intelligence Analyst
                                           4. Data Architect - Data Architects
   3.   Data Mining Engineer                  work closely with users, system
   4.   Data Architect                        designers, and developers to create a
   5.   Senior Data Scientist
                                              blueprint that data management
Let us now briefly go through these job       systems use to centralize, integrate
titles to get a better understanding:         and maintain the data sources.
1. Data Scientist - Data Scientists are    5. Senior Data Scientist - Senior Data
   data enthusiasts who gather and            Scientists anticipate the business's
   analyze large sets of structured and       needs in the future. Although they
   unstructured data. A data scientist's      might not be involved in gathering
   role combines computer science,            data, they play a high-level role in
   statistics, and mathematics. They          analyzing it. Using their vast
   analyze, process, and model data           experience, they can design and
   and later interpret the results to         create new standards for analyzing
                                                                                  7
   data. They can also create ways to        Is this an outlier?
   use statistical data and develop tools
   to further analyze the data.              In some cases, the objective is to find
                                                                                   8
The algorithms that are used for these       What should be done now?
types of questions are called anomaly
detection algorithms.                        This question usually solves the
                                             problems of autonomous robots or self-
What will probably be the value of           driving cars that need to make decisions
this variable?                               based on changes in external factors.
                                             Machine learning helps to solve such
Machine learning can also help us            problems with the help of reinforcement
predict numerical values of continuous       learning.
variables. There are scenarios in which
we must predict numerical values of a        These models are trained by a process of
variable based on historic data.             reward every time a correct action is
                                             taken and punishment every time a
Some examples are:                           wrong action is taken.
Q: How much rainfall will we receive this
year?
A: 100 mm
A: 320
The kind of algorithms that can predict
these values are called regression
algorithms.
                                                                                   9
Recap
•     Data science is about how to extract meaningful interpretation from the data.
•     There are many careers in Data Science like Data Scientist, Data Engineer and
      Data analyst.
•     Data Architect and Senior Data Scientist are two roles for experienced
      professionals.
•     Classification helps us to predict if a new item belongs to class A or class B.
•     Regression helps us to predict the value of a continuous variable.
•     Clustering helps us to find patterns in the data.
•     Reinforcement learning helps models to take decisions based on external
      factors.
                                      Exercises
                           Objective Type Questions
    Please choose the correct option in the questions below.
    1. A school named ABC has recorded the total marks of every student in the class.
       This an example of:
           a. Qualitative data
           b. Quantitative data
           c. Both qualitative and quantitative data
           d. None of the above
    2. A food delivery app has asked for your feedback on the quality of the food. You
       have written two paragraphs to describe the food. This is an example of:
           a. Qualitative Data
           b. Quantitative Data
           c. Both qualitative and quantitative data
           d. None of the above
    3. It would help if you predicted what the temperature would be for next Friday.
       Which algorithm will you use?
           a) Clustering
           b) Regression
           c) Anomaly detection
           d) Binary classification
    4. You need to predict if your car tire will last for the next 1000 km. Which algorithm
       will you use?
            a) Clustering
            b) Regression
            c) Anomaly detection
            d) Binary classification
   5. You want to build a way to segregate spam emails from good emails. Which
      algorithm will you use?
         a) Clustering
         b) Regression
         c) Anomaly detection
         d) Binary classification
                               Standard Questions
    Please answer the questions below in no less than 100 words.
    1. What are the common career paths for data science?
    2. What does a Data Architect do?
    3. What are the differences between classification and regression?
                                  Applied Project
Emails are a part of daily communication. Sometimes we receive unwanted emails called
spam. There are few techniques that email providers use to identify spam mails :
                                                                                    11
CHAPTER
Data Visualization
                                                                                 12
   •   Charts                                The most preferred food item is pizza
   •   Graphs                                and the least preferred food item is
   •   Tables                                pasta.
   •   Maps
   •   Histograms                            Example 2: Using a line chart that
                                             displays the data of the number of
3. Examples of data                          students present in the class for one
                                             week.
   visualization
Example 1: Using a pie chart that            Here is the data:
displays the data of the food preferred by
the students.
                                              Date            Number of students
                                                              present
We have the food item preference of 50
                                              06-Apr          49
students. Let us now visualize the data
using a pie chart and find the most           07-Apr          42
preferred and the least preferred food        08-Apr          37
item.                                         09-Apr          48
                                              10-Apr          43
                                              11-Apr          36
                                              12-Apr          50
                                                     N U M BE R O F S T U D EN T S
Let us now visualize the data using a pie                    PRESENT
chart:                                        60
                                              50
       FOOD PREFERENCE
                                              40
         Dosa
         30%                                  30
                                 Pizza        20
                                  50%
                                              10
             Pasta                             0
              20%
                                                                                     13
We can also visualize the same data          Let us understand what steps we need
using a bar graph:                           to take to make sure that we collect the
                                             right set of data for analysis.
To make sure that we get the required        •   Format of data - The format of the
outcome from the data, we must collect           Data that is collected for analysis
the right and relevant data.                     should be right. Data should be
It is essential to have correct and good         accessible and readable for analysis.
quality data to make an analysis or to           If the collected data is not in the right
                                                 format, we should convert it to the
construct algorithms that can have an
                                                 required format for analysis.
impact. Without relevant data, your
analyses will not only be irrelevant, but
they can also be misleading.
                                             5. Asking the right
You cannot expect to find perfectly             question
preprocessed raw data that be used
                                             Once we have the required data ready
directly for your needs. Hence, you need
                                             with us, the next step is to ask the right
to understand how the data was               question to the data. It is important to
gathered and what sources it was
                                             understand that if we don't ask the right
collected from.
                                             questions, we will never get the right
Therefore, it is essential to understand     answers. To make sure we perform the
how to collect relevant data for analysis.
                                                                                      14
                                                  a. Regression Analysis is a process
                                                     for finding out the relationships
                                                     and correlations among the
                                                     different variables in the data.
Below are specific questions that you             b. Cohort Analysis – it enables you
need to ask to your data set to get the              to easily compare how different
right answer:                                        groups, or cohorts, of customers,
                                                     behave over time.
•   What do you wish to find?
                                                     For example, you can create a
    It is essential to consider what your            cohort of customers based on the
    goal is and what decision-making it              date when they made their first
    will facilitate. What outcome from the           purchase. Subsequently, you can
    analysis would you consider a                    study the spending trends of
    success?                                         cohorts from different periods in
                                                     time to determine whether the
    These initial analysis questions are             quality of the average acquired
    important to guide you through the               customer     is   increasing   or
    process and help focus on valuable               decreasing over time.
    insights.    You    can    start   by
    brainstorming and preparing a draft
                                                  c. Predictive Analysis – Predictive
    guideline for specific questions you
                                                     analytics involves the analysis of
    want to find from the data. This will
                                                     historical datasets to predict
    help you to dive deeper into the more            future possibilities. It can also be
    specific insights you want to achieve.           used for generating alternative
                                                     scenarios and risk assessments.
•   Which statistical    techniques    are
    applicable?                               •   Who will be using the final results?
    There are several statistical analysis        An important aspect of your data
    techniques that you can use for               analytics refers to the end-users of
    analyzing data. However, in real-life         our analysis. Who are they and how
    scenarios,      three       statistical       will they be using the reports you
    techniques are mostly used for                create? You must get to know your
    analysis:                                     final users, including:
                                                                                      15
    a. What do they expect to learn from     be able to understand the insights
       the data?                             from them.
    b. What do they need?
    c. How advanced are their technical      It is essential to convince executive
       skills?
                                             and decision-makers that the data
    d. How much time do they have?
                                             that you have gathered and analyzed
                                             are:
    If you know these answers, you can
    decide on how detailed your data
    visualizations should be and what        a. Correct
    areas of the data your report should     b. Important
    be focused on.                           c. Urgent to act upon
                                                                                16
Recap
                                     Exercises
                          Objective Type Questions
    Please choose the correct option in the questions below.
                                                                                       17
   4. Which format of data is easiest for analysis?
        a. Tabular data
        b. Text data in a PDF
        c. Data in an image
        d. Speech data
   5. Which visualization is best for representing a relation between two variables?
        a. Scatter plot
        b. Histogram
        c. Pie chart
        d. Gantt Chart
                              Standard Questions
   Please answer the questions below in no less than 100 words.
   1. What are the steps to make sure that the correct data is collected for analysis?
   2. Write a short note on the statistical techniques which can be used for data
      analysis.
   3. Is it important to assess the end-users for a visualization? Explain in your own
      words.
   2. If you find that the data collected has outliers, what steps can you take to ensure
      that your analysis is still accurate?
                                 Applied Project
Each student should write down the marks he/she had received in the examination for
the subjects studied in the previous grade. Use these marks to plot on paper
   a. bar graph to display marks of each individual subject.
   b. line graph to display marks of each individual subject.
   c. pie chart to show percentage contribution of marks of each subject to the total
      marks obtained.
                                                                                       18
CHAPTER
                                                                                  19
various brands in your window. Ever        querying data, mining data, search data,
wondered, how this new application or      and analyzing data to get insights.
website knows that you are looking to
buy a handbag? Well, the answer to this    For example, if we have a database with
is data science. Algorithms in data
                                           customer data, an end-user could query
science help in tracking your searches
                                           the database to find out how many
and learn your preferences from them.
                                           customers have started using the
Speech      Recognition     -    Speech    company's services in the last quarter
recognition is now part of our everyday    and how many have stopped using the
lives. Speech recognition has now          service. They can do so by just entering
become a part of phones, game consoles,    a query in plain English instead of a
and even smartwatches. Have you heard
                                           query language like SQL.
of Microsoft's Cortana? It uses speech
recognition behind the scenes to take
                                           Chatbots are also an important area that
inputs from the user.
                                           uses text analytics for both querying and
Speech recognition can also be found on    searching data. Chatbots can use to
many devices that can be used to           query a database and give a reply based
automate our homes.                        on the question. They can also use
                                           search based on text analytics to help in
Speech recognition has been around for     retrieving a document based on what
more than a decade. However, it is         end users are looking for.
gaining popularity now as machine
learning is helping organizations make     4. Analytics             on      image
speech recognition much more accurate.
                                           data
3. Analytics on text data                  Image recognition can be described as a
Text analytics can be defined as the       process by which we can process images
process of collecting unstructured text    for identifying people, patterns, logos,
from various sources and analyzing and     objects, or places.
extracting relevant information from it.   Many machine learning tools can assist
It can also be used for transforming it    users with facial recognition of objects in
into structured information that can       a picture. These tools can also scan the
then be used in various other ways.        objects in the picture and attempt to
There are several ways to analyze          identify and name them based on a large
unstructured text. Most of these           database of images.
techniques can be divided under these      Mobile phones, for example, make use of
technical areas - Natural Language         computer    vision    technologies    in
Processing (NLP), data mining, and         combination with a camera to achieve
information retrieval.                     image recognition. This advanced
                                           technology has a variety of applications
Typically, we used text analytics
technologies for four basic tasks –
                                                                                   20
like accessibility for the visually
impaired and interactive advertising.
                                                                                    21
c. Planning      and    Navigation:
   Making computers capable of                Recap
   traveling from Point X to Point Y.
   For example, a self-driving robot.         •   There are two important applications
                                                  of data science – digital ads and
d. Natural Language Processing:                   speech recognition.
   Make computers capable of                  •   Text analytics can be defined as the
   understanding and processing a                 process of collecting unstructured
   language. For example, a web                   text from various sources and
   translator that translates one                 analyzing and extracting relevant
   language to another.                           information from it.
                                              •   Chatbots are also an important area
e. Perception: Make computers                     that uses text analytics for both
   capable of interacting with real-              querying and searching data.
   world objects by the sense touch,          •   Image recognition can be said to be
   sound, smell and eyesight.                     a process by which we can process
                                                  images for identifying people,
f. Emergent Intelligence: Make                    patterns, logos, objects or places.
   computers capable of Intelligence          •   Artificial Intelligence is defined as
   that is not explicitly programmed              the science and engineering of
   but     is    derived    from   AI             making intelligent machines.
   capabilities. The basic vision for         •   AI has many sub goals like – natural
   this goal is to enable machines to             language processing, perception etc.
   exhibit emotional intelligence,
   moral reasoning, and more.
                                   Exercises
                        Objective Type Questions
Please choose the correct option in the questions below.
                                                                                   22
   3. Which of the following is a use case of data science?
        a. Facial recognition
        b. Text analytics
        c. Sentiment analysis
        d. All of the above
   4. What does natural language processing help us with?
        a. Text analytics
        b. Video analytics
        c. Image analytics
   5. What technologies are used by chatbots?
        a. Text analytics
        b. Speech recognition
        c. Both above
                               Standard Questions
    Please answer the questions below in no less than 100 words.
                                  Applied Project
Understanding the mood of the speaker can be very useful. Certain keywords can be
associated with different sentiments.
Example 1: “The news continues to be gloomy.” If you read this sentence you will
understand      that     the     sentiment  of    the     speaker     is    sad.
Example 2: “I was infuriated by his arrogance.” This sentence tells you that the
sentiment of the speaker is angry.
Discuss with your classmates how text analytics can help us identify the sentiment of
the speaker i.e. if the speaker is happy, angry, or sad. It is possible that a sentence may
have more than one keywords which highlight the sentiment of the speaker. Provide 2
examples of such scenarios for each of the sentiments discussed above.
                                                                                        23
                                References
Vivek Kumar. 2020. WHY DOES DATA SCIENCE MATTER IN ADVANCED IMAGE
RECOGNITION?        [Online].     [4     March      2021].     Available     from:
https://www.analyticsinsight.net/data-science-matter-advanced-image-recognition/
24