ARTIFICIAL INTELLIGENCE
CLASS X
HANDOUT-DATA SCIENCE
Data Sciences, it is a concept to unify statistics, data analysis, machine learning and their related methods in order
to understand and analyse actual phenomena with data. It employs techniques and theories drawn from many
fields within the context of Mathematics, Statistics, Computer Science, and Information Science.
Applications of Data Sciences
Data Science is not a new field. Data Sciences majorly work around analysing the data and when it comes to AI,
the analysis helps in making the machine intelligent enough to perform tasks by itself.
There exist various applications of Data Science in today’s world. Some of them are:
Fraud and Risk Detection: The earliest applications of data science were in Finance.
Companies were fed up of bad debts and losses every year.
Genetics & Genomics: Data Science applications also enable an advanced level of treatment personalization
through research in genetics and genomics
Internet Search: When we talk about search engines, we think ‘Google’. Right? But there are many other search
engines like Yahoo, Bing, Ask, AOL, and so on. All these search engines (including Google) make use of data
science algorithms to deliver the best result for our searched query in the fraction of a second.
Targeted Advertising: If you thought Search would have been the biggest of all data science applications, here is
a challenger – the entire digital marketing spectrum.
Website Recommendations: Aren’t we all used to the suggestions about similar products on Amazon? They not
only help us find relevant products from billions of products available with them but also add a lot to the user
experience.
Airline Route Planning: The Airline Industry across the world is known to bear heavy losses. Except for a few
airline service providers, companies are struggling to maintain their occupancy ratio and operating profits
Data Collection
Data collection is an exercise which does not require even a tiny bit of technological knowledge. But when it
comes to analysing the data, it becomes a tedious process for humans as it is all about numbers and alpha-
numerical data. That is where Data Science comes into the picture. It not only gives us a clearer idea around the
dataset, but also adds value to it by providing deeper and clearer analyses around it. And as AI gets incorporated
in the process, predictions and suggestions by the machine become possible on the same.
For the data domain-based projects, majorly the type of data used is in numerical or alpha-numerical format and
such datasets are curated in the form of tables. Such databases are very commonly found in any institution for
record maintenance and other purposes. Some examples of datasets which you must already be aware of are:
BANKS- Databases of loans issued, account holder, locker owners, employee registrations, bank visitors, etc.
ATM MACHINES-Usage details per day, cash denominations transaction details, visitor details, etc.
MOVIE THEATRES-Movie details, tickets sold offline, tickets sold online, refreshment purchases, etc.
Sources of Data
There exist various sources of data from where we can collect any type of data required and the data collection process can
be categorised in two ways: Offline and Online.
While accessing data from any of the data sources, following points should be kept in mind:
1. Data which is available for public usage only should be taken up.
2. Personal datasets should only be used with the consent of the owner.
3. One should never breach someone’s privacy to collect data.
4. Data should only be taken form reliable sources as the data collected from random sources
can be wrong or unusable.
5. Reliable sources of data ensure the authenticity of data which helps in proper training of the
AI model.
Types of Data
For Data Science, usually the data is collected in the form of tables. These tabular datasets can be
stored in different formats. Some of the commonly used formats are:
1. CSV: CSV stands for comma separated values. It is a simple file format used to store tabular data. Each line of
this file is a data record and reach record consists of one or more fields which are separated by commas. Since the
values of records are separated by a comma, hence they are known as CSV files.
2. Spreadsheet: A Spreadsheet is a piece of paper or a computer program which is used for accounting and
recording data using rows and columns into which information can be entered. Microsoft excel is a program
which helps in creating spreadsheets.
3. SQL: SQL is a programming language also known as Structured Query Language. It is a domainspecific
language used in programming and is designed for managing data held in different kinds of DBMS (Database
Management System) It is particularly useful in handling structured data.
A lot of other formats of databases also exist