0% found this document useful (0 votes)
36 views30 pages

Lectures 1 - 2

The document provides an introduction to statistics, emphasizing its importance in data-driven decision-making across various fields such as business, finance, and marketing. It covers key concepts including descriptive and inferential statistics, types of data, scales of measurement, and the distinction between population and sample. Additionally, it discusses the role of big data and its applications in modern analytics.

Uploaded by

Pushkar Raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views30 pages

Lectures 1 - 2

The document provides an introduction to statistics, emphasizing its importance in data-driven decision-making across various fields such as business, finance, and marketing. It covers key concepts including descriptive and inferential statistics, types of data, scales of measurement, and the distinction between population and sample. Additionally, it discusses the role of big data and its applications in modern analytics.

Uploaded by

Pushkar Raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Introduction to Statistics

Lectures 1 & 2

Lectures 1 & 2 Introduction to Statistics


Statistics In Broader Perspective Of Data Science
Data in the 21st Century is what oil was in the Industrial Age
i.e., those who manage this data efficiently would be the ones
who succeed.
The modern businesses are increasingly shaped by the power of
data-driven decision-making.
Business analytics is the scientific process of transforming data
into insight for making better decisions.

Lectures 1 & 2 Introduction to Statistics


Applications in Business and Economics

Accounting
Public accounting firms use statistical sampling procedures when
conducting audits for their clients.
Usually the large number of individual accounts receivable makes
reviewing and validating every account too time-consuming and
expensive.
As common practice in such situations, the audit staff selects
a subset of the accounts called a sample.
Finance
Financial analysts use a variety of statistical information to
guide their investment recommendations.
In the case of stocks, analysts review financial data such as
price/earnings ratios and dividend yields.

Lectures 1 & 2 Introduction to Statistics


Applications in Business and Economics

Marketing
Electronic scanners at retail checkout counters collect data for
a variety of marketing research applications.
Data suppliers such as The purchase point-of-sale scanner
data from grocery stores, process the data, and then sell
statistical summaries of the data to manufacturers.
Production
Emphasis on quality makes quality control an important
application of statistics in production. A variety of statistical
quality control charts are used to monitor the output of a
production process.

Lectures 1 & 2 Introduction to Statistics


Big Data
Big data describes large and diverse datasets that are huge in
volume and also rapidly grow in size over time.
It is through technology that we have truly been thrust into the
Data Age.
Data can now be collected electronically, the available amounts
of it are staggering.

Lectures 1 & 2 Introduction to Statistics


Big Data (Contd.)
IBM describes big data through the four Vs: volume, velocity,
variety, and veracity.
Volume is the large amount of data in terabytes and petabytes,
making it be classified as big data.
Velocity is the speed with which new data is generated and
moves around. For example, the New York Stock Exchange
collects 1 terabyte of data in a single trading session.
Variety shows that big data includes variety of data types such
as text, images, videos, voice files and other unstructured data.
Veracity of big data denotes the trustworthiness whether the
data is accurate and of high-quality. For example, the data
could have many missing values, which makes reliable analysis
a challenge.

Figure: Relationship between population and sample

Lectures 1 & 2 Introduction to Statistics


Big Data (Contd.)

Netflix’s Content Recommendation recommends material to its


subscribers using big data analytics. Netflix predicts what users
would want to watch next based on their viewing histories, rat-
ings, and searches.
Amazon, the e-commerce behemoth, analyzes consumer behav-
ior, preferences, and purchase history using big data analytics.
General Electric uses big data analytics to predict when equip-
ment will break and schedules maintenance before problems
arise by collecting and analyzing data from sensors implanted
in machines.

Lectures 1 & 2 Introduction to Statistics


Statistics, Population, and Sample.
Statistics: is a branch of mathematics that deals with the col-
lection, organization, analysis, interpretation, and presentation
of data.
Population: The collection of all individuals or items under
consideration in a statistical study. Results obtained from Pop-
ulation is called Parameter.
Sample: That part of the population from which information
is obtained. Results obtained from sample is Statistic.

Lectures 1 & 2 Introduction to Statistics


Statistics, Population, and Sample (Contd.)

Examples
Advertisements for IT jobs in the Netherlands is the
population, and the top 50 search results for advertisements
for IT jobs in the Netherlands is the sample.
All the students in the class are population whereas the top 10
students in the class are the sample.
All the members of the parliament is population and the
female candidates present there is the sample.

Lectures 1 & 2 Introduction to Statistics


Descriptive & Inferential Statistics

Statistics can be categorized into two main categories: Descrip-


tive & Inferential Statistics.
Descriptive statistics consists of methods for organizing and
summarizing information.
Common measures and techniques in descriptive statistics in-
clude measures of central tendency (such as mean, median,
and mode), measures of dispersion (such as range, variance,
and standard deviation).
Inferential statistics involves making inferences, predictions,
or generalizations about a larger population based on data col-
lected from a sample of that population.
Inferential statistics allow researchers to draw conclusions, test
hypotheses, and make predictions about populations, even when
it is impractical or impossible to study the entire population di-
rectly.

Lectures 1 & 2 Introduction to Statistics


Examples of Descriptive & Inferential Statistics

Descriptive Statistics
A class of 30 students has an average height of 5 feet 8 inches.
45% of survey respondents prefer chocolate ice cream.
Inferential Statistics
Based on a sample of voters, it is estimated that 52% of the
population will vote for candidate A.
A drug trial, involving 100 participants, finds that a new
medication is effective in reducing blood pressure.

Lectures 1 & 2 Introduction to Statistics


Elements, Variables, and Observations (Contd.)

Elements are the entities on which data are collected.


A variable is a characteristic of interest for the elements. The
following data set includes the following five variables:
WTO Status: The nation’s membership status in the World
Trade Organization; this can be either as a member or an ob-
server.
Per Capita Gross Domestic Product (GDP) ($): The total mar-
ket value ($) of all goods and services produced by the nation
divided by the number of people in the nation.
Fitch Rating: The nation’s sovereign credit rating as appraised
by the Fitch Group1. the credit ratings range from a high of
AAA to a low of F.
Fitch Outlook: An indication of the direction the credit rating;
the outlook can be negative, stable, or positive.

Lectures 1 & 2 Introduction to Statistics


Elements, Variables, and Observations

Lectures 1 & 2 Introduction to Statistics


Qualitative & Quantitative variables
Data can be classified as either qualitative or quantitative
Quantitative Variables also referred to as “numeric” variables
are variables that represent a measurable quantity. Examples
include:
Number of students in a class
Number of square feet in a house
Population size of a city
Age of an individual
Height of an individual
Qualitative Variables or “categorical” variables are variables
that take on names or labels and can fit into categories.
Examples include:
Eye color (e.g. “blue”, “green”, “brown”)
Gender (e.g. “male”, “female”)
Level of education (e.g. “high school”, “Associate’s degree”,
“Bachelor’s degree”)
Marital status (e.g. “married”, “single”, “divorced”)

Lectures 1 & 2 Introduction to Statistics


Quantitative Variables

There are two types of quantitative variables: discrete and con-


tinuous.
A discrete variable is a variable which can take on only a
countable number of distinct values like 0, 1, 2, 3, 4, · · · . Some
examples of discrete random variables include:
The number of times a coin lands on tails after being flipped
20 times.
The number of defective widgets in a box of 50 widgets.
A continuous variable is a variable which can take on an infi-
nite number of possible values. Some examples of continuous
random variables include:
Weight of an animal
Height of a person
Time required to run a marathon

Lectures 1 & 2 Introduction to Statistics


Qualitative & Quantitative variables.

Lectures 1 & 2 Introduction to Statistics


Scales of Measurement

Data collection requires one of the following scales of measure-


ment: nominal, ordinal, interval, or ratio.
When the data for a variable consist of labels or names used to
identify an attribute of the element, the scale of measurement
is considered a nominal scale.
The scale of measurement for the WTO Status variable is nom-
inal because the data “member” and “observer” are labels used
to identify the status category for the nation.
The scale of measurement for a variable is considered an ordinal
scale if the order or rank of the data is meaningful.
Fitch Rating is ordinal because the rating labels, which range
from AAA to F, can be rank ordered from best credit rating
(AAA) to poorest credit rating (F).

Lectures 1 & 2 Introduction to Statistics


Scales of Measurement

The scale of measurement for a variable is an interval scale if


the data have all the properties of ordinal data and the interval
between values is expressed in terms of a fixed unit of measure.
College admission SAT scores are an example of interval-scaled
data. For example, three students with SAT math scores of
620, 550, and 470 can be ranked or ordered in terms of best
performance to poorest performance in math.
The scale of measurement for a variable is a ratio scale if the
data have all the properties of interval data and the ratio of
two values is meaningful.
If we compare the cost of $30,000 for one automobile to the cost
of $15,000 for a second automobile, the ratio property shows
that the first automobile is $30,000 / $15,000 = 2 times, or
twice, the cost of the second automobile.

Lectures 1 & 2 Introduction to Statistics


Example 1

Comparing Tablet Computers. Tablet PC Comparison provides a


wide variety of information about tablet computers. The company’s
website enables consumers to easily compare different tablets using
factors such as cost, type of operating system, display size, battery
life, and CPU manufacturer. A sample of 10 tablet computers is
shown.
1 How many elements are in this data set?
2 How many variables are in this data set?
3 Which variables are categorical and which variables are quanti-
tative?
4 What type of measurement scale is used for each of the vari-
ables?

Lectures 1 & 2 Introduction to Statistics


Example 1 (Contd.)

Lectures 1 & 2 Introduction to Statistics


Example 2

Comparing Phones: This example shows data for eight phones


(Consumer Reports). The Overall Score, a measure of the overall
quality for the phone, ranges from 0 to 100. Voice Quality has
possible ratings of poor, fair, good, very good, and excellent. Talk
Time is the manufacturer’s claim of how long the phone can be
used when it is fully charged.
1 How many elements are in this data set?
2 For the variables Price, Overall Score, Voice Quality, and Talk
Time, which variables are categorical and which variables are
quantitative?
3 What scale of measurement is used for each variable?

Lectures 1 & 2 Introduction to Statistics


Example 2 (Contd.)

Lectures 1 & 2 Introduction to Statistics


Parameter and Statistic
Parameter: A descriptive measure for a population.
Statistic: A descriptive measure for a sample.
For example,
Suppose, I am interested in The mean height of palm trees
in Karachi. There might be tens of thousands of palm trees
around the state, which means it would be virtually impossible
to measure the height of every single one. I select a random
sample of 100 palm trees and find the mean height of the trees
in just that sample .
The mean height of all trees (population) is Parameter.
The mean height of the trees in our sample is statistic.

Lectures 1 & 2
Figure: EnterIntroduction
Captionto Statistics
Types of Data
The method of collecting information is divided into two differ-
ent sections, namely primary data and secondary data.
Primary data is the data that is collected for the first time
through personal experiences or evidence, particularly for re-
search. It is also described as raw data or first-hand informa-
tion.
The data is mostly collected through observations, physical
testing, mailed questionnaires, surveys, personal interviews, tele-
phonic interviews, case studies, and focus groups, etc.
Secondary data is a second-hand data that is already collected
and recorded by some researchers for their purpose, and not for
the current research problem.
It is accessible in the form of data collected from different
sources such as government publications, censuses, internal
records of the organisation, books, journal articles, websites
and reports, etc.
Lectures 1 & 2 Introduction to Statistics
Data Sources
Data can be obtained from existing sources, by conducting an obser-
vational study, or by conducting an experiment.
Existing Sources:In some cases, data needed for a particular applica-
tion already exist. Companies maintain a variety of databases about
their employees, customers, and business operations. Data on em-
ployee salaries, ages, and years of experience can usually be obtained
from internal personnel records.

Lectures 1 & 2 Introduction to Statistics


Data Sources (Contd.)

Lectures 1 & 2 Introduction to Statistics


Data Sources (Contd.)

Observational Study: In an observational study we simply


observe what is happening n a particular situation, record data
on one or more variables of interest, and conduct a statistical
analysis of the resulting data.
For example, researchers might observe a randomly selected
group of customers that enter a Walmart supercenter to
collect data on variables such as the length of time the
customer spends shopping, the gender of the customer, the
amount spent, and so on.

Lectures 1 & 2 Introduction to Statistics


Data Sources (Contd.)

Experiment
The key difference between an observational study and an
experiment is that an experiment is conducted under
controlled conditions.
As a result, the data obtained from a well-designed
experiment can often provide more information as compared
to the data obtained from existing sources or by conducting
an observational study.
For example, suppose a pharmaceutical company would like to
learn about how a new drug it has developed affects blood
pressure.

Lectures 1 & 2 Introduction to Statistics


Descriptive Statistics for Qualitative Variable

Most of the statistical information can be summarized and pre-


sented in a form that is easy for the reader to understand.
Such summaries of data, which may be tabular, graphical, or
numerical, are referred to as descriptive statistics.
For example, consider the variable Fitch Outlook (Table 1.1),
which indicates the direction the nation’s credit rating which
can be negative, stable, or positive.
Tabular representation of Fitch outlook is,

Lectures 1 & 2 Introduction to Statistics


Descriptive Statistics for Qualitative Variable (Contd.)

A graphical summary of the same variable (Fitch outlook) is


called a bar chart.

Lectures 1 & 2 Introduction to Statistics

You might also like