Probability and Statistics
Introductory Class
Instructor: Aysha Azmat
Fall 2024
Probability and Statistics
MATH-361 : Probability and Statistics
Credit Hrs : 3 hrs
Text Book :
• INTRODUCTORY STATISTICS, 7th Edition by PREM S. MANN
Reference Book :
• Probability & Statistics for Engineers & Scientists, 8 Edition by Walpole
th
Course Outline
3/116
Probability and Statistics
Course Assessment
Quiz - 10%
Assignment - 5%
Presentation - 5%
Mids - 30%
Finals - 50%
4/116
Probability and Statistics
Introduction to Statistics
What is Statistics?
Statistics refers to the branch of mathematics that involves the collection, analysis,
interpretation, presentation, and organization of data.
Its primary purpose is to provide tools and techniques for understanding and
making inferences about various aspects of the world based on data.
Statistics is widely used in various fields, including science, social sciences,
economics, business, medicine, engineering, and more.
It provides a systematic approach to dealing with uncertainty and variability in
data, enabling researchers and analysts to make informed decisions, identify
patterns, and gain insights from the information available.
Why study Statistics?
Statistics is the study of data collection, analysis, interpretation, and presentation. It plays
a crucial role in various fields and industries due to its ability to provide insights and make
informed decisions based on data.
Here are some reasons why we study statistics:
1. Data Analysis and Interpretation
2. Informed Decision-Making
3. Research and Scientific Studies
4. Predictive Modeling
5. Quality Control and Process Improvement
Why study Statistics?
6. Epidemiology and Public Health
7. Economics and Finance
8. Social Sciences
9. Education Assessment
10. Political Analysis
11. Environmental Studies
12. Risk Assessment and Insurance
13. Market Research
Overall, statistics provides a systematic approach to dealing with data, enabling
us to make informed decisions, identify trends, and gain insights into various
aspects of the world around us. It's a powerful tool that contributes to
advancements in numerous fields and enhances our understanding of complex
phenomena.
Types of Statistics
There are two types of Statistics, namely,
Descriptive Statistics
Inferential Statistics
Descriptive Statistics
A group of methods used for organizing, displaying and describing the main
features of a data by tables, graphs and summary measures
Descriptive statistics include measures such as the mean (average), median
(middle value), mode (most frequent value), standard deviation (a measure of
data spread), and various graphical representations like histograms, bar charts,
and scatter plots.
Graphical Techniques : Bar Graph, Pie Chart, Histogram
Numerical Techniques : Measure of central location, dispersion
Inferential Statistics
These are techniques used to draw conclusions or make predictions about a larger
population based on a sample of data. Inferential statistics involve hypothesis testing,
confidence intervals, and regression analysis, among other methods.
These techniques help researchers make educated guesses about the characteristics
of a population based on the information gathered from a smaller subset.
Population and Sample
A population consists of all elements, individuals, items or objects, whose
characteristics are being studied.
Population being studied is called Target Population.
A portion of Population selected for study is called Sample.
Survey and Census
A technique of collecting information from a sample is known as Sample Survey.
A survey conducted for every member of population is called Census.
Example:-
If we collect information on all the families of United States, it is referred as Census
and if information on only 100 families of United States is collected, it will be called
Survey.
Types of Samples
Representative Sample : A sample that represents the characteristics of population as
closely as possible.
Random Sample : A sample drawn in such a way that each element of population has
an equal chance of being selected.
Examples : Lottery or Lucky Draw
Basic Terminologies
Element : Member of a population about which information is collected.
Examples : a person, firm, state, country etc.
Variable : A variable is a characteristics that assumes different values for different
elements.
Examples : No. of houses built per month, profit of companies
Constant : An entity which doesn’t vary its value.
CONT…
Data Set : A collection of observations on one or more variables.
Examples : List of prices of 25 recently sold homes, test scores 15 students.
Observation : The value of variable for an element also called measurement
Example : No. of houses built in November = 250
Types of Variables
Quantitative Variable : A variable that can be measured numerically.
Examples : Incomes, Heights, Gross Sales etc.
Qualitative Variable : A variable that cannot assume a numerical value but can be
classified into two or more non-numeric categories.
Examples : Gender of a person, Brand of cell phone.
Types of Quantitative Variable
Discrete Variables : A variable that can assume only certain values and no
intermediate values. These are countable.
Examples : No. of cars sold, No. of students in a class
Continuous Variable : A variable that can assume any numerical value over a certain
interval.
Examples : Time taken to complete paper, Quantity of milk.
So, to sum this up:
A Classification Based on Time
Data can also be classified as Cross-Section Data or Time-Series Data
Cross-Sectional Data: Cross-sectional data is collected at a specific point in time from
multiple individuals, entities, or observational units. These individuals or entities could
be people, companies, countries, or any other unit of analysis. The data provides a
snapshot of characteristics, behaviors, or attributes of these units at a single moment.
Cross-sectional data can be used to compare different units and identify relationships
between variables within that specific time frame.
For example, if you collect data on the income and education levels of people from
different cities in a single year, that would be cross-sectional data. You can compare the
average income and education levels across cities to analyze any potential
relationships.
A Classification Based on Time
Time Series Data: Time series data is collected over multiple time periods for a
single entity or observational unit. This type of data captures how a specific
variable changes over time. Time series data can reveal patterns, trends, and
fluctuations in the variable being measured. It is often used for forecasting future
values based on historical patterns.
For instance, if you collect data on the monthly sales of a product over the course
of several years, that would be time series data. You can analyze how sales have
changed over time, whether there are any seasonal patterns, and use this
information to predict future sales.
A Classification Based on Time
In summary, the key difference between cross-sectional data and time series data
lies in the focus of analysis: cross-sectional data focuses on comparing different
units at a specific point in time, while time series data focuses on tracking the
changes in a single unit's variable(s) over multiple time periods. Both types of data
have their own uses and applications, and the choice between them depends on
the research or analysis goals.
Sources of Data
Primary Data: Primary data is original data collected directly from the source for a
specific research purpose. Researchers gather primary data to address their research
questions and objectives. Collecting primary data can be time-consuming and may
involve various methods such as surveys, interviews, observations, experiments, and
questionnaires. Since primary data is collected for a specific study, it is tailored to the
research needs and is highly relevant to the research question.
Examples of primary data collection methods:
Conducting surveys to gather opinions or preferences.
Interviewing individuals to obtain in-depth insights.
Running experiments to test hypotheses.
Observing behaviour in a controlled environment.
Sources of Data
Advantages of Primary Data:
• Relevance: Primary data can be customized to address specific
research objectives.
• Control: Researchers have control over data collection methods and
quality.
• Freshness: Data is up-to-date and collected for the specific research
study.
Limitations of Primary Data:
• Time and Resources: Collecting primary data can be time-consuming
and costly.
• Bias: Researchers' biases or errors can affect data collection and
interpretation.
• Sample Size: Depending on the study, obtaining a representative
sample may be challenging.
Sources of Data
Secondary Data: Secondary data refers to data that has been collected and compiled
by someone else for a purpose other than the current research study. Researchers
use secondary data when the data collected by others can answer their research
questions or support their analysis. Secondary data can come from various sources,
including government agencies, research studies, organizations, and databases. It can
save time and resources compared to primary data collection, but researchers need
to ensure its quality, relevance, and reliability for their specific research goals.
Examples of secondary data sources:
Government reports and publications.
Research studies and academic papers.
Databases and repositories.
Historical records and archives.
Sources of Data
Advantages of Secondary Data:
• Time and Cost Efficiency: Using existing data saves time and resources.
• Historical Analysis: Secondary data can support historical and long-term analyses.
• Large-Scale Studies: Secondary data sources often contain large datasets.
Limitations of Secondary Data:
• Limited Customization: Secondary data may not fully match research needs.
• Quality Concerns: Data quality, accuracy, and reliability might vary.
• Context: The original purpose of data collection might differ from the research
context.
Researchers often decide between primary and secondary data based on their research
objectives, available resources, time constraints, and the relevance and quality of the
data sources. In some cases, a combination of both types can provide a comprehensive
understanding of the research topic.
Thank you!