0% found this document useful (0 votes)
7 views9 pages

1 Introduction

The document provides an overview of statistics, emphasizing its importance in data analysis, decision-making, and its role in artificial intelligence. It distinguishes between descriptive and inferential statistics, detailing methods for data summarization, estimation, hypothesis testing, and types of data. Additionally, it covers scales of measurement, types of data, and the differences between population and sample in statistical studies.

Uploaded by

osama7abx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views9 pages

1 Introduction

The document provides an overview of statistics, emphasizing its importance in data analysis, decision-making, and its role in artificial intelligence. It distinguishes between descriptive and inferential statistics, detailing methods for data summarization, estimation, hypothesis testing, and types of data. Additionally, it covers scales of measurement, types of data, and the differences between population and sample in statistical studies.

Uploaded by

osama7abx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Introduction

Statistics
is the science of making sense of data and of how to gather data.

Reason to study Statistics


To be an informed “information consumer”.
Decision making, including personal decisions

Statistics is the branch of mathematics that involves collecting, analyzing, presenting, and organizing data.
It provides methodologies for making sense of numerical data and is crucial in various fields for decision-making
and predictions.

Importance of Statistics in AI
Statistics plays a vital role in AI for several reasons:

1. Data Analysis and Interpretation: AI models require vast amounts of data for training. Statistics helps in
analyzing this data to understand patterns, trends, and relationships.

2. Probability Theory: Many AI algorithms, such as Bayesian networks and Markov models, are grounded in
probability theory, which is a key component of statistics.

3. Model Evaluation: Statistical methods are used to evaluate the performance of AI models. Techniques such as
hypothesis testing, confidence intervals, and p-values help determine the significance of results.

4. Data Preprocessing: Statistics aids in data preprocessing tasks like handling missing values, outlier detection,
and normalization, which are crucial for building robust AI models.

5. Feature Selection: Statistical techniques help identify the most relevant (‫ )اﻷﻛﺜﺮ ﺻﻠﺔ‬features in a dataset, which
improves model performance and reduces complexity.

6. Predictive Modeling: Many AI applications involve predictive modeling, where statistical methods are used to
create models that can make accurate predictions based on historical data.

7. Uncertainty Quantification (‫)ﻋﺪم اﻟﯿﻘﯿﻦ اﻟﻜﻤﻲ‬: In AI, it is often necessary to quantify the uncertainty in ()predictions
(‫)ﻏﺎﻟﺒًﺎ ﻣﺎ ﯾﻜﻮن ﻣﻦ اﻟﻀﺮوري ﻗﯿﺎس ﻋﺪم اﻟﯿﻘﯿﻦ ﻓﻲ اﻟﺘﻨﺒﺆات‬. Statistical methods provide tools to measure and manage this
uncertainty.

Types of Statistics: Descriptive and Inferential


Descriptive Statistics (‫)اﻹﺣﺼﺎء اﻟﻮﺻﻔﻲ‬

Descriptive statistics involve methods for summarizing , analyzing and organizing data so that it can be
easily understood.

Purpose :

Descriptive statistics provide simple summaries about the sample and the measures, making large amounts of
data understandable using specialized methods.

These methods include:

Measures of Central Tendency :

Mean: The average of a dataset.


Median: The middle value when the data is ordered.
Mode: The most frequently occurring value in the dataset.

Measures of Dispersion (‫ )ﻣﻘﯿﺎس اﻟﻌﻤﻖ‬:**

Range: The difference between the highest and lowest values.


Variance: The average of the squared differences from the mean.
Standard Deviation: The square root of the variance, indicating the spread of the data around the mean.

Measures of Position :

Percentiles: Values below which a certain percentage of the data falls.


Quartiles: Values that divide the data into four equal parts.

Visualization Tools :

Histograms: Graphs showing the frequency distribution of a dataset.


Box Plots: Visual representations of the minimum, first quartile, median, third quartile, and maximum of a
dataset.
Scatter Plots: Graphs showing the relationship between two variables.

Inferential Statistics (‫)اﻹﺣﺼﺎﺋﯿﺎت اﻻﺳﺘﺪﻻﻟﯿﺔ‬

Inferential statistics involve methods for making predictions or inferences (‫ )اﺳﺘﺪﻻﻻت‬about a population based
on a sample of data drawn from that population.

Purpose :
Inferential statistics allow us to make predictions, decisions, or inferences about a population based on
sample data, helping in generalizing findings (‫ )ﺗﻌﻤﯿﻢ اﻟﻨﺘﺎﺋﺞ‬and understanding relationships between variables
using specialized methods.

These methods include:

Estimation (‫ )ﺗﻘﺪﯾﺮ‬:

Point Estimation (‫)ﺗﻘﺪﯾﺮ اﻟﻨﻘﻄﺔ‬: Providing a single value as an estimate of an unknown population parameter (e.g.,
sample mean as an estimate of population mean).
Interval Estimation: Providing a range of values within which the parameter is expected to lie (e.g., confidence
intervals).

Hypothesis Testing (‫ )اﺧﺘﺒﺎر اﻟﻔﺮﺿﯿﺎت‬:

Null Hypothesis (H0): A statement that there is no effect or no difference, which is tested for possible rejection.
Alternative Hypothesis (H1): A statement that indicates the presence of an effect or difference.
p-value: The probability of obtaining the observed results, or more extreme, assuming the null hypothesis is true.
Significance Level (α): The threshold for rejecting the null hypothesis, typically set at 0.05 or 5%.

Regression Analysis :

Simple Linear Regression: Analyzing the relationship between two variables by fitting a linear equation.
Multiple Regression: Analyzing the relationship between one dependent variable and multiple independent
variables.

Correlation :

Pearson Correlation Coefficient: Measures the linear relationship between two continuous variables.
Spearman Rank Correlation: Measures the strength and direction of association (‫ )ﯾﻘﯿﺲ ﻗﻮة واﺗﺠﺎه اﻻرﺗﺒﺎط‬between
two ranked variables.

Data
Definition of Data
Data refers to the raw facts, figures, and information collected for reference, analysis, and processing , It can be
in various forms such as numbers, text, images, and audio.

Types of Data: Qualitative and Quantitative


1. Qualitative (categorical) Data

Qualitative data ( ‫ ) اﻟﻨﻮﻋﯿﺔ‬, also known as categorical data, describes characteristics that cannot be measured
numerically , It is typically any non-numeric data , Ex : text data .

Types of Qualitative Data :

Nominal Data :

Definition: Data that can be categorized but not ranked or ordered.


Examples: Gender (male, female), eye color (blue, green, brown), types of cuisine (Italian, Chinese, Mexican).

Ordinal Data :

Definition: Data that can be categorized and ranked, but the intervals between ranks are not uniform.
Examples: Education levels (high school, bachelor's, master's, doctorate), customer satisfaction ratings
(satisfied, neutral, dissatisfied).

2. Quantitative (numerical) Data

Quantitative data , also known as numerical data, represents quantities (‫ )ﯾﻤﺜﻞ اﻟﻜﻤﯿﺎت‬and can be measured
and expressed numerically , It allows for mathematical operations and statistical analysis.

Types of Quantitative Data :

Discrete Data (‫ )ﺑﯿﺎﻧﺎت ﻣﻨﻌﺰﻟﺔ‬:

Definition: Data that can take on only specific, distinct inseparable values (‫)ﻗﯿﻢ ﻣﻨﻌﺰﻟﺔ ﻏﯿﺮ ﻗﺎﺑﻠﺔ ﻟﻠﺘﺠﺰﯾﺊ‬, always
counted as whole numbers (1,2,3, ..) .
Examples: Number of students in a class, number of cars in a parking lot, number of books on a shelf .

Continuous Data :

Definition: Separable data that can take on any value within a range and can be measured with infinite
precision (‫)دﻗﺔ ﻻ ﻧﻬﺎﺋﯿﺔ‬.
Examples: distance, temperature, weight, time , velocity .

Continuous vs. Discrete Data


1. Discrete Data

Discrete data consists of unique, separate values.


These values are countable and can only take specific values (a whole numbers) like {1,2,3,4,..} .
Discrete data often represent items that can be counted, and there are no intermediate (‫ )وﺳﻄﯿﺔ‬values
between the numbers in the dataset there is no { 3.7 , 5.5 , 1.2 ,..} .

Examples :

Number of students in a class: You can have 20 or 21 students, but not 20.5 students.
Number of cars in a parking lot: The count is a whole number (5, 10, 15).

Discrete Data Graph :

![[discrete_graph_example.jpg]]

2. Continuous Data

Continuous data consists of data that can take any value within a given range.
Unlike discrete data, continuous data represents any measurable object , and can take an infinite number
of values within a given interval.
Continuous data can be divided into finer (‫ )أدق‬and finer levels, essentially having an infinite number of
possible values and infinite precision .

Examples :

Temperature: The temperature can vary continuously, such as 23.4°C, 23.45°C, etc.
Time: Time can be measured with great precision, such as 12.3 seconds, 12.35 seconds, and so on.

Continuous Data Graph

![[continuous_graph.jpg]]

Statistics vs. Parameters and Population vs. Sample


Population vs. Sample

Population :

Definition: A population is the entire set of individuals, items, or things in a particular study to collect the data
.
It includes every member of a defined group that we are studying or collecting information on .
Examples: All students in a university, all cats in a city, all products sailed by a company.

Sample :
Definition: A sample is a subset of the population selected for measurement, observation, or questioning to
provide statistical information about the population.
Examples: 200 students selected from a university, 500 cat from a city .

Most used ways to collect data :

Random .
Stratified .
Systematic
Cluster

Statistics vs. Parameters

Parameter :

Definition: A parameter is a numerical value that describes a characteristic of a population.


Examples: Population mean (μ), population standard deviation (σ), population proportion (P).

Statistic :

Definition: A statistic is a numerical value that describes a characteristic of a sample.


Examples: Sample mean (x̄), sample standard deviation (s), sample proportion (p̂ ).

Measure : Sample Statistic Population Parameter


Mean x̄ (Sample Mean) μ (Population Mean)
Standard Deviation s (Sample Std Dev) σ (Population Std Dev)
Variance s² (Sample Variance) σ² (Population Variance)
Proportion p̂ (Sample Proportion) P (Population Proportion)
Size n (Sample Size) N (Population Size)

Scales of Measurement - Nominal, Ordinal, Interval, & Ratio


Scale Data
1. Nominal Scale

Definition :

The nominal scale : is the most basic level of measurement.


It categorizes data without any order or quantitative value.

Characteristics :
Categories Only: Data is grouped into distinct categories or groups.
No Order: Categories do not have a meaningful order or ranking.
Qualitative Data: It represents qualitative attributes.

Examples :

Gender (Male, Female)


Colors (Red, Blue, Green)
Types of animals (Dog, Cat, Bird)

Mathematical Operations :

Mode (most frequent category) is meaningful.


No meaningful arithmetic operations (e.g., addition or subtraction).

2. Ordinal Scale

Definition :

The ordinal scale categorizes data into ordered categories, but the differences between the categories are
unmeasurable .

Characteristics :

Order Matters: Data is ranked or ordered, but the distance between ranks is not defined.
Qualitative Data: Represents qualitative attributes with a meaningful order.

Examples :

Satisfaction Ratings (Very Unsatisfied, Unsatisfied, Neutral, Satisfied, Very Satisfied)


Education Level (High School, Bachelor’s, Master’s, PhD)

Mathematical Operations :

Median and mode are meaningful.


No meaningful arithmetic operations (e.g., addition or subtraction) due to non-uniform intervals.

3. Interval Scale

Definition :
The interval scale measures data with both order and exact differences between values, but it does not have
a true zero point ( BC there is negative and positive data).

Characteristics :

Ordered Data: Data has a meaningful order.


Values Differences: The differences between values are consistent and meaningful.
No True Zero: The zero point does not represent the start of the attributes .

Examples :

Temperature in Celsius or Fahrenheit (e.g., 10°C, 20°C, 30°C)


IQ Scores

Mathematical Operations :

Mean, median, and mode are meaningful.


Addition and subtraction are meaningful, but multiplication and division are not (e.g., 20°C is not "twice as hot" as
10°C).

4. Ratio Scale

Definition :

The ratio : scale is the highest level of measurement.


It has all the properties of the interval scale, with an additional true zero point (starts from zero) that allows for
the representation of the absence of the attribute ( ‫ )ﯾﺴﻤﺢ ﺑﺘﻤﺜﻞ ﻋﺪم وﺟﻮد ﻣﺘﻐﯿﺮ ﻣﻌﯿﻦ ﺑﻜﻮﻧﻪ = ﺻﻔﺮ‬.

Characteristics :

Order & Equal Intervals: Data has a meaningful order with consistent intervals.
True Zero: Zero represents the absence of the attribute, making comparisons of absolute magnitudes possible
(‫ )ﻣﻤﺎ ﯾﺠﻌﻞ اﻟﻤﻘﺎرﻧﺎت ﺑﯿﻦ اﻟﻤﻘﺎدﯾﺮ اﻟﻤﻄﻠﻘﺔ ﻣﻤﻜﻨﺔ‬.

Examples :

Height (e.g., 160 cm, 180 cm)


Weight (e.g., 50 kg, 70 kg)

Mathematical Operations :

Mean, median, and mode are meaningful.


Addition, subtraction, multiplication, and division are all meaningful.

You might also like