0% found this document useful (0 votes)

32 views16 pages

C6 - DSC551 - R Programming

chapter 6 r programming

Uploaded by

fakhrizul Afif

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views16 pages

C6 - DSC551 - R Programming

chapter 6 r programming

Uploaded by

fakhrizul Afif

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

DSC551: Programming for Data

Science (R Programming)
6. Basic Inferential Statistics

Lecturer, Department of Statistics

2024-10-01

Asmui Rahim, DSC551:R , Oct 2024

Introduction
Aims:

1. Discussing about the assumptions of normality.

2. Introduce basic hypothesis testing (one sample t-test).

Asmui Rahim, DSC551:R , Oct 2024

Assumptions of Normality
Plot the distribution of the data by using;
1. Histogram
2. Boxplot
3. Density plot
4. QQ (Quantile-Quantile) plot
Normality test
1. Kolmogorov-Smirnov test.
2. Shapiro-Wilk test
Data preparation;

1 mydata1 <- read.csv("telco.csv",

2 stringsAsFactors = TRUE)

Asmui Rahim, DSC551:R , Oct 2024

Density plot
Visualize the distribution of a continuous variable.
See the “shape” of the data, highlighting where most of the values cluster and how spread
out they are.
Identifying patterns;
Skewness: Normal, skewed to the left or right?
Multimodality: Are there multiple peaks in the distribution?

1 plot(density(mydata1$Usage_GB),
2 main="Density estimate of data")

Asmui Rahim, DSC551:R , Oct 2024

QQ-plot
Asses if a dataset follows a normal distribution.
If the points fall approximately along a straight diagonal line, this suggests that the data
follows the normal distributions.

1 qqnorm(mydata1$Usage_GB)
2 qqline(mydata1$Usage_GB,
3 col="red",lwd=3)

Asmui Rahim, DSC551:R , Oct 2024

Normality test (Kolmogorov-Smirnov test)
1 ks.test(mydata1$Usage_GB, "pnorm",
2 mean=mean(mydata1$Usage_GB),
3 sd=sd(mydata1$Usage_GB))

Asymptotic one-sample Kolmogorov-Smirnov test

data: mydata1$Usage_GB
D = 0.076064, p-value = 0.957
alternative hypothesis: two-sided

pnorm specified the theoretical distribution (normal in this case)

Need to provide estimated parameters mean and the standard deviation.
Conclusion;
p-value>0.05, Usage_GB data to be normally distributed for KS test.

Asmui Rahim, DSC551:R , Oct 2024

Normality test (Shapiro-Wilk test)
1 shapiro.test(mydata1$Usage_GB)

Shapiro-Wilk normality test

data: mydata1$Usage_GB
W = 0.9878, p-value = 0.9119

Conclusion;
p-value>0.05, Usage_GB data to be normally distributed for SW test.

How to make the normality conclusion?

The normality test suggest that the internet usage data distribution does not differ from
normal distribution, which might have assumed from the density based on qq-plots, KS and
SW test.

Asmui Rahim, DSC551:R , Oct 2024

Hypothesis testing

Asmui Rahim, DSC551:R , Oct 2024

Asmui Rahim, DSC551:R , Oct 2024
Steps to do hypotesis testing

Asmui Rahim, DSC551:R , Oct 2024

Hypothesis test with one sample
Question: There was a claim that the usage of internet quota by the students was different
from the average of 15 GB. A study was conducted to investigate the claim and 45 students
were selected at random. Test at 5% level of significance.
Step 1: State the hypothesis and identify the claim

H0 : μ = 15

H1 : μ ≠ 15

Step 2: State the level of significance

α = 0.05 (5% level of significance, confidence level at 95%)

Asmui Rahim, DSC551:R , Oct 2024

Step 3: Find the p-value
1 t.test(mydata1$Usage_GB, mu=15)

One Sample t-test

data: mydata1$Usage_GB
t = 3.2358, df = 44, p-value = 0.002306
alternative hypothesis: true mean is not equal to 15
95 percent confidence interval:
16.18011 20.07766
sample estimates:
mean of x
18.12889

Step 4: Make the decision

Reject H0 if p − value ≤ α. Since p − value = 0.0023 < α = 0.05, reject H0 .
Step 5: Summarize the result
At 5% significance level, the internet usage by the students is different from 15 GB.

Asmui Rahim, DSC551:R , Oct 2024

Confidence interval
1 t.test(mydata1$Usage_GB, mu=15)

One Sample t-test

We are 95% confident that the mean of internet usage by the students is between 16.1801
and 20.0777.

Note

For testing the alternative hypothesis of less or more than (testing directional H1 ), we can
specify the argument alternative="less" or alternative="greater". You can
specify just the initial letter.

Asmui Rahim, DSC551:R , Oct 2024

Exercise
Question 1: There was a claim that the usage of internet quota by the students was less from
the average of 20 GB. A study was conducted to investigate the claim and 45 students were
selected at random. Test at 10% level of significance.
Step 1: Hypothesis

H0 : The usage of internet quota by the students is equal to 20 GB

H1 : The usage of internet quota by the students is less than 20 GB
Step 2: Level of significance
α = 0.10
Step 3: Find the p-value
1 t.test(mydata1$Usage_GB, mu=20, alternative = "less", conf.level = 0.9)

One Sample t-test

data: mydata1$Usage_GB
t = -1.9351, df = 44, p-value = 0.02971
alternative hypothesis: true mean is less than 20
90 percent confidence interval:
-Inf 19.38699
sample estimates:
mean of x
18.12889 Asmui Rahim, DSC551:R , Oct 2024
Step 4: Decision rule
Reject H0 if p-value<α. Since p-value=0.02971 < α = 0.1, we reject the H0 .
Step 5: Conclusion
At 10% level of significance, the usage of internet quota by the students is less than 20 GB.
Question 2: A claim has been made that the average daily mobile phone usage are more than
the average of 5 hours. Using the datasets from Question 1, first assess the normality of the
daily hour per day of mobile phone usage. Then, conduct a hypothesis test at a 5% significance
level if there is sufficient evidence to support the claim.

Note

You can change the significance level by specifying the argument conf.level= . For 10%,
set the value as 0.9. For 1%, set the value as 0.99. By default the values is set for 5% level of
significance.

Asmui Rahim, DSC551:R , Oct 2024

End of Slides

Asmui Rahim, DSC551:R , Oct 2024

Revision Questions
No ratings yet
Revision Questions
32 pages
Normality One Sample
No ratings yet
Normality One Sample
20 pages
Hyp Testing
No ratings yet
Hyp Testing
23 pages
Hypothesis
No ratings yet
Hypothesis
44 pages
One-Sample Test of Proportions: Z 1.733 One-Tailed Probability 0.042 Two-Tailed Probability 0.084
No ratings yet
One-Sample Test of Proportions: Z 1.733 One-Tailed Probability 0.042 Two-Tailed Probability 0.084
4 pages
L15 Testing of Hypothesis
No ratings yet
L15 Testing of Hypothesis
42 pages
Probability and Statistics Practice Set
No ratings yet
Probability and Statistics Practice Set
31 pages
Stats 2 Module Updated
No ratings yet
Stats 2 Module Updated
30 pages
G) Compliance and Regulations
No ratings yet
G) Compliance and Regulations
9 pages
Hypothesis Testing for Students
No ratings yet
Hypothesis Testing for Students
77 pages
Unit 5.2 Testing Two Population Means
No ratings yet
Unit 5.2 Testing Two Population Means
24 pages
Testing of Hypothesis
No ratings yet
Testing of Hypothesis
26 pages
Hypothesis Python
No ratings yet
Hypothesis Python
42 pages
Biometrics 2011 II 7
No ratings yet
Biometrics 2011 II 7
16 pages
Hypothesis Testing Problem
No ratings yet
Hypothesis Testing Problem
57 pages
Hypothesis Test Examples 1-4 Homework
No ratings yet
Hypothesis Test Examples 1-4 Homework
12 pages
1 Vocab Reasoning
No ratings yet
1 Vocab Reasoning
3 pages
Statistics Handbk Act08
No ratings yet
Statistics Handbk Act08
12 pages
A534011718 23 2025 Unit5
No ratings yet
A534011718 23 2025 Unit5
64 pages
Hypothesis Testing Intro and Test For Means
No ratings yet
Hypothesis Testing Intro and Test For Means
10 pages
MODS 2023 L1W4 - CI and Stats Tests
No ratings yet
MODS 2023 L1W4 - CI and Stats Tests
30 pages
Ie226 - Week 9
No ratings yet
Ie226 - Week 9
11 pages
06 Testing
No ratings yet
06 Testing
29 pages
From Internet Q3 PDF
No ratings yet
From Internet Q3 PDF
14 pages
P Test
No ratings yet
P Test
33 pages
Tutorial 9 Solutions S1 2015
No ratings yet
Tutorial 9 Solutions S1 2015
10 pages
9 3+sig+test+notes
No ratings yet
9 3+sig+test+notes
33 pages
Hypothesis Testing Lecture
No ratings yet
Hypothesis Testing Lecture
28 pages
Statppt2 - Test Statistic, Z-Critical & T-Critical
No ratings yet
Statppt2 - Test Statistic, Z-Critical & T-Critical
44 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
10 pages
Testing Hypothesis
No ratings yet
Testing Hypothesis
11 pages
MKT3602 Week+8 Slides
No ratings yet
MKT3602 Week+8 Slides
47 pages
Week 14 - 15 Testing Claims About Means and Proportions
No ratings yet
Week 14 - 15 Testing Claims About Means and Proportions
74 pages
Theory
No ratings yet
Theory
7 pages
HW12 Sol
No ratings yet
HW12 Sol
9 pages
Assignment06 1
No ratings yet
Assignment06 1
4 pages
Hypothesis Testing Guide
No ratings yet
Hypothesis Testing Guide
50 pages
Mat120 Module4 Review
No ratings yet
Mat120 Module4 Review
6 pages
9.3 Exercises
No ratings yet
9.3 Exercises
7 pages
Lecture 21 STATS 30301
No ratings yet
Lecture 21 STATS 30301
11 pages
2-Basic Statistics For Pharmacology Practicals
No ratings yet
2-Basic Statistics For Pharmacology Practicals
38 pages
Hypothesis Testing for Students
No ratings yet
Hypothesis Testing for Students
12 pages
Eda Final Topic
No ratings yet
Eda Final Topic
108 pages
5 Single Sample T JASP
No ratings yet
5 Single Sample T JASP
10 pages
5 & 6 - BIOSTATISTICS V & VI Inferential Statistics I & II
No ratings yet
5 & 6 - BIOSTATISTICS V & VI Inferential Statistics I & II
68 pages
BE186
No ratings yet
BE186
51 pages
COM508 Reviewer
No ratings yet
COM508 Reviewer
7 pages
Hypothesis Test For Mean Using Given Data (Standard Deviation Known-Z-Test)
No ratings yet
Hypothesis Test For Mean Using Given Data (Standard Deviation Known-Z-Test)
12 pages
APM 391 Lab #5 Hypothesis Testing - 2020 - Fillable
No ratings yet
APM 391 Lab #5 Hypothesis Testing - 2020 - Fillable
6 pages
Hypothesis Testing: T-Test & ANOVA Guide
No ratings yet
Hypothesis Testing: T-Test & ANOVA Guide
20 pages
Up Tps6 Lecture Powerpoint 11.1 2
No ratings yet
Up Tps6 Lecture Powerpoint 11.1 2
63 pages
Lfs Project I - Answers For Master Sample - XLS: Distribution of Ages
No ratings yet
Lfs Project I - Answers For Master Sample - XLS: Distribution of Ages
9 pages
MPHH 6210 Lecture 3 - Data Analysis - 2025
No ratings yet
MPHH 6210 Lecture 3 - Data Analysis - 2025
167 pages
Sampling Distributions and Inference: October 15, 2016
No ratings yet
Sampling Distributions and Inference: October 15, 2016
19 pages
Stats 12 2425 2
No ratings yet
Stats 12 2425 2
58 pages
9.4 Full Hypothesis Test Examples - Introductory Business Statistics - OpenStax
No ratings yet
9.4 Full Hypothesis Test Examples - Introductory Business Statistics - OpenStax
7 pages
Test of Significance About Mean
No ratings yet
Test of Significance About Mean
23 pages
Exploring Entrepreneurial Intentions Integrating Kolb S Experiential Learning Theory With Ajzen S Theory of Planned Behaviour
No ratings yet
Exploring Entrepreneurial Intentions Integrating Kolb S Experiential Learning Theory With Ajzen S Theory of Planned Behaviour
19 pages
A Survey On Power System Blackout and Cascading Events Research Motivations and Challenges
No ratings yet
A Survey On Power System Blackout and Cascading Events Research Motivations and Challenges
19 pages
Chapter - 6 - Employee - Selection - 9e
No ratings yet
Chapter - 6 - Employee - Selection - 9e
16 pages
Introduction To Health Economics and Outcomes Research (HEOR) For Writing Professionals
No ratings yet
Introduction To Health Economics and Outcomes Research (HEOR) For Writing Professionals
24 pages
Business Stats Assignment Guide
No ratings yet
Business Stats Assignment Guide
3 pages
Breast Self-Examination: Knowledge, Attitude, and Practice Among Female Dental Students in Hyderabad City, India
No ratings yet
Breast Self-Examination: Knowledge, Attitude, and Practice Among Female Dental Students in Hyderabad City, India
8 pages
Managing Customer Life Cycle Through Knowledge Management Capability: A Contextual Role of Information Technology
No ratings yet
Managing Customer Life Cycle Through Knowledge Management Capability: A Contextual Role of Information Technology
26 pages
Piping Design Guide for Non-Engineers
No ratings yet
Piping Design Guide for Non-Engineers
20 pages
TOR of The Feasibility Study of Crop Recommendation
No ratings yet
TOR of The Feasibility Study of Crop Recommendation
6 pages
Construction Data Capture Analysis
No ratings yet
Construction Data Capture Analysis
59 pages
Essay Types Powerpoint
100% (2)
Essay Types Powerpoint
5 pages
Unit 10 Marketing Research and Planning P1
0% (1)
Unit 10 Marketing Research and Planning P1
7 pages
Media Literacy Fundamentals
No ratings yet
Media Literacy Fundamentals
7 pages
Digital Ethnography Reading List
No ratings yet
Digital Ethnography Reading List
10 pages
Business Research-2018
No ratings yet
Business Research-2018
75 pages
Ptme-Pg-1-It1-Rm3151 (11.10.2023)
No ratings yet
Ptme-Pg-1-It1-Rm3151 (11.10.2023)
2 pages
Sampling
No ratings yet
Sampling
32 pages
Title Proposal - Group 3
No ratings yet
Title Proposal - Group 3
26 pages
PRACTICAL RESEARCH 1 Finals.
No ratings yet
PRACTICAL RESEARCH 1 Finals.
6 pages
THISISS
No ratings yet
THISISS
55 pages
10 Exclusive Gmat Questions!: Here's What You've Really Been Waiting For, Right?
100% (2)
10 Exclusive Gmat Questions!: Here's What You've Really Been Waiting For, Right?
20 pages
University of Cambridge International Examinations General Certificate of Education Ordinary Level
No ratings yet
University of Cambridge International Examinations General Certificate of Education Ordinary Level
20 pages
Data Quality Management Maturity Model A Case Study in BPS-Statistics of Kaur Regency, Bengkulu Province, 2017
No ratings yet
Data Quality Management Maturity Model A Case Study in BPS-Statistics of Kaur Regency, Bengkulu Province, 2017
4 pages
Activity 6: Teaching Internship Eportfolio Activity
No ratings yet
Activity 6: Teaching Internship Eportfolio Activity
10 pages
CMPE Graduation Project Manual
No ratings yet
CMPE Graduation Project Manual
25 pages
Data Analysis Report On "Food For Fork"
No ratings yet
Data Analysis Report On "Food For Fork"
14 pages
Practical Research2 Modules 1ST Quarter
No ratings yet
Practical Research2 Modules 1ST Quarter
104 pages
Types and Importance of Research
No ratings yet
Types and Importance of Research
1 page
Quality Testing of Rice Grains Using Image Processing Applications
No ratings yet
Quality Testing of Rice Grains Using Image Processing Applications
6 pages
Doan, (2020)
No ratings yet
Doan, (2020)
49 pages

C6 - DSC551 - R Programming

Uploaded by

C6 - DSC551 - R Programming

Uploaded by

DSC551: Programming for Data

Lecturer, Department of Statistics

Asmui Rahim, DSC551:R , Oct 2024

1. Discussing about the assumptions of normality.

Asmui Rahim, DSC551:R , Oct 2024

1 mydata1 <- read.csv("telco.csv",

Asmui Rahim, DSC551:R , Oct 2024

Asmui Rahim, DSC551:R , Oct 2024

Asmui Rahim, DSC551:R , Oct 2024

Asymptotic one-sample Kolmogorov-Smirnov test

pnorm specified the theoretical distribution (normal in this case)

Asmui Rahim, DSC551:R , Oct 2024

Shapiro-Wilk normality test

How to make the normality conclusion?

Asmui Rahim, DSC551:R , Oct 2024

Asmui Rahim, DSC551:R , Oct 2024

Asmui Rahim, DSC551:R , Oct 2024

Step 2: State the level of significance

Asmui Rahim, DSC551:R , Oct 2024

One Sample t-test

Step 4: Make the decision

Asmui Rahim, DSC551:R , Oct 2024

One Sample t-test

Asmui Rahim, DSC551:R , Oct 2024

H0 : The usage of internet quota by the students is equal to 20 GB

One Sample t-test

Asmui Rahim, DSC551:R , Oct 2024

Asmui Rahim, DSC551:R , Oct 2024

You might also like