0% found this document useful (0 votes)
130 views11 pages

Term Project - Skittles Collection

This report summarizes a class project analyzing the distribution of colors in bags of Skittles candy. Students each collected data from one bag, recording the number of candies in each color. The combined class data included a total of 1108 candies across 19 bags. Graphs of the class data showed the distribution of each color was close to the expected 20% proportion. Confidence intervals were constructed for the population proportion of yellow candies and the population mean number of candies per bag.

Uploaded by

api-365526530
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
130 views11 pages

Term Project - Skittles Collection

This report summarizes a class project analyzing the distribution of colors in bags of Skittles candy. Students each collected data from one bag, recording the number of candies in each color. The combined class data included a total of 1108 candies across 19 bags. Graphs of the class data showed the distribution of each color was close to the expected 20% proportion. Confidence intervals were constructed for the population proportion of yellow candies and the population mean number of candies per bag.

Uploaded by

api-365526530
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Qili Huang

Math 1040

Term project -skittles collection

Report introduction:

This is a basic statistics course. This project is to collect how many pieces of different colors of

candy in a pack of skittles. Then we will get a lot of different data and then we need to analyze and

calculate it, so that we can have a better understanding of statistics. In this project, I used some

charts to represent the data, which will make the whole project clearer.

This project consists of five parts, which are:

Part 1: Skittles Date Collection

Part 2: Organizing and Displaying Categorical Data: Colors

Organizing and Displaying Categorical Data: The Number of Candies per Bag

Part 3: Summary Stats

Part 4: Confidence Interval Estimates and Hypothesis Tests

Part 5: Compile Term Project, Reflection and ePortfolio Posting


Part 1: Skittles Date Collection

In this process, there are 19 students participated, each student needs to calculate the

color of skittles inside of their candy bag.

Student Red Orange Yellow Green Purple Total


1 Sadi M. B 14 12 11 13 10 60
2 Luis C 12 17 10 11 13 63
3 Sonia H. C 7 10 14 14 11 56
4 Isaac L. C 13 15 13 13 10 64
5 Riley T. C 13 13 7 11 15 59
6 Lynnell C.C 4 14 10 13 17 58
7 Stacey L. H 9 15 8 9 16 57
8 Qili H 14 11 11 12 10 58
9 Amme K. L 11 9 13 14 7 54
10 Juan C. L 13 15 9 11 9 57
11 Amanda M. M 8 16 15 9 11 59
12 Danna T. N 10 12 15 7 15 59
13 Tayler B.P 14 8 9 18 10 59
14 Melany R 17 10 13 12 8 60
15 Sierra J. R 6 6 17 11 16 56
16 Jacqueline S 17 8 17 12 4 58
17 Mckenzie R. T 9 11 13 12 9 54
18 Valerie S.V 12 8 12 10 14 56
19 Zane K. W 11 11 7 13 19 61
summary 214 221 224 225 224 1108
Part 2: Organizing and Displaying Categorical Data: Colors

1. What proportion (or percentage) of the Skittles do you expect to see of each color? Why?

Before I collect the data, I think the color ratio of each bag of skittles is the same.

In my skittle, I have 14 red,11 orange, 11 yellow,12 green and 10 purple. Total is 58.

Red: 14/58=24%

Orange: 11/58=19%

Yellow: 11/58=19%

Green: 12/58=21%

Purple: 10/58=17%

Count Red Count Orange Count Yellow Count Green Count Purple

Expected 20% 20% 20% 20% 20%


Proportion
Observed 24% 19% 19% 21% 17%
Proportion
2. In StatCrunch, create a pie chart and a Pareto chart for the total number of candies of each

color in our class data set.


3. Does the class data represent a random sample? What would the population be?

I think the class data is a random sample. This data collection consisted of 19 students

participated who bought the same size skittles at different times and places. Also, in each bag,

each bag has the same pieces of candy. So, I think these samples are all valid.

4. Create a table that displays the proportions by color and the total count from your own bag

of candies together with the proportions by color and total count for the entire class sample.

My bag: Class Totals:

Red: 14/58=24.14% 214/1108=19.31%

Orange: 11/58=18.97% 221/1108=19.95%

Yellow: 11/58=18.97% 224/1108=20.22%

Green: 12/58=20.69% 225/1108=20.31%

Purple: 10/58=17.24% 224/1108=20.22%

Total: 58 1108

Proportion Proportion Proportion Proportion Proportion Total

Red Orange Yellow Green Purple Count

My Bag 24.14% 18.97% 18.97% 20.69% 17.24% 100%

Class 19.31% 19.95% 20.22% 20.31% 20.22% 100%


Totals
5.Write a well thought out paragraph discussing your observations of this data. Respond to the
following prompts:

 Do the graphs reflect what you expected to see? Are there any surprises?

Yes, My Expected Proportion is 20% of the total for each color, and the data shows that
the ratio of each color is also close to 20%, so they don't have a big gap. I am pleasantly
surprised that they are too close to my Expected Proportion.

 Are there any observations that appear to be outliers? If so, what impact might they have
on graphics and summary statistics?

According to my observation there are no outlier.

 Does the distribution of colors in the total class data match with your own data from your
single bag of candies or are they different?

They are not matching. The proportion of some colors is different from the sample. But
the total class data is relatively accurate.
COLLECTION DATA
Sample of skittles: 19 bags

Total candies: 1108

Red: 214 Red candies average 214/19=11.26


per bag
Orange: 221 Orange candies 221/19=11.63
average per bag
Yellow: 224 Yellow candies 224/19=11.79
average per bag
Green: 225 Green candies 225/19=11.84
average per bag
Purple: 224 Purple candies 224/19=11.79
average per bag
5 number summaries:

Min: 54 Max: 64

Q1: 56 Median: 58

Q3: 60 Standard Deviation: 2.6

Mean: 58.3
Part 3: Summary Stats

Explain the difference between categorical and quantitative data. What types of graphs make

sense and what types of graphs do not make sense for categorical data? For quantitative data?

Explain why. What types of calculations make sense and what types of calculations do not

make sense for categorical data? For quantitative data? Explain why.

Categorical data are more focused in presenting the data in groups or orders. For

example, classes inside of a school, different races and different companies for a flight.

Basically, it presents to users a graph or sets of data in different groups. Inside of this project,

we asked to list the candies in different colors. This is a typical way of getting categorical data.

In this section, we used bar chart and pie chart to clearly identify the category of color. So, bar

chart and pie chart are a good way to present categorical data because they showed a clear

group or category inside of the graph. However, I think line graph or dot plot aren’t going to

make a categorical graph because there isn’t a clear presentation of each groups.

In contrast, quantitative data focused more on the measurement and numbers. For

example, average height, average weight or average student grades. Inside of this project, we

asked to have the average candies per color and generate a box graph. This clearly shows that

using box graph is good for showing quantitative data. Using bar chart or pie chart isn’t going to

work for the quantitative data because there isn’t any specific group inside of the data.
Part 4: Confidence Interval Estimates and Hypothesis Tests

Explain in general the purpose and meaning of a confidence interval.

1.Construct a 99% confidence interval estimate for the population proportion of yellow candies.

X=224 (Number of Yellow candies), n=1108(Total number of candies), and 99%

STAT-TESTS-A:1-PropZInt enter got (0.17109,0.23324)

There are 99% confident that the confidence interval contains the true value of the
population standard deviation between 0.17109 and 0.23324.

2.Construct a 95% confidence interval estimate for the population mean number of candies per
bag.

X=58.3 (mean number of candies per bag), Sx=2.65(Total number of candies), n=19 and 95%

STAT-TESTS-8: TInterval-Data-calculate got (57.04,59.592)

There are 95% confident that the confidence interval contains the true value of the
population mean number of candies between 57.04 and 59.592.

Hypothesis Tests
Explain in general the purpose and meaning of a hypothesis test.
1. Use a 0.05 significance level to test the claim that 20% of all Skittles candies are red.
X=214 (Number of Red candies), n=1108(Total number of candies), P=0.1931
STAT-TESTS-5:1-PropZTest
P0:0.2
X:214
N:1108
prop≠P0
get p=0.57
Then 0.57>0.05
There are failed to reject the null hypothesis
2. Use a 0.01 significance level to test the claim that the mean number of candies in a bag
of Skittles is 55.
X¯=58.3 (mean number of candies per bag), S=2.6, n=19
STAT-TESTS-2:T-TEST
µ0:58
X¯:58.3
Sx:2.6
N:19
µ≠µ0
get t=05.5324, p=2.9769e¯5
P value<0.01
P value is less than significance.
There are reject the null hypothesis

Reflection:
The sample of our class is a random sample and that the population is normally

distributed (n>30), Possible errors are calculation errors, including statistical errors for each

color of candy and statistical errors for the total number. Since our data comes from a class, this

is a simple random statistic. I think it can be proved by increasing the number of samples or

collecting samples from different places.

Part 5: Compile Term Project, Reflection and ePortfolio Posting

This is a data collection task from my math 1040 class. All I need to do is buy a bag of

2.17oz skittles and record the number of different color candy in one bag. When I finished this, I

found that I got a piece of data from the whole class. So, I know that the first thing about data

analysis is to get a valid sample. In this project, we were asked to analyze its data including

calculate the mean, standard deviation, 5-number summary and solve some Confidence

Interval Estimates and Hypothesis Tests questions. At the same time, I learned how to construct

pie chart, a Pareto chart and confidence intervals. These are very useful representations in

statistics.
So, through this project, I think I have learned a lot. It helps me to analyze it if I have a

similar problem and all the results represent what it means. And I believe that group

collaboration is important for how to get a valid sample. Because everyone needs to provide

accurate data, if one person does not, it will affect the final reaction result. I think statistics is

the foundation of many future work areas. For students, statistics have a certain level of help

for their other courses. So I understand why I need to take this class.

You might also like