Qili Huang
Math 1040
                           Term project -skittles collection
Report introduction:
        This is a basic statistics course. This project is to collect how many pieces of different colors of
candy in a pack of skittles. Then we will get a lot of different data and then we need to analyze and
calculate it, so that we can have a better understanding of statistics. In this project, I used some
charts to represent the data, which will make the whole project clearer.
This project consists of five parts, which are:
        Part 1: Skittles Date Collection
        Part 2: Organizing and Displaying Categorical Data: Colors
                Organizing and Displaying Categorical Data: The Number of Candies per Bag
        Part 3: Summary Stats
        Part 4: Confidence Interval Estimates and Hypothesis Tests
        Part 5: Compile Term Project, Reflection and ePortfolio Posting
Part 1: Skittles Date Collection
         In this process, there are 19 students participated, each student needs to calculate the
color of skittles inside of their candy bag.
           Student                   Red       Orange      Yellow      Green      Purple       Total
    1      Sadi M. B                  14         12          11          13         10          60
    2      Luis C                     12         17          10          11         13          63
    3      Sonia H. C                  7         10          14          14         11          56
    4      Isaac L. C                 13         15          13          13         10          64
    5      Riley T. C                 13         13           7          11         15          59
    6      Lynnell C.C                 4         14          10          13         17          58
    7      Stacey L. H                 9         15           8           9         16          57
    8      Qili H                     14         11          11          12         10          58
    9      Amme K. L                  11          9          13          14          7          54
    10      Juan C. L                 13         15           9          11          9          57
    11     Amanda M. M                 8         16          15           9         11          59
    12     Danna T. N                 10         12          15           7         15          59
    13     Tayler B.P                 14          8           9          18         10          59
    14     Melany R                   17         10          13          12          8          60
    15     Sierra J. R                 6          6          17          11         16          56
    16     Jacqueline S               17          8          17          12          4          58
    17     Mckenzie R. T               9         11          13          12          9          54
    18     Valerie S.V                12          8          12          10         14          56
    19     Zane K. W                  11         11           7          13         19          61
             summary                 214        221          224        225        224         1108
Part 2: Organizing and Displaying Categorical Data: Colors
1. What proportion (or percentage) of the Skittles do you expect to see of each color? Why?
Before I collect the data, I think the color ratio of each bag of skittles is the same.
In my skittle, I have 14 red,11 orange, 11 yellow,12 green and 10 purple. Total is 58.
Red:           14/58=24%
Orange:        11/58=19%
Yellow:        11/58=19%
Green:         12/58=21%
Purple:         10/58=17%
                 Count Red     Count Orange       Count Yellow      Count Green       Count Purple
  Expected          20%              20%                20%               20%             20%
 Proportion
  Observed          24%              19%                19%               21%             17%
 Proportion
2. In StatCrunch, create a pie chart and a Pareto chart for the total number of candies of each
color in our class data set.
3. Does the class data represent a random sample? What would the population be?
         I think the class data is a random sample. This data collection consisted of 19 students
participated who bought the same size skittles at different times and places. Also, in each bag,
each bag has the same pieces of candy. So, I think these samples are all valid.
4. Create a table that displays the proportions by color and the total count from your own bag
of candies together with the proportions by color and total count for the entire class sample.
                      My bag:                                      Class Totals:
Red:               14/58=24.14%                                   214/1108=19.31%
Orange:            11/58=18.97%                                   221/1108=19.95%
Yellow:             11/58=18.97%                                  224/1108=20.22%
Green:             12/58=20.69%                                   225/1108=20.31%
Purple:            10/58=17.24%                                  224/1108=20.22%
Total:                  58                                            1108
                Proportion Proportion Proportion Proportion Proportion Total
                Red             Orange      Yellow        Green         Purple         Count
 My Bag           24.14%         18.97%       18.97%        20.69%        17.24%         100%
 Class            19.31%         19.95%       20.22%        20.31%        20.22%         100%
 Totals
5.Write a well thought out paragraph discussing your observations of this data. Respond to the
following prompts:
   Do the graphs reflect what you expected to see? Are there any surprises?
       Yes, My Expected Proportion is 20% of the total for each color, and the data shows that
    the ratio of each color is also close to 20%, so they don't have a big gap. I am pleasantly
    surprised that they are too close to my Expected Proportion.
   Are there any observations that appear to be outliers? If so, what impact might they have
    on graphics and summary statistics?
       According to my observation there are no outlier.
   Does the distribution of colors in the total class data match with your own data from your
    single bag of candies or are they different?
       They are not matching. The proportion of some colors is different from the sample. But
    the total class data is relatively accurate.
                        COLLECTION DATA
Sample of skittles:                       19 bags
Total candies:                             1108
Red:                  214       Red candies average   214/19=11.26
                                per bag
Orange:               221       Orange candies        221/19=11.63
                                average per bag
Yellow:               224       Yellow candies        224/19=11.79
                                average per bag
Green:                225       Green candies         225/19=11.84
                                average per bag
Purple:               224       Purple candies        224/19=11.79
                                average per bag
5 number summaries:
Min:                  54        Max:                      64
Q1:                   56        Median:                   58
Q3:                   60        Standard Deviation:       2.6
Mean:                 58.3
Part 3: Summary Stats
Explain the difference between categorical and quantitative data. What types of graphs make
sense and what types of graphs do not make sense for categorical data? For quantitative data?
Explain why. What types of calculations make sense and what types of calculations do not
make sense for categorical data? For quantitative data? Explain why.
       Categorical data are more focused in presenting the data in groups or orders. For
example, classes inside of a school, different races and different companies for a flight.
Basically, it presents to users a graph or sets of data in different groups. Inside of this project,
we asked to list the candies in different colors. This is a typical way of getting categorical data.
In this section, we used bar chart and pie chart to clearly identify the category of color. So, bar
chart and pie chart are a good way to present categorical data because they showed a clear
group or category inside of the graph. However, I think line graph or dot plot aren’t going to
make a categorical graph because there isn’t a clear presentation of each groups.
       In contrast, quantitative data focused more on the measurement and numbers. For
example, average height, average weight or average student grades. Inside of this project, we
asked to have the average candies per color and generate a box graph. This clearly shows that
using box graph is good for showing quantitative data. Using bar chart or pie chart isn’t going to
work for the quantitative data because there isn’t any specific group inside of the data.
Part 4: Confidence Interval Estimates and Hypothesis Tests
Explain in general the purpose and meaning of a confidence interval.
1.Construct a 99% confidence interval estimate for the population proportion of yellow candies.
   X=224 (Number of Yellow candies), n=1108(Total number of candies), and 99%
   STAT-TESTS-A:1-PropZInt enter got (0.17109,0.23324)
   There are 99% confident that the confidence interval contains the true value of the
   population standard deviation between 0.17109 and 0.23324.
2.Construct a 95% confidence interval estimate for the population mean number of candies per
bag.
X=58.3 (mean number of candies per bag), Sx=2.65(Total number of candies), n=19 and 95%
STAT-TESTS-8: TInterval-Data-calculate got (57.04,59.592)
   There are 95% confident that the confidence interval contains the true value of the
   population mean number of candies between 57.04 and 59.592.
Hypothesis Tests
Explain in general the purpose and meaning of a hypothesis test.
   1. Use a 0.05 significance level to test the claim that 20% of all Skittles candies are red.
      X=214 (Number of Red candies), n=1108(Total number of candies), P=0.1931
      STAT-TESTS-5:1-PropZTest
      P0:0.2
      X:214
      N:1108
      prop≠P0
      get p=0.57
      Then 0.57>0.05
      There are failed to reject the null hypothesis
   2. Use a 0.01 significance level to test the claim that the mean number of candies in a bag
      of Skittles is 55.
      X¯=58.3 (mean number of candies per bag), S=2.6, n=19
      STAT-TESTS-2:T-TEST
      µ0:58
      X¯:58.3
        Sx:2.6
        N:19
        µ≠µ0
        get t=05.5324, p=2.9769e¯5
        P value<0.01
        P value is less than significance.
        There are reject the null hypothesis
Reflection:
        The sample of our class is a random sample and that the population is normally
distributed (n>30), Possible errors are calculation errors, including statistical errors for each
color of candy and statistical errors for the total number. Since our data comes from a class, this
is a simple random statistic. I think it can be proved by increasing the number of samples or
collecting samples from different places.
Part 5: Compile Term Project, Reflection and ePortfolio Posting
        This is a data collection task from my math 1040 class. All I need to do is buy a bag of
2.17oz skittles and record the number of different color candy in one bag. When I finished this, I
found that I got a piece of data from the whole class. So, I know that the first thing about data
analysis is to get a valid sample. In this project, we were asked to analyze its data including
calculate the mean, standard deviation, 5-number summary and solve some Confidence
Interval Estimates and Hypothesis Tests questions. At the same time, I learned how to construct
pie chart, a Pareto chart and confidence intervals. These are very useful representations in
statistics.
        So, through this project, I think I have learned a lot. It helps me to analyze it if I have a
similar problem and all the results represent what it means. And I believe that group
collaboration is important for how to get a valid sample. Because everyone needs to provide
accurate data, if one person does not, it will affect the final reaction result. I think statistics is
the foundation of many future work areas. For students, statistics have a certain level of help
for their other courses. So I understand why I need to take this class.