First Stage              Statics and Probability               lecture (2)
Second Semester
Definition 3.1. A frequency distribution
      Is the organization of raw data in table form, using classes and
frequencies.
Definition 3.2. Categorical Frequency Distributions
      The categorical frequency distribution is used for data that can be
placed in specific categories, such as nominal- or ordinal-level data. For
example, data such as political affiliation, religious affiliation, or major field
of study would use categorical frequency distributions.
Example 3.3. Distribution of Blood Types Twenty-five army inductees were
given a blood test to determine their blood type. The data set is
Construct a frequency distribution for the data.
Ans/ Since the data are categorical, discrete classes can be used. There are
four blood types: A, B, O, and AB. These types will be used as the classes for
the distribution. The procedure for constructing a frequency distribution for
categorical data is given next.
Step.1 Make a table as shown.
Step.2 Tally the data and place the results in column B.
Step.3 Count the tallies and place the results in column C.
                                        1
First Stage             Statics and Probability              lecture (2)
Second Semester
Step.4 Find the percentage of values in each class by using the formula
% = f/n . 100%, where f = frequency of the class and n = total number of
values. For example, in the class of type A blood, the percentage is % = 5 /25
.100% = 20%
Percentages are not normally part of a frequency distribution, but they can be
added since they are used in certain types of graphs such as pie graphs. Also,
the decimal equivalent of a percent is called a relative frequency.
Step.5 Find the totals for columns C (frequency) and D (percent). The
completed table is shown.
3.2. Grouped Frequency Distributions. When the range of the data is large,
the data must be grouped into classes that are more than one unit in width, in
what is called a grouped frequency distribution.
Example 3.4. Data represent the record high temperature in F
                                       2
First Stage                Statics and Probability              lecture (2)
Second Semester
The procedure for constructing a grouped frequency distribution for
numerical data follows.
    Determine the classes. Find the highest value , lowest value: H = 134
      and L = 100.
Find the range: R = highest value - lowest value = H - L, so R = 134 - 100 =34
Select the number of classes desired (usually between 5 and 20). In this case,
7 is arbitrarily chosen.
Find the class width by dividing the range by the number of classes.
Round the answer up to the nearest whole number if there is a remainder: 4.9
5. (Rounding up is different from rounding off. A number is rounded up if
there is any decimal remainder when dividing. For example, 85  6 = 14.167
and is rounded up to 15. Also, 53  4 = 13.25 and is rounded up to 14. Also,
after dividing, if there is no remainder, you will need to add an extra class to
accommodate all the data.)
Select a starting point for the lowest class limit. This can be the smallest data
value or any convenient number less than the smallest data value. In this case,
100 is used. Add the width to the lowest score taken as the starting point to
get the lower limit of the next class. Keep adding until there are 7 classes, as
shown, 100, 105, 110, etc.
Subtract one unit from the lower limit of the second class to get the upper limit
of the first class. Then add the width to each upper limit to get all the upper
limits. 105 - 1 = 104.
The first class is 100-104, the second class is 105-109, etc.
                                        3
First Stage               Statics and Probability                lecture (2)
Second Semester
    Tally the data.
    Find the numerical frequencies from the tallies. The completed
      frequency distribution is
      Sometimes it is necessary to use a cumulative frequency distribution. A
cumulative frequency distribution is a distribution that shows the number of
data values less than or equal to a specific value (usually an upper boundary).
The values are found by adding the frequencies of the classes less than or
equal to the upper class boundary of a specific class. This gives an ascending
cumulative frequency. In this example, the cumulative frequency for the first
class is 0 +2 = 2; for the second class it is 0 + 2 + 8 = 10; for the third class it
is 0 + 2 + 8 + 18 = 28.
      Naturally, a shorter way to do this would be to just add the cumulative
frequency of the class below to the frequency of the given class. For example,
the cumulative frequency for the number of data values less than 114.5 can be
found by adding 10 + 18 = 28.
The cumulative frequency distribution for the data in this example is as
follows:
                                         4
First Stage            Statics and Probability              lecture (2)
Second Semester
Remark 3.5. Constructing a grouped frequency distribution
    Determine the class.
    Find the highest and lowest value
    Find the range
    Select the number of classes desired
    Find the width by dividing the range by the number of classes and
      rounding up
    Select a starting point (usually the lowest value or any convenient
      number less than the lowest value), add the width to get the lower
      limits.
    Find the upper class limits
    Find the boundary
    Tally the data
    Find the numerical frequency from the tallies
    Find the cumulative frequencies
4. Histogram, Frequency Polygons, and Ogives
      The purpose of graphs in statistics is to convey the data to the viewers
in pictorial form. 1) It is easier for most people to comprehend the meaning
                                      5
First Stage              Statics and Probability                lecture (2)
Second Semester
of data presented graphically than data presented numerically in tables or
frequency distributions.
      This is especially true if the users have little or no statistical knowledge.
Statistical graphs can be used to describe the data set or to analyze it.2)
Graphs are also useful in getting the audience attention in a publication or a
speaking presentation. 3) They can be used to discuss an issue, reinforce a
critical point, or summarize a data set. They can also be used to discover a
trend or pattern in a situation over a period of time.
The three most commonly used graphs in research are:
1. The histogram.
2. The frequency polygon.
3. The cumulative frequency graph, or ogive (pronounced o-jive).
An example of each type of graph is shown in following Figure. The data for
each graph are the distribution of the miles that 20 randomly selected runners
ran during a given week.
                                        6
First Stage              Statics and Probability                lecture (2)
Second Semester
Definition 4.1. Histogram
      The histogram is a graph that displays the data by using contiguous
vertical bars (unless the frequency of a class is 0) of various heights to
represent the frequencies of the classes.
Example 4.2. Construct a histogram to represent the data shown for the
record high temperatures for each of the 50 states
                        Class boundaries            Frequency
                       99.5 – 104.5                      2
                       104.5 – 109.5                     8
                       109.5 – 114.5                    18
                       114.5 – 119.5                    13
                       119.5 – 124.5                     7
                       124.5 – 129.5                     1
                       129.5 – 134.5                     1
Solution:
Step.1 Draw and label the x and y axis. The x axis is always the horizontal
axis, and the y axis is always the vertical axis.
Step.2 Represent the frequency on the y-axis and the class boundaries on the
x-axis.
                                        7
First Stage                Statics and Probability            lecture (2)
Second Semester
Step.3 Using the frequencies as the heights, draw vertical bars for each class
Definition 4.3. The frequency polygon is a graph that displays the data by
using lines that connect points plotted for the frequencies at the midpoints of
the classes. The frequencies are represented by the heights of the points.
Example 4.4. For the same example
Step.1 Find the midpoints of each class
          Class boundaries             Midpoint            Frequency
             99.5 – 104.5                  102                     2
             104.5 – 109.5                 107                     8
             109.5 - 114:5                 112                 18
             114:5 – 119.5                 117                     13
              119.5 – 124.5                122                     7
              124.5 – 129.5                127                     1
              129.5 – 134.5                132                     1
Step.2 Draw the x and y axes. Label the x axis with the midpoint of each class,
and then use a suitable scale on the y axis for the frequencies.
Step.3 Using the midpoints for the x values and the frequencies as the y
values, plot the points.
                                       8
First Stage            Statics and Probability             lecture (2)
Second Semester
Step. 4 Connect adjacent points with line segments. Draw a line back to the x
axis at the beginning and end of the graph, at the same distance that the
previous and next midpoints would be located.
Definition 4.5. The Ogive The third type of graph that can be used represents
the cumulative frequencies for the classes. This type of graph is called the
cumulative frequency graph, or ogive. The cumulative frequency is the sum
of the frequencies accumulated up to the upper boundary of a class in the
distribution.
The ogive is a graph that represents the cumulative frequencies for the
classes in a frequency distribution.
Example 4.6. Step 1 Find the cumulative frequency for each class.
Step.2 Draw the x and y axes. Label the x axis with the class boundaries. Use
an appropriate scale for the y axis to represent the cumulative frequencies.
                                       9
First Stage                Statics and Probability                  lecture (2)
Second Semester
(Depending on the numbers in the cumulative frequency columns, scales such
as 0, 1, 2, 3, . . . , or 5, 10, 15, 20, . . . , or 1000, 2000, 3000, . . . can be used.
Do not label the y axis with the numbers in the cumulative frequency column.)
In this example, a scale of 0, 5, 10, 15, . . . will be used.
Step.3 Plot the cumulative frequency at each upper class boundary, as shown
in Figure. Upper boundaries are used since the cumulative frequencies
represent the number of data values accumulated up to the upper boundary of
each class.
Step.4 Starting with the first upper class boundary, 104.5, connect adjacent
points with line segments, as shown in Figure 2. Then extend the graph to the
first lower class boundary, 99.5, on the x axis.
                                          10