NCSS Statistical Software NCSS.
com
Chapter 143
Histograms
Introduction
The word histogram comes from the Greek histos, meaning pole or mast, and gram, which means chart or graph.
Hence, the direct definition of “histogram” is “pole chart.” Perhaps this word was chosen because a histogram
looks like several poles standing side-by-side.
A histogram is used to display the distribution of data values along the real number line. It competes with the
probability plot as a method of assessing normality. A histogram is created by dividing up the range of the data
into a small number of intervals or bins. The number of observations falling in each interval is counted. This gives
a frequency distribution.
A histogram is a graph of the frequency distribution in which the vertical axis represents the count (frequency)
and the horizontal axis represents the possible range of the data values.
Density Trace
The histogram is widely used and needs little explanation. However, it does have its drawbacks. First, the number
and width of the intervals are a subjective decision, yet they have a high impact on the appearance of the
histogram. Slightly different boundary values can sometimes give dramatically different looking histograms,
especially when the number of values used to create the histogram is small.
Another problem with the histogram is that the rectangles make it appear that the data are spread uniformly
throughout the interval, which is rarely the case. Also, the “skyscraper” look of the histogram doesn’t resemble
the rather smooth nature of the data’s true distribution.
143-1
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Histograms
These issues with the histogram have brought many new innovations. One of the more popular display techniques
for showing the distribution of data is the density trace.
Density refers to the relative frequency (concentration) of data points along the data range. Mathematically, the
density at a value x is defined as the fraction of data values per unit of measurement that lie in an interval centered
at x. Once you choose a suitable interval width, you can calculate the density at any (and every) x value. If you
calculate the density at, say, 50 values and connect them, you’ll have a density trace.
In NCSS, the interval width is specified as a percentage. As you increase the percentage, you increase the amount
of data included in each density calculation. This increases the smoothness of the chart. The following four
density traces were made of the same data at increasing percentage smoothness.
As the interval width is increased, data points further and further from the center value are included. In order to
decrease the weight of points that are far removed from the center value, we use a weighting scheme that weights
points proportionally to their distance from the center value. The weight function used is half the cosine function
with its peak at the center value. It decreases symmetrically to zero, after which a weight of zero is applied.
Hence, points have a smaller and smaller impact on the density trace as they are further and further from the
center.
Another way to think of the density trace is to imagine that you construct 1000 histograms of the same data using
slightly different boundary positions and take the average rectangle height at each of 50 values along the data
range. This would give you a smoothed histogram that has many of the same properties of the density trace.
Hence, the density trace should be thought of as a smoothed histogram in which interval width and number of
bins do not come into play.
143-2
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Histograms
Data Structure
A histogram is constructed from the values of a single column. A second variable may be used to divide the first
variable into groups (e.g., age group or gender). In this procedure, a separate histogram is produced for each
group.
Procedure Options
This section describes the options available in this procedure.
Variables Tab
This panel specifies which variables are used in the histogram.
Variables
Data Variable(s)
Select one or more columns. If more than one column is entered, the values may be combined into one histogram
or separated into separate histograms depending on the Combine all variables and Groups as One option.
Grouping Variable
This variable may be used to separate the observations into groups. A separate histogram is created for each
unique value of this variable unless the Combine all variables and Groups as One option is checked.
Combine all Variables and Groups as One
When checked, the values for all selected variables and groups are combined into one histogram.
Format Options
Variable Names
This option specifies whether the column names or column labels are used on the chart.
Value Labels
This option specifies whether the actual values or the labels from the Data Label Variable are used to label the
points, and whether the values or the value labels are used for the group level labels of the plot.
Histogram Format
Format
Click the format button to change the plot settings (see Histogram Window Options below).
Edit During Run
Checking this option will cause the plot format window to appear when the procedure is run. This allows you to
modify the format of the graph with the actual data.
143-3
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Histograms
Histogram Window Options
This section describes the specific options available on the Histogram window, which is displayed when the
Histogram button is clicked. Common options, such as axes, labels, legends, and titles are documented in the
Graphics Components chapter.
Histogram Tab
Format Section
You can modify the color of the histogram and its outline using the options in this section. The third example uses
a brown to yellow gradient fill.
Bins Section
You can specify the number of bins (bars) of the histogram in several ways.
143-4
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Histograms
Overlays Tab
Density Section
You can add a density trace line over the histogram. This serves as a smoothed histogram. Note that the third
example uses a blue to yellow gradient fill.
Frequency Section
You can add the frequency polygon line over the histogram. This line connects the top midpoints of each bar.
Normal Density Section
You can add a normal density line over the histogram. This line is based on the data’s mean and standard
deviation. Note the impact of the brown to orange to yellow gradient in the third example.
143-5
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Histograms
Border Plots Tab
X Axis Section
You can add a box plot and a dot plot underneath the histogram to give a very clear picture of the density of the
data.
Titles, Legend, X Axis, Y Axis, Grid Lines, and Background Tabs
Details on setting the options in these tabs are given in the Graphics Components chapter.
143-6
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Histograms
Example 1 – Creating a Histogram
This section presents an example of how to generate a histogram. The data used are from the Fisher dataset. We
will create a histogram of SepalLength.
You may follow along here by making the appropriate entries or load the completed template Example 1 by
clicking on Open Example Template from the File menu of the Histograms window.
1 Open the Fisher dataset.
• From the File menu of the NCSS Data window, select Open Example Data.
• Select the Data subdirectory of your NCSS directory.
• Click Open.
2 Open the Histograms window.
• Using the Graphics menu or the Procedure Navigator, find and select the Histograms procedure.
• On the menus, select File, then New Template. This will fill the procedure with the default template.
3 Specify the variables.
• On the Histograms window, select the Variables tab.
• Double-click in the Data Variable(s) text box. This will bring up the variable selection window.
• Select SepalLength from the list of variables and then click Ok. “SepalLength” will appear in the Data
Variable(s) box.
4 Add density trace and border plots.
• On the Histograms window, click the Histogram Format button.
• On the Overlays tab, check Outline under Density.
• On the Border Plots tab, check Box Plot and Dot Plot.
5 Run the procedure.
• From the Run menu, select Run Procedure. Alternatively, just click the green Run button.
143-7
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Histograms
Histogram Output
143-8
© NCSS, LLC. All Rights Reserved.