College of Micronesia-FSM: Dana Lee Ling's Introduction to Statistics Using OpenOffice.org, LibreOffice.org Calc, 4th edition: "Section 2.2: Histograms and Frequency Distributions”
Read this section.
A distribution counts the number of elements of data in either a category or within a range of values. Plotting the count of the elements in each category or range as a column chart generates a chart called a histogram. The histogram shows the distribution of the data. The height of each column shows the frequency of an event. This distribution often provides insight into the data that the data itself does not reveal. In the histogram below, the distribution for male body fat among statistics students has two peaks. The two peaks suggest that there are two subgroups among the men in the statistics course, one subgroup that is at a healthy level of body fat and a second subgroup at a higher level of body fat.
The ranges into which values are gathered are called bins, classes, or intervals. This text tends to use classes or bins to describe the ranges into which the data values are grouped.
Nominal level of measurement
At the nominal level of measurement one can determine the frequency of elements in a category, such as students by state in a statistics course.
Ordinal level of measurement
Data classes into classes comprised of each unique data value
At the ordinal level, a frequency distribution can be done using the rank order, counting the number of elements in each rank order to obtain a frequency. When the frequency data is calculated in this way, the distribution is not grouped into a smaller number of classes.
Data gathered into a number of classes fewer than the number of unique data values
The ranks can be collected together, classed, to reduce the number of rank order categories. in the example below the age data in gathered into two-year cohorts.
2.22 Ratio level of measurement
At the ratio level data is always gathered into ranges. At the ratio level, classed histograms are used. Ratio level data is not necessarily in a finite number of ranks as was ordinal data.
The ranges into which data is gathered are defined by a class lower limit and a class upper limit. The width is the class upper limit minus the class lower limit. The frequency function in spreadsheets uses class upper limits. In this text histograms are also generated using the class upper limits.
To calculate the class lower and upper limits the minimum and maximum value in a data set must be determined. Spreadsheets include functions to calculate the minimum value MIN and maximum value MAX in a data set.
In LibreOffice the MIN and MAX function can take a list of comma separated numbers or a range of cells in a spreadsheet. In statistics a range of cells is the most common input for these functions. When a range of cells is the usual input, this text uses the word "data" to refer to the fact that the range of cells is usually your data! Ranges of cells use two cell addresses separated by a full colon. An example is shown below where the data is arranged in a vertical column from A2 to A42. Sort the original data from smallest to largest before you begin!
How to make a frequency table at the ratio level
- Find the minimum value of the data set using the MIN function
- Find the maximum value of the data set using the MAX function
- Calculate the range by subtracting the MIN from the MAX:
range = maximum value - minimum value
- Decide on the number of classes you are going to use (also called bins or intervals)
- Divide the range by the number of classes to calculate the class width (or bin width or interval width)
- Calculate the class upper limits
- Put the class upper limits into a column of cells
- Manually tally the data into the frequency column to determine the frequencies for each class. The class upper limit is included in each tally. As a check, the sum of the frequencies must be equal to the sample size.
- Create a column chart
|Class Upper Limits (CUL)||Frequency|
|=min + class width|
|+ class width|
|+ class width|
|+ class width|
|+ class width = max|
For the female height data:
58, 58, 59.5, 59.5, 60, 60, 60, 60, 60, 61, 61, 61.2, 61.5, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 63, 63, 63, 63.5, 64, 64, 64, 64, 65, 65, 66, 66
Five classes would produce the following results:
Min = 58
Max = 66
Range = 66 - 58 = 8
Width = 8/5 = 1.6
|58 + 1.6||59.6||4|
|59.6 + 1.6||61.2||8|
|61.2 + 1.6||62.8||13|
|62.8 + 1.6||64.4||8|
|64.4 + 1.6||66||4|
Note that 61.2 is INCLUDED in the class that ends at 61.2. The class includes values at the class upper limit. In other words, a class includes all values up to and including the class upper limit.
Note too that the frequencies add to the sample size.
After making the column chart, double click on the columns to open the data series dialog box. Find the Options tab and set the spacing (or gap width) to zero.
Note that the spacing or gap width on the columns has been set to zero.
Relative frequency is one way to determine a probability.
Divide each frequency by the sum (the sample size) to get the relative frequency
|Height CUL||Frequency||Relative Frequency f/n or P(x)|
The relative frequency always adds to one (rounding causes the above to add to 1.01, if all the decimal places were used the relative frequencies would add to one.
The area under the relative frequency columns is equal to one.
Another example using integers:
0, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4.5, 5, 5, 5, 6, 6, 7, 8, 9, 10
min = 0
max = 10
range = 10
width = 10/5 = 2
|Class Num||Calculation||CUL||Frequency||Relative Frequency f/n or P(x)|
|1||min + width||2||4||0.20|
The above method produces equal width classes and to conforms the inclusion of the class upper limit by spreadsheet packages.
Checking frequency tables
The final class upper limit must be equal to the maximum value in the data set. The frequencies must sum to the sample size n. The relative frequencies must add to 1.00.
|CUL||Frequency||Relative Frequency f/n|
|min + width|
|+ width = MAX|
|Sum:||sample size n||1.00|
For more advanced spreadsheet users, frequency data can be obtained using the frequency function FREQUENCY. This function is also very useful when working with large data sets. The frequency function is:
DATA refers to the range of cells containing the data, CLASSES refers to the range of cells containing the class upper limits.
The data set seen below are the height measurements for 49 female students in statistics courses during two consecutive terms.
The frequency function built into spreadsheets works very differently from all other functions. The frequency function called an "array" function because the function places values into an array of cells. For the function to do this, you must first select the cells into which the function will place the frequency values.
With the cells still highlighted, start typing the frequency function.
After typing the opening parenthesis, drag and select the data to be classed. If the data is more than can be selected by dragging, type the data range in by hand.
The frequency function usually uses a comma, not a semi-colon as seen in the image below.
Drag and select the class upper limits.
Type the closing parenthesis.
Then press and hold down BOTH the CONTROL (Ctrl) key and the SHIFT key. With both the control and shift keys held down, press the Enter (or Return) key.
As noted above, the frequencies should add to the sample size. When working with spreadsheets, internal rounding errors can cause the maximum value in a data set to not get included in the final class. In the last class, use the value obtained by the MAX function and not the previous class + a width formula to generate that class upper limit.