How many classes should a histogram have




















We do not use any type of profiling, targeting, or advertising cookies on any of our Sites. Detailed information on the use of cookies on the moresteam. By using this Site you consent to the use of cookies. Histogram Description Histograms are graphs of a distribution of data designed to show centering, dispersion spread , and shape relative frequency of the data. A Histogram can be constructed to provide more usable information: The Histogram graph gives a quick visual summary of the data.

There are other distribution shapes that you may encounter: How to Start The first step in constructing a histogram is to decide how the process should be measured - what data should be collected.

If you choose this route, use the following sequence: Count the number of data points 50 in our height example. Determine the range of the sample - the difference between the highest and lowest values Determine the number of class intervals.

You can use either of two methods as general guidelines in determining the number of intervals: A. Use 10 intervals as a rule of thumb. Calculate the square root of the number of data points and round to the nearest whole number. In the case of our height example, the square root of 50 is 7. You may wish to experiment with different interval numbers.

If there are too many, the distribution will spread out, and the histogram will look flat. When we have a relatively small set of data, we typically only use around five classes. If the data set is relatively large, then we use around 20 classes. Again, let it be emphasized that this is a rule of thumb, not an absolute statistical principle. There can be good reasons to have a different number of classes for data. We will see an example of this below. Before we consider a few examples, we will see how to determine what the classes actually are.

We begin this process by finding the range of our data. In other words, we subtract the lowest data value from the highest data value.

When the data set is relatively small, we divide the range by five. The quotient is the width of the classes for our histogram. We will probably need to do some rounding in this process, which means that the total number of classes may not end up being five. When the data set is relatively large, we divide the range by Just as before, this division problem gives us the width of the classes for our histogram.

Also, as what we saw previously, our rounding may result in slightly more or slightly less than 20 classes. In either of the large or small data set cases, we make the first class begin at a point slightly less than the smallest data value. We must do this in such a way that the first data value falls into the first class. Other subsequent classes are determined by the width that was set when we divided the range. We know that we are at the last class when our highest data value is contained by this class.

For an example we will determine an appropriate class width and classes for the data set: 1. The range, or width, of the interval determines the number of histogram classes and influences the shape of the graph.

If the interval is too wide, significant information might be omitted by the classes being too inclusive. When the choice of interval width is too narrow, low class frequency might give undue importance to what is actually a random variation.

There are several methods for setting an appropriate number of histogram classes for a data set. According to Sturgis's rule, the number of classes should be close to the base 2 log of the number of data points, plus one. Using Rice's rule, the number of classes defined should be twice the cube root of the number of data points. Frequency histograms should be labeled with either class boundaries as shown below or with class midpoints in the middle of each rectangle.

The purpose of these graphs is to "see" the distribution of the data. When using a calculator or software to plot histograms, experiment with different choices for boundaries, subject to the above restrictions, to find out which graphical properties modality, skewness or symmetry, outliers, etc Then use the boundaries that best reveal these persistent properties.



0コメント

  • 1000 / 1000