Recently, when fiddling with data dispersion, I encountered a graph called box diagram (BoxPlot). It works well for discrete distributions of display data.
The box was invented in 1977 by John Tukey, the American statistician John Tuki. It consists of five numeric points: Minimum (min), lower four (Q1), median (median), Upper four (Q3), Maximum (max). You can also add an average (mean) to the box diagram. Such as. The next four-digit, median, and four-bit digits form a "box with compartments". Create an extension line between the top four and the maximum, which becomes the beard (whisker).
As there is always a variety of "dirty data" in the real data, and also become "outliers", so in order to not because of these few outliers caused by the overall characteristics of the migration, the outliers are remitted separately, and the box diagram of the beard in the two-level modified to the minimum observation and maximum observation value. The experience here is that the maximum (minimum) observation is set to a distance of 1.5 iqr (the median four-cent difference) between the four-bit value. That
· IQR = Q3-q1, which is the difference between the upper four and the next four, which is the length of the box.
· The minimum observation value is min = Q1-1.5*IQR, if the outliers are less than the minimum observations, the lower beard is the smallest observation, and the outliers are remitted separately by points. If there is no number smaller than the minimum observation value, the lower beard is the minimum value.
· The maximum observed value is Max = Q3 + 1.5*IQR, and if there are outliers greater than the maximum observation, the upper beard is the maximum observation, and the outliers are remitted separately by point. If there is no greater number than the maximum observation, the maximum beard is capped.
Through the box diagram, when analyzing the data, the box diagram can effectively help us to identify the characteristics of the data:
- Visually identify outliers in a dataset (see outliers).
- Determine the degree of dispersion and bias of data in a data set (observe the length of the box, the shape of the compartment, and the length of the beard).
Examples of R languages > x = C (29.6, 28.2, 19.6, 13.7, 13.0, 7.8, 3.4, 2.0, 1.9, 1.0, 0.7, 0.4, 0.4, 0.3, 0.3, 0.3, 0.3, 0.3, 0. 2, 0.2, 0.2, 0.1, 0.1, 0.1, 0.1, 0.1)> BoxPlot (x)
Box Diagram (BoxPlot)