Box Diagram (BoxPlot)

Source: Internet
Author: User

Recently, when fiddling with data dispersion, I encountered a graph called box diagram (BoxPlot). It works well for discrete distributions of display data.

The box was invented in 1977 by John Tukey, the American statistician John Tuki. It consists of five numeric points: Minimum (min), lower four (Q1), median (median), Upper four (Q3), Maximum (max). You can also add an average (mean) to the box diagram. Such as. The next four-digit, median, and four-bit digits form a "box with compartments". Create an extension line between the top four and the maximum, which becomes the beard (whisker).

As there is always a variety of "dirty data" in the real data, and also become "outliers", so in order to not because of these few outliers caused by the overall characteristics of the migration, the outliers are remitted separately, and the box diagram of the beard in the two-level modified to the minimum observation and maximum observation value. The experience here is that the maximum (minimum) observation is set to a distance of 1.5 iqr (the median four-cent difference) between the four-bit value. That

· IQR = Q3-q1, which is the difference between the upper four and the next four, which is the length of the box.

· The minimum observation value is min = Q1-1.5*IQR, if the outliers are less than the minimum observations, the lower beard is the smallest observation, and the outliers are remitted separately by points. If there is no number smaller than the minimum observation value, the lower beard is the minimum value.

· The maximum observed value is Max = Q3 + 1.5*IQR, and if there are outliers greater than the maximum observation, the upper beard is the maximum observation, and the outliers are remitted separately by point. If there is no greater number than the maximum observation, the maximum beard is capped.

Through the box diagram, when analyzing the data, the box diagram can effectively help us to identify the characteristics of the data:

    1. Visually identify outliers in a dataset (see outliers).
    1. Determine the degree of dispersion and bias of data in a data set (observe the length of the box, the shape of the compartment, and the length of the beard).
Examples of R languages > x = C (29.6, 28.2, 19.6, 13.7, 13.0, 7.8, 3.4, 2.0, 1.9, 1.0, 0.7, 0.4, 0.4, 0.3, 0.3, 0.3, 0.3, 0.3, 0. 2, 0.2, 0.2, 0.1, 0.1, 0.1, 0.1, 0.1)> BoxPlot (x)

Box Diagram (BoxPlot)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.