Regression Model performance evaluation series 1-QQ chart, regression model evaluation 1-qq

Last Update:2018-03-01 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Regression Model performance evaluation series 1-QQ chart, regression model evaluation 1-qq
(Erbqi) the QQ plot is the Quantile-Quantile diagram, that is, the Quantile-Quantile diagram. A simple understanding is to plot the values of the two same Quantile distributions into points (x, y; if the two distributions are very close, the vertex (x, y) will be distributed near the y = x straight line; otherwise, no; the prediction result of the regression model can be evaluated from the QQ plot.

There are two types of QQ charts: normal QQ plot and normal QQ plot. The difference is that one of the normal QQ charts is normal distribution. The following two types of distribution are shown below.

Normal QQ plot

From here, use Filliben's estimate to determine n points

Below we try to draw a normal QQ chart

Built-in functions using open-source libraries are simple, but some details may not be seen

Import numpy as np from matplotlib import pyplot as pltimport matplotlibmatplotlib. style. use ('ggplot ') # use a normal distribution to randomly generate 100 data records x = np. round (np. random. normal (loc = 0.0, scale = 1.0, size = 100), 2) from scipy. stats import probplotf = plt. figure (figsize = (8, 6) ax = f. add_subplot (111) probplot (x, plot = ax) plt. show ()

Below are some details to pave the way for our ordinary QQ

Import sys, osimport pandas as pd import numpy as np from scipy. stats import norm, linregressfrom matplotlib import pyplot as plt # Return order_statistic_mediansdef round (x) of len (x): N = len (x) osm_uniform = np. zeros (N, dtype = np. float64) osm_uniform [-1] = 0.5 ** (1.0/N) osm_uniform [0] = 1-osm_uniform [-1] I = np. arange (2, N) osm_uniform [1:-1] = (I-0.3175)/(N + 0.365) return osm_uniform # generate 100 data records randomly using a normal distribution x = np. round (np. random. normal (loc = 0.0, scale = 1.0, size = 100), 2) osm_uniform = calc_uniform_order_statistic_medians (x) # ppf (Percent point function) is cdf (Cumulative distribution function) the inverse function of is to take the value osm = norm corresponding to the corresponding quantile. ppf (osm_uniform) osr = np. sort (x) # calculates the slope intercept, intercept, rvalue, pvalue, stderr = linregress (osm, osr) plt of the samples in the osm and osr combinations. figure (figsize = (10, 8) plt. plot (osm, osr, 'bo', osm, slope * osm + intercept, 'r-') plt. legend () plt. show ()

The figure on the left Shows 100 sampling points, and the figure on the right shows 1000 sampling points. We can see that the distribution of the 1000 sampling points is closer to the linear y = x, that is, better fit the normal distribution.

The difference between a normal QQ plot and a normal one is that the reference system is not a normal distribution but may be a data set with arbitrary distribution, which is exactly what we need.

From here

It is a scenario where the dotted line is a real network change and the solid line is the result of a simple smooth prediction. I hope to see the fitting effect of a simple smooth prediction through a general QQ plot.

First look at the cdf diagram of the two curves (Fx (x) = P (X ≤ x )),

The cumulative distribution points of this graph are calculated by np. linspace (min (X), max (X), and len (X). It looks a bit strange.

After recalculating the cdf chart with raw data as the cumulative distribution point, did we find something interesting?

When the number of two curves is the same, the cdf values corresponding to the same position are the same after the two groups of data are sorted from small to large,

Therefore, when the number of two curves is the same, the QQ plot only needs to be sorted in ascending order.

We can see that the slope of the official network curve and the QQ plot of the smooth prediction curve is only 0.79, indicating that the distribution of smooth prediction and the distribution of source data are quite different.

Code

httpspeedavg = np.array([1821000, 2264000, 2209000, 2203000, 2306000, 2005000, 2428000,       2246000, 1642000,  721000, 1125000, 1335000, 1367000, 1760000,       1807000, 1761000, 1767000, 1723000, 1883000, 1645000, 1548000,       1608000, 1372000, 1532000, 1485000, 1527000, 1618000, 1640000,       1199000, 1627000, 1620000, 1770000, 1741000, 1744000, 1986000,       1931000, 2410000, 2293000, 2199000, 1982000, 2036000, 2462000,       2246000, 2071000, 2220000, 2062000, 1741000, 1624000, 1872000,       1621000, 1426000, 1723000, 1735000, 1443000, 1735000, 2053000,       1811000, 1958000, 1828000, 1763000, 2185000, 2267000, 2134000,       2253000, 1719000, 1669000, 1973000, 1615000, 1839000, 1957000,       1809000, 1799000, 1706000, 1549000, 1546000, 1692000, 2335000,       2611000, 1855000, 2092000, 2029000, 1695000, 1379000, 2400000,       2522000, 2140000, 2614000, 2399000, 2376000])def smooth_(squences,period=5):    res = []    gap = period/2    right = len(squences)    for i in range(right):        res.append(np.mean(squences[i-gap if i-gap > 0 else 0:i+gap if i+gap < right else right]))    return res httpavg = np.round((1.0*httpspeedavg/1024/1024).tolist(),2)smooth = np.round(smooth_((1.0*httpspeedavg/1024/1024).tolist(),5),2)f = plt.figure(figsize=(8, 6))ax = f.add_subplot(111)probplot(smooth, plot=ax)# plt.show()f = plt.figure(figsize=(8, 6))ax = f.add_subplot(111)probplot(httpavg, plot=ax)# plt.show()import statsmodels.api as smplt.figure(figsize=(15,8))ecdf = sm.distributions.ECDF(httpavg)x = np.linspace(min(httpavg), max(httpavg), len(httpavg))y = ecdf(x)plt.plot(x, y, label='httpavg',color='blue',marker='.')ecdf1 = sm.distributions.ECDF(smooth)x1 = np.linspace(min(smooth), max(smooth), len(smooth))y1 = ecdf1(x1)plt.plot(x1, y1, label='smooth',color='red',marker='.')plt.legend(loc='best')# plt.show()def cdf(l):    res = []    length = len(l)    for i in range(length):        res.append(1.0*(i+1)/length)    return resplt.figure(figsize=(15,8))x = np.sort(httpavg)y = cdf(x)plt.plot(x, y, label='httpavg',color='blue',marker='.')x1 = np.sort(smooth)y1 = cdf(x1)plt.plot(x1, y1, label='smooth',color='red',marker='.')plt.legend(loc='best')# plt.show()from scipy.stats import norm,linregressplt.figure(figsize=(10,8))httpavg = np.sort(httpavg)smooth  = np.sort(smooth)slope, intercept, rvalue, pvalue, stderr = linregress(httpavg, smooth)plt.plot(httpavg, smooth, 'bo', httpavg, slope*httpavg + intercept, 'r-')xmin = np.amin(httpavg)xmax = np.amax(httpavg)ymin = np.amin(smooth)ymax = np.amax(smooth)posx = xmin + 0.50 * (xmax - xmin)posy = ymin + 0.01 * (ymax - ymin)plt.text(posx, posy, "$R^2=%1.4f$ y = %.2f *x + %.2f"  % (rvalue,slope,intercept))plt.plot(httpavg,httpavg,color='green',label='y=x')plt.legend(loc='best')# plt.show()

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Regression Model performance evaluation series 1-QQ chart, regression model evaluation 1-qq

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Regression Model performance evaluation series 1-QQ chart, regression model evaluation 1-qq

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support