Quantile-quantile Plot_ Academic

Source: Internet
Author: User
Tags dashed line

Purpose:
Check If two Data Sets Can be fit with the Same distribution the Quantile-quantile (Q-Q) plot was a graphical for Determining if two data sets come from populations with a common distribution.

A q-q plot is a plot of the quantiles of the the the "the" of the "the" of the against the quantiles of the second By a quantile, we mean the point below which a given fraction (or percent) of points lies. The "is", the 0.3 (or 30%) quantile is the "point" at which 30% percent of the the data fall below and 70% fall to that value.

A 45-degree Reference line is also plotted. If The two sets come from a population with the same distribution, the points should fall approximately along this Referen CE line. The greater the departure from this reference line, the greater the evidence for the conclusion, the two data sets E come from populations with different distributions.

The advantages of the Q-Q plot Are:the sample sizes do is need to.

Many distributional aspects can be simultaneously tested. For example, shifts in location, shifts in scale, changes in symmetry, and the presence of outliers can all is detected fr Om this plot. For example, if the two data sets come from populations whose distributions differ-by-a shift in location, the points Should lie along a straight line this is displaced either up or down from the 45-degree reference line.

The Q-q plot is similar to a probability plot. For a probability plot, the quantiles for one of the data samples are replaced with the quantiles of theoretical Ution. Sample Plot

This q-q plot shows which these 2 batches does not appear to have come the from populations with a common. The batch 1 values are significantly higher than the corresponding batch 2 values. The differences are increasing from values 525 to 625. Then the values for the 2 batches get closer again. Definition:
Quantiles for data set 1 versus quantiles of data set 2 The Q-Q plot are formed by:vertical axis:estimated quantiles from Data set 1 horizontal axis:estimated quantiles from data set 2

Both axes are in units of their respective data sets. The "is" is, the actual quantile level is not plotted. For a given point on the q-q plot, we know this quantile level is the same for both, points Level actually is.

If The data sets have the same size, the q-q plot is essentially a plot of sorted data set 1 against sorted data set 2. If The data sets are is not of equal size, the quantiles are usually is picked to correspond to the sorted values from the small The ER data set and then the quantiles for the larger data set are interpolated. Questions The Q-q plot is used to answer the the following questions:do two data sets come from populations with a common dis Tribution? Do two data sets have common location and scale? Do two data sets have similar distributional shapes? Do two data sets have similar tail behavior? Importance:check for Common distribution when there are data samples, it are two often to desirable if the know Of a common distribution is justified. If So, then location and scale estimators can pool both data sets to obtain estimates of the common and location. If Two The samples do differ, it's also useful to gain some of the understanding. The Q-q plot can provide morE insight into the nature of the difference than analytical methods such as the Chi-Square and Kolmogorov-smirnov 2-sample Tests. Related Techniques Bihistogram
T Test
F Test
2-sample chi-square Test
2-sample Kolmogorov-smirnov Test Case Study The Quantile-quantile plot was demonstrated in the ceramic strength data case s Tudy. Software q-q plots are available in some general purpose statistical Software, programs including. If the number of data points in the two samples are equal, it should is relatively easy to write a macro in statistical PR Ograms that does not support the Q-q plot. If the number of points are not equal, writing a macro for a q-q to plot may.


Http://www.itl.nist.gov/div898/handbook/eda/section3/qqplot.htm





Q-q plot is quantile-quantile plot. It is often used in various types of research, mainly to visually indicate the difference between observed and predicted values.

In the SPSS is very tolerant to do, analysis-descriptive Statistics-q-qplot.

Q-q plot is mainly used to estimate the difference between the observed and predicted values of quantitative traits. In general, the quantitative trait data we obtained are normal distribution data. In the GWAS study, the x and y axes of q-q plot are mainly representative of-LG P values of each SNP. The predicted line is a line of 45° at the point of origin. The actual observation value is the marked solid point.

Q-q Plot main points:

Why is the predicted dashed line 45° out of it? Because the line of prediction is actually drawn by the first quadrant in the QQ chart. Theoretically a point A on the map position should be a predictive value =a the actual value, the conversion to coordinates is a (x,y) x=y. So the predicted line is a 45° line that is emitted from the origin.

The coordinates of the points of the observed values are not. The coordinates of the same position A are (x,y) x is the predicted value and Y is the actual observed value. Check the R in QQ plot algorithm is like this

Pvals <-read.table ("Dgi_chr3_pvals.txt", header=t)

Observed <-sort (pvals$pval)
LOBs <--(log10 (observed))

Expected <-C (1:length (observed))
Lexp <--(log10 (expected/Length (expected) +1))

The specific explanation is this, first the P value from small to large sort. The lobs represents the ordinate, the lexp represents the horizontal axis, the ordinate is the-log10 that observes the P value, and the horizontal axis depends on the number of P values. For example, when there are only 3 p values p1=0.0001 p2=0.001 p3=0.01, then in this P-value group, length (observed) = 3, for p1=0.0001 expected=1 lexp=-log10 (1/3+1), for p2= 0.001 expected=2 lexp=-log10 (2/3+1), p3=0.01 expected=3 (LEXP=-LOG10) ..... So If there is a deviation indicating that the actual value is biased against the predicted value, and that the SNP point has a large deviation in the GWAS study, the deviation from the observed value of this SNP is caused by the genetic effect of the SNP mutation.

http://blog.csdn.net/likelet/article/details/7377664


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.