What are Quantile-Quantile (Q-Q) Plot

Suman Prasad
3 min readJun 3, 2022

Give a random variable X with few observations x1,x2,x3……..x500

Is X Gaussian Distribution?

So to answer this question QQ Plots come into the field and help in answering the question

Though there are more statistical testing available such as KS test, AD test but By graphical methods using QQ plot we can answer the above question

How To Plot (Theoretically)?

  1. Sort Xi’s and compute percentiles

x1,x2…….x500

sort in ascending order. So it will become like

x’1,x’2,……..x’500 (Such that x’1 is less than x’2)

Now compute percentiles

after computing the percentiles, it will become like

x’5 — — -> 1stPercentile

x’10 — → 2ndPercentile

x’500 — -> 100th percentile

2. we will consider a random variable Y which has a Gaussian distribution. Let’s take 1000 samples of the same and similarly as above sort them and find their percentiles.

  • After this we will plot the percentiles of random variable X on the y-axis and the percentiles of Y on the x-axis, thus forming the Quantile-Quantile plot.

Practically Implementation Using Python

import numpy as np
import pylab
import scipy.stats as stats

#N(0,1)
std_normal = np.random.normal(loc=0, scale=1, size=1000)
# 0 to 100th percentiles of std-normal
for i in range(0,101):
print(i,np.percentile(std_normal,i))

# Generate 100 samples from N(20,5)
measurements = np.random.normal(loc = 20, scale = 5, size = 100)
stats.probplot(measurements, dist=”norm”,plot=pylab)
pylab.show()

If ( y and x) for i:1->100 lie on straight line then x and y have similar family distribution.

Note: As my no. of sample increases More and more points start lying on this line.

Here we are now generating 100 samples from a uniform distribution and plotting a QQ plot against Y, which is a gaussian distribution.

#generate 100 samples from N(20,5)
measurements = np.random.uniform(low=-1, high = 1, size = 100)
stats.probplot(measurements, dist=”norm”,plot=pylab)
pylab.show()

In the above figure the distributions are in two axes X-axis: Normal and Y-axis: Uniform).

Conclusion: From the above diagram

the points do not lie on the line and hence they are moving further away from the line and at the extreme end of the graph, the points diverge the most.

Fun Part is that if we increase the sample size we will get significant difference

#generate 6000 samples from N(20,5)
measurements = np.random.uniform(low=-1, high = 1, size = 6000)
stats.probplot(measurements, dist=”norm”,plot=pylab)
pylab.show()

FINAL CONCLUSION

If most of the points are on straight line i.e., the distribution on X-axis & On Y-axis are of same family and if they don’t then the random variable X belongs to a different distribution than that we are comparing with.

--

--

Suman Prasad

Masters In Data Science @Central University Of Rajasthan. #datascientistenthusiast