One of the most necessary things to do when you get the data is to look at your data distribution. The distribution of data also includes two types: PDF and CDF.
Here's how to generate PDFs and CDF using Python:
- Use the Matplotlib Drawing Interface hist () to directly draw the PDF distribution;
- Using NumPy's data processing function histogram (), the PDF distribution data can be generated to facilitate subsequent data processing, such as the further generation of CDF;
- The advantage of using Seaborn's Distplot () is that you can fit the PDF distribution to see the distribution type of your data;
Shown is a PDF graph generated using 3 algorithms. Here is the source code.
fromSciPyImportStatsImportMatplotlib.pyplot as PltImportNumPy as NPImportSeaborn as Snsarr= Np.random.normal (size=100)#Plot histogramPlt.subplot (221) plt.hist (arr)#Obtain histogram dataPlt.subplot (222) hist, Bin_edges=Np.histogram (arr) Plt.plot (hist)#Fit Histogram curvePlt.subplot (223) Sns.distplot (arr, KDE=false, Fit=stats.gamma, rug=True) plt.show ()
Python handles PDF and CDF