Matplotlib for data analysis

Source: Internet
Author: User
Tags jupyter jupyter notebook install matplotlib
Conda environment Installation

Official Address: https://www.anaconda.com

Configuration environment:

Add

C: \ programdata \ anaconda3

C: \ programdata \ anaconda3 \ library \ bin

C: \ programdata \ anaconda3 \ scripts

Create environment:

Conda create-N python3 Python = 3.6

Switch environment:

Windows: Activate python3

Linux: Source activate python3

Use of jupyter and Conda

Install jupyter

Conda install jupyter

Start jupyter

Jupyter notebook

Matplotlib Installation

Conda install matplotlib

Draw a line chart

Run the following code:

From matplotlib import pyplot as PLTx = range (2, 26, 2) y = [15, 13, 14.5, 17, 20, 25, 26, 26, 27, 22, 18, 15] # drawing the drawing PLT. plot (x, y) # displays the drawing PLT. show ()
Running Effect

Note: Save the image to display it first.

Upgrade
From matplotlib import pyplot as PLTx = range (2, 26, 2) y = [15, 13, 14.5, 17, 20, 25, 26, 26, 27, 22, 18, 15] # Set the image size PLT. figure (figsize = (28, 8), DPI = 80) # increase image clarity in the DPI graphic icon # Set the scale of the X axis li = [I/2 for I in range (4, 52)] PLT. xticks (Li [: 2]) # sets the scale PLT of the Y axis. yticks (range (min (Y), max (y) + 1) # plot the drawing PLT. plot (x, y) # Save the image PLT. savefig (". /t1.png ") # displays the graphic PLT. show ()
Set Chinese

Matplotlib does not support Chinese characters by default, because the default English font cannot display Chinese Characters

Supports fonts in Linux/MAC;

FC-list view supported Fonts

FC-list: lang = ZH view supported Chinese Characters

Matplotlib. RC can be modified.

Directory in Windows

C:\Windows\Fonts\msyh.ttc
Question 1

If List A shows the temperature of each minute from to, how can we draw a line chart to observe the temperature change every minute?

A = [random. randint (120) for I in range ()]

From matplotlib import pyplot as pltimport randomfrom matplotlib import font_managermy_font = font_manager.fontproperties (fname = "C: \ WINDOWS \ fonts \ msyh. TTC ") # Add the font x = range (0,120) y = [random. randint (20, 35) for I in range (0, 120)] PLT. figure (figsize = (28, 8), DPI = 80) PLT. plot (x, y) # adjust the density of the X axis _ xticks_labels = ". format (I) For I in range (60)] _ xticks_labels + = ["11 ". format (I) For I in range (60)] # take the step, numbers and characters correspond to the same data length PLT. xticks (List (x) [: 3], _ xticks_labels [: 3], rotation = 45, fontproperties = my_font) # rotation level PLT. yticks (range (min (Y), max (y) + 1) PLT. xlabel ("time", fontproperties = my_font) PLT. ylabel ("temperature unit ('C)", fontproperties = my_font) PLT. title ("temperature changes every minute from to", fontproperties = my_font) PLT. show ()
Question 2

Assume that when you are 30 years old, the number of female (male) friends that you pay each year from 11 years old to 30 years old is calculated based on your actual situation, for example,, draw a line chart for this data to analyze the number of female (male) friends each year

A = [,]

Requirements:

Y axis indicates the number

X axis indicates the age, for example, 11 years old and 12 years old.

From matplotlib import pyplot as pltfrom matplotlib import font_managermy_font = font_manager.fontproperties (fname = "C: \ WINDOWS \ fonts \ msyh. TTC ") x = range (11, 31) y = [1, 0, 1, 1, 2, 5, 3, 2, 3, 4, 4, 5, 6, 5, 4, 3, 3, 1, 1, 1] PLT. figure (figsize = (28, 8), DPI = 80) PLT. plot (x, y) _ xticks_labels = ["{} years old ". format (I) For I in range (11, 31)] PLT. xticks (x, _ xticks_labels, fontproperties = my_font) PLT. yticks (range (min (Y), max (y) + 1) PLT. xlabel ("years", fontproperties = my_font) PLT. ylabel ("number of male and female friends", fontproperties = my_font) PLT. title ("female (male) friend quantity trend", fontproperties = my_font) PLT. show ()
Upgrade

Assume that when you are 30 years old, the number of female (male) friends that you pay each year from 11 years old to 30 years old is calculated based on your actual situation, for example,, b. Draw a line chart for this data to analyze the number of female (male) friends each year.

a = [1, 0, 1, 1, 2, 5, 3, 2, 3, 4, 4, 5, 6, 5, 4, 3, 3, 1, 1, 1]b = [1, 0, 3, 1, 2, 2, 3, 3, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1]

Requirements:

Y axis indicates the number

X axis indicates the age, for example, 11 years old and 12 years old.

From matplotlib import pyplot as pltfrom matplotlib import font_managermy_font = font_manager.fontproperties (fname = "C: \ WINDOWS \ fonts \ msyh. TTC ") x = range (11, 31) Y1 = [1, 0, 1, 1, 2, 5, 3, 2, 3, 4, 4, 5, 6, 5, 4, 3, 3, 1, 1, 1] y2 = [1, 0, 3, 1, 2, 2, 3, 3, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1] # Set the image size PLT. figure (figsize = (28, 8), DPI = 80) PLT. plot (x, Y1, label = "yourself", color = "red", linestyle = "--", linewidth = 5) PLT. plot (x, Y2, label = "", color = "green", linestyle = ":", linewidth = 3) # Set the X axis scale _ xticks_labels = ["{} years old ". format (I) For I in range (11, 31)] PLT. xticks (x, _ xticks_labels, fontproperties = my_font) # PLT. yticks (range (0, 9) PLT. xlabel ("years", fontproperties = my_font) PLT. ylabel ("number of male and female friends", fontproperties = my_font) PLT. title ("female (male) friend quantity trend", fontproperties = my_font) # Draw the grid PLT. grid (alpha = 0.4) # Add the legend PLT. legend (prop = my_font, Loc = "upper left") # displays PLT. show ()
Summary

Create a scatter chart

Assuming that you have obtained the highest daily temperature (in list A and B) of a certain place in the month of May, March 10, how can you find a pattern of the change of the air temperature with time (day?

a = [11,17,16,11,12,6,6,7,8,9,8,12,15,14,17,18,21,15,17,20,14,15,15,15,19,21,22,22,22,23,20]b = [26,26,28,19,21,20,19,17,16,19,18,20,20,19,22,23,17,20,22,15,11,15,5,13,17,10,11,13,12,13,6]
From matplotlib import pyplot as pltfrom matplotlib import font_managerx_3 = range (1, 32) x_10 = range () y_3 =, values, 20, 15, 15, 15, 22, 22, 20] values = [, 26, 20, ,] # Set the font my_font = font_manager.fontproperties (fname = "C: \ WINDOWS \ fonts \ msyh. TTC ") # Set the image size PLT. figure (figsize = (28, 8), DPI = 80) # increase the image clarity of the figure image icon DPI # plot the scatter plot PLT. scatter (X_3, y_3, label = "August") PLT. scatter (x_10, y_10, label = "October ") # adjust the scale of the X axis _ x = List (X_3) + List (x_10) _ xtick_labels = ["July ". format (I) For I in X_3] _ xtick_labels + = ["August ". format (i-50) for I in x_10] PLT. xticks (_ x [: 3], _ xtick_labels [: 3], rotation = 45, fontproperties = my_font) # display the legend PLT. legend (prop = my_font, Loc = "upper left") # display the image PLT. xlabel ("month", fontproperties = my_font) PLT. ylabel ("temperature", fontproperties = my_font) PLT. title ("temperature and a pattern of change over time (days)", fontproperties = my_font) # display the image PLT. show ()
Create a bar chart

Suppose you have obtained the first 20 movies (List A) and box office data (List B) in China in 2018. How can you display this data more intuitively?

A = ["Wolf 2", "Red Ocean action", "mermaid", "Chinatown detective 2", "I am not a drug God", "speed and passion 8 ", "The richest man in xihong City", "speed and passion 7", "Catch the demon", "Avengers 3: unlimited war", "Catch the demon 2", "shame Iron Fist ", "Transformers 4: Survival", "Predecessor 3: Goodbye Predecessor", "Kung Fu Yoga", "Jurassic World 2"]

B = [56.32, 36.22, 33.9, 33.71, 30.75, 26.46, 25.25, 24.26, 24.21, 23.7, 22.19, 21.9, 19.79, 19.26]

From matplotlib import pyplot as pltfrom matplotlib import font_managermy_font = font_manager.fontproperties (fname = "C: \ WINDOWS \ fonts \ msyh. TTC ") A = [" Wolf 2 "," Red Ocean action "," mermaid "," Chinatown detective 2 "," I am not a drug God "," speed and passion 8 ", "The richest man in xihong City", "speed and passion 7", "Catch the demon", "Avengers 3: unlimited war", "Catch the demon 2", "shame Iron Fist ", "Transformers 4: Survival", "Predecessor 3: Goodbye Predecessor", "Kung Fu Yoga", "Jurassic World 2"] B = [56.32, 36.22, 33.9, 33.71, 30.75, 26.46, 25.25, 24.26, 24.21, 23.7, 22.19, 21.9, 19.79, 19.26, 17.53, 16.79] # Set the image size PLT. figure (figsize = (20, 15), DPI = 80) # plot the bar chart PLT. bar (range (LEN (A), B, width = 0.3) # Set the string to the X axis PLT. xticks (range (LEN (A), A, fontproperties = my_font, rotation = 90) # Save the image PLT. savefig ('. /Statistical Chart .png ') # displays the image PLT. show ()
Upgraded horizontal layout
From matplotlib import pyplot as pltfrom matplotlib import font_managermy_font = font_manager.fontproperties (fname = "C: \ WINDOWS \ fonts \ msyh. TTC ") A = [" Wolf 2 "," Red Ocean action "," mermaid "," Chinatown detective 2 "," I am not a drug God "," speed and passion 8 ", "The richest man in xihong City", "speed and passion 7", "Catch the demon", "Avengers 3: unlimited war", "Catch the demon 2", "shame Iron Fist ", "Transformers 4: Survival", "Predecessor 3: Goodbye Predecessor", "Kung Fu Yoga", "Jurassic World 2"] B = [56.32, 36.22, 33.9, 33.71, 30.75, 26.46, 25.25, 24.26, 24.21, 23.7, 22.19, 21.9, 19.79, 19.26, 17.53, 16.79] # Set the image size PLT. figure (figsize = (20, 15), DPI = 80) # plot the bar chart PLT. barh (range (LEN (A), B, Height = 0.3, color = "red") # Set the string to the X axis PLT. yticks (range (LEN (A), A, fontproperties = my_font) # Save the image PLT. savefig ('. /Statistical Chart 2.png ') # display the grid PLT. grid (alpha = 0.5) # displays the image PLT. show ()

Question 5

Assume that the movie in list A is at the three-day box office in (B _14), (B _15), and (B _16, how can we intuitively present the box office information of a movie in the list and compare it with the data of other movies?

A = ["Rise of the planet 3: The Battle of the ultimate", "Dunkirk", "Return of Hero", "Wolf 2"]

B _16 = [15746,312,449,]

B _15 = [12357,156,204,]

B _14 = [2358,399,235,]

From matplotlib import pyplot as pltfrom matplotlib import font_managermy_font = font_manager.fontproperties (fname = "C: \ WINDOWS \ fonts \ msyh. TTC ") A = [" Rise of the planet 3: The Battle of the ultimate "," Dunkirk "," Return of heroes "," Wolf 2 "] # Set the image size PLT. figure (figsize = (20, 8), DPI = 80) B _16 = [15746,312,449 7, 319] B _15 = [12357,156,204 5, 168] B _14 = [2358,399,235 8, 362] bar_width = 0.2x _ 14 = List (range (LEN ())) x_15 = [I + bar_width for I in x_14] x_16 = [I + bar_width * 2 for I in x_14] PLT. bar (range (LEN (A), B _14, width = bar_width, label = "September 14") PLT. bar (x_15, B _15, width = bar_width, label = "September 14") PLT. bar (x_16, B _16, width = bar_width, label = "September 14") PLT. xticks (x_15, A, fontproperties = my_font) # sets the graph split PLT. legend (prop = my_font) # title PLT. title ("3-day analysis of Movies", fontproperties = my_font) # displays the image PLT. show ()
Create a histogram 6

Assume that you have obtained a list of 250 movies, and want to see the distribution of these movies (for example, the frequency of the number of movies from 100 minutes to 120 minutes) how do you analyze the presented data?

A = [126,129,142,120,113, 135,131,129,136,129,102,120,103, 90,114,135,121,119,136,119,112,120,113,100, 94,115,101, 99,126,129,142,120,133, 135,161, 99,126,129,142,120,143, 135,141, 99,126,129,162,120,113, 135,101, 99,126,129,142,120,112, 135,130, 99,126,129,142,140,113, 135,136, 99,126,129,162,120,113, 135,134, 99,126,129,142,120,113,120, 94,135,135, 99,126,129,142,120,133, 135,136, 99,126,129,142,124,113, 135,137, 99,126,129,142,120,113, 95,135,111,138, 99,126,129,142,126,113, 135,139, 99,126,129,142,128,113, 135,131, 99,126,129,142,129,113, 135,133, 99,126,129,142,120,113, 135,121, 99,126,129,142,130,113, 135,131, 99,126,129,142,120,113, 135,141, 99,126,129,142,120,113, 135,151, 99,126,129,142,120,113, 135,131, 99,126,129,142,140,113, 135,131, 99,126,129,142,120,113, 135,131, 99,126,129,142,120,113, 135,131,161, 99,126,129,142,120,113, 90,111,]

From matplotlib import pyplot as plta = [126,129,142,120,113, 135,131,129,136,129,102,120,103, 90,114,135,121,119,136,119,112,120,113,100, 94,115,101, 99,126,129,142,120,133, 135,161, 99,126,129,142,120,143, 135,141, 99,126,129,162,120,113, 135,101, 99,126,129,142,120,112, 135,130, 99,126,129,142,140,113, 135,136, 99,126,129,162,120,113, 135,134, 99,126,129,142,120,113,120, 94,135,135, 99,126,129,142,120,133, 135,136, 99,126,129,142,124,113, 135,137, 99,126,129,142,120,113, 95,135,111,138, 99,126,129,142,126,113, 135,139, 99,126,129,142,128,113, 135,131, 99,126,129,142,129,113, 135,133, 99,126,129,142,120,113, 135,121, 99,126,129,142,130,113, 135,131, 99,126,129,142,120,113, 135,141, 99,126,129,142,120,113, 135,151, 99,126,129,142,120,113, 135,131, 99,126,129,142,140,113, 135,131, 99,126,129,142,120,113, 135,131, 99,126,129,142,120,113, 135,131,161, 99,126,129,142,120,113, 90,111] # Number of computing groups D = 3 # Number of groups from num_bins = (max (a)-min (A) // D # Divided into multiple groups # Set the image size PLT. figure (figsize = (20, 8), DPI = 80) PLT. hist (A, num_bins, normed = true) # sets the X axis scale PLT. xticks (range (min (A), max (A) + D, D) PLT. grid () PLT. show ()
Question 7

List the data in the following list through sampling statistics based on the time they need. Can this data be painted into a histogram?

Data by absolute numbers

Interval Width Quantity Quantity/width
0 5 4180 836
5 5 13687 2737
10 5 18618 3723
15 5 19634 3926
20 5 17981 3596
25 5 7190 1438
30 5 16369 3273
35 5 3212 642
40 5 4122 824
45 15 9200 613
60 30 6461 215
90 60 3435 57

Interval = [, 20, 25, 30, 35, 40, 45, 60, 90]

Width = [5, 5, 5, 5, 5, 5, 5, 5, 5, 15, 30, 60]

Quantity = [4180,13687, 18618,19634, 17981,7190, 162.16,3212, 4122,9200, 6461,3435]

From matplotlib import pyplot as pltinterval = [0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 60, 90] width = [5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 15, 30, 60, 18618,19634, 17981,7190, 16.0,3212, 4122,9200, 6461,3435] # Set the image size PLT. figure (figsize = (20, 8), DPI = 80) PLT. bar (range (LEN (Quantity), quantity, width = 1) # set X axis position _ x = [I-0.5 for I in range (LEN (Quantity) + 1)] _ xtick_labels = interval + ["150"] PLT. xticks (_ x, _ xtick_labels) PLT. grid () PLT. show ()

 

Matplotlib for data analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.