Python Project Practice II (Download data) Third article

Source: Internet
Author: User
Tags parse csv file

Continue with the previous section, in this chapter you will download data from the Web and visualize the data. The data on the Web is incredibly large, and most of it has not been carefully examined. If you can analyze these data, you can find patterns and associations that others don't find. We will access and visualize the data stored in two common formats: CSV and JSON. We will use the Python module CSV to process the weather data stored in CSV (comma separated Values) format, to find out the highest and lowest temperatures in two different regions over a period of time. We will then use Matplotlib to create a chart based on the downloaded data to show the temperature variations in two different regions: Sitka and California Death Valley. Later in this chapter, we will use the module JSON to access population data stored in JSON format and use Pygal to draw a population map by country.

A CSV format

The simplest way to store data in a text file is to write the data as a series of comma-separated values (CSV) to the file. Such a file is called a CSV file. For example, here is a row of weather data in CSV format:
2014-1-5,61,44,26,18,7,-1,56,30,9,30.34,30.27,30.15,,,, 10,4,,0.00,0,,195

Two-parse CSV file header

The CSV module is included in the Python standard library and can be used to parse rows of data in a CSV file, allowing us to quickly extract values of interest. Let's look at the first line of this file, which contains a series of descriptions of the data:

Import csvfilename = ' sitka_weather_07-2014.csv ' with open (filename) as f:    reader = Csv.reader (f)    header_row = NE XT (reader)    print (Header_row)

(1) Call Csv.reader () and pass the previously stored file object as an argument to it, creating a Reader (reader) object associated with the file. We store this reader object in reader.

(2) The module CSV contains the function next (), which returns the next line in the file when it is called and passes the reader object to it. In the previous code, we only called next () once, so we got the first line of the file,

The results are as follows: [' Akdt ', ' Max Temperaturef ', ' Mean temperaturef ', ' min temperaturef ', ' Max Dew PointF ', ' Meandew PointF ', ' min dewpoi ' NtF ', ' max humidity ', ' Mean humidity ', ' min humidity ', ' max sea level Pressurein ', ' Mean sea level Pressurein ', ' min Sea level Pressurein ', ' Max Visibilitymiles ', ' Mean visibilitymiles ', ' Min visibilitymiles ', ' Max Wind speedmph ', ' Me An wind speedmph ', ' Max Gust speedmph ', ' precipitationin ', ' cloudcover ', ' Events ', ' winddirdegrees '

This CSV file is like this.

Three print header files and their location

To make the header data easier to understand, print each file header and its location in the list:

Import csvfilename = ' sitka_weather_07-2014.csv ' with open (filename) as f:    reader = Csv.reader (f)    header_row = NE XT (reader) for    Index,column_header in Enumerate (header_row):        print (Index,column_header)

The results are as follows:

Four extract and read data

Once we know what columns are needed, we can read some data. First read the maximum temperature per day:

Import csvfilename = ' sitka_weather_07-2014.csv ' with open (filename) as f:    reader = Csv.reader (f)    header_row = NE XT (reader)    #for index,column_header in Enumerate (header_row):     #   print (index,column_header)     Highs=[] for    row in reader: high        = Int (row[1])        highs.append (high)    print (highs)

Results such as:

[64, 71, 64, 59, 69, 62, 61, 55, 57, 61, 57, 59, 57, 61, 64, 61, 59, 63, 60, 57, 69, 63, 62, 59, 57, 57, 61, 59, 61, 61, 6 6]

Five draw the air temperature icon
Import CSV frommatplotlib import pyplot as pltfilename = ' sitka_weather_07-2014.csv ' with open (filename) as F: 
   reader = Csv.reader (f)    Header_row = Next (reader)    #for index,column_header in Enumerate (header_row):     #   print (Index,column_header)     highs=[] for    row in reader: high        = Int (row[1])        highs.append (High)    print (highs)   Fig = plt.figure (dpi=128,figsize= (10,6))    plt.plot (highs,c= "Red")    #设置图形的格式    plt.title ("Daily High Temperatures,july, ", fontsize=24)    Plt.xlabel (" ", fontsize=16)    Plt.ylabel (" Temperature (F) ", FONTSIZE=16)    Plt.tick_params (axis= "both", which = "major", labelsize=16)    plt.show ()

Results such as:

Six module datetime

The DateTime class in the module datetime is first imported, then the method Strptime () is called, and the string containing the desired date is used as the first argument. The second argument tells Python how to format the date. In this example, '%y-' lets python treat the part preceding the first hyphen in the string as a four-bit year; '%m-' lets Python treat the part of the second hyphen as a number representing the month, while '%d ' lets Python treat the last part of the string as the day of the month (1~ 31).

Method Strptime () accepts various arguments and determines how dates are interpreted according to them. Some of these arguments are listed here:

Seven add a date to the chart

Once you know how to work with dates in a CSV file, you can improve the temperature pattern by extracting the date and the highest temperature and passing them to plot () as follows:

Import csvfrom matplotlib import Pyplot as PltFrom datetime import datetimefilename = ' sitka_weather_07-2014.csv ' with open (filename) as F:reader = Csv.reader (f) header_row = Next (reader) #for Index,column_header in Enumerate (header_row): # print (Index,column_header) #从文件中获取日期和最高气温dates,highs=[],[]For row in reader:current_date = Datetime.strptime (Row[0], "%m/%d/%y") dates.append (current_date)High = Int (row[1]) highs.append (high) print (highs) FIG = Plt.figure (dpi=128,figsize= (10,6)) Plt.plot (dates , highs,c= "Red") #设置图形的格式 plt.title ("Daily High Temperatures,july,", fontsize=24) Plt.xlabel ("", fontsize=16)fig.autofmt_xdate ()Plt.ylabel ("Temperature (F)", fontsize=16) Plt.tick_params (axis= "both", which = "major", labelsize=16) Plt.show ()

We created two empty lists to store the date and maximum temperature extracted from the file (see?). We then convert the data containing the date information (Row[0]) to a DateTime object and append it to the end of the list dates. We pass the date and maximum temperature values to plot (). We call Fig.autofmt_xdate () to draw the oblique date labels so that they don't overlap each other. The improved chart is displayed.

Eight re-draw a data series

The improved chart shows a lot of far-reaching data, but we can add the lowest temperature data to make it more useful. To do this, you need to extract the lowest temperatures from the data files and add them to the chart as follows:

Import csvfrom matplotlib import pyplot as pltfrom datetime import datetimefilename = ' sitka_weather_2014.csv ' with open (fi Lename) as F:reader = Csv.reader (f) header_row = Next (reader) #for Index,column_header in Enumerate (header_row) : # print (index,column_header) #从文件中获取日期和最高气温, minimum temperaturedates,highs,lows=[],[],[]For row in reader:current_date = Datetime.strptime (Row[0], "%y-%m-%d") dates.append (current_date)Low=int (row[3]) lows.append (low)High = Int (row[1]) highs.append (high) # Print (highs) FIG = Plt.figure (dpi=128,figsize= (10,6)) Plt.plot (date s,highs,c= "Red")plt.plot (dates,lows,c= "Blue")#设置图形的格式 plt.title ("Daily High temperatures-2014,", fontsize=24) Plt.xlabel ("", fontsize=16) fig.autofmt_xdate () Plt.ylabel ("Temperature (F)", fontsize=16) Plt.tick_params (axis= "both", which = "major", labelsize=16) Plt.show ()

As follows:

Nine to color the icon area

After adding two data series, we can understand the daily temperature range. The following is a final modification of the chart to show the daily temperature range by coloring. To do this, we will use the method Fill_between (), which accepts a series of X-values and two Y-values, and fills the space between the two Y-value series:

Plt.plot (dates,highs,c= "Red", alpha=0.5) Plt.plot (dates,lows,c= "Blue", alpha=0.5) Plt.fill_between (Dates,highs, Lows,facecolor= "Blue", alpha=0.1)

(1) The argument alpha specifies the transparency of the color. An alpha value of 0 means full transparency, and 1 (the default setting) indicates full opacity. By setting Alpha to 0.5, you can make the red and blue polyline colors appear lighter.

(2) We passed an X-value series to Fill_between (): list dates, and two series of Y-values: highs and lows.

(3) The argument facecolor specifies the color of the fill area, and we also set Alpha to a smaller value of 0.1, allowing the filled area to connect two data series without distracting the viewer's attention.

(4) The chart showing the area between the highest and lowest temperatures is populated as follows:

Not to be continued! New Year's Day three days a small holiday has come, wish everyone happy New Year!

Python Project Practice II (Download data) Third article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.