Data Analysis example--meteorological data
first, the experiment introduction
This experiment will analyze and visualize the meteorological data of the northern coast of Italy. In the experiment process, we will first use Python Matplotlib Library of data for the graph processing, and then call the Scikit-learn library in the SVM library to the data regression analysis, and finally in the support of the graph analysis to draw our conclusions. 1.1 Course Sources
This course is based on Turing education, "The Python Data Analysis combat," the 2nd Chapter production, thanks to Turing Education authorized experimental building release. For a systematic study of this book, please purchase "Python data analysis combat."
In order to ensure that the experimental building environment to complete the experiment, we in the original book content based on a series of experimental guidance, such as the experiment Screenshot, code comments, to help you better combat.
If you have doubts about the experiment or suggest you can ask questions in the discussion area at any time, discuss it with your classmates. 1.2 Experimental Knowledge point matplotlib Library Draw the image Scikit-learn library to the data regression analysis NumPy library to the data slice 1.3 experimental environment python2.7 Spyder XFCE Terminal 1.4 for the crowd
This course is medium difficulty, suitable for users with Python base, if the Matplotlib module has knowledge will be quicker to start. 1.5 Code Acquisition
You can use the following command to download the code to the experimental building environment, as a reference comparison to learn.
$ wget http://labfile.oss.aliyuncs.com/courses/780/SourceCode.zip
$ wget http://labfile.oss.aliyuncs.com/ Courses/780/weatherdata.zip
This experiment is done in an interactive environment where the code cannot be run directly from the command line (because you want to paint). If you want to see the effect of the students can be sourcecode.zip and Weatherdata.zip decompression, and then open the desktop Spyder, in Ipython Console interactive environment to enter the path weatherdata, and then paste the code into the Console. See the following GIF for specific ways.
Here still want to appeal to you do not simply copy and paste code, or to follow the course of this experiment to achieve their own step by step. second, the experimental principle
Meteorological data is a kind of data that is easily found on the Internet. Many websites provide meteorological data such as air pressure, temperature, humidity and rainfall. A meteorological data file can be obtained simply by specifying the location and date. These measurements were collected by the meteorological station. Meteorological data such as data sources cover a wide range of information. The purpose of data analysis is to transform the raw data into information, and then transform the information into knowledge, so it is appropriate to take meteorological data as the object of data analysis to explain the whole process of data analysis. 2.1 Assumptions to be tested: impact of the sea on the climate
When writing this chapter, although it is summer, but the heat is unbearable, people living in the big city feel more intense. So many people go to the mountain village or seaside city to play, relax the body and mind, away from the sultry weather in the inland city. I often wonder what effect the sea has on the climate. This problem can be a good starting point for data analysis. I do not want to write this chapter as a scientific reading, but to use such a way to allow data analysis enthusiasts to apply what they have learned to the problem of "how the ocean affects the climate of a region".
Research systems: Adriatic and Po river basins
Now that you have defined a good problem, you need to find a system that is appropriate to the research data and provide an environment that is appropriate to answer this question. First, you need to find a sea area for your research. I live in Italy, there are many options for the sea, because Italy is a peninsula country surrounded by the sea. Why should you confine your choice to Italy? Because the problem we're studying is just a matter of Italian behavior, which is that we like to hide in the seaside in summer to avoid the heat of the outback. I do not know whether such behaviour is also prevalent in other countries, so I have only studied Italy as a system that I am familiar with. But you might consider which area of Italy to look at. As stated above, Italy is a peninsula country and it is not a problem to find a sea area to study, but how to measure the impact of the ocean on its different places. This leads to a big problem. Italy is actually mountainous, almost far from the sea, and can be used as a reference to each other in less inland areas. In order to measure the impact of the ocean on the climate, I ruled out mountains, because the mountains may introduce many other factors, such as elevation.
This area of the Italian Po Valley is well suited for studying the effects of ocean on climate. The plains are east of the Adriatic Sea, stretching hundreds of kilometres inland (see figure 9-1). Its surroundings are surrounded by mountains, but because of its breadth, it weakens the influence of the mountains. In addition, the region is densely populated, and it is easy to select a group of cities that are different from the sea. Of the selected cities, the maximum distance between two cities is about 400 km.
The first step is to select 10 cities as the reference group. When choosing a city, be aware that they represent the entire plain area (see Figure 9-2).
As shown in Figure 9-2, we have selected 10 cities. They will then analyse their weather data, 5 of which are within 100 kilometres of the sea and the remaining 5 kilometers from the sea 100~400.
List of cities selected as Sample: Ferrara (Ferrara) Torino (Turin) Mantova (Mantua) Milano (Milan) Ravenna (Ravenna) Asti (ASTI) Bologna (Bologna) Piacenza (Piacenza) Cesena (Cesena) Faenza (Fansa)
Now, we need to figure out how far these cities are from the sea. There are a variety of methods. The services provided by the Thetimenow Web site are available here, and it supports multiple languages (see Figure 9-3).
With a service that calculates the distance between two cities, we can calculate the distance between each city and the sea. You can choose the seaside city Comacchio as the basis for calculating the distance between the other cities (see Figure 9-2). After all the distances have been computed using the above services, the resulting results are shown in table 9-1.
Iii. Development and preparation
After defining the systems to be studied, we need to create a data source to capture the data needed for research. On the Internet, you will find that many sites provide meteorological data from around the world, including the open Weather map, and its web site is http://openweathermap.org/(see Figure 9-4).
The Web site provides the following features: The city's meteorological data can be obtained from the URL specified in the request. We have the data ready and we don't need to call the API for that site.
Start the Spyder on the desktop. If you find that there is no Spyder on your desktop, please reopen an experimental environment in the Experiment (Lab building creates different experimental environments for different experiments):
Let's get our data files first. Open XFCE Terminal
$ cd Code
$ mkdir weatheranalysis
$ cd weatheranalysis
$ wget http://labfile.oss.aliyuncs.com/courses/780/ Weatherdata.zip
$ unzip Weatherdata.zip
You should be able to see the weather data files of 10 cities in the middle of the weatherdata (end of. csv)
Double hit Open Spyder, enter our target directory in Ipython Console
CD Code
CD weatheranalysis
CD Weatherdata
We only need to use the Ipython Console in our experiment so other unrelated windows can be closed.
Import NumPy as NP
import pandas as PD
import datetime
If you want to use the data in this chapter, you need to load the 10 CSV files saved during writing this chapter.
Df_ferrara = pd.read_csv (' ferrara_270615.csv ')
Df_milano = pd.read_csv (' milano_270615.csv ')
Df_mantova = Pd.read_csv (' mantova_270615.csv ')
Df_ravenna = pd.read_csv (' ravenna_270615.csv ')
Df_torino = Pd.read_csv (' Torino_270615.csv ')
df_asti = pd.read_csv (' asti_270615.csv ')
Df_bologna = pd.read_csv (' bologna_270615.csv ' )
Df_piacenza = pd.read_csv (' piacenza_270615.csv ')
Df_cesena = pd.read_csv (' cesena_270615.csv ')
df_ Faenza = pd.read_csv (' faenza_270615.csv ')
We read the data into memory and completed the part of the experimental preparation. IV. structure of the project document
Pic9-* represents the code used by each analysis diagram.
Weatherdata is our data. v. Steps of the Experiment
It is common practice to analyze the data collected from data visualization. As I mentioned earlier, the Matplotlib Library provides a series of chart generation tools that can represent data in a visual format. Data visualization in the data analysis stage is very helpful to discover some characteristics of the research system.
Import the following necessary libraries:
%matplotlib inline
import matplotlib.pyplot as Plt
import matplotlib.dates as mdates from
dateutil Import Parser
5.1 Analysis of temperature data
For example, a very simple method of analysis is to first analyze the temperature change trend in one day. Let's take the city of Milan for example.
# reading Milan's urban meteorological data
Df_milano = pd.read_csv (' milano_270615.csv ')
# take out the temperature and date data we want to analyze
y1 = df_milano[' temp ']
X1 = df_milano[' Day ']
# converts date data into datetime format
Day_milano = [Parser.parse (x) for x in X1]
# calls subplot function, fig is an image object, Ax is the Axis object
Fig, ax = plt.subplots ()
# adjusts the x-axis scale so that it rotates 70 degrees, making it easy to see
the Plt.xticks (rotation=70)
# Set time format
hours = mdates. Dateformatter ('%h:%m ')
# Set the x-axis display format
ax.xaxis.set_major_formatter (Hours)
# Draw the image, Day_milano is the x-axis data, Y1 is the y-axis data, and ' R ' stands for ' red '
ax.plot (Day_milano, Y1, ' R ')
# Display Image
Fig
If you execute the above code, you will get the image shown in Figure 9-8. According to the figure, the temperature trend is close to the sine curve, starting from morning temperature gradually increased, the highest temperature appeared between two o'clock in the afternoon to six points, then the temperature gradually decreased, the next morning at six points to achieve the lowest value.
The purpose of our data analysis is to try to explain whether the ocean affects the temperature and whether it can affect the temperature trends, so we look at the temperature trends in several different cities at the same time. This is the only way to verify that the analysis direction is correct. Therefore, we have chosen three nearest and three cities from the sea.
# Read the data file (the classmate that did not read the data before, must read it here) Df_ravenna = pd.read_csv (' ravenna_270615.csv ') Df_faenza = Pd.read_csv (' faenza_270615. CSV ') Df_cesena = pd.read_csv (' cesena_270615.csv ') Df_asti = pd.read_csv (' asti_270615.csv ') Df_torino = Pd.read_csv (' Torino_270615.csv ') Df_milano = pd.read_csv (' milano_270615.csv ') # read temperature and date data Y1 = df_ravenna[' temp '] x1 = df_ravenna[' Day '] y2 = df_faenza[' temp '] x2 = df_faenza[' Day '] y3 = df_cesena[' temp '] x3 = df_cesena[' Day '] y4 = df_milano[' temp '] x4 = d f_milano[' Day '] Y5 = df_asti[' temp '] x5 = df_asti[' ' Day '] Y6 = df_torino[' temp '] x6 = df_torino[' days '] # turn dates from string type to Standard datetime type Day_ravenna = [Parser.parse (x) for x into x1] Day_faenza = [Parser.parse (x) for x in x2] Day_cesena = [parse R.parse (x) for X-x3] Dat_milano = [Parser.parse (x) for x in x4] Day_asti = [Parser.parse (x) for x in x5] Day_torino = [ Parser.parse (x) for x in X6] # calls the subplots () function, redefining fig, ax variable fig, ax = plt.subplots () plt.xticks (rotation=70) hours = Mdates. Dateformatter ('%h:%m ') ax. Xaxis.set_major_formatter (Hours) #这里需要画出三根线, so requires three sets of parameters, ' G ' stands for ' green ' ax.plot (day_ravenna,y1, ' R ', Day_faenza,y2, ' R ',
Day_cesena,y3, ' R ') Ax.plot (Dat_milano,y4, ' G ', day_asti,y5, ' G ', Day_torino,y6, ' g ') FIG
The above code will produce the chart shown in Figure 9-9. The temperature curves of the nearest three cities use red, while the curves of the three cities farthest from the sea use green.
As shown in Figure 9-9, the results look good. The highest temperatures in the nearest three cities are much lower than the three cities farthest from the sea, while the lowest temperatures look less varied.
We can do in-depth research in this direction, collect the highest temperate and lowest temperatures in 10 cities, and use linear graphs to indicate the relationship between the maximum temperature point and the distance from sea.
# dist is a list of city distances from the seaside
dist = [df_ravenna[' dist '][0], df_cesena[' dist '][0 '
,
df_faenza[' Dist '], df_ferrara[' Dist '][0], df_bologna[' dist '][0 ', df_mantova[' Dist '][0], df_piacenza[' dist '][0
],
df_milano[' Dist '][0],
df_asti[' Dist '][0],
df_torino[' Dist '][0]
# Temp_max is a list of the highest temperatures in each city
Temp_max = [df_ravenna[