Python meteorological Data analysis--"Python Data analysis Combat" __python

Source: Internet
Author: User
Tags datetime numeric value set time

10 cities were selected. They will then analyse their weather data, 5 of which are within 100 kilometres of the sea and the remaining 5 kilometers from the sea 100~400.

A list of cities selected for the sample is as follows:
Ferrara (Ferrara)
Torino (Turin)
Mantova (Mantua)
Milano (Milan)
Ravenna (Ravenna)
Asti (ASTI)
Bologna (Bologna)
Piacenza (Piacenza)
Cesena (Cesena)
Faenza (Fansa)

Data Source: http://openweathermap.org/

1. Temperature Data Analysis
The purpose of the data analysis is to try to explain whether the ocean affects the temperature and whether it can affect the temperature trend, so it also looks at the temperature trends in several different cities. This is the only way to verify that the analysis direction is correct. So choose three closest to the sea and three cities farthest from the sea.

Import Matplotlib.pyplot as PLT import matplotlib.dates as mdates from dateutil import parser Import pandas as PD import n Umpy as NP Df_ferrara = Pd.read_csv (' ferrara_270615.csv ') Df_milano = pd.read_csv (' milano_270615.csv ') Df_mantova = Pd.re Ad_csv (' mantova_270615.csv ') Df_ravenna = pd.read_csv (' ravenna_270615.csv ') Df_torino = pd.read_csv (' torino_270615 '). CSV ') Df_asti = pd.read_csv (' asti_270615.csv ') Df_bologna = pd.read_csv (' bologna_270615.csv ') Df_piacenza = Pd.read_csv (' piacenza_270615.csv ') Df_cesena = pd.read_csv (' cesena_270615.csv ') Df_faenza = pd.read_csv (' faenza_270615.csv ') Read the city weather data # Remove the temperature and date data to be analyzed y1 = df_ravenna[' temp '] x1 = df_ravenna[' Day '] y2 = df_faenza[' temp '] x2 = df_faenza[' Day '] y3 = df_cesena[' temp '] x3 = df_cesena[' Day '] y4 = df_milano[' temp '] x4 = df_milano[' Day '] Y5 = df_asti[' temp '] x5 = df_asti[' da Y '] Y6 = df_torino[' temp '] x6 = df_torino[' Day '] # converts date data to datetime format Day_ravenna = [Parser.parse (x) for x in X1] Day_
Faenza = [Parser.parse (x) for x in X2]Day_cesena = [Parser.parse (x) for x into x3] Dat_milano = [Parser.parse (x) for x in x4] Day_asti = [Parser.parse (x) for x in X5] Day_torino = [Parser.parse (x) for x in X6] # calls the subplot function, fig is an image object, Ax is the Axis object Fig, ax = plt.subplots () # Adjusts the x-axis coordinate scale So that it rotates 70 degrees to facilitate viewing plt.xticks (ROTATION=70) # set time format hours = mdates. Dateformatter ('%h:%m ') # Set the x-axis display format ax.xaxis.set_major_formatter (hours) #这里需要画出三根线, so you need three sets of parameters Ax.plot (Day_ravenna,y1, ' R ', Day_faenza,y2, ' R ', Day_cesena,y3, ' R ') Ax.plot (Dat_milano,y4, ' G ', day_asti,y5, ' G ', Day_torino,y6, ' g ') #显示图像 Fig

Results:

The highest temperatures in the nearest three cities are much lower than the three cities farthest from the sea, while the lowest temperatures look less varied.

In this direction can be done in-depth study, collected in 10 cities of the highest moderate and lowest temperature, using a linear graph to indicate the relationship between the maximum temperature point and the distance from the sea.

#10个城市的最高温和最低温, a linear graph is used to indicate the relationship between the maximum point of temperature and the distance from sea #dist: The list of urban distance and seaside distances dist = [df_ravenna[' dist '][0], df_cesena[' dist ' "][0",
        df_faenza[' Dist '][0], df_ferrara[' dist '][0 ', df_bologna[' Dist '][0], df_mantova[' Dist '][0], 
df_piacenza[' Dist '][0], df_milano[' Dist '][0], df_asti[' Dist '][0], df_torino[' Dist '][0] #temp_max: A list of the highest temperatures in each city #temp_min: A list of the lowest temperatures in each city Temp_max = [df_ravenna[' temp '].max (), df_cesena[' temp '].max ()
            , df_faenza[' temp '].max (), df_ferrara[' temp '].max (), df_bologna[' temp '].max (), df_mantova[' temp '].max (), df_piacenza[' temp '].max (), df_milano[' temp '].max (), df_ asti[' temp '].max (), df_torino[' temp '].max ()] temp_min = [df_ravenna[' temp '].min (), Df_cesena [' Temp '].min (), df_faenza[' temp '].min (), df_ferrara[' temp '].min (), df_bologna[' temp '].min (), df_mantova[' t Emp '].min (), Df_piacenza[' temp '].min (), df_milano[' temp '].min (), df_asti[' temp '].min (), df_torino[' temp '].min ()] #先把最高温画出来 Fig, Ax = plt.subplots () ax.plot (Dist,temp_max, ' ro ') FIG #scikit-learn Library SVR method from SKLEARN.SVM import SVR # Dist1 is a collection of cities near the coast, di St2 is a city away from the ocean. Dist1 = Dist[0:5] Dist2 = dist[5:10] # Change the structure of the list, Dist1 is now a collection of 5 lists # we see NumPy () functions have the same effect reshape  = [[[x] for x in dist1] Dist2 = [[x] for x in Dist2] # TEMP_MAX1 is the corresponding maximum temperature of the city in Dist1 temp_max1 = temp_max[0:5] # TEMP_MAX2 is The corresponding maximum temperature of the city in Dist2 temp_max2 = temp_max[5:10] # calls the SVR function, in which the linear fitting function is specified and the C is set to 1000来 as much as possible (since no precise predictions are needed to fit the data) svr_lin1 = SVR (kernel= ' linear ', c=1e3) svr_lin2 = SVR (kernel= ' linear ', c=1e3) # Add data for fitting Svr_lin1.fit (Dist1, Temp_max1) svr_lin2. Fit (Dist2, TEMP_MAX2) # about the reshape function see the detailed discussion behind the code XP1 = Np.arange (10,100,10). Reshape ((9,1)) XP2 = Np.arange (50,400,50). Res Hape ((7,1)) Yp1 = Svr_lin1.predict (xp1) yp2 = Svr_lin2.predict (XP2) # Limits the X-axis range of values Ax.set_xlim (0,400) # Draw the image Ax.plot (XP 1, YP1, c= ' B ', label= ' StRong Sea effect ') Ax.plot (XP2, YP2, c= ' G ', label= ' Light Sea effect ') FIG print svr_lin1.coef_ #斜率 print Svr_lin1.interce Pt_ # offset Print svr_lin2.coef_ print Svr_lin2.intercept_

Results:

Within 60 kilometers of the sea, the temperature rises quickly, from 28 degrees to 31 degrees, then gradually easing (if it continues to grow), the longer distance will have a small rise. These two trends can be represented by two straight lines, the expression of which is: x = ax + b
Where a is the slope and B is the Intercept.
Consider the intersection of these two lines as a cut-off point for areas affected by the oceans and unaffected by the oceans, or at least the weaker marine impacts.

#考虑将这两条直线的交点作为受海洋影响和不受海洋影响的区域的分界点, or at least the weaker boundary point of the ocean, from
scipy.optimize import fsolve

#定义第一条拟合直线
def Line1 (x):
    a1=svr_lin1.coef_[0][0]
    b1=svr_lin1.intercept_[0] return
    a1*x+b1

#定义第二条拟合直线
def Line2 (x):
    a2=svr_lin2.coef_[0][0]
    b2=svr_lin2.intercept_[0] return
    a2*x+b2

#定义了找到两条直线的交点的 x coordinates function
def findintersection (fun1,fun2,x0): Return
    fsolve (lambda x:fun1 (x)-fun2 (x), x0)

result= Findintersection (line1,line2,0.0)
print "[x,y]=[%d,%d]"% (result,line1 (result))

fig, ax = plt.subplots ()
x=np.linspace (0,300,31)
Ax.plot (x,line1 (x), X,line2 (x), result,line1 (Result), ' ro ')
Fig

Results:

Executing the above code will get the coordinates of the intersection [x,y] = [53, 30]
Therefore, it can be said that the average distance that the ocean has an effect on the temperature (the day's condition) is 53 km.

Now, analyze the minimum temperature.

#最低温
Fig, ax = plt.subplots ()
Plt.axis ((0,400,15,25))
Ax.plot (dist,temp_min, ' bo ')
Fig

Results:

It's clear that the lowest temperatures around 6 o ' clock at night or in the morning have nothing to do with the ocean

2. Humidity Data Analysis
The humidity trend of three coastal cities and three inland cities could be examined on that day.

Import Matplotlib.pyplot as PLT import matplotlib.dates as mdates from dateutil import parser Import pandas as PD import n Umpy as NP Df_ferrara = Pd.read_csv (' ferrara_270615.csv ') Df_milano = pd.read_csv (' milano_270615.csv ') Df_mantova = Pd.re Ad_csv (' mantova_270615.csv ') Df_ravenna = pd.read_csv (' ravenna_270615.csv ') Df_torino = pd.read_csv (' torino_270615 '). CSV ') Df_asti = pd.read_csv (' asti_270615.csv ') Df_bologna = pd.read_csv (' bologna_270615.csv ') Df_piacenza = Pd.read_csv (' piacenza_270615.csv ') Df_cesena = pd.read_csv (' cesena_270615.csv ') Df_faenza = pd.read_csv (' faenza_270615.csv ') Read City Humidity Data # Remove humidity and date data to analyze Y1 = df_ravenna[' humidity '] x1 = df_ravenna[' Day '] y2 = df_faenza[' humidity '] x2 = df_faenza[' Day '] y3 = df_cesena[' humidity '] x3 = df_cesena[' Day '] y4 = df_milano[' humidity '] x4 = df_milano[' Day '] Y5 = df_asti[' Humidit Y '] x5 = df_asti[' days '] Y6 = df_torino[' humidity '] x6=df_torino[' Day ' # converts date data to datetime format Day_ravenna = [Parser.pars E (x) for x in x1] Day_faenza = [parser. Parse (x) for X-x2] Day_cesena = [Parser.parse (x) for x in x3] Dat_milano = [Parser.parse (x) for x in x4] Day_asti = [P Arser.parse (x) for x in x5] Day_torino = [Parser.parse (x) to X in X6] # call subplot function, fig is image object, ax is axis object Fig, ax = PLT Subplots () # Adjusts the x-axis scale so that it rotates 70 degrees, making it easy to view the format of the Plt.xticks (rotation=70) # set time hours = Mdates. Dateformatter ('%h:%m ') # Set the x-axis display format ax.xaxis.set_major_formatter (hours) #这里需要画出三根线, so you need three sets of parameters Ax.plot (Day_ravenna,y1, ' R ', Day_faenza,y2, ' R ', Day_cesena,y3, ' R ') Ax.plot (Dat_milano,y4, ' G ', day_asti,y5, ' G ', Day_torino,y6, ' g ') #显示图像 Fig

Results:

At first glance, it looks as if the humidity in coastal cities is greater than in inland cities, with a total humidity gap of around 20% per day. Let's look at the relationship between the extreme humidity and the distance from the sea.

#dist: Urban distance and seaside distance List dist = [df_ravenna[' dist '][0], df_cesena[' Dist '][0], df_faenza[' dist '-][0], Df_fer rara[' Dist '][0], df_bologna[' Dist '][0], df_mantova[' Dist '][0], df_piacenza[' Dist '][0], df_ milano[' Dist '][0], df_asti[' Dist '][0], df_torino[' Dist '][0] # Get maximum humidity data Hum_max = [df_ravenna[' Hu Midity '].max (), df_cesena[' humidity '].max (), df_faenza[' humidity '].max (), df_ferrara[' humidity '].max (), Df_bologna [' Humidity '].max (), df_mantova[' humidity '].max (), df_piacenza[' humidity '].max (), df_milano[' humidity '].max (), df_ asti[' humidity '].max (), df_torino[' humidity '].max ()] fig, ax = plt.subplots () plt.plot (Dist,hum_max, ' Bo ') # Get minimum Humidity hum_ min = [df_ravenna[' humidity '].min (), df_cesena[' humidity '].min (), df_faenza[' humidity '].min (), df_ferrara[' Humidity '].min (), df_bologna[' humidity '].min (), df_mantova[' humidity '].min (), df_piacenza[' humidity '].min (), df_ milano[' humidity '].min (), df_asti[' humidity '].min (), df_torino[' HuMidity '].min ()] #fig, ax = plt.subplots () plt.plot (dist,hum_min, ' ro ') 

Results:

Coastal cities, both the largest and smallest, are higher than inland cities. However, it cannot be said that there is a linear relationship between humidity and distance or other relationships that can be expressed in curves. The number of data points collected (10) is too small to describe such trends.

3. Wind Frequency Rose Chart
Of the meteorological data collected in each of the cities, the following two wind-related:
Wind (direction) wind speed

By analyzing the dataframe of each city's meteorological data, it is found that the wind speed is not only associated with the time period of one day, but also with a direction that is in the 0~360度. For example, each measurement data also contains the direction the wind blows
In order to better analyze this kind of data, it is necessary to make it into a visual form, but for wind data, it is no longer the best choice to make the linear graph using Cartesian coordinate system.

If you make a scatter plot of data points in a dataframe,

#散点图不直观, for example
fig, ax = plt.subplots ()
plt.plot (df_ravenna[' wind_deg '],df_ravenna[' wind_speed '], ' ro ')
#fig
plt.show ()

Results:

To represent a 360-degree distribution of data points, it is best to use another visualization method: Polar Charts.

First, create a histogram, that is, 360 degrees into eight facets, each with 45 degrees, all the data points to the eight facets.

#表示呈360度分布的数据点, use another visualization method: Polar Map
#360 ° Eight equal parts, divided into eight facets, each 45°, all the data points to the eight facets
#histogram () The function returns the array in the result hist to the number of data points that fall on each facet.
[0 5 1 0 1 0 0]
#返回结果中的数组bins定义了360度范围内各面元的边界.
[0. 45.90. 135.180. 225.270. 315.360.]

Hist,bins=np.histogram (df_ravenna[' wind_deg '],8,[0,360])
Print hist
Print Bins

The histogram () function returns the array in the result hist the number of data points that fall on each facet.

[0 5 11 1 0 1 0 0]

The array bins in the return result defines the boundaries of each facet within a 360-degree range.

[0.45. 90.135. 180.225. 270.315. 360.]

To properly define the polar graph, you cannot leave these two arrays. Create a function to draw the polar plot, and define this function as Showrosewind (), which has three parameters:
Values array, refers to the data to be plotted, that is, the hist array;
The second argument, City_name, is a string type, specifying the name of the city to use for the chart title;
The last parameter, Max_value, is an integer specifying the maximum blue value.

def showrosewind (values,city_name,max_value):
    N = 8

    # theta = [PI*1/4, PI*2/4, PI*3/4, ..., pi*2]
    theta = Np.a Range (0.,2 * np.pi, 2 * np.pi/n)
    radii = Np.array (values)
    fig,ax=plt.subplots ()
    # Draws the coordinate system of the polar plot
    plt.axes ([ 0.025, 0.025, 0.95, 0.95], polar=true)

    # The list contains RGB values for each sector, the larger the X, the closer the color is to the blue
    colors = [1-x/max_value, 1-x/max _value, 0.75) for x in radii]

    # Draw each sector
    Plt.bar (theta, radii, width= (2*np.pi/n), bottom=0.0, color=colors)

    # Set the title of the polar plot
    plt.title (City_name, x=0.2, fontsize=20)
    Fig

You need to modify the color table of the variable colors store. Here, the closer the color of the pie to the blue, the greater the value. Once you have defined the function, you can call it:

Showrosewind (hist, ' Ravenna ', Max (hist))

Results:

The entire 360-degree range is divided into eight areas (faces), each with an arc length of 45 degrees, and a list of radial scale values for each region. In each area, a sector that can be changed with a radius length represents a numeric value, and the longer the radius, the larger the number that the sector represents. To enhance the readability of the chart, we use the color table corresponding to the sector radius. The longer the radius, the larger the fan span, the closer the color is to the dark blue.

The distribution mode of wind direction in polar coordinate system can be known from the polar region graph just obtained. The figure shows that most of the day the Wind

Blow to the southwest and West direction.

After defining the Showrosewind () function, it is also very easy to see the wind in other cities.

#其他城市
hist, bin = Np.histogram (df_ferrara[' wind_deg '],8,[0,360])
print hist
showrosewind (hist, ' Ferrara ', Max (hist))
hist, bin = Np.histogram (df_milano[' wind_deg ',],8,[0,360])
print hist
showrosewind ( hist, ' Milano ', Max (hist))

Results:


Calculation of the distribution of wind speed mean

Even other data related to wind speed can be represented by a polar plot.

Define the Rosewind_speed function to calculate the average wind speed of each facet element in the eight facets divided into 360 degrees.

#计算风速均值的分布情况
def rosewind_speed (df_city):
    degs=np.arange (45,361,45)
    tmp=[] for
    deg in Degs:
        # Gets the average wind speed of the wind_deg in the specified range
        #获取的是风向大于 ' deg-46 ' degree and the wind direction is less than ' deg ' data.
        tmp.append (df_city[(df_city[' wind_deg ']> (deg-46)) & (df_city[' wind_deg ']<deg)]
        [' Wind_ Speed '].mean ()) return
    np.array[tmp]

The Rosewind_speed () function returns a numpy array that contains eight average wind speed values. The array will act as the first parameter of the previously defined Showrosewind () function, which is used to draw the polar plot.

Hist,bins=np.histogram (df_ravenna[' wind_deg '],8,[0,360])
Print hist
print Bins
showrosewind (rosewind _speed (Df_ravenna), ' Ravenna ', Max (hist)
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.