Using machine learning to predict weather (third part neural network)

Source: Internet
Author: User
Tags memory usage numeric value shuffle dnn
Overview

This is the last article in a series on machine learning to predict the average temperature, and as a last article, I will use Google's Open source machine learning Framework TensorFlow to build a neural network regression. About the introduction of TensorFlow, installation, Introduction, please Google, here is not to tell.

This article I mainly explain several points: Understanding artificial Neural network theory TensorFlow advanced api:estimators constructing DNN Model predicting weather artificial neural network theory

The previous article mainly explained how to construct the linear regression model (this is the most basic machine learning algorithm) to predict the average temperature of Nebraska State Lincoln daily. Linear regression models are very effective and can be used for numerical (such as classification), forecasting (such as weather forecasting). Linear regression algorithm is also compared with limitations, it requires a linear relationship between the data.
There are countless algorithms for data mining and machine learning to deal with non-linear relationship scenarios. In recent years, the most popular neural network algorithm, which can deal with many problems in the field of machine learning. Neural network algorithms have the ability of linear and nonlinear learning algorithms.
Neural networks are inspired by biological neurons in the brain that work in complex interactive networks to transmit, gather and learn information based on the history of information that has been collected. The computational neural networks we are interested in are similar to those of the brain, because they are a collection of neurons (nodes) that receive input signals (numbers), process input and send processed signals to other downstream agent networks. The signal is a very powerful feature, not limited to the linear relationship, as the processing of digital data through neural networks.
In this series, I've been focusing on a specific type of machine learning called supervised learning, and it says that the results of training data are known to predict future input output according to the known inputs and outputs of history. In addition, the type of prediction is the true value of the number, which means that we are using a regression prediction algorithm.
From a graphical perspective, the neural networks described in this article are similar to the following diagrams:

The neural network described above contains an input layer to the left, namely the X1 and X2 in the graph, which are the neural network input values. These two features are entered into the neural network and are processed and transmitted through two layers of neurons called hidden layers. This description shows two hidden layers, each containing three neurons (nodes). The signal then leaves the neural network and is summarized as a single numeric predictive value at the output layer.
Let me take a moment to explain the meaning behind the arrows, which represent the transfer of data between tiers from one node to another. Each arrow represents a mathematical transformation of a numeric value, starting at the bottom of the arrow and multiplying by the specific weight of the path. Each node in a layer will get a value in this way. It then summarizes all the values that converge on the node. This is the linear operation of the neural network I mentioned earlier.

After summing on each node, a special nonlinear function is applied to the sum, which is described in the image above as FN (...). )。 The special function of introducing nonlinear characteristics into neural networks is called activation function. The nonlinear characteristics of the activation function endow the multilayer neural network with its function. If the nonlinearity is not added to the process, all layers are effectively combined into a constant operation, including multiplying the input by a flat coefficient value (i.e., a linear model).
Well, it's all good, but how does this turn into a learning algorithm? The most immediate answer is to evaluate the ongoing projections, i.e. the output of the model "Y" to the actual expected value (the target), and a series of adjustments to the weights to improve the overall predictive accuracy.
In a world of regression machine learning algorithms, the accuracy is evaluated by using the cost (also known as "loss" or "objective") functions (i.e., the sum of squared errors (SSE)). Please note that I extend this statement to the whole machine learning continuum, not just the neural network. In the previous article, the common least squares algorithm was used to achieve this, and it found a combination of coefficients that minimized the error squared and the least squares.
Our neural network regression will do the same thing. It extracts the characteristic value from the iterative training data, calculates the cost function (using SSE), and adjusts the weights in the way of minimizing the cost function. The process of iterative push characteristics by an algorithm and the evaluation of how to adjust weights according to cost function.
The model optimization algorithm is very important in building robust neural networks. For example, the network architecture (that is, width and depth) is fed, then evaluated according to the cost function, and the weights are adjusted. The model is considered "learning" when the optimizer function determines that the weight adjustment does not result in a change in the cost function calculation cost.TensorFlow Estimator API

TensorFlow is made up of several parts, the most common of which is the core API, which provides users with a low-level API to define and train any machine learning algorithms that use symbolic operations. This is also the core function of TensorFlow, although the kernel API can handle most scenarios, but I am more concerned about the Estimator API.
The TensorFlow team developed the estimator API to make it easier for everyday developers to use the library. This API provides a training model, an evaluation model, and a predictive interface to unknown data similar to the Sci-kit library, which is achieved by implementing a common interface for various algorithms. In addition, building in the Advanced API is the machine learning best practices, abstraction and scalability load.
All of these machine learning benefits enable a set of tools implemented in the underlying estimator class and a number of pre-packaged model types, reducing the entry threshold for using TensorFlow and therefore can be applied to day-to-day problems. By abstracting issues such as writing a training loop or dealing with a session, developers can focus on more important things, such as quickly experimenting with multiple models and model architectures to find the models that best suit their needs.
In this article, I will describe how to use one of the very powerful depth neural network estimators DNN regressor. build a dnnregressor to predict the weather

Let's first import some of the libraries we need to use.

Import pandas as PD  
import numpy as NP  
import TensorFlow as TF from  
sklearn.metrics import 
Explained_vari Ance_score, \  
    mean_absolute_error, \
    median_absolute_error from
sklearn.model_selection import train_ Test_split  

Let's take care of the data, and all the data I have on the GitHub, you can check the clone.

# read in the CSV data into a pandas data frame and set the date as the index
df = pd.read_csv (' End-part2_df.csv '). Set  _index (' date ')

# Execute the describe () function and transpose the output so that it doesn ' t overflow the width of the Screen
df.describe (). T  


# Execute the info () function df.info () <class ' pandas.core.frame.DataFrame ' > index:997 entries, 2015-01-04 to 20 17-09-27 Data Columns (total columns): MEANTEMPM 997 non-null Int64 maxtempm 997 Non-null 4 mintempm 997 non-null Int64 meantempm_1 997 non-null float64 meantempm_2 997 non-null Floa T64 meantempm_3 997 non-null float64 meandewptm_1 997 non-null float64 meandewptm_2 997 non-null Float64 meandewptm_3 997 non-null float64 meanpressurem_1 997 non-null float64 meanpressurem_2 997 non-n ull float64 meanpressurem_3 997 non-null float64 maxhumidity_1 997 non-null float64 maxhumidity_2 997 N On-null float64 maxhumidity_3 997 non-null float64 minhumidity_1 997 non-null float64 minhumidity_2 9         Float64 non-null minhumidity_3 997 non-null float64 maxtempm_1 997 non-null float64 maxtempm_2 997 Non-null float64 
Maxtempm_3 997 non-null float64 mintempm_1 997 non-null float64 mintempm_2 997 non-null Floa T64 mintempm_3 997 non-null float64 maxdewptm_1 997 non-null float64 maxdewptm_2 997 non-null Float64 maxdewptm_3 997 non-null float64 mindewptm_1 997 non-null float64 mindewptm_2 997 non-n ull float64 mindewptm_3 997 non-null float64 maxpressurem_1 997 non-null float64 maxpressurem_2 997 N On-null float64 maxpressurem_3 997 non-null float64 minpressurem_1 997 non-null float64 minpressurem_2 9          Float64 non-null minpressurem_3 997 non-null float64 precipm_1 997 non-null float64 precipm_2 997 non-null float64 precipm_3 997 non-null float64 Dtypes:float64 (), Int64 (3) memory usage:311.6+ K   B

Please note that we have just recorded 1000 meteorological data records, and all the features are of a digital nature. Also, because of the hard work we did in the first article, all the records are complete because they are not missing any values (no non-null values).
Now I'm going to delete both the "mintempm" and "maxtempm" columns because they mean nothing to help us predict the average temperature. We are trying to predict the future, so obviously we can't grasp the data about the future. I will also isolate the feature (X) from the target (y).

# The MAXTEMPM and mintempm from the dataframe
df = Df.drop ([' mintempm ', ' maxtempm '], Axis=1)

# X would Be a pandas dataframe of all columns except meantempm
X = Df[[col for col in Df.columns if Col!= ' meantempm ']]

Y'll be a pandas series of the meantempm
y = df[' meantempm ']  

As with all monitoring machine learning applications, I will divide my dataset into training sets and test sets. However, in order to better explain the iterative process of training this neural network, I will use an additional dataset, which I refer to as the "validation set." For the training set, I will take advantage of 80% of the data, and for the test and validation set, they will be 10% of the remaining data. To decompose this data, I will again use the Train_test_split () function of the Scikit Learn library.

# split data into training set and a temporary set using SKLEARN.MODEL_SELECTION.TRAING_TEST_SPL It X_train, x_tmp, y_train, y_tmp = Train_test_split (X, y, test_size=0.2, random_state=23) # take remaining the 20% of D ATA in X_tmp, y_tmp and split them evenly x_test, X_val, y_test, Y_val = Train_test_split (x_tmp, Y_tmp, test_size=0.5, ran dom_state=23) X_train.shape, X_test.shape, x_val.shape print ("Training instances {}, training features {}". Format (X _train.shape[0], x_train.shape[1]) print ("Validation instances {}, Validation features {}". Format (x_val.shape[0), x_ VAL.SHAPE[1]) Print ("Testing instances {}, testing features {}". Format (x_test.shape[0), x_test.shape[1]) Train  ing instances 797, training features Validation instances, Validation features-testing instances 100, Testing features 

The first step to take when building a neural network model is to instantiate the Tf.estimator.DNNRegressor () class. The constructor of a class has multiple parameters, but I will focus on the following parameters: Feature_columns: A structure similar to a list containing the definitions of the names and data types of the elements to be entered into the model Hidden_ units: A structure similar to a list, A definition optimizer:tf that contains the number width and depth of a neural network. An instance of the optimizer subclass that optimizes the weight of the model during training; its default value is the Adagrad optimizer. ACTIVATION_FN: Activates the function for introducing nonlinearity into the network at each layer; The default is Relu Model_dir: The directory to create, which contains metadata for the model and other checkpoint saves
I will first define a list of numeric feature columns. To do this, I use the Tf.feature_column.numeric_column () function to return a Featurecolumn instance.

Using the defined attribute columns, I can now instantiate the Dnnregressor class and store it in a regression variable. I specify that I want a neural network with a depth of two layers, where two layers are 50 nodes wide. I also pointed out that I wanted my model data to be stored in a directory called Tf_wx_model.

Regressor = Tf.estimator.DNNRegressor (Feature_columns=feature_cols,  
                                      hidden_units=[50, M),
                                      model_dir= ' tf_ Wx_model ')
INFO:tensorflow:Using Default Config.  
INFO:tensorflow:Using config: {' _tf_random_seed ': 1, ' _save_checkpoints_steps ': None, ' _save_checkpoints_secs ': 600, ' _model_dir ': ' Tf_wx_model ', ' _log_step_count_steps ': M, ' _keep_checkpoint_every_n_hours ': 10000, ' _save_summary_ Steps ': M, ' _keep_checkpoint_max ': 5, ' _session_config ': None}  

What I want to do next is to define a reusable function, which is often called an "input function", and I'll call WX_INPUT_FN (). This function will be used to enter data into my neural network during the training and testing phases. There are many different ways to build input functions, but I will describe how to define and use a TF.ESTIMATOR.INPUTS.PANDAS_INPUT_FN (), because my data is in a pandas structure.

def wx_input_fn (X, Y=none, Num_epochs=none, Shuffle=true, batch_size=400): Return  
    Tf.estimator.inputs.pandas_ INPUT_FN (x=x,
                                               y=y,
                                               num_epochs=num_epochs,
                                               shuffle=shuffle,
                                               batch_size=batch_size)

Note that this wx_input_fn () function takes a required parameter and four optional parameters, and then gives these parameters to the TensorFlow input function, dedicated to the returned pandas data. This is a very powerful feature of the TensorFlow API. The parameters of the function are defined as follows: X: Enter the target value of one (training, evaluation, and predictive) y:x to be entered into the three Dnnregressor interface methods, which is optional and will not be supplied to the predictive call Num_epochs: Optional parameters. A new era occurs when the algorithm executes once on the entire dataset. Shuffle: Optional parameter that specifies whether the batch (subset) of the dataset is randomly selected each time the algorithm is executed batch_size: Number of samples to include each time the algorithm is executed

By defining our input functions, we can now train our neural networks based on the training dataset. For those of you who are familiar with the TensorFlow Advanced API, you may notice that the way I train my model is somewhat unorthodox. At least from the perspective of the current tutorials on the TensorFlow Web site and other tutorials on the web. Typically, you'll see something like the following.

Regressor.train (INPUT_FN=INPUT_FN (Training_data, Num_epochs=none, shuffle=true), Steps=some_large_number)

.....
Lots of log info ...  
.

The author then shows the Evaluate () function directly, and there is little hint as to what it does or why it exists in this line of code.

Regressor.evaluate (INPUT_FN=INPUT_FN (Eval_data, Num_epochs=1, Shuffle=false), Steps=1) ...
Less log info ...  
.

After that, assuming that all the training models are perfect, they jump straight to the Execute Predict () function.

predictions = Regressor.predict (Input_fn=input_fn (Pred_data, Num_epochs=1, Shuffle=false), Steps=1)  

I would like to be able to provide a reasonable explanation of how to train and evaluate the neural network so as to minimize the risk of fitting or fitting the model to the training data. So, without further delay, let me define a simple training cycle, train the training data and evaluate the assessment data on a regular basis.

evaluations = [] STEPS = to I in range (m): Regressor.train (Input_fn=wx_input_fn (X_train, y=y_train), step
                                                               S=steps) Evaluations.append (regressor.evaluate input_fn=wx_input_fn (X_val,
                                                               Y_val, Num_epochs=1,  
Shuffle=false)) INFO:tensorflow:Create Checkpointsaverhook.  
INFO:tensorflow:Saving checkpoints for 1 into tf_wx_model/model.ckpt.  INFO:tensorflow:step = 1, loss = 1.11335e+07 info:tensorflow:global_step/sec:75.7886 INFO:tensorflow:step = loss = 36981.3 (1.321 sec) info:tensorflow:global_step/sec:85.0322 ...
A WHOLE LOT of LOG OUTPUT ... INFO:tensorflow:step = 39901, loss = 5205.02 (1.233 sec) INFO:tensorflow:Saving checkpoints for 40000 to tf_wx_model/m  
Odel.ckpt. INFO:tensorflow:Loss for final step:4557.79. INFO:tensorflow:Starting Evaluation at 2017-12-05-13:48:43 INFO:tensorflow:Restoring parameters from tf_wx_model/model.ckpt-40000 INFO:tensorflow:Evaluation [1/1] INFO:tensorflow:Finished Evaluation at 2017-12-05-13:48:43 INFO:tensorflow:Saving dict to Global step 40000:average_lo SS = 10.2416, Global_step = 40000, loss = 1024.16 INFO:tensorflow:Starting evaluation at 2017-12-05-13:48:43 Info:tens orflow:restoring parameters from tf_wx_model/model.ckpt-40000 INFO:tensorflow:Finished evaluation at  2017-12-05-13:48:43 INFO:tensorflow:Saving dict for global step 40000:average_loss = 10.2416, Global_step = 40000, loss   = 1024.16

The loop above has been iterated 100 times. In the loop body, I called the Train () method of the regression object and passed it to my reusable wx_input_fn (), which passed my training feature set and goals. I intentionally num_epochs the default parameter equal to None, basically saying: "I don't care how many times you pass the training set, just continue the training algorithm on each default Batch_size 400" (about half of the training group). I also set the shuffle parameter to the default value true to randomly select data during training to avoid any sequential relationships in the data. The last parameter of the train () method is the step I set to 400, which means that each loop of the training set will be batch-processed 400 times.
This gives me a good time to explain the meaning of a epoch in more specific numbers. Recall that a epoch occurs when all the records of a training set are trained through a neural network. So, if our training focus has about 800 (exactly 797) records, and 400 for each batch, then we have a time for every two batches. So if we traverse the entire training set 100 iterations 400 steps, each batch size is 400 (one half time for each batch), we get:

(x 400/2) = 20,000 epochs

Now you may want to know why I execute and evaluate () method for each iteration of the loop and capture its output in the list. Let me first explain what happens each time the train () method is triggered. It randomly selects a batch of training records and pushes them through the network until a prediction is made and the loss function is calculated for each record. The weight of the optimizer is then adjusted according to the calculated loss, which makes a good adjustment to the direction of reducing the overall loss of the next iteration. Generally speaking, as long as the learning rate is small enough, these loss values gradually decline over time.
However, after a certain number of these learning iterations, the weight begins to be influenced not only by the overall trend of the data, but also by the actual noise being inherited from all real data. At this point, the network is unduly affected by the characteristics of the training data and becomes unable to generalize the predictions about the overall data. This is related to the deficiencies of many other tutorials in the Advanced TensorFlow API I mentioned earlier. It is important to periodically break this during training and evaluate how the model is extended to evaluate or validate the dataset. By looking at the evaluation output for the first loop iteration, let's take a moment to see the results returned by the evaluate () function.

Evaluations[0]  
{' Average_loss ': 31.116383, ' global_step ': $, ' loss ': 3111.6382}

As you can see, it outputs the average loss (mean square error) and the total loss of the steps in the training (squared error sum), which is the No. 400 step. In a well-trained network, you will often see a tendency to train and evaluate losses more or less in parallel. However, in an over configuration model at a point in time, the validation training set will no longer see the output of its evaluate () method being lowered, in fact, where the fitting begins to appear. This is where you want to stop the further training model, preferably before the change takes place.
Now that we have an evaluation set for each iteration, let's draw it as a function of the training steps to make sure we don't overdo our model. To do this, I'll use a simple scatter plot in the Matplotlib pyplot module.

Import Matplotlib.pyplot as Plt  
%matplotlib inline

# Manually set the parameters of the figure to and appropriate Size
plt.rcparams[' figure.figsize '] = [a]

loss_values = [ev[' loss '] for EV in evaluations]  
Training_ steps = [ev[' global_step '] for EV in evaluations]

Plt.scatter (x=training_steps, y=loss_values)  
Plt.xlabel (' Training steps (epochs = STEPS/2) ')  
Plt.ylabel (' Loss (SSE) ')  
plt.show ()  

From the chart above, I did not have an overly configurable model after all these iterations, because the assessment of losses never showed significant changes in the direction of added value. Now, I can safely continue to make predictions based on my remaining test dataset and evaluate how the model predicts the average weather temperature.
Similar to the other two regression methods I have demonstrated, the predict () method requires INPUT_FN, and I will pass INPUT_FN with a reusable wx_input_fn (), give it the test dataset, and assign the Num_epochs to none. Shuffle is false, so it will be sent to all data to be tested sequentially.
Next, I do some form of dicts iterations returned from the Predict () method so that I have an array of numpy predictions. Then, I use the Sklearn method Explain_variance_score (), Mean_absolute_error (), and Median_absolute_error () to predict the array to measure the relationship between the forecast and the known target y_test.

pred = Regressor.predict (Input_fn=wx_input_fn (X_test, Num_epochs=1, shuffle=false)) predictions = Np.array ([p[' predictions '] [  
                                            0] for P in pred]) print ("The explained variance:%.2f"% Explained_variance_score (  
                                            Y_test, predictions)) print ("The Mean absolute Error:%.2f degrees Celcius"% mean_absolute_error ( Y_test, predictions)) print ("The Median absolute Error:%.2f degrees Celcius"% Median _absolute_error (y_test, predictions)) INFO:tensorflow:Restoring parameters From tf_wx_model/model.ckpt-40000 the explained variance:0.88 Mean absolute error:3.11 degrees the Med Ian Absolute error:2.51 degrees Celcius 

I have used the same metrics as the linear regression technique associated with the previous article so that we can evaluate not only the model but also the comparisons. As you can see, the two models behave quite similarly, and a simpler linear regression model is slightly better. However, you can optimize the machine learning model by modifying parameters such as learning rate, width, and depth. Summary

This article demonstrates how to use the TensorFlow Advanced API Estimator Subclass Dnnregressor. And I have also described the neural network theory, how they are trained, and the importance of recognizing the dangers of overly fitted models in the process.
To demonstrate this process of building a neural network, I built a model that predicts the average temperature for the second day based on the numerical features collected in the first article in this series. The purpose of writing these articles is not to create a very good model for predicting weather, my goal is: to demonstrate from data collection, data processing, exploratory data analysis, model selection, model building and model evaluation for analysis (machine learning, data science, whatever ...). ) The general process of the project. Demonstrates how to use the two popular Python libraries Statsmodels and Scikit Learn to select meaningful functionality that does not violate key assumptions of linear regression technology. Demonstrates how to use a high-level tensorflow API and visually understand what is happening under all these layers of abstraction. Discusses the problems associated with the overly fitted model. Explain experimenting with multiple model types to best solve the problem. related articles

Using machine learning to predict weather Part two
Using machine learning to forecast the first part of the weather
Get me more interesting posts in English original address: http://www.bugcode.cn/mlweatherpart03.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.