Python machine learning notes: Using Keras for multi-class classification

Source: Internet
Author: User
Tags prepare random seed shuffle wrappers theano uci machine learning repository keras

Keras is a python library for deep learning that contains efficient numerical libraries Theano and TensorFlow.

The purpose of this article is to learn how to load data from CSV and make it available for keras use, how to model the data of multi-class classification using neural network, and how to use Scikit-learn to evaluate Keras neural network models.

Preface, the concept description of two classification and multi-classification

(preface is to organize other people's blog notes 77774223)

1, in LR (logistic regression), how to do multi-classification?

In general, the LR model we know is a two classification model, but can we use LR for multi-classification tasks? The answer is of course possible.

But what we need to be aware of is that we have a lot of ideas for classifying with LR.

2, the idea of training multiple two classifiers

Since the natural LR is used to do two classifications, it is natural to think of dividing the multi-classification into multiple two-classification tasks.

The following three strategies are in particular:

2.1 One-on-one

If there are n categories in a category, we will match the N categories to 22 (22 pairs to two classification problems). Then we can get a two classifier. (A simple explanation, the equivalent of 2 in the N categories)

Later, during the testing phase, we handed the new sample to the two classifier. So we can get a classification result. Use the most predictable category as the result of the prediction.  Below, I give a concrete example to understand. The meaning is actually very obvious, first put the category 22 combination (6 kinds of combinations). After the combination, one of the categories as a positive class and the other as a negative class (the positive or negative is relative, the purpose is to convert to two classification). Then train each of the two classifiers. You can get 6 two classifiers. The test sample is then predicted on 6 two classifiers. As you can see from the results, Category 1 is the most predictable, so the test sample belongs to Category 1. 2.2 Pair of remaining (OvR)A pair of others is actually better understood, each time a category is treated as a positive class, and the remaining categories as negative classes. In this case there is a total (n classifiers). If only one classifier is predicted to be a positive class at the time of testing, the corresponding category is marked as the final classification result.  For example, the following example. Probably explained that there is when there are 4 categories, each time one of the categories as a positive category, the rest as negative category, a total of 4 combinations, for the 4 of the combination of classifier training, we can get 4 classifiers.  For the test sample, put in 4 classifiers for prediction, only one classifier is predicted to be positive class, so take the result of this classifier as the prediction result, the Classifier 2 predicts the result is category 2, so this sample belongs to Category 2. In fact, some people will have doubts, then the prediction of the negative class of the classifier is not the tube? Yes, because there are many possibilities for predicting negative classes, it is not possible to determine which category is only determined when the prediction is a positive class. For example, for the classifier 3, the classification result is negative class, but the negative class has category 1, Category 2, category 43, in the end what kind of? 2.3-to-many (MvM)The so-called many-to-many is actually the multiple categories as the positive class, multiple categories as negative class. This article does not introduce this method, in detail can refer to Zhou Zhihua Watermelon book p64-p65. 3, for the above method is actually training more than two classifiers, then there is no more direct method of LR to multi-classification it? We know that for the two classification of LR, the probability of the positive and negative classes are as follows: For multi-classification, I just need to make a simple modification on it. Assuming that a classification task has k categories, the probability for each category is: For Class K, for the rest of the class, the problem description

  In this study, we will use Iris Flower Data Set The standard machine learning problem .

This data set is well researched and is a good question to practice on neural networks because all 4 input variables are numeric and have the same centimeter level. Each instance describes the observed flower measurement properties, and the output variable is a specific iris type.

This is a multi-category classification problem, meaning that more than two classes need to be predicted, and there are actually three species of flowers. This is an important type of problem with neural network exercises, since three class values require special processing.

Iris Data Set is a fully researched issue that we can expected The accuracy of the implementation Model is within the range of 95% to 97%, which provides a good target for developing our model.

You can from the UCI Machine Learning Library Download the iris data set , and put it in the current working directory, the file name is " iris.csv".

5.1, 3.5, 1.4, 0.2, Iris-setosa
4.9,3.0,1.4,0.2, Iris-setosa
4.7, 3.2, 1.3, 0.2, Iris-setosa
4.6,3.1,1.5,0.2, Iris-setosa
5.0, 3.6, 1.4, 0.2, Iris-setosa
5.4,3.9,1.7,0.4, Iris-setosa
4.6,3.4,1.4,0.3, Iris-setosa
5.0,3.4,1.5,0.2, Iris-setosa
4.4, 2.9, 1.4, 0.2, Iris-setosa
4.9,3.1,1.5,0.1, Iris-setosa
5.4, 3.7, 1.5, 0.2, Iris-setosa
4.8,3.4,1.6,0.2, Iris-setosa
4.8,3.0,1.4,0.1, Iris-setosa
4.3,3.0,1.1,0.1, Iris-setosa
5.8,4.0,1.2,0.2, Iris-setosa
5.7,4.4,1.5,0.4, Iris-setosa
5.4,3.9,1.3,0.4, Iris-setosa
5.1, 3.5, 1.4, 0.3, Iris-setosa
5.7,3.8,1.7,0.3, Iris-setosa
5.1, 3.8, 1.5, 0.3, Iris-setosa
5.4,3.4,1.7,0.2, Iris-setosa
5.1, 3.7, 1.5, 0.4, Iris-setosa
4.6,3.6,1.0,0.2, Iris-setosa
5.1,3.3,1.7,0.5, Iris-setosa
4.8,3.4,1.9,0.2, Iris-setosa
5.0,3.0,1.6,0.2, Iris-setosa
5.0,3.4,1.6,0.4, Iris-setosa
5.2,3.5,1.5,0.2, Iris-setosa
5.2,3.4,1.4,0.2, Iris-setosa
4.7,3.2,1.6,0.2, Iris-setosa
4.8,3.1,1.6,0.2, Iris-setosa
5.4,3.4,1.5,0.4, Iris-setosa
5.2,4.1,1.5,0.1, Iris-setosa
5.5, 4.2, 1.4, 0.2, Iris-setosa
4.9,3.1,1.5,0.1, Iris-setosa
5.0,3.2,1.2,0.2, Iris-setosa
5.5, 3.5, 1.3, 0.2, Iris-setosa
4.9,3.1,1.5,0.1, Iris-setosa
4.4,3.0,1.3,0.2, Iris-setosa
5.1,3.4,1.5,0.2, Iris-setosa
5.0, 3.5, 1.3, 0.3, Iris-setosa
4.5,2.3,1.3,0.3, Iris-setosa
4.4,3.2,1.3,0.2, Iris-setosa
5.0, 3.5, 1.6, 0.6, Iris-setosa
5.1, 3.8, 1.9, 0.4, Iris-setosa
4.8,3.0,1.4,0.3, Iris-setosa
5.1, 3.8, 1.6, 0.2, Iris-setosa
4.6,3.2,1.4,0.2, Iris-setosa
5.3, 3.7, 1.5, 0.2, Iris-setosa
5.0,3.3,1.4,0.2, Iris-setosa
7.0,3.2,4.7,1.4, Iris-versicolor
6.4,3.2,4.5,1.5, Iris-versicolor
6.9,3.1,4.9,1.5, Iris-versicolor
5.5,2.3,4.0,1.3, Iris-versicolor
6.5, 2.8, 4.6, 1.5, Iris-versicolor
5.7,2.8,4.5,1.3, Iris-versicolor
6.3,3.3,4.7,1.6, Iris-versicolor
4.9,2.4,3.3,1.0, Iris-versicolor
6.6,2.9,4.6,1.3, Iris-versicolor
5.2,2.7,3.9,1.4, Iris-versicolor
5.0,2.0,3.5,1.0, Iris-versicolor
5.9,3.0,4.2,1.5, Iris-versicolor
6.0,2.2,4.0,1.0, Iris-versicolor
6.1,2.9,4.7,1.4, Iris-versicolor
5.6, 2.9, 3.6, 1.3, Iris-versicolor
6.7,3.1,4.4,1.4, Iris-versicolor
5.6,3.0,4.5,1.5, Iris-versicolor
5.8,2.7,4.1,1.0, Iris-versicolor
6.2,2.2,4.5,1.5, Iris-versicolor
5.6, 2.5, 3.9, 1.1, Iris-versicolor
5.9,3.2,4.8,1.8, Iris-versicolor
6.1, 2.8, 4.0, 1.3, Iris-versicolor
6.3,2.5,4.9,1.5, Iris-versicolor
6.1,2.8,4.7,1.2, Iris-versicolor
6.4,2.9,4.3,1.3, Iris-versicolor
6.6,3.0,4.4,1.4, Iris-versicolor
6.8, 2.8, 4.8, 1.4, Iris-versicolor
6.7,3.0,5.0,1.7, Iris-versicolor
6.0,2.9,4.5,1.5, Iris-versicolor
5.7, 2.6, 3.5, 1.0, Iris-versicolor
5.5,2.4,3.8,1.1, Iris-versicolor
5.5,2.4,3.7,1.0, Iris-versicolor
5.8,2.7,3.9,1.2, Iris-versicolor
6.0,2.7,5.1,1.6, Iris-versicolor
5.4,3.0,4.5,1.5, Iris-versicolor
6.0,3.4,4.5,1.6, Iris-versicolor
6.7,3.1,4.7,1.5, Iris-versicolor
6.3,2.3,4.4,1.3, Iris-versicolor
5.6,3.0,4.1,1.3, Iris-versicolor
5.5, 2.5, 4.0, 1.3, Iris-versicolor
5.5,2.6,4.4,1.2, Iris-versicolor
6.1,3.0,4.6,1.4, Iris-versicolor
5.8, 2.6, 4.0, 1.2, Iris-versicolor
5.0,2.3,3.3,1.0, Iris-versicolor
5.6,2.7,4.2,1.3, Iris-versicolor
5.7,3.0,4.2,1.2, Iris-versicolor
5.7,2.9,4.2,1.3, Iris-versicolor
6.2,2.9,4.3,1.3, Iris-versicolor
5.1, 2.5, 3.0, 1.1, Iris-versicolor
5.7,2.8,4.1,1.3, Iris-versicolor
6.3,3.3,6.0,2.5, Iris-virginica
5.8,2.7,5.1,1.9, Iris-virginica
7.1,3.0,5.9,2.1, Iris-virginica
6.3,2.9,5.6,1.8, Iris-virginica
6.5,3.0,5.8,2.2, Iris-virginica
7.6,3.0,6.6,2.1, Iris-virginica
4.9, 2.5, 4.5, 1.7, Iris-virginica
7.3,2.9,6.3,1.8, Iris-virginica
6.7, 2.5, 5.8, 1.8, Iris-virginica
7.2,3.6,6.1,2.5, Iris-virginica
6.5, 3.2, 5.1, 2.0, Iris-virginica
6.4,2.7,5.3,1.9, Iris-virginica
6.8,3.0,5.5,2.1, Iris-virginica
5.7,2.5,5.0,2.0, Iris-virginica
5.8,2.8,5.1,2.4, Iris-virginica
6.4,3.2,5.3,2.3, Iris-virginica
6.5,3.0,5.5,1.8, Iris-virginica
7.7,3.8,6.7,2.2, Iris-virginica
7.7, 2.6, 6.9, 2.3, Iris-virginica
6.0,2.2,5.0,1.5, Iris-virginica
6.9,3.2,5.7,2.3, Iris-virginica
5.6, 2.8, 4.9, 2.0, Iris-virginica
7.7, 2.8, 6.7, 2.0, Iris-virginica
6.3, 2.7, 4.9, 1.8, Iris-virginica
6.7,3.3,5.7,2.1, Iris-virginica
7.2,3.2,6.0,1.8, Iris-virginica
6.2, 2.8, 4.8, 1.8, Iris-virginica
6.1,3.0,4.9,1.8, Iris-virginica
6.4, 2.8, 5.6, 2.1, Iris-virginica
7.2,3.0,5.8,1.6, Iris-virginica
7.4, 2.8, 6.1, 1.9, Iris-virginica
7.9,3.8,6.4,2.0, Iris-virginica
6.4, 2.8, 5.6, 2.2, Iris-virginica
6.3,2.8,5.1,1.5, Iris-virginica
6.1, 2.6, 5.6, 1.4, Iris-virginica
7.7,3.0,6.1,2.3, Iris-virginica
6.3,3.4,5.6,2.4, Iris-virginica
6.4,3.1,5.5,1.8, Iris-virginica
6.0,3.0,4.8,1.8, Iris-virginica
6.9,3.1,5.4,2.1, Iris-virginica
6.7,3.1,5.6,2.4, Iris-virginica
6.9,3.1,5.1,2.3, Iris-virginica
5.8,2.7,5.1,1.9, Iris-virginica
6.8,3.2,5.9,2.3, Iris-virginica
6.7,3.3,5.7,2.5, Iris-virginica
6.7,3.0,5.2,2.3, Iris-virginica
6.3, 2.5, 5.0, 1.9, Iris-virginica
6.5,3.0,5.2,2.0, Iris-virginica
6.2,3.4,5.4,2.3, Iris-virginica
5.9,3.0,5.1,1.8, Iris-virginica

Second, import classes and functions
We start by importing all the classes and functions needed in this article. This includes features that require Keras, as well as data loading from pandas and data preparation and model evaluation from scikit-learn.

import numpy
import pandas
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.utils import np_utils
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import LabelEncoder
from sklearn.pipeline import Pipeline

Three, initialize the random number generator
Below, we initialize the random number generator to a constant value (7)

This is very important to ensure that we can accurately achieve the results obtained from the model again, it ensures that the random process of training the neural network model can be reproduced.

# fix random seed for reproducibility
seed = 7
numpy.random.seed (seed)

Fourth, record the data set
Can directly load the data set. Because the output variable contains strings, it is easiest to load data using pandas. Then we can split the attributes (columns) into input variables (X) and output variables (Y).

# load dataset
dataframe = pandas.read_csv ("iris.csv", header = None)
dataset = dataframe.values
X = dataset [:, 0: 4] .astype (float)
Y = dataset [:, 4]

Five, coding output variables
The output variable contains three different string values.

When using neural networks to model multi-class classification problems, it is good practice to reshape the output attributes of the vector containing the values of each class value into a matrix, each class value has a Boolean value, and a given instance Does it have this value? Is there a class value?

This is called one hot encoding or creating dummy variables from categorical variables.

For example: In this problem, the three class values are Iris-setosa, Iris-versicolor and Iris-virginica. If we have observations:

Multi-class classification problems can be decomposed into multiple two-class classification problems in essence, and there are many ways to solve the two-class classification problems. Here we use ANN (artificial neural network) in Keras machine learning framework to solve the multi-classification problem. The example we use here is the iris flower dataset in the famous UCI Machine Learning Repository.
The multi-class classification problem is similar to the second-class classification problem, and the output label of the categorical function (categorical function) needs to be converted into a numerical variable. This problem is directly converted to (0, 1) (sigmoid function for the output layer) or (-1, 1) (tanh function for the output layer) during the second classification. Similarly, in the multi-classification problem, we will convert to a dummy variable: a one hot encoding method is used to convert the vector of the output label to only the column where the corresponding label appears, and the rest is 0. Boolean matrix. Take the iris data we used as an example:

sample, label
1, Iris-setosa
2, Iris-versicolor
3, Iris-virginica
 After conversion with one hot encoding is as follows:

sample, Iris-setosa, Iris-versicolor, Iris-virginica
1, 1, 0, 0
2, 0, 1, 0
3, 0, 0, 1
 Be careful not to convert labels directly into numerical variables, such as 1,2,3. This is more a regression prediction problem than a prediction problem. The latter is more difficult than the former. (When there are more categories, the span of the output value will be larger, and the activation function of the output layer can only be linear at this time)


We can accomplish this by first using the scikit-learn class LabelEncoder to consistently encode strings into integers. Then use Keras function to_categorical () to convert the integer vector to a hot code

# encode class values as integers
encoder = LabelEncoder () (Y)
encoded_Y = encoder.transform (Y)
# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical (encoded_Y)
 Six, define the neural network model
Keras library provides wrapper classes that allow you to use neural network models developed by Keras in scikit-learn.

There is a KerasClassifier class in Keras, which can be used as an Estimator in scikit-learn, which is a basic type model in the library. KerasClassifier takes the name of the function as a parameter. This function must return the constructed neural network model to prepare for training.

Here is a function that will create a baseline neural network for the iris classification problem. It creates a simple fully connected network with a hidden layer containing 8 neurons.

The hidden layer uses a rectifier to activate the function, which is a good practice. Because we used one-hot encoding for the iris flower dataset, the output layer must create 3 output values, one for each class. The output value with the maximum value will be regarded as the class predicted by the model.

The network topology of this simple single-layer neural network can be summarized as:

4 inputs-> [8 hidden nodes]-> 3 outputs
Please note that we use the "softmax" activation function in the output layer. This is to ensure that the output value is in the range of 0 and 1, and can be used as the prediction probability.

Finally, the network uses an efficient Adam gradient descent optimization algorithm with a log loss function, which is called "categorical_crossentropy" in Keras.

# define baseline model
def baseline_model ():
# create model
model = Sequential ()
model.add (Dense (8, input_dim = 4, activation = 'relu'))
model.add (Dense (3, activation = 'softmax'))
# Compile model
model.compile (loss = 'categorical_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
return model
 We can now create our KerasClassifier for scikit-learn.

We can also pass parameters in the construction of KerasClassifier class, which will be passed to the internal fit () function for training neural networks. Here, we pass the number of epochs to 200 and the batch size to 5 for use in training the model. By setting verbose to 0, debugging will also be turned off during training.

estimator = KerasClassifier (build_fn = baseline_model, epochs = 200, batch_size = 5, verbose = 0)
 Seven, use k-fold cross-validation evaluation model
Keras is a simple modular neural network framework developed based on the bottom layer of Theano or Tensorflow, so building a network structure with Keras will be simpler than Tensorflow. Here we will use the KerasClassifier class provided by Keras. This class can be used as an Estimator in the scikit-learn package, so using this class we can easily call some functions in the sklearn package for data preprocessing and result evaluation (this is sklearn The basic types of models in the package).
For the network structure, we use a 3-layer omnidirectional connection, with 4 nodes in the input layer, 10 nodes in the hidden layer, and 3 nodes in the output layer. Among them, the activation function of the hidden layer is relu (rectifier), and the activation function of the output layer is softmax. For the loss function, select categorical_crossentropy (this function comes from theano or tensorflow, see here for details) (for the second category, generally choose activation = ‘sigmoid’, loss = ‘binary_crossentropy’)
PS: For the multi-class classification network structure, adding an intermediate hidden layer can improve the training accuracy, but the required calculation time and space will increase, so it is necessary to test to choose an appropriate number, here we set it to 10; Each layer's dropout rate (dropout) also needs to be adjusted accordingly (too high to underfit, too low to overfit), here we set it to 0.2.

We can now evaluate the neural network model on the training data.

Scikit-learn has an excellent ability to evaluate models using a set of techniques. The gold standard for evaluating machine learning models is k-fold cross-validation.

First, we can define the model evaluation procedure. Here, we set the number of folds to 10 (a good default) and shuffle the data before partitioning.

kfold = KFold (n_splits = 10, shuffle = True, random_state = seed)

Now we can evaluate our model (estimator) on our dataset (X and dummy_y) using a 10-fold cross-validation procedure (kfold).

Takes only about 10 seconds to evaluate the model and returns an object that describes the evaluation of the 10 building models for each segmentation of the data set.

results = cross_val_score (estimator, X, dummy_y, cv = kfold)
print ("Baseline:% .2f %% (% .2f %%)"% (results.mean () * 100, results.std () * 100))
 The results are summarized as the mean and standard deviation of the model accuracy on the data set. This is a reasonable estimate of the model performance of invisible data. For this problem, it also belongs to the range of known best results.

Accuracy: 97.33% (4.42%)
Eight, code implementation
import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.wrappers.scikit_learn import KerasClassifier
from keras.utils import np_utils
from sklearn.model_selection import train_test_split, KFold, cross_val_score
from sklearn.preprocessing import LabelEncoder

# load dataset
dataframe = pd.read_csv ("iris.csv", header = None)
dataset = dataframe.values
X = dataset [:, 0: 4] .astype (float)
Y = dataset [:, 4]

# encode class values as integers
encoder = LabelEncoder ()
encoded_Y = encoder.fit_transform (Y)
# convert integers to dummy variables (one hot encoding)
dummy_y = np_utils.to_categorical (encoded_Y)

# define model structure
def baseline_model ():
    model = Sequential ()
    model.add (Dense (output_dim = 10, input_dim = 4, activation = 'relu'))
    model.add (Dropout (0.2))
    model.add (Dense (output_dim = 3, input_dim = 10, activation = 'softmax'))
    # Compile model
    model.compile (loss = 'categorical_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
    return model
estimator = KerasClassifier (build_fn = baseline_model, nb_epoch = 40, batch_size = 256)
# splitting data into training set and test set. If random_state is set to an integer, the split datasets are fixed.
X_train, X_test, Y_train, Y_test = train_test_split (X, dummy_y, test_size = 0.3, random_state = 0) (X_train, Y_train)

# make predictions
pred = estimator.predict (X_test)

# inverse numeric variables to initial categorical labels
init_lables = encoder.inverse_transform (pred)

# k-fold cross-validate
seed = 42
np.random.seed (seed)
kfold = KFold (n_splits = 10, shuffle = True, random_state = seed)
results = cross_val_score (estimator, X, dummy_y, cv = kfold)
 Nine, summary
In this article, we learned how to use Keras Python library to develop and evaluate neural networks for deep learning. Learned the following knowledge:

How to load data and make it available for Keras.
How to use one hot code to prepare multi-class classification data for modeling.
How to use keras neural network model and scikit-learn.
How to use Keras to define neural networks for multi-class classification.
How to use scikit-learn with k-fold cross-validation to evaluate Keras neural network model
Ten, reference

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.