Trouble
As a data analyst, you have been working in this multinational bank for half a year.
This morning, the boss called you to the office, dignified complexion.
You play the drums in your heart and you think you've stabbed something. Fortunately, the boss's words made you quickly dismiss your concerns.
Customers are mainly located in France, Germany and Spain.
The information you have in your hands, including their age, gender, credit, card information, etc. Whether the customer has lost information in the last column (Exited).
Please select the Python 3.6 version on the left to download the installation.
Next is the new folder, named Demo-customer-churn-ann, and download the data from this link and place it under the folder.
Click the New button at the top right of the interface to create a new Python 3 Notebook named Customer-churn-ann.
The preparation is over, and we begin to clean up the data.
Clean
First, read the pandas and NumPy packages that are most commonly used in data cleansing.
As you can see, the data is read-only in full error. But not all columns have a role to play in predicting user churn. Let's look at the following:
- RowNumber: line number, this is definitely useless, delete
- CustomerID: User number, this is issued sequentially, delete
- Surname: User name, no impact on churn, delete
- Creditscore: Credit score, this is important, keep
- Geography: The country in which the user is located, this has the effect of retaining
- Gender: User gender, may have an impact, retain
- Age: Ages, great influence, young people are more likely to switch banks, keep
- Tenure: When the bank for how many years users, it is important to keep
- Balance: Deposit and loan situation, very important, keep
- Numofproducts: Use of product quantity, very important, keep
- Hascrcard: Is it important to have our credit card reserved
- Isactivemember: Whether active users, very important, keep
- Estimatedsalary: Estimated income, very important, retained
- Exited: Whether it has been lost, this will serve as our label data
In the Scikit-learn Toolkit, a handy tool labelencoder is provided, allowing us to easily turn the category information into numerical values.
Is this the Bob?
No, gender fortunately, there are only two ways to take a value, either male or female. We can define "is male" as 1, then the female value is 0. The two types of values only describe different categories, without ambiguity.
And the geography is different. Because there are 3 possible countries in the dataset, they are converted to 0 (France), 1 (Germany), 2 (Spain). The question is, is there really a sequence (size) relationship between the three?
The answer is naturally negative. We're actually going to use numerical descriptions to classify them. However, the value of the number of sequence differences, the machine will be ambiguous. It is not clear that the different values are just the code of a country and may bring this size relationship into the model calculation, resulting in incorrect results.
No.
Because in this case, the 3-column number converted by Onehotencoder is actually not independent. Given two columns of information, you can calculate the 3rd column value yourself.
The first two columns of a row, say, are (0, 0), then the third column is definitely 1. Because this is the conversion rule. Only one of the 3 columns is 1, and the rest is 0.
If you have a multivariate linear regression, you should know that in this case, we need to remove one of the columns to continue the analysis. Otherwise it will fall into the "virtual variable trap" (dummy variable trap).
Let's delete the No. 0 column and avoid falling into the pit.
x = Np.delete (x, [0], 1)
Print the first line again:
So in the back of the training, he can and the previous feature matrix one by one corresponding to the operation of the calculation.
Since the label represents the category, we also use Onehotencoder to convert it so that we can do the classification study later.
Onehotencoder = Onehotencoder ()
y = Onehotencoder.fit_transform (y). ToArray ()
At this point the label becomes two columns of data, one column represents the customer's retention, and one column represents the customer churn.
Y
Array ([[0., 1.],
[1., 0.],
[0., 1.],
...,
[0., 1.],
[0., 1.],
[1., 0.]])
The overall data is complete. But we can't use them all for training.
You will find that many columns have a much smaller variance than they were originally. Machines will be more convenient to learn.
The data cleansing and transformation work is done.
Decision Tree
If I read my loan or no loan: How can I use Python and machine learning to help you make a decision? Article, you should have a feeling--the question is like a loan approval decision! Since the decision tree is very good in this article, do we continue to use the decision tree?
After testing, the decision tree in our data set, the performance is still good. The overall accuracy rate is 0.81, the recall rate is 0.80,F1 score of 0.81, is already very high. It is possible to judge 10 customers by the likelihood of loss, which can be judged correctly 8 times.
But is that enough?
We may be able to adjust the parameters of the decision tree to optimize and try to improve the prediction results.
Or we can use deep learning.
Depth
The use of deep learning scenarios is often due to the fact that the original model of the classic machine learning model is too simple to grasp complex data characteristics.
I'm not going to tell you a bunch of math formulas, let's do an experiment.
Please open this website.
You will see a deep learning playground as shown:
The graphic on the right is the blue data and the outer ring is the yellow data. Your task is to classify two different types of data in a model.
You say that's not easy? I can see it at one glance.
It doesn't work for you to see. With your settings, the machine can be correctly differentiated before it counts.
You see many plus minus signs in the picture. Let's play a model by manipulating them.
First, the "2 HIDDEN LAYERS" left minus in the middle of the dot graph reduces the number of hidden layers to 1.
Then, click on the minus sign above "2 neurons" to reduce the number of neurons to 1.
Open the Activation function drop-down box above the page and select "Sigmoid".
Now the model, in fact, is the classic logical regression (logistic Regression).
Click the Run button at the top left, and we'll see how it works.
Because the model is too simple, the machine racked its brains and tried to slice two types of nodes on a planar plane with a straight line.
The loss (loss) remains high. Training set and test set loss are around 0.4, obviously not in line with our classification requirements.
Let's try increasing the number of layers and neurons. This time, click the plus sign to add the hidden layer back to 2, and the number of neurons in both layers takes 2.
Click Run again.
After a period of time, the results stabilized, you found that the computer with two lines, the plane cut into 3 parts.
The loss of the test set drops to about 0.25, while the training set loss is reduced to less than 0.2.
The model is complex and the effect seems to be better.
Again, we'll increase the number of neurons in the first hidden layer to 4.
Click to run, and then something interesting happens.
The machine uses a nearly perfect curve to divide the plane into two parts inside and outside. Both the test set and the training set loss are dropped at a rapid speed, and the training set loss is even close to 0.
This tells us that many times the problem caused by the model is too simple to improve the complexity of the model by deepening the method of hiding the layers and increasing the neurons.
The current classification method is the number of hidden layers to distinguish whether "depth". When the number of hidden layers in a neural network reaches more than 3 layers, it is called "deep neural Network" or "deep learning".
Uh deep learning, it turns out to be so simple.
If you have time, you are advised to play more in this playground. You will soon have a perceptual understanding of neural networks and deep learning.
Framework
The engine behind the playground is Google's deep learning framework, TensorFlow.
The so-called framework, is someone else to help you construct a good basic software application. You can save time and improve efficiency by calling them to avoid reinventing the wheel yourself.
There are many frameworks for deep learning that support the Python language, in addition to TensorFlow, Pytorch, Theano and Mxnet.
My advice to you is to find a package you like, learn more, and practice to improve your skills. Never argue with others which deep learning framework is better. One of the radish cabbage each their own, everyone has their own preferences, and the deep study of the lake water is very deep, say many lost. If you say the wrong thing, the other door may be unhappy.
I like TensorFlow better. But TensorFlow itself is a bottom-level library. Although the interface becomes more and more easy to use as the version changes. But for beginners, many details are still too trivial and difficult to master.
Beginner's patience is limited, frustration is too easy to give up.
Fortunately, there are several highly abstract frameworks that are built on top of TensorFlow. If your task is to apply an out-of-the-box deep learning model, these frameworks will give you a lot of convenience.
These frameworks include Keras, Tensorlayer and so on. What we are going to use today is called Tflearn.
Its characteristic is that it looks like Scikit-learn. This will be especially easy to learn if you are familiar with the classic machine learning model.
Actual combat
Gossip says so much, let's keep writing code.
Before you write the code, go back to the terminal and run the following command to install several packages:
Pip Install TensorFlow
Pip Install Tflearn
After execution, go back to notebook.
We call the Tflearn framework.
Import Tflearn
Then we started to build the same building blocks and take the neural network layer.
The first is the input layer.
NET = Tflearn.input_data (Shape=[none, 11])
Note the notation here, because the data we enter is the characteristic matrix. After our processing, the feature matrix now has 11 columns, so the second item of shape is write 11.
The first item of shape, None, refers to the number of feature matrix rows we want to enter. Because we are now building a model, the latter feature matrix may be input at a time, it may be divided into block input, the length can be large and small, cannot be determined beforehand. So fill none here. Tflearn will handle this value by reading the dimension of the feature matrix when we actually perform the training.
Below we build hidden layers. Here we are going to use deep learning to build 3 layers.
NET = tflearn.fully_connected (NET, 6, activation= ' Relu ')
NET = tflearn.fully_connected (NET, 6, activation= ' Relu ')
NET = tflearn.fully_connected (NET, 6, activation= ' Relu ')
Activation we met in the deep learning playground just now, representing the activation function. Without it, all input and output are linear.
The Relu function is one of the activation functions. It probably looks like this.
If you want to learn more about activating functions, refer to the Learning Resources section later in this article.
In the hidden layer, we set up 6 neurons in each layer. So far, there is no computational formula for the optimal number of neurons. An engineering practice is to take the number of neurons in the input layer, plus the number of neurons in the output layer, divided by 2. We're using this method here to get 6.
Set up 3 middle hidden layers, let's build the output layer below.
NET = tflearn.fully_connected (NET, 2, activation= ' Softmax ')
NET = tflearn.regression (NET)
Here we use two neurons to do the output, and we illustrate using the regression method. The activation function selected by the output layer is Softmax. When dealing with classification tasks, Softmax is more appropriate. It tells us the probability of each class, where the highest value can be the result of our classification.
The building blocks were finished, and we told Tflearn to build the model with the structure we just built.
Model = Tflearn. DNN (NET)
With the model, we can use the Fit function. Do you think it's very similar to Scikit-learn's use method?
Model.fit (X_train, Y_train, n_epoch=30, batch_size=32, Show_metric=true)
Note that there are a few more parameters here, let's explain.
N_epoch
Batch_size
Show_metric
The following is the final training result of the computer output. In fact, the middle of the process to see more exciting, you have to try to know.
Training step:7499 | Total loss: [1m[32m0.39757[0m[0m | time:0.656s
| Adam | epoch:030 | loss:0.39757-acc:0.8493--iter:7968/8000
Training step:7500 | Total loss: [1m[32m0.40385[0m[0m | time:0.659s
| Adam | epoch:030 | loss:0.40385-acc:0.8487--iter:8000/8000
--
We see the loss of the training set (loss) at about 0.4.
Open the terminal, we enter
Tensorboard--logdir=/tmp/tflearn_logs/
Then enter http://localhost:6006/in the browser.
You can see the following interface:
This is the visualization of the model training process, which can see the curve of accuracy rising and loss decreasing.
Open the Graphs tab and we can see the structure of the neural network graph.
We take the building block process, here at a glance.
Evaluation
Training the model, let's try to make a prediction.
Look at the first line of the feature matrix of the test set.
X_TEST[0]
Array ([1.75486502,-0.57369368,-0.55204276,-1.09168714,-0.36890377,
1.04473698, 0.8793029,-0.92159124, 0.64259497, 0.9687384,
1.61085707])
We'll use it to predict the classification results.
y_pred = Model.predict (x_test)
Print it out to see:
Y_PRED[0]
Array ([0.70956731, 0.29043278], Dtype=float32)
The model determines that the customer is 0.70956731 less likely to lose.
Let's look at the actual tag data:
Y_TEST[0]
Array ([1., 0.])
The customer did not lose. The prediction is right.
But the prediction of a data is not correct, it is not able to explain the problem. We run the entire test set below and use the Evaluate function to evaluate the model.
Score = Model.evaluate (X_test, Y_test)
Print (' Test accuarcy:%0.4f%% '% (score[0] * 100))
Test accuarcy:84.1500%
On the test set, the accuracy reached 84.15%, good!
Hope that in your efforts, the machine to make accurate judgments can help the bank to effectively lock down the potential loss of customers, reduce customer turnover rate, continue to fight gold.
Description
You may think that deep learning is nothing serious. The original decision tree algorithm, so simple can be achieved, can also achieve more than 80% accuracy. Having written so many sentences, the results of deep learning have only increased by a few percentage points.
First, when the accuracy reaches a certain height, ascension is not easy. This is like a student exam, never pass to pass, the effort does not need to be very high, from 95 points to 100, but many people have not completed the goal for a lifetime.
Second, in some areas, a 1% increase means that millions of dollars in profits, or thousands of lives, are saved.
Third, the rise of deep learning is due to the big Data environment. In many cases, the more data you have, the more obvious the benefits of deep learning will be. There are only 10,000 records in this example that are far from the size of big data.
Learning Resources
If you are interested in deep learning, the following learning resources are recommended.
The first is the textbook.
The first one is the deep learning, the absolute classics.
The second one is hands-on machine learning with Scikit-learn and tensorflow:concepts, Tools, and techniques to Build Intelligent Syst EMS, simple and easy to understand.
Incoming group: 125240963 to get dozens of sets of PDFs Oh!
How can python and deep neural networks be used to lock out customers who are about to churn? Performance over 100,000!