International - English

Cart Console

Topic Center

Contact Sales

Home > Others

Cross-entropy cost function (function and formula derivation)

Last Update:2018-07-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The cross-entropy cost function (cross-entropy) is a way to measure the predicted and actual values of an artificial neural network (ANN). Compared with the two-time cost function, it can promote the training of Ann more effectively. Before introducing the cross-entropy cost function, this paper briefly introduces two cost functions and their shortcomings.

1. Shortage of two-time cost function

One of the aims of Ann is to enable machines to learn knowledge as humans do. When people learn to analyze new things, when they find themselves making mistakes, the greater the intensity of correction. such as shooting: when the athlete found his shooting direction from the right direction farther, then he adjusted the shooting angle should be bigger, the basketball is easier to throw into the basket. Similarly, we hope that: when Ann is training, if the error between the predicted value and the actual value is greater, then in the course of the reverse propagation training, the range of parameter adjustment will be greater, so that the training can converge faster. However, if you use the two cost function to train Ann, the actual effect is that if the error is greater, the amplitude of the parameter adjustment may be smaller and the training will be slower.

Taking a neuron's class two classification training as an example, two experiments (Ann's usual activation function is the sigmoid function, the experiment also uses this function): input an identical sample data x=1.0 (the sample corresponding to the actual classification Y=0), two experiments each random initialization parameters, Thus, different output values are obtained after the first forward propagation of each individual, resulting in different costs (errors):

Experiment 1: The first output value is 0.82

Experiment 2: The first output value is 0.98

In experiment 1, the random initialization of the parameters, so that the first output value of 0.82 (the sample corresponding to the actual value of 0); After 300 iterations of training, the output value is reduced from 0.82 to 0.09, approximating the actual value. In Experiment 2, the first output value is 0.98, and the output value is only reduced to 0.20 after 300 iterations.

It can be seen from the cost curve of two experiments that the cost of experiment 1 decreases rapidly with the increase of training times, but the cost of experiment 2 decreases very slowly at first, and intuitively, the higher the initial error, the slower the convergence.

In fact, the reason for the slow training due to the large error is that two-time cost function is used. The formula for the two-time cost function is as follows:

where c is the price, x represents the sample, Y represents the actual value, a represents the output value, and n represents the total number of samples. For the sake of simplicity, the same sample is illustrated as an example, at which time the two cost functions are:

At present, the most effective algorithm for training Ann is the inverse propagation algorithm. In short, the training of Ann is through the reverse propagation cost, to reduce the cost-oriented, adjust parameters. The main parameters are: The connection weights between the neurons w, and the bias B of each neuron itself. The method of parameter adjustment is to use the gradient descent algorithm (Gradient descent) to adjust the parametric size along the gradient direction. The gradient of W and B is deduced as follows:

where z represents the input of the neuron, indicating the activation function. As can be seen from the above formula, the gradient of W and b is proportional to the gradient of the activation function, the greater the gradient of the activation function, the faster the size of W and b adjusts, and the quicker the training converges. The usual activation function of the neural network is the sigmoid function, and the curve of the function is as follows:

As shown in the figure, the initial output value of Experiment 2 (0.98) corresponds to a gradient significantly smaller than the output value of experiment 1 (0.82), so the parameter gradient of experiment 2 is slower than the experimental 1. This is the higher the initial cost (error), the more slowly the training causes. Contrary to our expectations, that is: not like people, the greater the error, the greater the extent of correction, so that the faster you learn.

One might say that choosing an activation function that does not change or change the gradient does not solve the problem. The pattern Tucson, which, though simply and rudely solves the problem, may cause more and more troublesome problems. Furthermore, functions such as sigmoid, such as the Tanh function, have many advantages, and are well suited for activating functions, specifically for Google's own use.

2. Cross-entropy cost function

In other words, instead of activating the function, we replace the two-time cost function with the cross-entropy cost function:

where x represents the sample, and n represents the total number of samples. Then, recalculate the gradient of the parameter w:

Among them (see Appendix for specific proof):

Thus, the original in the gradient formula of W is eliminated; In addition, the gradient formula represents the error between the output value and the actual value. Therefore, the higher the error, the greater the gradient, the faster the parameter w adjusts, the faster the training speed. In the same vein, the gradient of B is:

It is proved that the training effect brought by cross-entropy cost function is better than the two-time cost function.

3. How the cross-entropy cost function is generated.

Taking the gradient calculation of bias B as an example, the cross-entropy cost function is deduced:

In the 1th section, the gradient formula for B, deduced from the two-time cost function, is:

To eliminate the equation, we want to find a cost function that makes:

That

To calculate the integral on both sides, you can get:

And this is the cross-entropy cost function described earlier.

Appendix:

The sigmoid function is:

Can be certified:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Cross-entropy cost function (function and formula derivation)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support