Cross-entropy cost function

Last Update:2015-03-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article is part of the third chapter of neural networks and deep learning, and discusses the cross-entropy cost function used in machine learning algorithm.

1. From the variance cost function

The cost function often uses the variance cost function (i.e., the mean square error MSE), for example, for a neuron (single input single output, sigmoid function), the cost function is defined as:

Where y is the output we expect, A is the actual output of the neuron "a=σ (z), where Z=wx+b".

In the course of training the neural network, we update W and B with the gradient descent algorithm, so we need to calculate the derivative of the cost function to W and B:

Then update W, B:

W <--w-η*? C/?w = w-η* a *σ′ (z)

b <--b-η*? C/?b = b-η* a *σ′ (z)

Because of the nature of the sigmoid function, σ′ (z) will be very small when Z takes most of the value (as the labeled ends are almost flat), which makes the W and b updates very slow (because η * a *σ′ (z) is close to 0).

2. Cross-entropy cost function (cross-entropy)

To overcome this shortcoming, the cross-entropy cost function is introduced (the following formula corresponds to one neuron, multiple input single output):

Where y is the desired output, a is the actual output of the neuron "a=σ (z), where z=∑wj*xj+b"

As with the variance cost function, the cross-entropy cost function also has two properties :

Non-negative nature. (So our goal is to minimize the cost function)
The cost function is close to 0 when the real output A is close to the desired output Y. (such as y=0,a~0;y=1,a~1, the cost function is close to 0).

In addition, it can overcome the problem that the variance cost function updates the weight too slowly. We also look at the derivative of it:

As you can see, there is no σ′ (z) in the derivative, and the update of the weights is subject to the effect of σ (z) y, which is affected by the error. So when the error is large, the weight update is fast, when the error is small, the weight of the update is slow. This is a very good nature.

3. Summary

When we use the sigmoid function as the activation function of neurons, it is better to use the cross-entropy cost function instead of the variance cost function to avoid the training process being too slow.
However, you may ask, why is the cross-entropy function? There are countless kinds of functions without σ′ (z) in the derivative, how can we think of the cross entropy function? This is naturally there is a story, more in-depth discussion will not write, youth please self-understanding.
In addition, the cross-entropy function is in the form of? [Ylna+ (1?y) ln (1?A)] instead of? [Alny+ (1?a) ln (1?y)], why? Because Lny has no meaning when the desired output is y=0, ln (1-y) is meaningless when Y=1 is expected. And since a is the actual output of the sigmoid function, it never equals 0 or 1, and is only infinitely close to 0 or 1, so there is no problem.

4. Also say: Log-likelihood cost

The logarithmic likelihood function is also often used as the cost function of Softmax regression, in the above discussion, our last layer (i.e. the output) is through the sigmoid function, so the cross-entropy cost function is adopted. The more common practice in deep learning is to use Softmax as the last layer, at which time the cost function is log-likelihood costs.

In fact, it's useful to think of a Softmax output layer with Log-likelihood cost as being quite similar to a sigmoid OUTPU T layer with cross-entropy cost.

In fact, the two are consistent, the logistic regression is the sigmoid function, Softmax regression is a logistic regression of the multi-category extension. Log-likelihood cost function can be reduced to the form of cross-entropy cost function in two categories. Refer to UFLDL tutorial for details

Reprint Please specify source: http://blog.csdn.net/u012162613/article/details/44239919

Cross-entropy cost function

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Cross-entropy cost function

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Cross-entropy cost function

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support