Cross-entropy cost function

Source: Internet
Author: User

This article is part of the third chapter of neural networks and deep learning, and discusses the cross-entropy cost function used in machine learning algorithm.

1. From the variance cost function

The cost function often uses the variance cost function (i.e., the mean square error MSE), for example, for a neuron (single input single output, sigmoid function), the cost function is defined as:

Where y is the output we expect, A is the actual output of the neuron "a=σ (z), where Z=wx+b".

In the course of training the neural network, we update W and B with the gradient descent algorithm, so we need to calculate the derivative of the cost function to W and B:

Then update W, B:

W <--w-η*? C/?w = w-η* a *σ′ (z)

b <--b-η*? C/?b = b-η* a *σ′ (z)

Because of the nature of the sigmoid function, σ′ (z) will be very small when Z takes most of the value (as the labeled ends are almost flat), which makes the W and b updates very slow (because η * a *σ′ (z) is close to 0).

2. Cross-entropy cost function (cross-entropy)

To overcome this shortcoming, the cross-entropy cost function is introduced (the following formula corresponds to one neuron, multiple input single output):

Where y is the desired output, a is the actual output of the neuron "a=σ (z), where z=∑wj*xj+b"

As with the variance cost function, the cross-entropy cost function also has two properties :

    • Non-negative nature. (So our goal is to minimize the cost function)
    • The cost function is close to 0 when the real output A is close to the desired output Y. (such as y=0,a~0;y=1,a~1, the cost function is close to 0).

In addition, it can overcome the problem that the variance cost function updates the weight too slowly. We also look at the derivative of it:

As you can see, there is no σ′ (z) in the derivative, and the update of the weights is subject to the effect of σ (z) y, which is affected by the error. So when the error is large, the weight update is fast, when the error is small, the weight of the update is slow. This is a very good nature.

3. Summary
    • When we use the sigmoid function as the activation function of neurons, it is better to use the cross-entropy cost function instead of the variance cost function to avoid the training process being too slow.

    • However, you may ask, why is the cross-entropy function? There are countless kinds of functions without σ′ (z) in the derivative, how can we think of the cross entropy function? This is naturally there is a story, more in-depth discussion will not write, youth please self-understanding.

    • In addition, the cross-entropy function is in the form of? [Ylna+ (1?y) ln (1?A)] instead of? [Alny+ (1?a) ln (1?y)], why? Because Lny has no meaning when the desired output is y=0, ln (1-y) is meaningless when Y=1 is expected. And since a is the actual output of the sigmoid function, it never equals 0 or 1, and is only infinitely close to 0 or 1, so there is no problem.

4. Also say: Log-likelihood cost

The logarithmic likelihood function is also often used as the cost function of Softmax regression, in the above discussion, our last layer (i.e. the output) is through the sigmoid function, so the cross-entropy cost function is adopted. The more common practice in deep learning is to use Softmax as the last layer, at which time the cost function is log-likelihood costs.

In fact, it's useful to think of a Softmax output layer with Log-likelihood cost as being quite similar to a sigmoid OUTPU T layer with cross-entropy cost.

In fact, the two are consistent, the logistic regression is the sigmoid function, Softmax regression is a logistic regression of the multi-category extension. Log-likelihood cost function can be reduced to the form of cross-entropy cost function in two categories. Refer to UFLDL tutorial for details

Reprint Please specify source:

Cross-entropy cost function

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.