Machine Learning Basics (vi)--Cross entropy cost function (cross-entropy error) _ Machine learning

Source: Internet
Author: User

Cross entropy cost function 1. Cross-entropy theory

Cross entropy is relative to entropy, as covariance and variance.

Entropy examines the expectation of a single information (distribution):

H (p) =−∑I=1NP (xi) Logp (xi)

Cross-Entropy examines the expectations of two of information (distributions):
H (P,Q) =−∑I=1NP (xi) logq (xi)
For details, please see Wiki Cross entropy

y = Tf.placeholder (Dtype=tf.float32, Shape=[none, ten]) ...

Scores = Tf.matmul (H, W) + b
probs = Tf.nn.softmax (scores) 
loss =-tf.reduce_sum (Y*tf.log (probs))
2. Cross Entropy cost function

LH (x,z) =−∑k=1dxklogzk+ (1−xk) log (1−ZK)
x represents the original signal, z represents the reconstructed signal, in vector form, the length is D and can be easily transformed into a vector inner product form.

3. Cross-entropy and KL divergence (also known as relative entropy) intuitively, why is cross entropy a measure of of distance of two probability? Entropy, cross entropy, relative entropy (KL divergence) and its Relationship Machine Learning Foundation (58)--Shannon entropy, relative entropy (KL divergence) and cross entropy

The so-called relative, natural between two random variables. Also known as mutual entropy, Kullback–leibler divergence (K-L divergence) and so on. If P (x) and q (x) are two probability distributions of x values, the relative entropy of P to Q is:

DKL (p| | Q ===∑I=1NP (xi) Logp (xi) Q (xi) ∑I=1NP (xi) Logp (xi) −∑I=1NP (xi) logq (xi) −h (p) +h (P,Q)

(in the definition of a sparse type of Self encoder loss function, a penalty term based on the KL divergence is often defined as the following form:

H (ρ| | ρ^) =−∑j=1m[ρjlog (ρ^j) + (1−ρj) log (1−Ρ^J)]

Where: Ρ^=1k∑i=1khi (traversing all the output within the layer, ∑mj=1 is traversing all layers)) 4. Cross-entropy cost function in neural networks

The cross entropy cost function is introduced to the neural network, which is to make up for the defects of the derivative form of the sigmoid function, which is prone to saturation (saturate, slower gradient update).

First look at the square error function (Squared-loss function), for a neuron (single input single output), define its cost function:
C= (a−y) 22
where a=σ (z), Z=wx+b, and then based on the bias (W) and bias (b) (to illustrate the need for the problem, may wish to x=1,y=0):
∂c∂w= (a−y) σ′ (z) x=aσ′ (z) ∂c∂b= (a−y) σ′ (z) =aσ′ (z)

Calculation of weights and offsets based on bias:
w=w−η∂c∂w=w−ηaσ′ (z) b=b−η∂c∂b=b−ηaσ′ (z)

In any case, the derivative form of the sigmoid function σ′ (z) always lingers, and σ′ (z) is more likely to be saturated, which can severely reduce the efficiency of parameter updates.

In order to solve the problem of decreasing the efficiency of parameter updating, we use the cross entropy cost function to replace the traditional square error function.

For the multiple input single output neuron structure, the following figure shows:


We define the loss function as:
c=−1n∑xylna+ (1−y) ln (1−a)
of which a=σ (z), z=∑jwjxj+b

The final derivation:
∂c∂w=1n∑xxj (σ (z) −y) ∂c∂b=1n∑x (σ (z) −y)

It avoids the problem that σ′ (z) participates in parameter updating and affects the efficiency of updating;

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.