Cross entropy cost function 1. Cross-entropy theory
Cross entropy is relative to entropy, as covariance and variance.
Entropy examines the expectation of a single information (distribution):
H (p) =−∑I=1NP (xi) Logp (xi)
Cross-Entropy examines the expectations of two of information (distributions):
H (P,Q) =−∑I=1NP (xi) logq (xi)
For details, please see Wiki Cross entropy
y = Tf.placeholder (Dtype=tf.float32, Shape=[none, ten]) ...
Scores = Tf.matmul (H, W) + b
probs = Tf.nn.softmax (scores)
loss =-tf.reduce_sum (Y*tf.log (probs))
2. Cross Entropy cost function
LH (x,z) =−∑k=1dxklogzk+ (1−xk) log (1−ZK)
x represents the original signal, z represents the reconstructed signal, in vector form, the length is D and can be easily transformed into a vector inner product form.
3. Cross-entropy and KL divergence (also known as relative entropy) intuitively, why is cross entropy a measure of of distance of two probability? Entropy, cross entropy, relative entropy (KL divergence) and its Relationship Machine Learning Foundation (58)--Shannon entropy, relative entropy (KL divergence) and cross entropy
The so-called relative, natural between two random variables. Also known as mutual entropy, Kullback–leibler divergence (K-L divergence) and so on. If P (x) and q (x) are two probability distributions of x values, the relative entropy of P to Q is:
DKL (p| | Q ===∑I=1NP (xi) Logp (xi) Q (xi) ∑I=1NP (xi) Logp (xi) −∑I=1NP (xi) logq (xi) −h (p) +h (P,Q)
(in the definition of a sparse type of Self encoder loss function, a penalty term based on the KL divergence is often defined as the following form:
H (ρ| | ρ^) =−∑j=1m[ρjlog (ρ^j) + (1−ρj) log (1−Ρ^J)]
Where: Ρ^=1k∑i=1khi (traversing all the output within the layer, ∑mj=1 is traversing all layers)) 4. Cross-entropy cost function in neural networks
The cross entropy cost function is introduced to the neural network, which is to make up for the defects of the derivative form of the sigmoid function, which is prone to saturation (saturate, slower gradient update).
First look at the square error function (Squared-loss function), for a neuron (single input single output), define its cost function:
C= (a−y) 22
where a=σ (z), Z=wx+b, and then based on the bias (W) and bias (b) (to illustrate the need for the problem, may wish to x=1,y=0):
∂c∂w= (a−y) σ′ (z) x=aσ′ (z) ∂c∂b= (a−y) σ′ (z) =aσ′ (z)
Calculation of weights and offsets based on bias:
w=w−η∂c∂w=w−ηaσ′ (z) b=b−η∂c∂b=b−ηaσ′ (z)
In any case, the derivative form of the sigmoid function σ′ (z) always lingers, and σ′ (z) is more likely to be saturated, which can severely reduce the efficiency of parameter updates.
In order to solve the problem of decreasing the efficiency of parameter updating, we use the cross entropy cost function to replace the traditional square error function.
For the multiple input single output neuron structure, the following figure shows:
We define the loss function as:
c=−1n∑xylna+ (1−y) ln (1−a)
of which a=σ (z), z=∑jwjxj+b
The final derivation:
∂c∂w=1n∑xxj (σ (z) −y) ∂c∂b=1n∑x (σ (z) −y)
It avoids the problem that σ′ (z) participates in parameter updating and affects the efficiency of updating;