Triple loss principle and gradient derivation

Source: Internet
Author: User

"Understanding Triple"

As shown, triple is a ternary group, this ternary group is composed of: randomly select a sample from the training data set, the sample is called anchor, and then randomly select a and anchor (recorded as x_a) belong to the same class of samples and different classes of samples, The two samples correspond to the Positive (recorded as x_p) and negative (recorded as X_n), thus constituting a (anchor,positive,negative) ternary.

"Understanding triple loss"
With the concept of the triple above, triple loss is well understood. For each element in the ternary group (sample), the training of a parameter sharing or unshared network, three elements of the characteristics of the expression, are recorded as:. The purpose of triple loss is to learn to make the distance between X_a and x_p features as small as possible, while the distance between X_a and X_n is as large as possible, and there is a minimum gap between X_a and X_n and the distance between X_a and x_p. The formulation of the expression is:

The corresponding objective function is also very clear:

Here the distance is measured in European distance,+ indicates that the value in [] is greater than 0, the value is loss, less than 0, the loss is zero.
It can be seen by the objective function:

    • When the distance between X_a and X_n < The distance between X_a and x_p overtime, the value in [] is greater than 0, resulting in a loss.
    • When the distance between X_a and X_n is >= between X_a and x_p, the loss is zero.

"triple loss gradient derivation"
The above objective function is recorded as L. When the loss of the first triple is greater than 0, it is only for the above formula:

"Tips for Implementing the algorithm "
It can be seen that the gradient of the x_p and X_n features is just using the intermediate result of the loss time, and the revelation is that if you implement a triple loss layer in CNN, you can avoid repeating calculations if you can store two intermediate results in forward propagation. This is only a trick when the algorithm is implemented.

The following section gives the methods and code for implementing triple loss in Caffe.

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Triple loss principle and gradient derivation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.