Yolo V2 loss function source (training core code) interpretation and its realization principle

Source: Internet
Author: User
Tags int size pow git clone

Prerequisite Description:

1, about YOLO and YOLO v2 detailed explanation please move to the following two links, or directly read the paper (I myself have to write YOLO tutorial, but after thinking the following two links in the article quality is very good _ (: З"∠) _)

Yolo:https://zhuanlan.zhihu.com/p/24916786?refer=xiaoleimlnote

Yolo v2:https://zhuanlan.zhihu.com/p/25167153

2, this article only interprets the source code of the loss function of YOLO V2, please use the following command

git clone https://github.com/pjreddie/darknet

After opening src/region_layer.c view

3, Yolo's official website address is: https://pjreddie.com/darknet/yolo/

4, the command that I use when I debug my Code is:

./darknet Detector Train Cfg/voc.data cfg/yolo-voc.cfg darknet19_448.conv.23


The latest version of the YOLO V2 loss function of the Source code interpretation (interpretation of the GPU-free version), as follows:

void Forward_region_layer (const region_layer L, network_state State) {int i,j,b,t,n;
    The size represents the parameters that each box needs to predict.
    int size = l.coords + l.classes + 1;
    memcpy (L.output, State.input, l.outputs*l.batch*sizeof (float));
    #ifndef GPU Flatten (l.output, l.w*l.h, SIZE*L.N, L.batch, 1);  #endif for (b = 0, b < l.batch; ++b) {for (i = 0; i < L.H*L.W*L.N; ++i) {int index = size*i +
            b*l.outputs;
        L.output[index + 4] = logistic_activate (L.output[index + 4]); }} #ifndef GPU if (l.softmax_tree) {for (b = 0, b < l.batch; ++b) {for (i = 0; i < l.h*l . W*L.N;
                ++i) {int index = size*i + b*l.outputs;
            Softmax_tree (l.output + index + 5, 1, 0, 1, l.softmax_tree, L.output + index + 5); }}} and else if (L.softmax) {for (b = 0, b < l.batch; ++b) {for (i = 0; i < L.H*L.W*L.N ;
      ++i) {int index = size*i + b*l.outputs;          Softmax (l.output + index + 5, l.classes, 1, l.output + index + 5);
    }}} #endif if (!state.train) return;
    memset (L.delta, 0, l.outputs * l.batch * sizeof (float));
    float Avg_iou = 0;
    float recall = 0;
    float Avg_cat = 0;
    float avg_obj = 0;
    float avg_anyobj = 0;
    int count = 0;
    int class_count = 0;
	* (L.cost) = 0;
    Here is the forward loss value for all the images in the batch process.
        for (b = 0; b < l.batch; ++b) {//This softmax classifier is not used, that is, this part of the code is not entered.
            if (l.softmax_tree) {int onlyclass = 0;
                for (t = 0; t < ++t) {box Truth = Float_to_box (State.truth + t*5 + b*l.truths);
                if (!truth.x) break;
                int class = state.truth[t*5 + B*l.truths + 4];
                float MAXP = 0;
                int maxi = 0;
                        if (truth.x > 100000 && truth.y > 100000) {for (n = 0; n < l.n*l.w*l.h; ++n) { int index = size*n + b*l.outputs + 5;
                        Float scale = l.output[index-1];
                        float p = scale*get_hierarchy_probability (l.output + index, L.softmax_tree, Class);
                            if (P > Maxp) {maxp = P;
                        Maxi = n;
                    }} int index = Size*maxi + b*l.outputs + 5;
                    Delta_region_class (L.output, L.delta, Index, class, L.classes, L.softmax_tree, L.class_scale, &avg_cat);
                    ++class_count;
                    Onlyclass = 1;
                Break
        }} if (Onlyclass) continue; }/* The L.H,L.W Here are the feature map resolution of the last convolution output. L.N is the number of anchor box, this mechanism is to learn from faster r-cnn regression method. L.N This parameter is related to the configuration file's anchors, num, and the value is num.
		Unlike the V1 version, the V1 version is that regardless of the final output of the feature map resolution is divided into 7*7 cells, and the V2 of each feature point is a cell, the advantage is: can return and identify smaller objects. */for (j = 0, J < l.h; ++j) {for (i = 0; i < L.W; ++i) {//This L.N isRepresents the number of boxes of different sizes that need to be predicted for the feature point, and the box width and height are related to the anchor coefficients in the configuration file.
                    for (n = 0; n < L.N; ++n) {int index = size* (J*L.W*L.N + I*L.N + N) + b*l.outputs;
                    Box pred = Get_region_box (L.output, l.biases, N, Index, I, J, L.W, l.h);
                    float Best_iou = 0;
					int best_class =-1; Here is the assumption that each feature point cell can have up to 30 objects located in the same position.
                    In fact, the threshold of the impact of small, and its main truth.x related. for (t = 0; t <30; ++t) {//Get Truth_box ' s X, Y, W, h box truth = Float_to_box (state.tr
						Uth + t*5 + b*l.truths);
                        Exit if (!truth.x) break after traversing all objects in the picture;
						float IOU = Box_iou (pred, Truth); Select Iou maximum that box as the last prediction box ~ if (Iou > Best_iou) {best_class = State.truth
                            [T*5 + b*l.truths + 4];
                        Best_iou = IOU; }}//Calculate gradient with no target avg_anyobj + = L.OUTPUt[index + 4];
                    L.delta[index + 4] = L.noobject_scale * ((0-l.output[index + 4]) * Logistic_gradient (L.output[index + 4])); if (L.classfix = =-1) l.delta[index + 4] = L.noobject_scale * ((Best_iou-l.output[index + 4]) * Logistic_gradien
                    T (L.output[index + 4]));
                            else{if (Best_iou > L.thresh) {l.delta[index + 4] = 0;  if (L.classfix > 0) {delta_region_class (l.output, L.delta, index + 5,
                                Best_class, L.classes, L.softmax_tree, l.class_scale* (L.classfix = 2? L.output[index + 4]: 1), &avg_cat);
                            ++class_count; }}//The number of pictures to be trained here is 12800 to enter if (* (State.net.seen) &L T
                        12800) {Box Truth = {0};
                       truth.x = (i +. 5)/L.W; Truth.y = (j +. 5)/l.h;
                        TRUTH.W = L.biases[2*n];
                        TRUTH.H = l.biases[2*n+1];
                            if (doabs) {truth.w = L.BIASES[2*N]/L.W;
                        TRUTH.H = l.biases[2*n+1]/l.h; }//Will predict the TX, ty, tw, th and actual box calculated by Tx ', Ty ', tw ', th ' difference deposited l.delta delta_region_box (truth
                    , L.output, l.biases, N, Index, I, J, L.W, L.h, L.delta,. 01);
        }}}}//Run to this step, all the squares on all the feature maps are labeled, which means there are no objects in this area. for (t = 0; t < ++t) {//Get Truth_box ' s X, Y, W, h box truth = Float_to_box (State.truth + t*5 + b*
            L.truths);
            if (!truth.x) break;
            float Best_iou = 0;
            int best_index = 0;
            int best_n = 0;
            i = (truth.x * L.W);
            j = (TRUTH.Y * l.h);
			printf ("%d%f%d%f\n", I, TRUTH.X*L.W, J, Truth.y*l.h); AboveObtained the x,y,w,h of Truth box, here Truth Box's X, y offset to 0, 0, record//For Truth_shift.x, Truth_shift.y, this is to facilitate the calculation IOU box t
            Ruth_shift = truth;
            truth_shift.x = 0;
            Truth_shift.y = 0;
		    printf ("Index%d%d\n", I, j);
            Here is the match value for calculating the place with the real object and the anchor boxs. for (n = 0; n < L.N; ++n) {//To get the box's index.
                Where size is the parameter that each box needs to calculate, (J*L.W*L.N + I*L.N + N) calculates the number of grid//b*l.outputs computed is the first few of the input image of the feature map, so that is to calculate the position.
				int index = size* (J*L.W*L.N + I*L.N + N) + b*l.outputs; To get the box predictions, here is the coordinate position x,y,w,h, and the remaining two confidence are placed back, box pred = Get_region_box (L.output, l.biases, N, index, I
				, J, L.W, l.h); Box's w,h is generated from anchors, where l.biases is the anchors parameter in the configuration file if (l.bias_match) {PRED.W = L.bias
                    Es[2*n];
                    Pred.h = l.biases[2*n+1];
                        if (doabs) {pred.w = L.BIASES[2*N]/L.W;
                    Pred.h = l.biases[2*n+1]/l.h;
}                }//printf ("Pred: (%f,%f)%f x%f\n", Pred.x, Pred.y, PRED.W, pred.h);
                This also moves the box position to 0, 0, to facilitate the calculation of IOU.
                pred.x = 0;
                PRED.Y = 0;
                float IOU = Box_iou (pred, Truth_shift);
                    if (Iou > Best_iou) {best_index = index;
                    Best_iou = IOU;
                Best_n = n;
			}}//printf ("%d%f (%f,%f)%f x%f\n", Best_n, Best_iou, truth.x, Truth.y, TRUTH.W, TRUTH.H); Calculate Box and Truth box iou float IOU = Delta_region_box (Truth, L.output, l.biases, Best_n, Best_index, I, J, L.W,
			L.h, L.delta, L.coord_scale);
			If the threshold is greater than the recall rate plus 1.
            if (Iou >. 5) Recall + = 1;
			Avg_iou + = IOU;
            Run here, the location of the regression is basically done, the following is mainly the operation of the target classification//l.delta[best_index + 4] = Iou-l.output[best_index + 4];
			Avg_obj + = L.output[best_index + 4];
            Here Logistic_gradient the area with the target in the logistic regression classification, calculates the category score of its output. L.delta[beSt_index + 4] = L.object_scale * (1-l.output[best_index + 4]) * Logistic_gradient (L.output[best_index + 4]); if (L.rescore) {//replace the above 1 with IOU (debug, L.rescore = 1, so you can walk here) L.delta[best_index + 4] = L.OBJECT_SC
            Ale * (Iou-l.output[best_index + 4]) * Logistic_gradient (L.output[best_index + 4]);
            }//Get the real class int class = State.truth[t*5 + B*l.truths + 4];
			if (l.map) class = L.map[class]; The difference between the predicted probability of all classes and the actual class of 0/1 * scale, and then into the L.delta in the corresponding class ordinal position delta_region_class (L.output, L.del
            TA, Best_index + 5, class, L.classes, L.softmax_tree, L.class_scale, &avg_cat);
            ++count;
        ++class_count;
    }}//printf ("\ n");
    #ifndef GPU Flatten (L.delta, l.w*l.h, SIZE*L.N, L.batch, 0);  
    #endif//Now, each position in the L.delta holds the difference between class, confidence, X, Y, W, H, and then iterates through all the positions by Mag_array, calculates the square and rear open roots of each position//and then uses the POW function to calculate the square * (l.cost) = Pow (Mag_array (L.delta, L.OUtputs * L.batch), 2); printf ("Region AVG IOU:%f, Class:%f, OBJ:%f, No Obj:%f, AVG Recall:%f, Count:%d\n", Avg_iou/count, Avg_cat/class_co
UNT, Avg_obj/count, avg_anyobj/(L.w*l.h*l.n*l.batch), Recall/count, Count);
 }
Note: The above code interpretation is a personal reference to the information on the Internet after some of the ideas, if there is a wrong place, you can point out, through the modification to improve the benefit of more people.


























Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.