Recently began to learn YOLO, blog form to record their own learning distance, about installation, background, etc. are not introduced, directly start reading source code:
1. First find the main function in the darknet.c file, see the explanation of the parameters, and if it is YOLO, perform the Run_yolo function:
int main (int argc, char **argv) {//test_resize ("data/bad.jpg");
Test_box ();
Test_convolutional_layer ();
if (ARGC < 2) {fprintf (stderr, "Usage:%s <function>\n", argv[0]);
return 0;
} Gpu_index = Find_int_arg (argc, argv, "-i", 0);
if (Find_arg (argc, argv, "-nogpu")) {gpu_index =-1;
#ifndef GPU gpu_index =-1;
#else if (gpu_index >= 0) {cuda_set_device (gpu_index);
#endif if (0 = strcmp (argv[1], "average")) {average (argc, argv);
else if (0 = = strcmp (argv[1], "YOLO")) {Run_yolo (argc, argv);
else if (0 = = strcmp (argv[1], "voxel")) {Run_voxel (argc, argv);
else if (0 = = strcmp (argv[1], "super")) {Run_super (argc, argv);
else if (0 = = strcmp (argv[1], "detector")) {Run_detector (argc, argv);
else if (0 = = strcmp (argv[1], "detect")) {Float Thresh = Find_float_arg (argc, argv, "-thresh",. 24); char *filename = (arGC > 4)?
ARGV[4]: 0;
Test_detector ("Cfg/coco.data", argv[2], argv[3], filename, Thresh,. 5); else if (0 = = strcmp (argv[1], "Cifar")) {Run_cifar (argc, argv);
2. Go to the Run_yolo, enter the different functions according to the second argument, and first go to test to see the Test_yolo function:
void Test_yolo (char *cfgfile, Char *weightfile, char *filename, float thresh) {Image **alphabet = Load_alphabet ();
Network Net = parse_network_cfg (Cfgfile);
if (weightfile) {load_weights (&net, weightfile);
} detection_layer L = net.layers[net.n-1];
Set_batch_network (&net, 1);
Srand (2222222);
clock_t time;
Char buff[256];
char *input = buff;
Int J;
float nms=.4;
Box *boxes = calloc (L.SIDE*L.SIDE*L.N, sizeof (box));
float **probs = calloc (L.SIDE*L.SIDE*L.N, sizeof (float *));
for (j = 0; j < L.SIDE*L.SIDE*L.N; ++j) probs[j] = calloc (l.classes, sizeof (float *));
while (1) {if (filename) {strncpy (input, filename, 256);
else {printf ("Enter Image Path:");
Fflush (stdout);
input = fgets (input, 256, stdin);
if (!input) return;
Strtok (input, "\ n");
Image im = Load_image_color (input,0,0); Image sized = ReSize_image (IM, NET.W, net.h);
float *x = sized.data;
Time=clock ();
Network_predict (NET, X);
printf ("%s:predicted in%f seconds.\n", Input, SEC (Clock ()-time));
Get_detection_boxes (L, 1, 1, Thresh, probs, boxes, 0);
if (NMS) Do_nms_sort (boxes, probs, L.SIDE*L.SIDE*L.N, l.classes, NMS);
Draw_detections (IM, L.SIDE*L.SIDE*L.N, thresh, boxes, probs, Voc_names, Alphabet, 20);
Draw_detections (IM, L.SIDE*L.SIDE*L.N, thresh, boxes, probs, Voc_names, Alphabet, 20);
Save_image (IM, "predictions");
Show_image (IM, "predictions");
Free_image (IM);
Free_image (sized);
#ifdef OPENCV cvwaitkey (0);
Cvdestroyallwindows ();
#endif if (filename) break;
}
}
The first line is a function that loads a picture, regardless of the first. The second function, in terms of name, is a process of constructing a network based on a CFG file, and returns the network variable, so what is network this struct? Look at its statement:
typedef struct network{
float *workspace;
int n;//Network layer
int batch;//Batch sample number, combined with subdivision using
int *seen;//already processed sample number
float epoch;
int subdivisions;
float momentum;
float decay;
Layer *layers;//each layer
int outputs;
float *output;
Strategies of learning_rate_policy policy;//learning rate
float learning_rate;//learning efficiency
float gamma;
float scale;
float Power;
int time_steps;
int step;
int max_batches;
float *scales;
int *steps;
int num_steps;
int burn_in;
int Adam;
float B1;
float B2;
float EPS;
int inputs;
int h, W, C;
int max_crop;
int min_crop;
float angle;
float aspect;
float exposure;
float saturation;
float hue;
int gpu_index;
Tree *hierarchy;
(The meaning of many variables is still not fully understood, first digging a hole here.) )
Enter the Parse_network_cfg function in the parser.c file:
The first method is read_cfg, first let's look at the contents of the CFG file:
You will see that the CFG file is a paragraph, and the first paragraph is net, followed by the parameters of the rows. Code: First set a list variable (basic list structure), then declare a section:
typedef struct{
Char *type;
List *options;
} Section
A section is composed of a string and a linked list, which corresponds to the data for a segment of the CFG file. A row of rows reads the data, and the strip function removes the space-wrapping from a line of text that is read. is to talk about the CFG content read to a list, each element of the list is another type description string and a list, where the element of the list is a KVP structure (that is, some variables and their values in the network, and a used variable that indicates whether it was used). The meaning of a paragraph in the CFG refers to a layer of the network, which contains multiple attributes (KVP). Note the first paragraph of the CFG file must be net layer, will be sentenced to Is_network ().
The network is then set according to the "NET" list (all the properties on the ground level), noting that the batch value is equal to the batch/subdivisions value. NET segment should be to declare all the used attributes in the entire network structure.
3, after that is the different layer parameter setting, if is the convolution layer (convolutional), enters the parse_convolutional () function,
Then there is the knowledge of the entire network structure. What needs to be known is that YOLO uses convolution neural networks, and the need to further understand the code (by setting the convolution layer based on the parameters) requires a combination of convolution neural network model knowledge.