Transfer from http://www.infocool.net/kb/Python/201611/209696.html# original address first step, prepare
From train_faster_rcnn_alt_opt.py into:
- Initialization parameters: args = Parse_args () using Python's Argparse
There are –net_name,–gpu,–cfg and so on (only a few parameters are modified in CFG, and most of the other parameters are in congig.py, which involves training the entire network).
- Cfg_from_file (Args.cfg_file) Here is the function in the proxy config Cfg_from_file to read the parameters in the previous CFG file, while calling the _merge_a_into_b function to integrate all the parameters, where __c = Edict () cfg = __c cfg is a dictionary (edict) data structure.
- Faster RCNN is a multi-process, mp_queue is a data structure used for communication between processes.
import multiprocessing as mpmp_queue = mp.Queue()
At the same time solvers, max_iters, rpn_test_prototxt = get_solvers (args.net_name) Get Solver parameters
The next step is to enter the various stages of training.
Step two, Stage 1 RPN, init from ImageNet model
‘stage1‘mp_kwargs = dict( queue=mp_queue, imdb_name=args.imdb_name, init_model=args.pretrained_model, solver=solvers[0], max_iters=max_iters[0], cfg=cfg)p = mp.Process(target=train_rpn, kwargs=mp_kwargs)p.start()rpn_stage1_out = mp_queue.get()p.join()
You can see that the first step is to use the Imagenet model M0 to finetuning the RPN network to get the model M1. For training, for example, the args parameter is found in script experiments/scrips/faster_rcnn_alt_opt.sh. The main concern is the TRAIN_RPN function.
For the TRAIN_RPN function, the main points are a few steps:
1. Modify the parameters on the basis of the Config parameter to fit the current task, mainly
Truecfg.TRAIN.BBOX_REG = False # applies only to Fast R-CNN bbox regressioncfg.TRAIN.PROPOSAL_METHOD = ‘gt‘
Here, it is important to focus on proposal method using GT, which will use the GT_ROIDB function later.
2. Initialize the Caffe3. Prepare Roidb and IMDB
The main involved function get_roidb
In the GET_ROIDB function, call Get_imdb in factory to go to the Pascol_voc class based on the key (a lambda expression) in __sets[name]. Class Pascal_voc (IMDB) invokes the initialization method of the parent class when it initializes itself, for example:
{ year:’2007’ image _set:’trainval’ devkit _path:’data/VOCdevkit2007’ data _path:’data /VOCdevkit2007/VOC2007’ classes:(…)_如果想要训练自己的数据,需要修改这里_ class _to _ind:{…} _一个将类名转换成下标的字典 _ 建立索引0,1,2.... image _ext:’.jpg’ image _index: [‘000001’,’000003’,……]_根据trainval.txt获取到的image索引_ roidb _handler: <Method gt_roidb > salt: <Object uuid > comp _id:’comp4’ config:{…}}
Note that here, and do not read any data, just build the index of the picture.
imdb.set_proposal_method(cfg.TRAIN.PROPOSAL_METHOD)
Set the proposal method, then the above, set to GT, here just set the generated method, the first call takes place in the next sentence, Roidb = Get_training_roidb (IMDb) –> append_flipped_images () When this line of code: "boxes = self.roidb[i][' boxes '].copy ()", where Get_training_roidb is located in train.py, mainly implements the horizontal rollover of the picture and adds it back. The actual is that the function called the IMDB. Append_flipped_images is in this function, called the PASCAL_VOC in the Gt_roidb, in turn called the same file in the _load_pascal_annotation, the function according to the index of the picture, Go to annotations This folder to find the corresponding XML callout data, and then load all the bounding box object, XML parsing to this end, followed by the assignment of several class members in ROIDB:
- Boxes a two-dimensional array, each row is stored xmin ymin xmax ymax
- The GT _classes stores the class index corresponding to each box (the class array is declared in the initialization function)
- GT _overlap is a two-dimensional array of num _classes (that is, the number of classes) row, each row corresponds to the box class index value is 1, the rest is 0, and later was turned into a sparse matrix
- SEG _areas stores the area of a box
- Flipped is false to indicate that the picture has not been flipped (later in the train.py, the flipped picture is added and used to distinguish
Finally, these member variables are assembled into a roidb return.
The PREPARE_ROIDB function in Roidb is also called in the GET_TRAINING_ROIDB function, which is used to prepare the roidb of the IMDB, adding some attributes to the dictionary in Roidb, such as image (index of images), width, Height, through the front GT _overla Properties, get max_classes and Max_overlaps.
So far
return roidb,imdb
4. Set Output path, Output_dir = Get_output_dir (IMDb), function in config, used to save intermediate generated caffemodule, etc. 5. Start training formally.
model_paths = train_net(solver, roidb, output_dir, pretrained_model=init_model, max_iters=max_iters)
Call the Train_net function in train, where, first filter_roidb, determine whether each entry in ROIDB is reasonable and reasonably defined as having at least one foreground box or background box,roidb all Groudtruth when Because the coincidence of box and corresponding class (overlaps) is obviously 1, that is to say roidb at least one tag class. If Roidb contains some proposal,overlaps between [Bg_thresh_lo, Bg_thresh_hi] will be considered background, greater than Fg_thresh is considered a foreground, roidb at least have a foreground or background, Otherwise it will be filtered out. After filtering out the useless roidb, the FILTERED_ROIDB is returned. In the train file, you need to focus on the Solverwrapper class. See train.py in detail, in this class, the introduction of Caffe Sgdslover, the last sentence self.solver.net.layers[0].set_roidb (ROIDB) set roidb into a layer (0) ( Here is Roilayer) call the Set_roidb method in ayer.py, set the ROIDB for layer (0), and scramble the order. Last Train_model. Here, you need to instantiate each layer, at this stage, the first will be implemented Roilayer, detailed reference layer in the setup, in the training Roilayer forward function, in the first layer, only need to copy data, The data is copied at different stages according to the network structure defined by the Prototxt file, blobs = Self._get_next_minibatch () This function reads the picture data (calls the Get_minibatch function, the function is in Minibatch, The main function is to do the actual data preparation for faster RCNN, when reading the data, the Boxes,gt_boxes,im_info (wide and high scaling) are separated.
First layer, for the stage1_rpn_train.pt file, the layer has only 3 top blobs: ' Data ', ' im_info ', ' gt_boxes '.
For the stage1_fast_rcnn_train.pt file, the layer has 6 top blob:top: ' Data ', ' Rois ', ' labels ', ' bbox_targets ', ' Bbox_inside_ ' Weights ', ' bbox_outside_weights ', these data are prepared in Minibatch. The data then flows through the Caffe until the end of training.
The structure of the network is drawn only part of it:
It is worth noting that the Rpn-data layer uses the Anchortargetlayer, which is implemented in Python and is introduced later.
6. Save the last obtained weight parameter
rpn_stage1_out = mp_queue.get()
At this point, the first stage is completed, at the beginning of the subsequent tasks, if necessary, will be in this output address to find the weight of the file.
Step three, Stage 1 RPN, generate proposals
This step is to call the previous training to get the model M1 to generate proposal P1, in this step only produces proposal, parameters:
mp_kwargs = dict( queue=mp_queue, imdb_name=args.imdb_name, rpn_model_path=str(rpn_stage1_out[‘model_path‘]), cfg=cfg, rpn_test_prototxt=rpn_test_prototxt)p = mp.Process(target=rpn_generate, kwargs=mp_kwargs)p.start()rpn_stage1_out[‘proposal_path‘] = mp_queue.get()[‘proposal_path‘]p.join()
1. Follow the Rpn_generate function
The TRAIN_RPN is basically the same as mentioned above, starting with rpn_proposals = Imdb_proposals (Rpn_net, IMDB), Imdb_proposals function in rpn.generate.py file, Rpn_ Proposals is a list of lists for each sub-list. for imdb_proposals, using IM = Cv2.imread (Imdb.image_path_at (i)) to read the picture data, call Im_proposals to generate the RPN proposals of the single picture, as well as the score. Here, the Im_proposals function invokes the forward of the network, thus getting the desired boxes and scores, which requires a good understanding of Blobs_out = Net.forward (data,im_info) in net The invocation relationship between the forward and the layer forward.
Here, there will also be proposal, the same will be implemented using Python Proposallayer, this function is also in the RPN folder, supplemented by the back.
boxes = blobs_out[‘rois‘][:, 1:].copy() / scale scores = blobs_out[‘scores‘].copy()return boxes, scores
At this point, get IMDB proposal
2. Saved proposal File
queue.put({‘proposal_path‘: rpn_proposals_path})rpn_stage1_out[‘proposal_path‘] = mp_queue.get()[‘proposal_path‘]
So far, Stage 1 RPN, generate proposals end
Fourth step, Stage 1 Fast r-cnn using RPN proposals, init from ImageNet model
Parameters:
cfg. TRAIN. Snapshot_infix = ' stage1 ' Mp_kwargs = Dict (Queue=mp_queue, Imdb_name=args.imdb_nam E, Init_model=args.pretrained_model, Solver=solvers[1], Max_iters=max_iters [1], cfg=cfg, Rpn_file=rpn_stage1_out[ ' Proposal_path ']) p = MP. Process (TARGET=TRAIN_FAST_RCNN, Kwargs=mp_kwargs) p.start () Fast_rcnn_stage1_out = Mp_queue.get () p.join ()
This step, using the proposal generated in the previous step, and the Imagenet model M0 to train the FAST-RCNN model M2.
Follow train_fast_rcnn
Similarly, the parameters are set, and the CFG is noted here. TRAIN. Proposal_method = ' RPN ' is different from the previous one and will be called Rpn_roidb later. Cfg. TRAIN. Ims_per_batch = 2, each mini-batch contains two pictures, and the ROI area of their proposal. And in this step there is rpn_file (the latter is related to the use of the RPN_ROIDB function). The others are almost the same as before. Mention, here in Train_net, will call Add_bbox_regression_targets located in Roidb, mainly to add bbox return target, that is, add Roidb ' Bbox_ Targets ' property, and according to the parameters set in the CFG, the bbox_targets mean and STD are obtained, because training class-specific regressors here will involve bbox_overlaps function, Put it in the Util.bbox.
Note that when this step is GET_ROIDB, as stated earlier, using Rpn_roidb, the IMDB is invoked. Create_roidb_from_box_list the method function is to read the boxes of each graph from the box_list, and this box_list is read from the proposal file saved in the previous step, then do some processing, see the code in detail, The point is that the gt_overlaps, which will return to ROIDB,RPN_ROIDB in the end, is the result of a calculation gt_overlaps such as box in Rpn_file and IOU of box in GT_ROIDB, unlike the GT generated by the GT_ROIDB () method. The gt_overlaps in _ROIDB is all 1.0. At the same time using the IMDB.MERGE_ROIDB, the static method of the class IMDB "do not understand here, need to learn again", the Rpn_roidb and gt_roidb into a roidb, here, need to understand the basic principles of the Merger.
Fifth step, Stage 2 RPN, init from Stage 1 Fast R-CNN model
Parameters:
‘stage2‘mp_kwargs = dict( queue=mp_queue, imdb_name=args.imdb_name, init_model=str(fast_rcnn_stage1_out[‘model_path‘]), solver=solvers[2], max_iters=max_iters[2], cfg=cfg)p = mp.Process(target=train_rpn, kwargs=mp_kwargs)rpn_stage2_out = mp_queue.get()
This part is the use of model M2 practice RPN Network, this time with Stage1 RPN network, this time conv layer parameters are not moving, only to do the forward calculation, training to get model M3, which is a fine-tuned RPN network.
Sixth step, Stage 2 RPN, generate proposals
Parameters:
mp_kwargs = dict( queue=mp_queue, imdb_name=args.imdb_name, rpn_model_path=str(rpn_stage2_out[‘model_path‘]), cfg=cfg, rpn_test_prototxt=rpn_test_prototxt)p = mp.Process(target=rpn_generate, kwargs=mp_kwargs)p.start()rpn_stage2_out[‘proposal_path‘] = mp_queue.get()[‘proposal_path‘]p.join()
This step, based on the M3 model obtained in the previous step, produces proposal P2, and the network structure is the same as the previous proposal P1.
Seventh step, Stage 2 Fast r-cnn, init from Stage 2 RPN R-CNN model
Parameters:
‘stage2‘mp_kwargs = dict( queue=mp_queue, imdb_name=args.imdb_name, init_model=str(rpn_stage2_out[‘model_path‘]), solver=solvers[3], max_iters=max_iters[3], cfg=cfg, rpn_file=rpn_stage2_out[‘proposal_path‘])p = mp.Process(target=train_fast_rcnn, kwargs=mp_kwargs)p.start()fast_rcnn_stage2_out = mp_queue.get()p.join()
This step is based on the model M3 and P2 training fast RCNN Get the final model M4, this step, conv layer and RPN are fixed parameters, just training the RCNN layer (that is, the full connection layer), and stage1 different, stage1 just fixed RPN layer, the other layers are still training. The model structure is the same as Stage1:
Eighth step, output final model
final_path = os.path.join( os.path.dirname(fast_rcnn_stage2_out[‘model_path‘]), args.net_name + ‘_faster_rcnn_final.caffemodel‘)print ‘cp {} -> {}‘.format( fast_rcnn_stage2_out[‘model_path‘], final_path)shutil.copy(fast_rcnn_stage2_out[‘model_path‘], final_path)print ‘Final model: {}‘.format(final_path)
Just a copy of the previous model output.
At this point, the entire FASTER-RCNN training process is over.
Anchortargetlayer and Proposallayer
As I said before, there are two layers without a description, one is the Anchortarget layer is a proposal layer, the following a brief analysis.
class AnchorTargetLayer(caffe.Layer)
The first is to read the parameters, in Prototxt, actually only read the param_str: "' feat_stride ': 16", this is a very important parameter, at present my understanding is the size of the slider slide, it is useful to identify the size of the object, such as the recognition of small objects, This parameter needs to be reduced and so on.
First the Setup section,
anchor_scales = layer_params.get(‘scales‘, (8, 16, 32))self._anchors = generate_anchors(scales=np.array(anchor_scales))
Call the Generate_anchors method to generate the most initial 9 anchor the function is located generate_anchors.py The main function is to generate Multiscale, multi-aspect ratio of anchors,8,16,32 is actually scales:[2^3 2^4 2^5 ],base_size is 16, specifically how to implement the source code can be consulted. The _ratio_enum () section generates the anchor of the three aspect ratios 1:2,1:1,2:1 as shown: (Refer to another blog below)
The _scale_enum () section, which generates three sizes of anchor, takes the _ratio_enum () portion of anchor[0 0 15 15] As an example, extending three scale 128*128,256*256,512*512 as shown:
Another function is forward ().
In faster RCNN will be based on the input of different graphs, get different feature map,height, width = bottom[0].data.shape[-2:] First get the conv5 of the high width, and GT box Gt_boxes = Bottom[1].data, picture information im_info = bottom[2].data[0,:], then calculates the offset, shift_x = Np.arange (0, width) * Self._feat_stride, here you will find that For example, you get the FM is h=61,w=36, then you multiply by 16, get the figure is probably 1000*600, in fact, this 16 is probably the scale of the network. The next step is to generate anchor, and to make a certain selection of anchor, see the code.
Another need to understand is proposal layer, this is only used when testing, many things and anchortargetlayer similar, not detailed, you can view the code. Mainly look at the forward function, the function algorithm introduced in the comment section is written in very detailed:
# Algorithm : # for each (H, W) location I# generate A anchor boxes centered on C Ell i# apply predicted bbox deltas at cell I to each of the A anchors# clip predicted boxes to Image# remove predicted boxes with either height or width < thres Hold# sort all (proposal, score) pairs by score from highest to Lowest# take top PRE_NMS_TOPN proposals before Nms# apply NMS with threshold 0.7 to remaining Propo Sals# take AFTER_NMS_TOPN proposals after Nms# return the top Proposals (RoIs top, scores top)
The NMS method is referenced in this function.
Code folder description Tools
In the Tools folder, it is the outermost wrapper file that we call directly. The main files included are:
- _init_paths.py : Used to initialize the path, that is, the subsequent path will join (path,*)
- compress_net.py: Used to compress parameters, using SVD to compress, it can be found that the author of the FC6 layer and FC7 layer is compressed, that is, two fully connected layer.
- demo.py : Normally, we call this function directly, and if we want to test our own model and data, we need to modify it. The test, config, and nums_wrapper functions in FAST_RCNN are called here. Vis_detections is used for instrumentation, Parse_args for parameter settings, and Damo and main functions.
- eval_recall.py: evaluation function
- reval.py: Re-evaluate, the functions in the FAST_RCNN and dataset are called here. where the From_mats function and the From_dets function Loadmat files and pkl files respectively.
- rpn_genetate.py: This function calls the Genetate function in RPN, and then we will make a specific introduction to the RPN layer. Here, it is primarily a process of encapsulating calls, where we invoke the configured parameters, set the RPN's test parameters, and input and output operations.
- test_net.py: Test the Fast RCNN network. Mainly is some parameter configuration.
- train_faster_rcnn_alt_opt.py: Training faster RCNN network using alternating training, here is based on faster RCNN the specific implementation of the article. You can see it in the main function, which includes the following steps:
- RPN 1, using imagenet model to initialize parameters, generate proposal, which is stored in Mp_kwargs
- Fast RCNN 1, using imagenet model for initialization parameters, using the proposal just generated for fast RCNN training
- RPN 2 is initialized with the parameters in fast rcnn (note OH) and generates proposal
- Fast RCNN 2, initializing parameters using the model in RPN 2
- It is worth noting that during our training we can set the number of iterations in the max_iters in get_solvers and reduce the number of iterations to reduce the test time when the network is not determined to be tuned.
- When we train faster RCNN network, we call this file training.
- train_net.py: A network model that trains its own datasets using fast RCNN
- train_svms.py: Using the most primitive rcnn network training Post-hoc SVMs
RPN
Here we mainly look at the code under the Lib/rpn folder. This article mainly introduces the model of RPN, which contains the following main files:
- generate_anchors.py : Generates multi-scale and multi-scale anchor points. This is mainly done by the Generate_anthors function, which can be seen using 3 scales (512, 3, and + +), as well as a scale (1:1,1:2,2:1). An anchor point is fixed by W, H, x_ctr, y_ctr, i.e., width, height, x center, and y Center.
- proposal_layer.py : This function is used to convert the output of RPN to object proposals. The author has added the Proposallayer class, which has re-set_up and forward functions, where forward is implemented by creating an anchor box, providing the box's parameter details for each anchor point, cutting the forecast box into an image, deleting a width, Gao Xiao the box to the threshold, and all ( Proposal, score) to sort, get pre_nms_topn proposals, get NMS, get AFTER_NMS_TOPN proposals. (Note: nms,nonmaximum suppression, non-maxima suppression)
- anchor_target_layer.py : Generates the training targets and labels for each anchor point, classifying them as 1 ( Object), 0 (not object), 1 (ignore). When Label>0, which is an object, will be the return of box. Among them, forward function function: In each cell, generate 9 anchor points, provide the details of the 9 anchor points, filter out the anchor point over the image, measuring the overlap with the GT.
- proposal_target_layer.py : For each object proposal generates training targets and labels, the category label is returned from 0-k for the label >0 box. (Note that unlike anchor_target_layer.py, the two are generated anchor, one is build proposal)
- generate.py : Generates an object proposals using a RPN.
The author generates RPN through these files.
Nms
Lib/nms folder is a non-maximum suppression, this part of the people should be very familiar with the Python version of the core function is py_cpu_nms.py, the implementation and comments are as follows:
DefPy_cpu_nms(Dets, Thresh):"" "Pure Python NMS baseline." "" "#x1, y1, x2, y2, and score Assignment x1 = dets[:,0] y1 = dets[:,1] x2 = dets[:,2] y2 = dets[:,3] scores = dets[:,4]#每一个op的面积 areas = (x2-x1 +1) * (Y2-y1 +1)#order是按照score排序的 order = Scores.argsort () [::-1] Keep = []while order.size > 0:i = Order[0] Keep.append (i) xx1 = Np.maximum (X1[i], X1[order[ 1:]]) yy1 = Np.maximum (Y1[i], Y1[order[ 1:]) xx2 = Np.minimum (X2[i], X2[order[ 1:]]) Yy2 = Np.minimum (Y2[i], y2[ Order[ #计算相交的面积 W = np.maximum ( 0.0, xx2-xx1 + 1) H = np.maximum (0.0, Yy2-yy1 + 1) inter = w * H #计算: Overlapping area/(area 1 + area 2-overlapping area) OVR = Inter/(a Reas[i] + areas[order[ 1:]]-inter) Inds = Np.where (OVR <= thresh) [0" order = order[inds + 1]
Faster RCNN Code Understanding (Python)