Explanation of SSD principle and source code interpretation 1-Data layer Annotateddatalayer

Source: Internet
Author: User
Tags crop image prefetch

Years later, the use of their spare time and intermittent will caffe SSD source read, although the middle due to work for a period of time, but finally completed successfully, SSD source reading is also this year's annual plan of the more important one of the content, completed or very fulfilling. after reading the code, one of the biggest experience is that before the paper a lot of confused my details now suddenly enlightened, this feeling is really wonderful, haha.

Since the work, I have read twice for me difficult code experience (the more difficult I more interested), one is the last reading OpenCV hog source, there is one is the SSD source reading, SSD source is more complex than hog source code, through the two times the experience of reading codes, I have a new experience with reading the code. when you read a code that is difficult for you, do not be afraid, you just have to calm down, a piece of hard code into the N sub-block, and then for each sub-block conquer, and then you will all the child blocks are broken, and then concatenated all the children together into a whole, this time you will be enlightened, This is a hard code to find. Of course this process will be difficult at first, because at first many things you do not understand, as I read the SSD code, it is difficult at first, but as long as there is patience and perseverance, slowly you will find that you are more familiar with these content, you will feel more and more relaxed, until finally you suddenly enlightened, that feeling is too wonderful, I believe you are addicted to this feeling.

51 of the first day a little to tidy up the SSD source reading notes, written blog, with everyone to share the exchange, because the SSD source code is more complex, coupled with limited time and energy, it is impossible to have in-depth understanding of every detail, there are shortcomings in the blog, I hope you can put forward valuable comments.

SSD source reading time, I created a QT project on the SSD source code, so easy to read, SSD source QT Project I uploaded to Csdn, the project with QT can be directly opened, we can directly download the QT project reading, improve reading efficiency.
Click to download SSD principle

The basic principle of the SSD is not intended to be detailed, there are a lot of good articles on the net, give a few I think better article:
1. SSD (single shot Multibox detector) algorithm and Caffe code explanation
2. CNN Target Detection (iii): SSD detailed
3. Deep learning Basics-Pedestrian detection SSD

Here's my personal understanding of SSD.


Take SSD 300 as an example

SSD is actually the 6 different feature map of all the Priorbox for classification and regression, each feature map corresponding to the priorbox is actually each pixel point on each feature map corresponding to the field of perception, and these feelings are actually sliding window, So SSD essentially realizes the classification and regression of all the sliding windows in the input image.
Note:
1. The experience of each feature map in SSDs is not the actual feeling of the wild, but by the Priorbox to decide, such as the official network structure in the conv4_3_norm corresponding to the feelings of the wild is determined by the Conv4_3_norm_mbox_priorbox
2. Each pixel in the SSD can correspond to many different aspect ratios of the field, but the same area of the SSD effect why good

Before also used mtcnn do target detection, but the effect is not good SSD, I was thinking, the same is the sliding window classification and return, why the gap so big. Let's talk about the difference between MTCNN and SSD in implementing the detection.

MTCNN and SSDs implement two different detection strategies
MTCNN: First build the image pyramid, and then use a fixed-size sliding window (implemented by the full convolutional network pnet) at each level of the pyramid sliding, each sliding window classification regression, because the image pyramid to the original information has a certain degree of loss (do resize), and because feature extraction only use one network , so the target extraction characteristics of different scales are not sufficient.
SSD: The image size is the same, with a different size sliding window (implemented by different feature graphs), and each sliding window classification regression. Different feature graphs correspond to different sizes of sliding windows, which realizes the feature extraction of different size sliding windows, because different feature graphs use different CNN structure, and the extracting feature is more full.

While observing the architecture of the MTCNN and SSD networks, I found that SSDs use a 3x3 convolution kernel for classification and regression, while MTCNN uses 1x1 convolution cores for classification and regression.
What is the difference between the two?

SSD has added local information
The 3x3 convolution kernel covers the sensing field of the pixel and its domain area, that is, the local information is added, while the MTCNN is used in the 1x1 convolution kernel classification and regression, only consider the corresponding sensing fields of the pixel point, not adding the local information (because the input image is the size of the feeling wild, So the input image has no domain area information at all, so the SSD model is more robust due to the addition of local information.
I have tried to paste the cut target into a random background image, and then use the SSD training detector, found that there is no effect, probably because the SSD training to use the target's local information, and the training samples in the experiment lost the target's local information.


In summary, the possible reasons for a good SSD effect:
1. Feature extraction is more fully, using different network structure to feature extraction of different size sliding window
2. Local information is added to each priorbox classification and regression, making the model more robust

The above understanding of SSDs is only a personal understanding, there are inappropriate places, welcome criticism.

The following will begin to understand the SSD source data Layer Annotateddatalayer source interpretation

#ifdef USE_OPENCV #include <opencv2/core/core.hpp> #endif//Use_opencv #include <stdint.h> #include <alg orithm> #include <map> #include <vector> #include "caffe/data_transformer.hpp" #include "caffe/layers/

Annotated_data_layer.hpp "#include" caffe/util/benchmark.hpp "#include" caffe/util/sampler.hpp "namespace Caffe { Template <typename dtype> annotateddatalayer<dtype>::annotateddatalayer (const layerparameter& param ): baseprefetchingdatalayer<dtype> (param), reader_ (param) {} template <typename dtype> annotateddatal

Ayer<dtype>::~annotateddatalayer () {this->stopinternalthread ();} Template <typename dtype> void annotateddatalayer<dtype>::D atalayersetup (const vector<blob<dtype& gt;*>& Bottom, const vector<blob<dtype>*>& top) {const int batch_size = This->layer_param_.da
  Ta_param (). Batch_size (); Const annotateddataparameter& Anno_data_param = THIS-&GT;layer_param_.annotated_data_param (); Read all data enhancement sampling parameters for (int i = 0; i < anno_data_param.batch_sampler_size (); ++i) {Batch_samplers_.push_back (Ann
  O_data_param.batch_sampler (i));

  } Label_map_file_ = Anno_data_param.label_map_file ();
  Make sure dimension is consistent within batch.
  Const transformationparameter& Transform_param = This->layer_param_.transform_param (); if (Transform_param.has_resize_param ()) {if (Transform_param.resize_param (). Resize_mode () ==resizeparameter_resize
    _mode_fit_small_size) {check_eq (batch_size, 1) << "Only support batch SIZE of 1 for fit_small_size."; }}//read a data and read the shape of the data, initialize the shape of the top and the shapes of the prefetch (such as the data size is 300x300)//Annotateddatum contains the data and annotations (the label contains the label and the Boundi
  ng box)//Read A data point, and use it to initialize the top blob. annotateddatum& anno_datum = * (Reader_.full (). Peek ()); The data read in Reader_ is the input data (including image data and boundingbox coordinates)//use Data_transformer to infer the expected Blob shape from Anno_datum.
  Vector<int> top_shape =this->data_transformer_->inferblobshape (Anno_datum.datum ()); This->transformed_data_.
  Reshape (Top_shape);
  Reshape Top[0] and prefetch_data according to the batch_size.
  Top_shape[0] = batch_size;

  Top[0]->reshape (Top_shape); The image data in the read-ahead thread for (int i = 0; i < this->prefetch_count; ++i) {This->prefetch_[i].data_.
  Reshape (Top_shape); } LOG (INFO) << "Output data size:" << top[0]->num () << "," << top[0]->channels () <<

  "," << top[0]->height () << "," << top[0]->width ();
    The label if (THIS-&GT;OUTPUT_LABELS_) {//generates data when there is a type of anno_datum.set_type (Annotateddatum_annotationtype_bbox); Has_anno_type_ = Anno_datum.has_type () | |
    Anno_data_param.has_anno_type ();
    Vector<int> Label_shape (4, 1);
      if (has_anno_type_) {anno_type_ = Anno_datum.type ();
      if (Anno_data_param.has_anno_type ()) {  If Anno_type is provided in annotateddataparameter, replace//the type stored in each individual Annotatedda
        Tum.
        LOG (WARNING) << "type stored in Annotateddatum is shadowed.";
      Anno_type_ = Anno_data_param.anno_type (); }//Infer the label shape from Anno_datum.
      Annotationgroup ().

      int num_bboxes = 0; Read all box quantities of the image if (Anno_type_ = = Annotateddatum_annotationtype_bbox) {//Since the number of Bboxe s can is different for each image,///We store the bbox information in a specific format. In specific://All bboxes is stored in one spatial plane (NUM and channels is 1)//and each row Contai NS One and only one box in the following format://[item_id, Group_label, instance_id, xmin, ymin, Xmax, Ymax, di
        FF]//Note:refer to Caffe.proto for details about Group_label and//instance_id.
  for (int g = 0; g < anno_datum.annotation_group_size (); ++g) {        Num_bboxes + = Anno_datum.annotation_group (g). Annotation_size ();
        } label_shape[0] = 1;
        LABEL_SHAPE[1] = 1; Baseprefetchingdatalayer<dtype>::layersetup () requires to call//Cpu_data and Gpu_data for consistent PR Efetch thread.
        Thus we make//sure there are at least one bbox.
        LABEL_SHAPE[2] = Std::max (num_bboxes, 1);
      LABEL_SHAPE[3] = 8;
      } else {LOG (FATAL) << "Unknown annotation type.";
    }} else {label_shape[0] = batch_size;

    } top[1]->reshape (Label_shape); The label data in the read-ahead thread for (int i = 0; i < this->prefetch_count; ++i) {this->prefetch_[i].label_.
    Reshape (Label_shape); }}}//This function was called on prefetch thread template<typename dtype> void annotateddatalayer<dtype>
  :: Load_batch (batch<dtype>* Batch) {Cputimer batch_timer; Batch_timer.
  Start ();
  Double read_time = 0; Double trans_time = 0;
  Cputimer timer;
  CHECK (Batch->data_.count ());

  CHECK (This->transformed_data_.count ()); Reshape according to the first anno_datum of each batch//in single input batches allows for inputs of varying dimen
  Sion.
  const int batch_size = This->layer_param_.data_param (). Batch_size ();
  Const annotateddataparameter& Anno_data_param =this->layer_param_.annotated_data_param ();

  Const transformationparameter& Transform_param =this->layer_param_.transform_param ();
  Initialize the size of Transformed_data_ and batch->data_ annotateddatum& anno_datum = * (Reader_.full (). Peek ()); Vector<int> top_shape =this->data_transformer_->inferblobshape (Anno_datum.datum ());//3x300x300 this- >transformed_data_. Reshape (Top_shape);
  Transformed_data_ stores an image for the Ssd300,transformed_data_ size: [1,3,300,300] top_shape[0] = batch_size; Batch->data_. Reshape (Top_shape); Batch->data_ stores batchsize images for ssd300,batch->data_ size [batchsize,3,300,300] dtype* top_data = Batch->data_.mutable_cpu_data ();  dtype* Top_label = NULL; Suppress warnings about uninitialized variables if (this->output_labels_ &&!has_anno_type_) {Top_
  label = Batch->label_.mutable_cpu_data ();
  }//Store transformed annotation. Map<int, vector<annotationgroup> > All_anno;

  Each image in the batchsize and the corresponding callout int num_bboxes = 0; for (int item_id = 0; item_id < batch_size; ++item_id) {timer.

    Start ();
    Get an image and do the appropriate preprocessing (such as adding disturbances) annotateddatum& anno_datum = * (Reader_.full (). Pop ("Waiting for Data")); Read_time + = timer.
    microseconds (); Timer.
    Start ();
    Annotateddatum distort_datum;
    annotateddatum* expand_datum = NULL; if (Transform_param.has_distort_param ()) {distort_datum.
      CopyFrom (anno_datum); This->data_transformer_->distortimage (Anno_datum.datum (), Distort_datum.
      Mutable_datum ()); if (transform_param.haS_expand_param ()) {expand_datum = new annotateddatum ();
      This->data_transformer_->expandimage (Distort_datum, expand_datum);
      } else {expand_datum = &distort_datum; }} else {if (Transform_param.has_expand_param ()) {expand_datum = new Annotateddatum (
        );
      This->data_transformer_->expandimage (Anno_datum, expand_datum);
      } else {expand_datum = &anno_datum;
    }} annotateddatum* sampled_datum = NULL;


    BOOL has_sampled = false; if (batch_samplers_.size () > 0) {/* 1. Data enhancement (corresponding to the paper 2.2 training part) * for batchsize Each image, generates Max_sample BoundingBox (candidate box) for each sampler (batch_sampler) * BoundingBox and target iou= generated per sampler
              0.1,0.3,0.5,0.7,0.9, this is consistent with the description of the paper * Example: Batch_sampler {sampler { min_scale:0.3 max_scale:1.0 min_aspect_ratio:0.5 max_aspect_ratio:2.0} sample_constraint { min_jaccard_overlap:0.7} max_sample:1 max_trials:50} * Yes In the sampler, randomly generated satisfies the condition of the boundingbox with any target in the image iou>0.7 * Note: * 1. The generated boundingbox coordinates are normalized coordinates so that the regression of the target detection is used in this form (such as MTCNN) * 2, which is not affected by resize. Randomly generated boundingbox, depending on the parameters of each batch_sampler: scale, aspect ratio, each sampler tries up to max_trials times * */vector<normalizedbbox>


      sampled_bboxes;//generated is normalized coordinates generatebatchsamples (*expand_datum, Batch_samplers_, &sampled_bboxes); /*2. Randomly pick a bounding box from the generated bounding box to crop out the bounding box corresponding image (size is sampled_bboxes[rand_idx] in the original) and calculate the bounding box Coordinates of all targets and categories * NOTE: * 1. Bounding the coordinates of the target in box = (coordinates of the ground truth in the original image-the coordinates of the bounding box)/(bounding box side length) * Here Groundtruth and boundingbox coordinates are relative to the original , this calculation is also used in MTCNN */if (sampled_bboxes.size () &GT
        0) {int rand_idx = Caffe_rng_rand ()% sampled_bboxes.size ();
        Sampled_datum = new Annotateddatum ();

        This->data_transformer_->cropimage (*expand_datum,sampled_bboxes[rand_idx],sampled_datum);
      Has_sampled = true;
      } else {sampled_datum = expand_datum;
    }} else {sampled_datum = expand_datum;
    } CHECK (Sampled_datum! = NULL); Timer.
    Start ();
    vector<int> Shape =this->data_transformer_->inferblobshape (sampled_datum->datum ()); if (Transform_param.has_resize_param ()) {//does not perform this part if (Transform_param.resize_param (). Resize_mode () = = resizeparameter_resize_mode_fit_small_size) {This->transformed_data_.
        Reshape (shape); Batch->data_.
        Reshape (shape);
      Top_data = Batch->data_.mutable_cpu_data ();
  } else {CHECK (std::equal (Top_shape.begin () + 1, top_shape.begin () + 4,shape.begin () + 1));    }} else {CHECK (std::equal (Top_shape.begin () + 1, top_shape.begin () + 4, Shape.begin (
    ) (+ 1));
    }//Apply data Transformations (mirror, scale, crop ...)
    int offset = Batch->data_.offset (item_id);
    This->transformed_data_.set_cpu_data (top_data + offset);
    Vector<annotationgroup> Transformed_anno_vec; if (This->output_labels_) {if (has_anno_type_) {//Make sure all data has same annotation
        Type.
        CHECK (Sampled_datum->has_type ()) << "Some Datum misses Annotationtype.";
        if (Anno_data_param.has_anno_type ()) {Sampled_datum->set_type (anno_type_); } else {check_eq (anno_type_, Sampled_datum->type ()) << "Different anno

        Tationtype. ";}

        Transform Datum and Annotation_group at the same time transformed_anno_vec.clear (); Annotateddatum,blob<float>,vecTor<annotationgroup>/* 3.
         Convert the crop out annotateddatum to the data section and the callout part * The data part will be resize to the size set by the data layer (e.g. 300x300) and saved to Top[0] * Callout is the coordinates of all targets in the image * * NOTE: * 1. The image here is not necessarily the original crop image, if Transform_param has crop_size this parameter, the original crop out of the image will again crop the * 2. As the crop out of the image is a resize, so if the generation of Lmdb, resize will cause the data layer to the original two times resize, * This may affect the target aspect ratio, so in the SFD (single Shot Sca Le-invariant face Detector), made a little improvement here, that is, in the first step of the generation of BoundingBox, ensure that each boundingbox is a square, so resize to 300x300 will not change Target aspect ratio */This->data_transformer_->transform (*sampled_datum,& (This->transformed_data_),&
        TRANSFORMED_ANNO_VEC);
          if (Anno_type_ = = Annotateddatum_annotationtype_bbox) {//Count the number of bboxes.
            Calculates how many targets in the randomly generated bounding box are in for (int g = 0; g < transformed_anno_vec.size (); ++g) {
          Num_bboxes + = Transformed_anno_vec[g].annotation_size (); }
        } 
        else {LOG (FATAL) << "Unknown annotation type.";
      }//BatchSize item_id image in the callout all_anno[item_id] = Transformed_anno_vec; } else {This->data_transformer_->transform (Sampled_datum->datum (),& (this->transform
        Ed_data_));
        Otherwise, store the label from Datum.
        CHECK (Sampled_datum->datum (). Has_label ()) << "Cannot find any label.";
      TOP_LABEL[ITEM_ID] = sampled_datum->datum (). label (); }} else {This->data_transformer_->transform (Sampled_datum->datum (),& (this->transform
    Ed_data_));
    }//Clear memory if (has_sampled) {delete sampled_datum;
    } if (Transform_param.has_expand_param ()) {delete expand_datum; } Trans_time + = timer.

    microseconds ();
  Put the read data back in Reader_.free (). Push (Const_cast<annotateddatum*> (&anno_datum)); }//Store "rich" annotation if NEEded. /*4. Finally save the callout information to top[1], top[1] shape:[1,1,numberofboxes,8] * Each line format: [item_id, Group_label, instance_id, xmin, ymin, Xmax,
   Ymax, diff] * The meaning of this 8-dimensional vector: the instance_id box's coordinates under the Group_label category in the item_id image of the batchsize image are [xmin, ymin, Xmax, Ymax] *
    */if (This->output_labels_ && has_anno_type_) {vector<int> label_shape (4);
      if (Anno_type_ = = Annotateddatum_annotationtype_bbox) {label_shape[0] = 1;
      LABEL_SHAPE[1] = 1;
      LABEL_SHAPE[3] = 8;
        if (num_bboxes = = 0) {//Store all-1 in the label.
        LABEL_SHAPE[2] = 1; Batch->label_.
        Reshape (Label_shape);
      Caffe_set<dtype> (8,-1, Batch->label_.mutable_cpu_data ());
        } else {//num_bboxes is the number of all the targets in all the images crop in front label_shape[2] = num_bboxes; Batch->label_.
        Reshape (Label_shape);
        Top_label = Batch->label_.mutable_cpu_data ();

        int idx = 0; To traverse the label information for each image in the bachsizes
        for (int item_id = 0; item_id < batch_size; ++item_id) {//ITE_ID image label information
          Const vector<annotationgroup>& Anno_vec = all_anno[item_id]; 

            for (int g = 0; g < anno_vec.size (); ++g) {const annotationgroup& anno_group = Anno_vec[g];  for (int a = 0; a < anno_group.annotation_size (); ++a) {Const annotation&
              Anno = Anno_group.annotation (a);

              Const normalizedbbox& bbox = Anno.bbox ();
              top_label[idx++] = item_id;
              top_label[idx++] = Anno_group.group_label ();
              top_label[idx++] = anno.instance_id ();
              top_label[idx++] = Bbox.xmin ();
              top_label[idx++] = Bbox.ymin ();
              top_label[idx++] = Bbox.xmax ();
              top_label[idx++] = Bbox.ymax ();
            top_label[idx++] = Bbox.difficult ();
    }}}}} else {  LOG (FATAL) << "Unknown annotation type."; }} timer.
  Stop (); Batch_timer.
  Stop ();
  DLOG (INFO) << "Prefetch batch:" << batch_timer.milliseconds () << "Ms.";
  DLOG (INFO) << "Read time:" << read_time/1000 << "Ms.";

DLOG (INFO) << "Transform time:" << trans_time/1000 << "Ms.";}
Instantiate_class (Annotateddatalayer);

Register_layer_class (Annotateddata);
 }//Namespace Caffe

There are several more important functions in the

Data Layer Generatebatchsamples (), This->data_transformer_->cropimage (), this->data_transformer_- >transform (), read them in detail below generatebatchsamples

void Generatebatchsamples (const annotateddatum& anno_datum, const vector<batchsampler> & Batch_samplers, vector<normalizedbbox>* sampled_bboxes) {Sampled_bboxes->clea

  R ();
  Get Groundtruth box vector<normalizedbbox> object_bboxes; 

  Groupobjectbboxes (Anno_datum, &object_bboxes); Generates multiple box for per sampler (int i = 0; i < batch_samplers.size (); ++i) {//Use original image as the source for S
    Ampling.
      if (Batch_samplers[i].use_original_image ()) {Normalizedbbox unit_bbox;
      Unit_bbox.set_xmin (0);
      Unit_bbox.set_ymin (0);
      Unit_bbox.set_xmax (1);
      Unit_bbox.set_ymax (1); Generatesamples (Unit_bbox,//Unit box object_bboxes,//ground Truth Box batch_s
    Amplers[i],//sampler sampled_bboxes); }}} void Generatesamples (const normalizedbbox& Source_bbox,//Unit box COnst vector<normalizedbbox>& object_bboxes,//object_bboxes is the image of all ground truth boxes con St batchsampler& Batch_sampler,//Sampler

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.