YOLO v2 Algorithm Details--taking Li Yu's gluon code as an example __ algorithm

Source: Internet
Author: User
Tags assert shuffle mxnet

Yolo algorithm in the field of object detection is a more interesting branch, the 2017 CVPR YOLO v2 to the original YOLO algorithm was upgraded, the paper itself contains more details of the algorithm, you can first reference blog: YOLO9000 algorithm detailed, Here the details of the YOLO V2 algorithm are detailed with the help of Li Yu's in-depth learning of the code in the Open Class (implemented through the Gluon interface in the Mxnet framework).
Reference Link: https://zh.gluon.ai/chapter_computer-vision/yolo.html

in the reference link, the code for implementing YOLO V2 algorithm is given, which mainly includes four parts of data reading, model loading, training model and test model, the most important is the Yolo2output class of model loading part and the yolo2_ of training model part. Forward functions and Yolo2_target functions. The following four sections are followed.

The implementation code for the

1, Data read section is primarily in the Get_iteratirs function, In this function, the main use of the Mxnet.image.ImageDetIter interface to read, you need to prepare Train.rec and Val.rec files in advance, Class_names is the list of object names. In addition, this defines the size of the input image as 3*256*256, and the following code is based on this definition. meaning of two parameters: 1, min_object_covered (float, default=0.1) –the cropped area of the image must contain in least this fraction to any bounding box supplied. The value of this parameter should is non-negative. In the case of 0, the cropped area does not need to overlap any of the bounding boxes supplied. 2, Max_attempts (int, default=50) –number of attempts at generating a cropped/padded region of the of the specified. After max_attempts failures, return the original image.

From mxnet import image from mxnet import nd data_shape = 256 Batch_size = Rgb_mean = Nd.array ([123, 117,]) Rgb_st
    D = Nd.array ([58.395, 57.12, 57.375]) def get_iterators (Data_shape, batch_size): Class_names = [' Pikachu ', ' dummy '] Num_class = Len (class_names) Train_iter = image. Imagedetiter (Batch_size=batch_size, data_shape= (3, Data_shape, Data_shape), path_imgrec=data_dir+
        ' Train.rec ', path_imgidx=data_dir+ ' train.idx ', Shuffle=true, Mean=true, Std=true, Rand_crop=1, min_object_covered=0.95, max_attempts=200) Val_iter = image. Imagedetiter (Batch_size=batch_size, data_shape= (3, Data_shape, Data_shape), path_imgrec=data_dir+ ' Val.rec ', Shuffle=false, Mean=true, std=true) return train_iter, Val_iter, Class_names, Num_c Lass Train_data, Test_data, class_names, Num_class = Get_iterators (Data_shape, Batch_size)

2. Model Loading part, first import the model through the Mxnet.gluon.model_zoo.vision.get_model interface, which is similar to Pytorch. When importing, notice the last. Features, the features is a variable in the initialization function of the ResNetV1 class, initialized by the Mxnet.gluon.nn.HybridSequential interface, Mxnet.gluon.nn.HybridSequenti Al is a special case of mxnet.gluon.nn.Sequential, Mxnet.gluon.nn.Sequential will be added to the layers in sequential order. This adds a layer through the Add method of the Hybridsequential class (the more abstract point is that the layer or network is implemented through block), and finally contains all the layers except the last fully connected layer of the network. In other words, net storage pretrained network structure in addition to the last two layers, is used to construct the main network, so net is containing 7*7 convolution plus 3 blocks (pretrained contains 7*7 convolution plus 4 block plus pooling layer), NET last level output feature map size becomes 256/16=16. The scales variable is used to hold the anchor's dimension information, which is a two-dimensional list, each row represents an anchor, the first column represents the width, and the second column represents the height. The value in this scales is based on the output of the final layer of the main network feature map size, such as this net last layer output is 16*16, then the scale inside the value is 3 or 9 this size is more normal.Yolo2output This class is used to construct the prediction layer, and the final Net.add (Predictor) completes the connection between the primary network and the prediction layer, followed by a detailed description of the Yolo2output class。 Predictor.initialize () is called the Mxnet.gluon.Block class (The Yolo2output class is implemented based on the Hybridblock class, Hybridblock is implemented through the Block base class) Initialize method, is used to initialize the network parameters, this step is necessary, otherwise constructed Network structure without parameters to run forward will be an error. The two main parameters of the Initialize method are initialization (generally default) and CTX (that is, a specified CPU or GPU, such as CTX=[MX.GPU (0), MX.GPU (1)). In addition the Initialize method and the Block.collect_params (). Initialize () result, the Collect_params () method of the Block class is more commonly used in block, Returns the block and its children parameters, there is an example of Collect_params () in the official website: If you want to initialize the DENSE1 layer with DENSE0 parameters, you can do this: Dense0 = nn. Dense (m); dense1 = nn. Dense (Params=dense0.collect_params ()). As to why NET does not run the Initialize method, is because when obtains the pretrained to set the Pretrained=true, namely uses the pre training model to carry on the parameter initialization.

From Mxnet.gluon.model_zoo import Vision
pretrained = Vision.get_model (' resnet18_v1 ', pretrained=true). Features
net = nn. Hybridsequential ()
for I in range (len (pretrained)-2):
    Net.add (Pretrained[i])

# anchor Scales, try Adjust It yourself
scales = [[3.3004, 3.59034],
          [9.84923, 8.23783]]

# use 2 classes, 1 as dummy class, otherwise soft Max won ' t work
predictor = Yolo2output (2, scales)
predictor.initialize ()
Net.add (Predictor)

the Yolo2output class is used to construct the prediction layer, as follows . Several assert statements are used to ensure that the input data is formatted in accordance with the requirements, the more important one is out_channels = Len (anchor_scales) * (Num_class + 1 + 4), first Len (anchor_scales) Represents the number of anchor, (Num_class + 1 + 4), the number of Num_class in the object, and 1 represents the center point coordinates and the width-height information of the score,4 representation box . Self.output = nn. Conv2d (Out_channels, 1, 1) The line that uses 1*1 convolution to complete the prediction layer.

Class Yolo2output (Hybridblock):
    def __init__ (self, num_class, Anchor_scales, **kwargs):
        super (Yolo2output, Self). __init__ (**kwargs)
        assert num_class > 0, "Number of classes should > 0, given {}". Format (num_class)
        s Elf._num_class = Num_class
        assert isinstance (anchor_scales, (list, tuple)), "list or tuple of anchor scales required" C5/>assert len (anchor_scales) > 0, "At least one anchor scale required" for
        anchor in Anchor_scales:
            assert len (anchor) = = 2, "expected each anchor scale to is (width, height), provided {}". Format (anchor)
        self._anchor_scales = an Chor_scales
        out_channels = Len (anchor_scales) * (Num_class + 1 + 4) with
        Self.name_scope ():
            self.output = n N.conv2d (Out_channels, 1, 1)

    def hybrid_forward (self, F, X, *args): Return
        self.output (x)

3. Training Code Section。 First through the gluon. The Trainer interface Initializes a trainer, followed by a training loop (20 epoch in the example), and the first of the loops resets the values of several loss functions. For I, the batch in enumerate (Train_data) reads one batch of data per loop. x = Net (x) is the result of data input to output, such as when the anchor quantity is 2, the input image size is 256*256,batch size 32 o'clock, the line code input is 32*3*256*256, the output is 32*14*16*16, and 14 is 2* (2 +1+4), the three values in parentheses represent the number of categories, score, and coordinate information, respectively. Output, cls_pred, score, XYWH = Yolo2_forward (x, 2, scales) one line callsYolo2_forward functionThe net output is processed, and the function is described in detail later. Tid, Tscore, tbox, sample_weight = Yolo2_target (score, xywh, y, scales, thresh=0.5) line is calledyolo2_target functionThe information related to the model training target is obtained, and the function is described in detail later. Loss1 = Sce_loss (cls_pred, Tid, sample_weight * class_weight) row is the loss of the computed classification, and the input cls_pred represents the predicted probability of each category of each box, Tid represents and the true box of the IOU largest boxes of the label, Sample_weight is only and the real box IOU the largest boxes for 1, the rest is 0,It can be seen that each object in the YOLO algorithm is predicted by a box in the grid cell where the center of the object is located. Score_weight is to calculate the weight of positive and negative samples in return loss, where the value of some positions of the first input matrix of the Nd.where function satisfies the inequality, then the corresponding position is substituted with the second input, instead of the third input.The end of this score_weight, in addition to the real box with the IOU the maximum weight is positive_weight, the other is Negative_weight。 Loss2 = L1_loss (score, Tscore, Score_weight) one line is to calculate the loss of score, which is our common confidence level displayed on the box, with a range of 0 to 1 decimal places. LOSS3 = L1_loss (Xywh, Tbox, Sample_weight * box_weight) is a line that calculates the loss of the box return,Here Sample_weight*box_weight is also the loss of the largest box with only the return and real frame IOU, and multiplied by box_weight to increase LOSS3 's weight in the loss。 The final loss is the loss of the 3. Trainer.step (batch_size) is the update of the network parameters, the reason to enter the batch_size, because the gradient is normalized to 1/batch_size. Cls_loss.update (LOSS1) is the update loss value, Obj_loss and Box_loss Similarly, here 3 values are only to print in the display interface, and the loss of return does not matter. The whole process is like this, and then the functions are described in detail.

From mxnet import init mxnet import GPU Positive_weight = 5.0 Negative_weight = 0.1 Class_weight = 1.0 Box_weight = 5.0 CTX = GPU (0) Net.collect_params (). Reset_ctx (CTX) trainer = Gluon.  Trainer (Net.collect_params (), ' sgd ', {' learning_rate ': 1, ' WD ': 5e-4} ') Import time to mxnet import Autograd for Epoch
    In range: # Reset data iterators and Metrics Train_data.reset () Cls_loss.reset () Obj_loss.reset () Box_loss.reset () tic = Time.time () for I, batch in enumerate (train_data): x = Batch.data[0].as_in_context (ctx) y = Batch.label[0].as_in_context (CTX) with Autograd.record (): x = Net (x) out Put, cls_pred, score, XYWH = Yolo2_forward (x, 2, scales) with Autograd.pause (): Tid, Tscore, t box, Sample_weight = Yolo2_target (score, xywh, y, Scales, thresh=0.5) # Losses Loss1 = Sce_loss (cl S_pred, Tid, Sample_weight * class_weight) score_weight = nd.wheRe (sample_weight > 0, nd.ones_like (sample_weight) * Positive_weight, Nd.ones_like (sample_weight) * negative_weight) Loss2 = L1_loss (score, Tscore, Score_weig HT) LOSS3 = L1_loss (Xywh, Tbox, Sample_weight * box_weight) loss = Loss1 + Loss2 + loss3 L Oss.backward () Trainer.step (batch_size) # Update Metrics cls_loss.update (LOSS1) obj_loss.u
        Pdate (LOSS2) box_loss.update (LOSS3) print (' Epoch%2d, train%s%.5f,%s%.5f,%s%.5f time%.1f sec '% ( Epoch, *cls_loss.get (), *obj_loss.get (), *box_loss.get (), Time.time ()-tic))

Yolo2_forward Function Used to organize and transform the network output. For example, the network structure of the Code and the input dimension is 3*256*256,batch_size=32, the input x of the Yolo2_forward function is 32*14*16*16. Stride = Num_class + 5 Here's 5 is a score plus four coordinate-related values. x = X.transpose ((0, 2, 3, 1)) is to move the output channel to the last dimension and then to the 0-dimensional output by x = X.reshape (0, 0, 1,-5, Stride), the front 3-dimensional invariant, respectively, is batch Size,weig Ht,height, the 4th dimension is the number of anchor, and the 5th dimension is the corresponding argument for each anchor (2 categories + 1 score+4 coordinates), so the resulting x is 32*16*16*2*7. cls_pred = X.slice_axis (begin=0, End=num_class, Axis=-1) is the first Num_class matrix (here 2) that takes the last dimension of X as a category prediction result. score_pred = X.slice_axis (Begin=num_class, End=num_class + 1, axis=-1) is the next 1 matrices taking the last dimension of x as the predictive result of score. xy_pred = X.slice_axis (Begin=num_class + 1, End=num_class + 3, axis=-1) is to take the last dimension of X and then the next 2 matrices as the center point coordinates of the box to predict the results. WH = X.slice_axis (Begin=num_class + 3, End=num_class + 5, axis=-1) is to take the last dimension of X and then the next 2 matrices as the width and height prediction result of box. The last dimension of the length 7 is clearly divided. Here score = nd.sigmoid (score_pred) and XY = Nd.sigmoid (xy_pred) are normalized because the score range is between 0 and 1, because the relative coordinates of the grid cell are used, So need 0 to 1 range (can see the original Figure3 bx and by calculation, where the model predicted that the XY corresponds to Figure3 in the TX and Ty). Transform_center is used to convert the relative coordinates of each grid cell into relative coordinates on the picture. The Transform_size function is to treat the width of the model outputinto the actual wide height. CID is the predicted category for each box. Left, top, right, and bottom are the bounds of the predicted box.

def yolo2_forward (x, Num_class, anchor_scales): "" "Transpose/reshape/organize convolution outputs." "
    Stride = Num_class + 5 # Transpose and reshape, 4th Dim is the number of anchors x = X.transpose ((0, 2, 3, 1)) x = X.reshape ((0, 0, 0,-1, Stride)) # now X is (batch, M, N, stride), stride = Num_class + 1 (object score) + 4 (coord  Inates) # class Probs cls_pred = X.slice_axis (begin=0, End=num_class, Axis=-1) # object Score score_pred = X.slice_axis (Begin=num_class, End=num_class + 1, axis=-1) score = nd.sigmoid (score_pred) # Center prediction, in Range (0, 1) for each grid xy_pred = X.slice_axis (Begin=num_class + 1, End=num_class + 3, axis=-1) xy = nd.sigmoid ( xy_pred) # width/height Prediction WH = X.slice_axis (Begin=num_class + 3, End=num_class + 5, axis=-1) # Conver 
    T x, y to positions relative to image x, y = Transform_center (XY) # convert W, h to width/height relative to image W, h = transform_size (WH, AnchoR_scales) # CID is the Argmax Channel CID = Nd.argmax (cls_pred, Axis=-1, keepdims=true) # Convert to corner fo
    Rmat boxes Half_w = W/2 Half_h = H/2 left = nd.clip (x-half_w, 0, 1) top = nd.clip (y-half_h, 0, 1) right = Nd.clip (x + half_w, 0, 1) bottom = Nd.clip (y + half_h, 0, 1) output = Nd.concat (*[cid, score, left, to P, right, bottom], dim=4-return output, cls_pred, score, Nd.concat (*[XY, WH), dim=4)

has two important functions in the Yolo2_forward function: The Transform_center function and the Transform_sizer function . These two functions are used for coordinate and long-width conversions. One of the highlights of the YOLO v2 algorithm is that the target is not a central coordinate or a long offset, but a simple change of offset, which can be read in detail. The
transform_center function is used to convert the relative coordinates of each grid cell into relative coordinates on the picture. The first input xy is the 32*16*16*2*2 size, then xy[0,1,1,0,:] represents the No. 0 anchor 1,1 and weight on the feature map of the first input 16*16 (height,feature) Each point on the map represents a grid cell, and this weight and height is the distance from the upper-left corner of the grid cell relative to the grid cell, and if weight=height=1, the point is the lower-right corner of the grid cell. Offset_y is the 32*16*16*2*1 size, where 16*16 is the first act 0, the second Act 1 ... The last behavior 15 of the two-dimensional matrix, the other dimensions are directly broadcast the past, offset_x the same. So when performing x + offset_x operations, for x[b,h,2,n,0] is to add a 2,x[b,h,4,n,0] is to add 4. The last divided by W or divided by H is also normalized operation, so that the resulting x and Y ranges from 0 to 1. Therefore, this function is to realize the Figure3 of the paper in this step.

def transform_center (XY): "" "
    Given x, y prediction after sigmoid (), convert to relative coordinates (0, 1) on image." ""
    B, H, W, n, s = xy.shape
    offset_y = nd.tile (Nd.arange (0, H, repeat= (w * n * 1), ctx=xy.context). Reshape (1, H, W, N, 1)), (b, 1, 1, 1, 1))
    # Print (Offset_y[0].asnumpy () [:,:, 0, 0])
    offset_x = nd.tile (Nd.arange (0, W, repeat= (  n * 1), ctx=xy.context). Reshape ((1, 1, W, N, 1)), (b, H, 1, 1, 1))
    # Print (Offset_x[0].asnumpy () [:,:, 0, 0])
    x, y = Xy.split (num_outputs=2, axis=-1)
    x = (x + offset_x)/w
    y = (y + offset_y)/h return
    x, y

The transform_size function is similar to the Transform_center function. The realization is this step of the Figure3 in the paper (figure formula below). Input WH corresponding to TW and th. AW and AH are the box's wide-high information.

def transform_size (WH, anchors): "" "
    Given W, H prediction after exp () and anchor sizes, convert to relative Width/heig HT (0, 1) on image "" "
    B, H, W, n, s = wh.shape
    aw, ah = Nd.tile (Nd.array (anchors, ctx=wh.context). Reshape (1, 1, 1 ,-1, 2)), (b, H, W, 1, 1)). Split (num_outputs=2, Axis=-1)
    w_pred, h_pred = Nd.exp (WH). Split (num_outputs=2, Axis=-1) 
  w_out = w_pred * aw/w
    h_out = h_pred * ah/h return
    w_out, h_out

yolo2_target functionConstruct model training goals. Here the input labels is ground truth, the size is 32*1*5,1 that only 1 object,5 contain 1 class tags and 4 coordinate information. For b in range (Output.shape[0]) is to traverse each input in the batch, the label is k*5 size NumPy array,k is the object number, the normal object label is greater than 0, so here valid_ Label is to filter out the wrong labels. Input scores's size n indicates the number of anchor, and H and W are 256*256 for the input image are 16 and 16, respectively. For L in Valid_label to traverse all valid object annotation information in a graph, because the coordinates of the callout data are to take the upper-left and lower-right coordinates of the box (or relative coordinates, that is, the value is 0 to 1), you can get GX, GY, GW, and gh;ind_ by simply adding and reducing X and ind_y correspond to the input coordinates, such as your input feature map is 16*16, in other words ind_x and ind_y are the coordinates of a grid cell on 16*16 's feature map. So the emphasis came: tx = GX * w-ind_x and Ty = Gy * h-ind_y,tx and Ty are the target values for model regression. Intersect is to compute the intersection area of each anchor and ground truth, so it is a 1*n numpy array,n is the number of anchor; Ovps is the ratio of the intersection area to the area of the aggregate, that is, the IOU, also the 1*n size. Best_match is the index of the anchor that chooses the IOU largest.ind_x and ind_y are used in the next few lines of assignment, which is why in the YOLO algorithm the object is predicted in the box where the center of the ground Truth frame of the object is located, in fact the so-called box is implicit, From the introduction here can also be seen, first, according to the size of the box to match the current object, find IOU largest box, and then a ndarray ground truth information to the box, including Socre, coordinates, category labels, Which box and central point coordinates。 Target_id[b, Ind_y, ind_x, Best_match,:] = l[0] is the label that assigns the IOU largest anchor to ground truth, as long as the anchor tag of the point that does not carry this assignment is 1, the background. Target_score[b, Ind_y, ind_x, Best_match,:] = 1.0 is the Best_match assignment of score box to 1, that is, the confidence level is 1, others are 0. The calculation of TW and th is the reverse process of the FIGURE3 formula in the paper. So the final Target_box is the goal of model training, the symbol and the formula in the paper are one by one corresponding. Sample_weight represents the weight. The results obtained from the sigmoid function in the TX and Ty corresponding papers are given.

Def yolo2_target (scores, boxes, labels, anchors, Ignore_label=-1, thresh=0.5): "" "Generate training targets, given pred
    Ictions and labels. "" " B, H, W, n, _ = Scores.shape anchors = Np.reshape (Np.array (anchors), ( -1, 2)) #scores = Nd.slice_axis (outputs, beg In=1, end=2, axis=-1) #boxes = Nd.slice_axis (outputs, begin=2, end=6, axis=-1) gt_boxes = Nd.slice_axis (labels, be Gin=1, end=5, axis=-1) Target_score = Nd.zeros ((b, H, W, N, 1), ctx=scores.context) target_id = Nd.ones_like (targe T_score, Ctx=scores.context) * Ignore_label Target_box = Nd.zeros ((b, H, W, N, 4), Ctx=scores.context) Sample_weig HT = Nd.zeros ((b, H, W, N, 1), ctx=scores.context) for B in range (output.shape[0)): # Find the best match for
        Each Ground-truth label = Labels[b].asnumpy () Valid_label = Label[np.where (label[:, 0] >-0.5) [0],:] # Shuffle because multi GT could possibly match to one anchor, we keep the last match randomly np.random. Shuffle (Valid_label) for L in Valid_label:gx, Gy, GW, GH = (l[1] + l[3])/2, (l[2] + l[4])/2, l[3
            ]-l[1], l[4]-l[2] ind_x = Int (GX * W) ind_y = Int (GY * h) tx = GX * w-ind_x ty = Gy * h-ind_y GW = GW * W GH = gh * H # Find the best match using width and height only, assuming centers are identical intersect = Np.minimum (anchors[:, 0], GW) * Np.minimum (anchors [:, 1], gh Ovps = intersect/(GW GH + anchors[:, 0] * anchors[:, 1]-intersect) Best_match = Int (Np.argmax (OVPS)) target_id[b, ind_y, ind_x, Best_match,:] = l[0] target_score[b, ind_y, ind_x , Best_match,:] = 1.0 tw = Np.log (gw/anchors[best_match, 0]) th = Np.log (gh/anchors[best_matc  H, 1]) target_box[b, ind_y, ind_x, Best_match,:] = Mx.nd.array ([TX, Ty, TW, th]) sample_weight[b, Ind_y, ind_x, BEST_MATCH,:] = 1.0 # print (' ind_y ', ind_y, ' ind_x ', ind_x, ' Best_match ', Best_match, ' t ', TX, ty, tw, Th, ' OVP ', Ovps [Best_match], ' GT ', GX, Gy, gw/w, gh/h, ' anchor ', Anchors[best_match, 0], Anchors[best_match, 1]-return target_id, TA Rget_score, Target_box, Sample_weight

About loss function , where two loss functions are defined for classification and regression: The Cross entropy loss function of the classification Sce_loss and the L1 loss function of the regression l1_loss. Obj_loss, Cls_loss, Box_ Loss is implemented by inheriting the Mx.metric.EvalMetric class, which was originally used to implement evaluation criteria, but it is also possible to output loss with this class, since these three variables are used to output to the display interface to facilitate viewing of the training progress.

sce_loss = Gluon.loss.SoftmaxCrossEntropyLoss (from_logits=false) L1_loss = Gluon.loss.L1Loss () from mxnet Import metric class Lossrecorder (mx.metric.EvalMetric): "" "Lossrecorder are used to re Cord raw loss so we can observe loss directly "" "Def __init__ (self, name): Super (Lossrecorder, self). __in  IT__ (name) def update (self, labels, preds=0): "" "" "" Update metric with pure loss "" to loss in  Labels:if isinstance (Loss, Mx.nd.NDArray): Loss = Loss.asnumpy () self.sum_metric + = Loss.sum () Self.num_inst + + = 1 Obj_loss = Lossrecorder (' objectness_loss ') Cls_loss = Lossrecorder (' Classif Ication_loss ') Box_loss = Lossrecorder (' Box_refine_loss ') 

4, the test section consists primarily of two steps, first reading the data and doing preprocessing, then the output from the model. Data reading and preprocessing is done through the Process_image function, which calls the Mxnet.image.imdecode interface to decode the open image into Height*width*3 ndarray. Then the Mxnet.image.imresize interface is resize to the specified size, then the data format is converted to float32 and normalized, and the No. 0 and 2nd Channel are converted into 3*height*width form and a dimension is added as 1*3* Height*width, this simulates the batch operation. The Predict function is used to predict the processed data, to get results from the trained model net, and to get detailed results through the Yolo2_forward function. Output.reshape ((0,-1, 6)) operation will output reshape to 3 dimensions, respectively, representing the number of batch size,box, box 6 indicator information (category, score, 4 coordinate information), and finally after the NMS operation to remove the repeating box.

def process_image (fname): With open (fname, ' RB ') as F:im = Image.imdecode (F.read ())  # resize to Data_shape data = Image.imresize (IM, Data_shape, data_shape) # minus RGB mean, divide STD data = (Data.astype (' float32 ')-Rgb_mean)/RGB_STD # Convert to batch x Channel x Height xwidth return data.transpos  E ((2,0,1)). Expand_dims (Axis=0), Im def predict (x): x = net (x) output, Cls_prob, score, XYWH = Yolo2_forward (x, 2, Scales) return Nd.contrib.box_nms (Output.reshape (0,-1, 6)) x, im = Process_image ('. /img/pikachu.jpg ') out = Predict (X.as_in_context (CTX)) 

Displays the results of the test, and each call to a display is passed the predicted result (out[0]) of one image. The Display function Plt.imshow (Im.asnumpy ()) is used to show the image, input can be numpy array, if it is RGB should be (n,m,3) such a dimension, the value can be uint8 or float. The For loop in the display function iterates through an NMS and the If class_id < 0 or score < threshold statements skip the background class and score Box,box = row[2:6] * Np.array that are less than a threshold value ([im.shape[0],im.shape[1]]*2) maps the bounding box boundary of a prediction to a boundary in the original image. Rect is a rectangular box from box boundary information and added to the PLT through Add_patch, where PLT.GCA () is equivalent to getting the current PLT content, and then the Add_patch method to add new content such as box or text information.

mpl.rcparams[' figure.figsize ' = (6,6)

colors = [' Blue ', ' green ', ' red ', ' black ', ' Magenta ']

def display (IM, out , threshold=0.5):
    plt.imshow (Im.asnumpy ())
    for row in out:
        row = Row.asnumpy ()
        class_id, score = Int ( Row[0]), row[1]
        if class_id < 0 or score < threshold:
            continue
        color = Colors[class_id%len (colors)]
  box = row[2:6] * Np.array ([im.shape[

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.