TensorFlow Tflearn Writing RCNN

Source: Internet
Author: User
Tags shuffle svm dnn

More than two weeks of efforts to finally write out the code of RCNN, this code is very interesting, and incidentally reviewed a few tensorflow application of knowledge points, so summarize, take everyone to share the experience. Theoretically, there are a lot of theoretical tutorials in rcnn, here I do not elaborate, interested friends can look at this blog to understand the approximate.

System Overview

The logic of RCNN is based on the alexnet model. In order to increase the object recognition rate of the model, the selective search algorithm was used to obtain about 2000 of the suspected item frame before the image was processed by the traditional algorithm (the algorithm in the paper). After that, the suspected boxes are imported into the CNN system to obtain the characteristics of the front layer of the output layer, and the trained SVM is the distinguishing object. The more interesting part of this includes the fine tune of the alexnet after imagenet training, the extraction of the last layer of feature points before the output layer in the fine tune and the training of the SVM classifier. Below, let's look at how to implement this model!

Code parsing

For the convenience of writing, here applies the Tflearn library as a tensorflow wrapper to write alexnet, about Tflearn, specific information click here to see its official website.

So let's take a look at the system flow first:

The first step is to train alexnet, where we use the Tensorflow-alexnet project on GitHub. The project will be alexnet used in the Learning Flower17 database, which is plainly the distinction between different kinds of flowers. All the features of GitHub's Code are written in earnest, but the author does not write about it, or whether the model supports continuing training at breakpoints, and it says, "Here's My Code:

Def train (Network, X, Y):    # Training    model = Tflearn. DNN (Network, checkpoint_path= ' model_alexnet ',                        Max_checkpoints=1, tensorboard_verbose=2, tensorboard_dir= ' Output ')    # This adds the mode of reading the archive. If you already have a saved model, then of course we read it and then continue    # Training!    if Os.path.isfile (' Model_save.model '):    model.load (' Model_save.model ')    model.fit (X, Y, n_epoch=100, validation_set=0.1, Shuffle=true,              show_metric=true, batch_size=64, snapshot_step=200,              snapshot_epoch= False, run_id= ' Alexnet_oxflowers17 ') # epoch = $    # save    The Model # here is to save the already calculated models    model.save (' Model_ Save.model ')

At the same time, we want to detect whether the model is functioning properly. Here is the code for detecting alexnet

# preprocessing picture functions: #------------------------------------------------------------------------------------------------# First, Read the image to form an image file Def load_image (img_path): img = Image.open (img_path) return img# the image file to be modified to 224 * 224 of the picture size (of course, RGB three    Channel we remain the same) def resize_image (In_image, New_width, New_height, Out_image=none, Resize_mode=image.antialias): img = In_image.resize ((new_width, new_height), Resize_mode) if Out_image:img.save (out_image) return img# will Image is loaded and converted to float32 format tensordef pil_to_nparray (pil_image): Pil_image.load () return Np.asarray (Pil_image, dtype= "Floa T32 ") # Network Framework functions: #------------------------------------------------------------------------------------------------ def create_alexnet (num_classes): # Building ' alexnet ' network = Input_data (Shape=[none, 224, 224, 3]) network = C onv_2d (Network, one, one, strides=4, activation= ' relu ') network = max_pool_2d (Network, 3, strides=2) network = Local_r Esponse_normalization (network) network =conv_2d (Network, 5, activation= ' relu ') network = max_pool_2d (Network, 3, strides=2) network = Local_response_no Rmalization (Network) network = conv_2d (Network, 384, 3, activation= ' relu ') network = conv_2d (Network, 384, 3, Activa tion= ' relu ') network = conv_2d (Network, 3, activation= ' relu ') network = max_pool_2d (Network, 3, strides=2) n  etwork = local_response_normalization (network) network = fully_connected (network, 4096, activation= ' tanh ') network =    Dropout (network, 0.5) network = fully_connected (network, 4096, activation= ' tanh ') network = Dropout (network, 0.5) Network = fully_connected (Network, num_classes, activation= ' softmax ') network = regression (network, optimizer= ' moment Um ', loss= ' categorical_crossentropy ', learning_rate=0.001) return Netwo rk# We are using this function to infer the DEF predict (network, modelfile,images) of the category of the input image: Model = Tflearn. DNN (Network) model.load (modelfile) return model.predict (iMages) If __name__ = = ' __main__ ': Img_path = ' testimg7.jpg ' IMGs = [] img = load_image (img_path) img = resize_i Mage (IMG, 224, 224) imgs.append (Pil_to_nparray (img)) NET = Create_alexnet (+) predicted = predict (NET, ' Model_sav     E.model ', IMGs) print (predicted)

So far, we have no direct relationship with RCNN. However, it is worth noting that the training model Model_save.model file We have previously saved is our pre-trained alexnet. So below, we are starting to formally make the RCNN system, let's write the traditional framework proposal code first.

Since the algorithm used in the text is selective search, I personally have not been too exposed to this algorithm, so it is time-consuming to write from scratch. Here I stole a lazy, using the Python ready-made library Selectivesearch to complete, then, the center of the preprocessing code is another concept, that is, IOU, interection or union concept. This concept is very useful here because a picture of our man-made label is often only for some of the objects in the way marked, the rest of us all counted as the background. Under this concept, if the computer selects many possible item frames at once, how do we decide which box corresponds to the object? For a completely non-overlapping box we naturally think that it does not label objects but backgrounds, but how do we classify the overlapping squares? We use the IOU concept here, that is, overlapping values exceed a valve value we label it as the object category, and in other cases we mark the box as the background. For a more detailed explanation, please click here.

So how do we implement this IOU in code?

# IOU Part 1def if_intersection (xmin_a, Xmax_a, Ymin_a, Ymax_a, Xmin_b, Xmax_b, Ymin_b, ymax_b): If_intersect = False # See if there is an intersection of two squares with four if. If none of the four conditions exist, we see no overlap if xmin_a < Xmax_b <= Xmax_a and (Ymin_a < Ymax_b <= ymax_a or ymin_a <= Ymin_b <  YMAX_A): If_intersect = True elif xmin_a <= Xmin_b < xmax_a and (Ymin_a < Ymax_b <= Ymax_a or ymin_a <= Ymin_b < ymax_a): If_intersect = True elif Xmin_b < xmax_a <= Xmax_b and (Ymin_b < ymax_a &lt ; = Ymax_b or Ymin_b <= ymin_a < Ymax_b): If_intersect = True elif xmin_b <= xmin_a < Xmax_b and (Ymi    N_b < ymax_a <= Ymax_b or Ymin_b <= ymin_a < Ymax_b): If_intersect = True Else:return False  # in the case of intersection, we organize two boxes of each of the four vertices by the size relationship, and get the intersection area by them if if_intersect = = True:x_sorted_list = Sorted ([Xmin_a, Xmax_a, Xmin_b, Xmax_b]) y_sorted_list = sorted ([Ymin_a, Ymax_a, Ymin_b, Ymax_b]) X_intersect_w = x_sorted_list[2] - X_SORTED_LIST[1] Y_intersect_h = y_sorted_list[2]-y_sorted_list[1] area_inter = X_intersect_w * y_inters Ect_h return area_inter# IOU part 2def IOU (Ver1, Vertice2): # vertices in four points # organize input vertices Vertice1 = [Ver1[0], ver1[1], ver1[0]+ver1[2], ver1[1]+ver1[3]] Area_inter = if_intersection (Vertice1[0], vertice1[2], vertice1[1 ], vertice1[3], vertice2[0], vertice2[2], vertice2[1], vertice2[3]) # If there is an intersection, calculate IOU if area_inter:area_1 = ver         1[2] * ver1[3] area_2 = vertice2[4] * vertice2[5] IOU = float (area_inter)/(area_1 + area_2-area_inter) return IOU return False

After

 , we can use the threthold of 0.5 for IOU when fine tune Alexnet, and 0.3 as threthold when training SVM. The function to achieve this thought is as follows:

# Read in data and save data for Alexnetdef load_train_proposals (datafile, NUM_CLSS, threshold = 0.5, SVM = False, Save=fa         LSE, save_path= ' DATASET.PKL '): train_list = open (datafile, ' r ') labels = [] images = [] for line in Train_list: TMP = Line.strip (). Split (") # tmp0 = Image Address # tmp1 = label # TMP2 = Rectangle vertic Es img = skimage.io.imread (tmp[0]) # python selective search function img_lbl, regions = Selectivesearch.sel Ective_search (IMG, scale=500, sigma=0.9, min_size=10) candidates = set () for R in regions: # excluding SA             Me rectangle (with different segments) # Remove Duplicate box if r[' rect '] in candidates:continue # Remove too small box if r[' size '] < 220:continue # resize to 224 * 224 for Input # restructure Box size proposal_img, Proposal_vertice = Clip_pic (img, r[' rect ') # Delete empty array # If the captured picture is empty , remove if Len (proposal_iMG) = = 0:continue # Ignore things contain 0 or not C contiguous array x, Y, W, h = r[' rect '] # long or box with a width of 0, excluding if w = = 0 or H = = 0:continue # Check If any 0-dimension exist # image A Rray's Dim has 0, culling [A, b, c] = Np.shape (proposal_img) If a = = 0 or b = = 0 or c = = 0:continue im = Image.fromarray (pr oposal_img) resized_proposal_img = Resize_image (IM, 224, 224) candidates.add (r[' rect ']) Img_float = Pil_to_nparra Y (resized_proposal_img) images.append (img_float) # calculation IOU Ref_rect = Tmp[2].split (', ') ref_rect _int = [Int (i) for i in ref_rect] Iou_val = IOU (Ref_rect_int, Proposal_vertice) # labels, let 0 repr Esent default class, which is background index = INT (tmp[1]) if SVM = = False:label = Np.zeros (num_clss+1  ) if Iou_val < threshold:label[0] = 1 Else:label[index] = 1 Labels.appEnd (label) Else:if Iou_val < threshold:labels.append (0) else:labels.append (index) if Save: Pickle.dump (images, labels), open (Save_path, ' WB ')) return images, labels

It is important to note that the input parameter of the SVM when true we do not need to use one hot way to express the label.

After preprocessing the input image, we need to use the pre-processed picture set to fine tune Alexnet.

# Use a already trained alexnet and the last layer redesigned# here define our alexnet fine framework. According to the original, we need to discard the last layer of alexnet, that is, softmax# and then put on a new layer of Softmax specifically for the new predicted class number +1 (because of the extra background class). The specific method is set # Restore to False, so that at the last layer of softmax, I do not restore any values.     def create_alexnet (Num_classes, Restore=false): # Building ' alexnet ' network = Input_data (Shape=[none, 224, 224, 3]) Network = conv_2d (network, one, one, strides=4, activation= ' relu ') network = max_pool_2d (Network, 3, strides=2) NE Twork = local_response_normalization (network) network = conv_2d (Network, 5, activation= ' relu ') network = Max_po ol_2d (Network, 3, strides=2) network = local_response_normalization (network) network = conv_2d (Network, 384, 3, ACTI vation= ' relu ') network = conv_2d (Network, 384, 3, activation= ' relu ') network = conv_2d (Network, 3, activation= ' Relu ') network = max_pool_2d (Network, 3, strides=2) network = local_response_normalization (network) network = ful ly_connected (Network, 4096, activation= ' TaNH ') network = Dropout (network, 0.5) network = fully_connected (network, 4096, activation= ' tanh ') network = Dropou T (network, 0.5) network = fully_connected (Network, num_classes, activation= ' Softmax ', restore=restore) network = Reg Ression (Network, optimizer= ' momentum ', loss= ' categorical_crossentropy ', le ARNING_RATE=0.001) return network# here, our training starts with the trained alexnet, that is, the Model_save.model begins to read. After the training, we will collect the training data into Fine_tune_model_save.model def fine_tune_alexnet (Network, X, Y): # Training model = Tflearn. DNN (Network, checkpoint_path= ' Rcnn_model_alexnet ', Max_checkpoints=1, tensorboard_verbose=2, tensor Board_dir= ' output_rcnn ') if Os.path.isfile (' Fine_tune_model_save.model '):p rint ("Loading The Fine Tuned model") model . Load (' Fine_tune_model_save.model ') elif os.path.isfile (' Model_save.model '):p rint ("Loading the Alexnet") model.load (' Model_save.model ') else:print ("No file to load, error") returN False model.fit (X, Y, n_epoch=10, validation_set=0.1, Shuffle=true, Show_metric=true, batch_size=64, SNA pshot_step=200, Snapshot_epoch=false, run_id= ' alexnet_rcnnflowers2 ') # epoch = $ # Save the Model MO Del.save (' Fine_tune_model_save.model ')

  Use these two functions to complete the fine tune of Alexnet. So far, we have done the direct use of the alexnet, and then we need to read the last layer of the Alexnet feature and use it to train the SVM. So, how do we get the feature of the picture? The method is simple, we subtract the output layer. The code is as follows:

# Use a already trained alexnet and the last layer Redesigneddef create_alexnet (num_classes, Restore=false): # Buildin G ' AlexNet ' network = Input_data (Shape=[none, 224, 224, 3]) network = conv_2d (network,, one, strides=4, activation = ' Relu ') network = max_pool_2d (Network, 3, strides=2) network = local_response_normalization (network) network = C onv_2d (Network, 5, activation= ' relu ') network = max_pool_2d (Network, 3, strides=2) network = Local_response_nor Malization (Network) network = conv_2d (Network, 384, 3, activation= ' relu ') network = conv_2d (Network, 384, 3, Activat ion= ' relu ') network = conv_2d (Network, 3, activation= ' relu ') network = max_pool_2d (Network, 3, strides=2) NE Twork = local_response_normalization (network) network = fully_connected (network, 4096, activation= ' tanh ') network = Dropout (network, 0.5) network = fully_connected (network, 4096, activation= ' tanh ') network = regression (Network, Opti       Mizer= ' momentum ',                  Loss= ' categorical_crossentropy ', learning_rate=0.001) return network 

After getting the features, we need to train the SVM. Why do we train SVM? Is it okay to use CNN's Softmax directly? This question is mentioned in the blog mentioned earlier. In short, SVM is suitable for small sample training, so this can improve the accuracy rate. The code for training the SVM is as follows:

# Construct Cascade svmsdef Train_svms (Train_file_folder, model):    # Here, we assign different training sets to different TXT files, each one containing only one category    Listings = Os.listdir (train_file_folder)    SVMs = [] for    train_file in listings:        if ' pkl ' in Train_file:    Continue        # Get training data for a single type of SVM.        x, Y = Generate_single_svm_train (train_file_folder+train_file)        train_features = [] for        i in X:            feats = Model.predict ([i])            train_features.append (Feats[0]) print ("Feature dimension")        print (Np.shape (train_ Features)        # This establishes a cascade SVM to differentiate all objects        CLF = SVM. Linearsvc ()        print ("Fit SVM")        Clf.fit (Train_features, Y) svms.append (CLF)    return SVMs

What do we do when we recognize an object? First, we get the suspect frame of the input image through a function:

def image_proposal (img_path): img = Skimage.io.imread (img_path) IMG_LBL, regions = Selectivesearch.selective_search ( IMG, scale=500, sigma=0.9, min_size=10) candidates = set () images = [] vertices = [] fo R R in regions:# excluding same rectangle (with different segments) if r[' rect ', in candidates:continue If r[' size ' < 220:continue# resize to 224 * 224 for input proposal_img, proposal_vertice = prep.cli P_pic (IMG, r[' rect ') # Delete Empty arrayif len (proposal_img) = = 0:continue # Ignore things contain 0 o R not C contiguous Arrayx, Y, W, h = r[' rect ']if w = = 0 or H = = 0:continue # Check If any 0-dimension exist[a, b, c] = Np.shape (proposal_img) If a = = 0 or b = = 0 or c = = 0:continueim = Image.fromarray (proposal_img) resized_proposal _img = Resize_image (IM, 224, 224) candidates.add (r[' rect ']) Img_float = Pil_to_nparray (resized_proposal_img) IMAGES.A        Ppend (Img_float)Vertices.append (r[' rect ']) return images, vertices 

This procedure is similar to a function in preprocessing, but is simpler because we do not need to consider the corresponding label. After that, we'll take these pictures one after the other to get the relative output (can actually do it together, but my computer always kill, maybe memory or other problems), finally, the application of cascaded SVM will be able to get the prediction results.

We must be curious about the results of the test. The following results compare the running results of alexnet and RCNN.

First, let's take a look at the results for the slices:

The results of this analysis are as follows: In the case of alexnet, the following data are obtained:

Judged as the fourth kind of flower. The actual results in the Flower 17 database are the last category, that is, the 17th category of flowers. Here, the 17th category of flowers is second only to the fourth class, which is 34%. So, what about the results of RCNN? We look at:

Obviously, the accuracy of the RCNN (Class 1) is very high.

TensorFlow Tflearn Writing RCNN

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.