The whole process of similar image search from training to service

Source: Internet
Author: User
Tags create index glob

A project was recently completed with a map search, with a total duration of three months. Keep track of where machine learning is used in the project and the various pits that have been trampled. In general, the project is divided into several parts:

First, the training objective function

1. Set up the basic model

2. Add a new layer

3. Freeze Base Layer

4. Compiling the Model

5. Training

6. Save the Model

Second, feature extraction

Third, create the index

Iv. Building Services

1. Flask Development

2, Gunicorn asynchronous, increase service robustness

3, Supervisor Deployment Monitoring Services

V. Summary

First, the training objective function

project is in the pre-training model Vgg16 on the basis of fine tuning ( Fine_tune) , and the dimension of the feature from the original 2048 Dimension Descending to 1024x768 dimensions.

The fine-tuning of the model is divided into the following steps:

1. Set up the basic model

This paper uses the pre-trained VGG16 basic model, and uses its bottleneck characteristics.

# set up the base model

Base_model = VGG16(weights='./model/vgg16_weights_tf_dim_ Ordering_tf_kernels_notop.h5 ', include_top=False)

#指定权重路径

# include_top= False does not load three layer full join layer

2. Add a new layer

Will own the target picture, simple classification, statistics category (need to specify the category when training the model)

# Add a new layer

defAdd_new_last_layer (Base_model, nb_classes):'''Add the final layer:p Aram Base_model: Pre-trained model:p Aram nb_classes: Number of categories: return: New Model'''x=base_model.output x=globalaveragepooling2d () (x) x= Dense (+, activation='Relu') (x)#characteristic dimension of the outputpredictions = dense (nb_classes, activation='Softmax') (x) Model= Model (Input=base_model.input, output=predictions)returnModel

3. Freeze Base Layer

Previous parameters can use pre-trained parameters and do not need to be re-trained, so they need to be frozen to keep them from changing.

def Freeze_base_layer (model, Base_model):          for inch base_model.layers:         = False

4. Compiling the Model

Model.compile (optimizer='rmsprop', loss='categorical_crossentropy ', metrics= ['accuracy'])#  optimizer: Optimizer #  loss: Loss function, multi-class logarithmic loss needs to convert the classification label to a two-value sequence (which converts the label to a shape such as (Nb_samples, nb_classes)# metrics: list, Contains metrics to prepare training data for evaluating the network performance of the model during training and testing. 

5. Training

#Data PreparationIm_width, Im_height= 224,224Train_dir='./refine_img_data/train'Val_dir='./refine_img_data/test'nb_classes= 5Np_epoch= 3batch_size= 16Nb_train_samples=get_nb_files (train_dir) nb_classes= Len (Glob.glob (Train_dir +'/*')) Nb_val_samples=get_nb_files (Val_dir)#set new data generation parameters based on existing dataTrain_datagen=Imagedatagenerator (preprocessing_function=Preprocess_input,rotation_range=30, Width_shift_range=0.2, Height_shift_range=0.2, Shear_range=0.2, Zoom_range=0.2, Horizontal_flip=True) Test_datagen=Imagedatagenerator (preprocessing_function=Preprocess_input,rotation_range=30, Width_shift_range=0.2, Height_shift_range=0.2, Shear_range=0.2, Zoom_range=0.2, Horizontal_flip=True)#getting data from a folderTrain_generator=train_datagen.flow_from_directory (train_dir,target_size=(Im_width, im_height), Batch_size=Batch_size,class_mode='categorical') Validation_generator=test_datagen.flow_from_directory (val_dir,target_size=(Im_width, im_height), Batch_size=Batch_size,class_mode='categorical')#Traininghistory_t1=Model.fit_generator (Train_generator,epochs=1, Steps_per_epoch=10, Validation_data=validation_generator,validation_steps=10, Class_weight='Auto')

6. Save the Model

saving a model to a specified path is generally saved as . h5 format

Model.save ('/model/test_model.h5')

Second, feature extraction

Load our well-trained model and take the characteristics of the specified layer as needed.

#use Model.summary () to view the model structure#extracting picture features from a modeltarget_size= (224,224)defmy_feature (mod, PATH): IMG= Image.load_img (path,target_size=target_size) img=image.img_to_array (IMG) img= Np.expand_dims (IMG, axis=0) img=preprocess_input (IMG)returnmod.predict (IMG)#Create a model to get the specified layer characteristicsModel_path='./model/my_model.h5'Base_model=Load_model (model_path) model= Model (Inputs=base_model.input, Outputs=base_model.get_layer ('Dense_1'). Output)#Extracting featuresImg_path='./my_img/bus.jpg'feat= My_feature (Model,img_path)#shape for (1,128)Print(feat)Print(Feat.shape)#Note that when you need to extract a large number of picture features, such as more than tens of thousands of times, the time is longer, then we can use multi-core and batch processing to do (Python because the GIL problem is not friendly to multithreading). defpre_processs_image (path):ifPath is  notNone andOs.path.exists (PATH) andLen (path) > 10:Try: IMG=cv2.imread (Path, Cv2. Imread_color) img= Cv2.resize (IMG, (224, 224)) img=Cv2.cvtcolor (IMG, Cv2. COLOR_BGR2RGB) img= Img.transpose (2, 0, 1)return[MATERIAL_ID,IMG, Flag]exceptException as Err:traceback.print_exc ()returnNoneElse: Logging.error ('could not find path:'+path)returnNone#CPU part, call multicore processing function, specify the number of coresWith Processpoolexecutor (max_workers=20) as Executor:feat_paras=list (Executor.map (Pre_processs_image, Material_batch))#GPU part with batch processing#TODO

Third, create the index

Here we use Facebook's Open source Neighbor Index framework Faiss.

 #CREATE INDEXD= 128nlist= 100#Number of slicesNprobe= 8#number of shards per lookupquantizer_img= Faiss. IndexFlatL2 (d)#index creation based on Euclidean distanceImage_index=Nonemodel_index=NoneifImage_feat_array is  notNone andLen (img_feat_list) > 100: Image_index=Faiss. Indexivfflat (quantizer_img, D, Nlist, Faiss. METRIC_L2) Image_index.train (Image_feat_array) image_index.add_with_ids (Image_feat_array,image_id_array) Image_ Index.nprobe=Nprobe Image_index.dont_dealloc_me=quantizer_img#saves the current index to the specified pathFaiss.write_index (Img_index,path)#testing the current indextemp_feat= Img_feat_list[1]res_2= Image_index.search (Temp_feat, k=5) Logging.info ('Image Search result is:'+ str (res_2))

Iv. Building Services

The flask frame gunicorn is used as the WSGI container. Supervisor management process.

1. Flask Development

Reference Document Http://docs.jinkan.org/docs/flask/quickstart.html#a-minimal-application

2, Gunicorn asynchronous, increase service robustness

Basic syntax:

Gunicorn–w process_num–b ip:port–k ' gevent ' Filename:app

# Note: Do not select here –k ' gevent ' is run synchronously

Synchronous deployment:

Gunicorn-b 0.0.0.0:9090 My_service:app

Asynchronous deployment:

Gunicorn-b 0.0.0.0:9090-k gevent My_service:app

After using Gunicorn to deploy the application, the QPS was boosted by a factor of flask compared to the previous one. In the original flask framework, the thread is blocked here due to the request of other interfaces in my interface, which makes the program very easy to feign death. After the switch, the stability has been greatly improved.

3, Supervisor Deployment Monitoring Services

Refer to the following document Www.cnblogs.com/gjack/p/8076419.html

V. Summary

Project to this place, the basic service framework already has. Many places only say the general idea, but the structure is complete. Many of the tools used in this paper, such as Gunicorn's asynchronous, but the principle is not very understanding, but also need to take the effort to learn. Due to the pressure on the line, time tight, many places too late to carefully pondering, there must be a lot of flaws, the back of the leak to check the vacancy.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.