A project was recently completed with a map search, with a total duration of three months. Keep track of where machine learning is used in the project and the various pits that have been trampled. In general, the project is divided into several parts:
First, the training objective function
1. Set up the basic model
2. Add a new layer
3. Freeze Base Layer
4. Compiling the Model
5. Training
6. Save the Model
Second, feature extraction
Third, create the index
Iv. Building Services
1. Flask Development
2, Gunicorn asynchronous, increase service robustness
3, Supervisor Deployment Monitoring Services
V. Summary
First, the training objective function
project is in the pre-training model Vgg16 on the basis of fine tuning ( Fine_tune) , and the dimension of the feature from the original 2048 Dimension Descending to 1024x768 dimensions.
The fine-tuning of the model is divided into the following steps:
1. Set up the basic model
This paper uses the pre-trained VGG16 basic model, and uses its bottleneck characteristics.
# set up the base model
Base_model = VGG16(weights='./model/vgg16_weights_tf_dim_ Ordering_tf_kernels_notop.h5 ', include_top=False)
#指定权重路径
# include_top= False does not load three layer full join layer
2. Add a new layer
Will own the target picture, simple classification, statistics category (need to specify the category when training the model)
# Add a new layer
defAdd_new_last_layer (Base_model, nb_classes):'''Add the final layer:p Aram Base_model: Pre-trained model:p Aram nb_classes: Number of categories: return: New Model'''x=base_model.output x=globalaveragepooling2d () (x) x= Dense (+, activation='Relu') (x)#characteristic dimension of the outputpredictions = dense (nb_classes, activation='Softmax') (x) Model= Model (Input=base_model.input, output=predictions)returnModel
3. Freeze Base Layer
Previous parameters can use pre-trained parameters and do not need to be re-trained, so they need to be frozen to keep them from changing.
def Freeze_base_layer (model, Base_model): for inch base_model.layers: = False
4. Compiling the Model
Model.compile (optimizer='rmsprop', loss='categorical_crossentropy ', metrics= ['accuracy'])# optimizer: Optimizer # loss: Loss function, multi-class logarithmic loss needs to convert the classification label to a two-value sequence (which converts the label to a shape such as (Nb_samples, nb_classes)# metrics: list, Contains metrics to prepare training data for evaluating the network performance of the model during training and testing.
5. Training
#Data PreparationIm_width, Im_height= 224,224Train_dir='./refine_img_data/train'Val_dir='./refine_img_data/test'nb_classes= 5Np_epoch= 3batch_size= 16Nb_train_samples=get_nb_files (train_dir) nb_classes= Len (Glob.glob (Train_dir +'/*')) Nb_val_samples=get_nb_files (Val_dir)#set new data generation parameters based on existing dataTrain_datagen=Imagedatagenerator (preprocessing_function=Preprocess_input,rotation_range=30, Width_shift_range=0.2, Height_shift_range=0.2, Shear_range=0.2, Zoom_range=0.2, Horizontal_flip=True) Test_datagen=Imagedatagenerator (preprocessing_function=Preprocess_input,rotation_range=30, Width_shift_range=0.2, Height_shift_range=0.2, Shear_range=0.2, Zoom_range=0.2, Horizontal_flip=True)#getting data from a folderTrain_generator=train_datagen.flow_from_directory (train_dir,target_size=(Im_width, im_height), Batch_size=Batch_size,class_mode='categorical') Validation_generator=test_datagen.flow_from_directory (val_dir,target_size=(Im_width, im_height), Batch_size=Batch_size,class_mode='categorical')#Traininghistory_t1=Model.fit_generator (Train_generator,epochs=1, Steps_per_epoch=10, Validation_data=validation_generator,validation_steps=10, Class_weight='Auto')
6. Save the Model
saving a model to a specified path is generally saved as . h5 format
Model.save ('/model/test_model.h5')
Second, feature extraction
Load our well-trained model and take the characteristics of the specified layer as needed.
#use Model.summary () to view the model structure#extracting picture features from a modeltarget_size= (224,224)defmy_feature (mod, PATH): IMG= Image.load_img (path,target_size=target_size) img=image.img_to_array (IMG) img= Np.expand_dims (IMG, axis=0) img=preprocess_input (IMG)returnmod.predict (IMG)#Create a model to get the specified layer characteristicsModel_path='./model/my_model.h5'Base_model=Load_model (model_path) model= Model (Inputs=base_model.input, Outputs=base_model.get_layer ('Dense_1'). Output)#Extracting featuresImg_path='./my_img/bus.jpg'feat= My_feature (Model,img_path)#shape for (1,128)Print(feat)Print(Feat.shape)#Note that when you need to extract a large number of picture features, such as more than tens of thousands of times, the time is longer, then we can use multi-core and batch processing to do (Python because the GIL problem is not friendly to multithreading). defpre_processs_image (path):ifPath is notNone andOs.path.exists (PATH) andLen (path) > 10:Try: IMG=cv2.imread (Path, Cv2. Imread_color) img= Cv2.resize (IMG, (224, 224)) img=Cv2.cvtcolor (IMG, Cv2. COLOR_BGR2RGB) img= Img.transpose (2, 0, 1)return[MATERIAL_ID,IMG, Flag]exceptException as Err:traceback.print_exc ()returnNoneElse: Logging.error ('could not find path:'+path)returnNone#CPU part, call multicore processing function, specify the number of coresWith Processpoolexecutor (max_workers=20) as Executor:feat_paras=list (Executor.map (Pre_processs_image, Material_batch))#GPU part with batch processing#TODO
Third, create the index
Here we use Facebook's Open source Neighbor Index framework Faiss.
#CREATE INDEXD= 128nlist= 100#Number of slicesNprobe= 8#number of shards per lookupquantizer_img= Faiss. IndexFlatL2 (d)#index creation based on Euclidean distanceImage_index=Nonemodel_index=NoneifImage_feat_array is notNone andLen (img_feat_list) > 100: Image_index=Faiss. Indexivfflat (quantizer_img, D, Nlist, Faiss. METRIC_L2) Image_index.train (Image_feat_array) image_index.add_with_ids (Image_feat_array,image_id_array) Image_ Index.nprobe=Nprobe Image_index.dont_dealloc_me=quantizer_img#saves the current index to the specified pathFaiss.write_index (Img_index,path)#testing the current indextemp_feat= Img_feat_list[1]res_2= Image_index.search (Temp_feat, k=5) Logging.info ('Image Search result is:'+ str (res_2))
Iv. Building Services
The flask frame gunicorn
is used as the WSGI container. Supervisor management process.
1. Flask Development
Reference Document Http://docs.jinkan.org/docs/flask/quickstart.html#a-minimal-application
2, Gunicorn asynchronous, increase service robustness
Basic syntax:
Gunicorn–w process_num–b ip:port–k ' gevent ' Filename:app
# Note: Do not select here –k ' gevent ' is run synchronously
Synchronous deployment:
Gunicorn-b 0.0.0.0:9090 My_service:app
Asynchronous deployment:
Gunicorn-b 0.0.0.0:9090-k gevent My_service:app
After using Gunicorn to deploy the application, the QPS was boosted by a factor of flask compared to the previous one. In the original flask framework, the thread is blocked here due to the request of other interfaces in my interface, which makes the program very easy to feign death. After the switch, the stability has been greatly improved.
3, Supervisor Deployment Monitoring Services
Refer to the following document Www.cnblogs.com/gjack/p/8076419.html
V. Summary
Project to this place, the basic service framework already has. Many places only say the general idea, but the structure is complete. Many of the tools used in this paper, such as Gunicorn's asynchronous, but the principle is not very understanding, but also need to take the effort to learn. Due to the pressure on the line, time tight, many places too late to carefully pondering, there must be a lot of flaws, the back of the leak to check the vacancy.