[Ai refining] machine learning 051-bag of Vision Model + extreme random forest to build an image classifier

Source: Internet
Author: User
Tags glob scale image
[Ai refining] machine learning 051-bag of Vision Model + extreme random forest to build an image classifier

(Python library and version number used in this article: Python 3.6, numpy 1.14, scikit-learn 0.19, matplotlib 2.2)

Bag of visual words (bovw) comes from bag of words (BOW) in natural language processing, for more information, see my blog [ai refining] machine learning 038-nlp to create a bag-of-words model. in NLP, bow's core idea is to use a document as a bag containing a variety of words. The importance of a word is measured based on the frequency or weight of the word. An important feature of bow is that it does not consider the order of word appearance, sentence syntax, and other factors.

The bag-of-words Visual Model bovw is a method that applies bow's core idea to the field of image processing. To represent an image, we can regard the image as a document, that is, a set of several "visual words", like bow, does not consider the order in which these visual words appear. Therefore, bovw's disadvantage is that it ignores the spatial location information between pixels (of course, there are many improvements to this disadvantage ). The core idea of bovw is as follows.

Some people have asked, there are many methods to extract image features, such as sift Feature Extraction and star feature extraction. Why do we need to use bovw models to characterize the image? Because of Sift, the feature vectors obtained by star feature extraction machines are multidimensional. For example, the sift vectors are 128 dimensions, and an image usually contains hundreds of Sift vectors, when performing downstream machine learning and computing, this computing workload is very large and the efficiency is very low. Therefore, we usually use clustering algorithms to cluster these feature vectors, then, a cluster in the cluster is used to represent a visual word in bovw, And the sift vector of the same image is mapped to the visual word sequence to generate the visual code book, each image can be described using a visual code vector. In subsequent computation, the efficiency is greatly improved, which is helpful for large-scale image retrieval.

For more details about the bovw, refer to the blog: Visual bag-of-words model bow learning notes and Matlab programming implementation.


1. Use bovw to create an image dataset

Bovw mainly includes three key steps:

1. Image Feature Extraction: The extraction algorithm can use methods such as sift, Star, and hog. For example, after using the sift Feature Extraction Tool to extract each image in the dataset, each sift feature is represented by a 128-dimension descriptive feature vector. If there are m images, a total of N sift feature vectors are extracted.

2. Obtain the visual word through clustering: K-means is the most commonly used. Of course, other clustering algorithms can be used to cluster n sift feature vectors, k-means divides n feature vectors into k clusters, so that the feature vectors inside each cluster have very high similarity, while the similarity between clusters is low, after clustering, K cluster centers are obtained (in bovw, the cluster center is called a visual word ). Calculate the distance from each sift feature of each image to these K visual words, and map it to the nearest cluster (that is, the corresponding word frequency of the visual word + 1 ). In this way, each image becomes a Word Frequency Vector corresponding to the visual word.

3. Construct a visual code: because the number of SIFT features in each image is not equal, you need to normalize these word frequency vectors to change the number of SIFT features in each image to a frequency, in this way, the visual code is obtained.

The entire process can be simply described as follows:

Prepare a dataset. First, extract three types of images from caltech256, and randomly extract 20 images from each category to form a small dataset. Each category is placed in a folder, the folder name starts with a number and a hyphen (-). A number indicates the category name. This small dataset is purely used to verify whether the algorithm can run normally. Prepare the following Dataset:

First, let's look at the code for the first step: the code for extracting image features:

Def _ img_sift_features (self, image): ''' extract the key points of the star feature in the image, and then use the sift Feature Extraction Tool to calculate the matrix of N rows and 128 columns, the number of Star features extracted in each graph is different, so n is different. However, after sift calculation, the feature dimensions are changed to 128 dimensions. Returns the '''keypoints = xfeatures2d matrix of the 128 columns of N rows. stardetector_create (). detect (image) gray = cv2.cvtcolor (image, cv2.color _ bgr2gray) _, feature_vectors = xfeatures2d. sift_create (). compute (Gray, keypoints) return feature_vectors

Then, we combine the 128-column feature set of N rows of all the images to form the 128-column feature of m rows and construct a clustering algorithm, this algorithm is used to map the model containing 32 Clustering Centers (visual words) and map these 128 columns of features to 32 Visual words (because kmeans here I use 32 clusters, therefore, 32 Visual words are obtained. The more complex the project is, the larger the value is, ranging from several hundred to several thousand .), Make a bag-of-words model by calculating the frequency of occurrence of each feature, as shown in the following code:

Def _ map_feature_to_cluster (self, img_path): ''' extract the star feature matrix (N rows and 128 columns) from a single image ), then, the feature matrix is mapped to k categories using the K-means clustering algorithm, and each row of features is mapped to a cluster to obtain the vectors of N cluster labels, count the number of feature vectors in each cluster, which is equivalent to the frequency of occurrence of a word in the bag of words. '''Img_feature_vectors = self. _ img_sift_features (self. _ get_image (img_path) # cluster_labels = self. cluster_model.predict (img_feature_vectors) # Calculate the category of these features in k clusters and obtain n numbers. Each number is one of the 0-31, which cluster the star feature belongs to? # eg [30 30 30 6 30 30 23 25 23 30 30 16 17 31 30 30 4 25] # count the number of features in each cluster. vector_nums = NP. zeros (self. clusters_num) #32 Elements for num in cluster_labels: vector_nums [num] + = 1 # normalize the number of features: Get the percentage instead of the number sum _ = sum (vector_nums) return [vector_nums/SUM _] If sum _> 0 else [vector_nums] # list of 32 columns in a row consisting of 32 Elements

The preceding figure only uses some images to obtain the cluster center, and does not use all images, because some images can represent all images.

Step 3: Obtain the visual code of multiple images and construct these visual codes into a matrix of 32 columns in row p.

Def _ calc_imgs_clusters (self, img_path_list, 32 is the number of clustering classes. Returns the matrix ''' img_paths = List (itertools. chain (* img_path_list) # expand the multi-layer list to code_books = [] [code_books.extend (self. _ map_feature_to_cluster (img_path) for img_path in img_paths] Return code_books

The complete code for preparing the dataset is long, as shown below:

# Prepare the dataset import cv2, itertools, pickle, osfrom cv2 import xfeatures2dfrom glob import globclass Dataset: def _ init _ (self, img_folder, cluster_model_path, img_ext = 'jpg ', max_samples = 12, clusters_num = 32): Self. img_folder = img_folder self. cluster_model_path = cluster_model_path self. img_ext = img_ext self. max_samples = max_samples self. clusters_num = clusters_num self. img_paths = self. _ get_img_paths () self. all_img_path S = [list (item. values () [0] For item in self. img_paths] self. cluster_model = self. _ load_cluster_model () def _ get_img_paths (Self): folders = glob (self. img_folder + '/*-*') # because the image folder name starts with a number + '-', you can use this to obtain img_paths = [] for folder in folders: class_label = folder. split ('\') [-1] img_paths.append ({class_label: glob (Folder + '/*. '+ self. img_ext)}) # every element is a dict. The key is the folder name, and the value is the list return img_path composed of the paths of all the images in the folder. S def _ get_image (self, img_path, new_size = 200): def resize_img (image, new_size ): ''' adjust the minimum value of the image length or width to new_size ''' H, W = image. shape [: 2] ratio = new_size/min (H, W) return cv2.resize (image, (INT (w * ratio), INT (H * Ratio ))) image = cv2.imread (img_path) return resize_img (image, new_size) def _ img_sift_features (self, image): ''' extracts the key points of Star features in an image, then, use the sift Feature Extraction Tool to calculate the matrix of N rows and 128 columns. The number of Star features extracted in each graph is different, so n is different. Then, all feature dimensions are changed to 128 dimensions. Returns the '''keypoints = xfeatures2d matrix of the 128 columns of N rows. stardetector_create (). detect (image) gray = cv2.cvtcolor (image, cv2.color _ bgr2gray) _, feature_vectors = xfeatures2d. sift_create (). compute (Gray, keypoints) return feature_vectors def _ calc_imgs_features (self, img_path_list): ''' gets the feature vectors of multiple images. These feature vectors are merged together, the final matrix consisting of 128 columns of m rows returns this matrix. m here is the sum of the number of feature vectors in each image, that is, N1 + N2 + N3 .... '''img_paths = List (itertools. chain (* img_path_list) # list multiple layers Expand feature_vectors = [] [feature_vectors.extend (self. _ img_sift_features (self. _ get_image (img_path) for img_path in img_paths] Return feature_vectors def _ create_save_cluster (Self): ''' because folders contain a large number of images, therefore, a small part (max_samples) is used for K-means clustering. ''' # Obtain the path cluster_img_paths = [list (item. values () [0] [: Self. max_samples] For item in self. img_paths] feature_vectors = self. _ calc_imgs_features (cluster_img_paths) cluster_model = kmeans (self. clusters_num, # create a clustering model n_init = 10, max_iter = 10, Tol = 1.0) cluster_model.fit (feature_vectors) # Train the clustering model # Save the clustering model, you do not need to train again later. With open (self. cluster_model_path, 'wb + ') as file: pickle. dump (cluster_model, file) print ('Cluster model is saved {}. '. format (self. cluster_model_path) return cluster_model def _ map_feature_to_cluster (self, img_path): ''' extract the star feature matrix from a single image (N rows and 128 columns ), then, the feature matrix is mapped to k categories using the K-means clustering algorithm, and each row of features is mapped to a cluster to obtain the vectors of N cluster labels, count the number of feature vectors in each cluster, which is equivalent to the frequency of occurrence of a word in the bag of words. '''Img_feature_vectors = self. _ img_sift_features (self. _ get_image (img_path) # cluster_labels = self. cluster_model.predict (img_feature_vectors) # Calculate the category of these features in k clusters and obtain n numbers. Each number is one of the 0-31, which cluster the star feature belongs to? # eg [30 30 30 6 30 30 23 25 23 30 30 16 17 31 30 30 4 25] # count the number of features in each cluster. vector_nums = NP. zeros (self. clusters_num) #32 Elements for num in cluster_labels: vector_nums [num] + = 1 # normalize the number of features: Get percentage instead of Count Sum _ = sum (vector_nums) return [vector_nums/SUM _] If sum _> 0 else [vector_nums] # a row of 32 columns, list def _ calc_imgs_clusters (self, img_path_list) composed of 32 elements: ''' to obtain the visual code book of multiple images, which forms a matrix of 32 columns of P rows, P is the number of images, and 32 is the number of clustering classes. Returns the matrix ''' img_paths = List (itertools. chain (* img_path_list) # expand the multi-layer list to code_books = [] [code_books.extend (self. _ map_feature_to_cluster (img_path) for img_path in img_paths] Return code_books def _ load_cluster_model (Self): ''' load the clustering model from cluster_model_path and return the model, if no cluster exists or an error occurs, call the function to prepare the cluster model ''' terter_model = none if OS. path. exists (self. cluster_model_path): Try: with open (self. cluster_model_path, 'rb') as F: cluster_model = pickle. load (f) doesn t: pass if cluster_model is none: Print ('no valid model found, start to prepare model... ') cluster_model = self. _ create_save_cluster () return cluster_model def get_img_code_book (self, img_path): ''' get the visual code book of a single image, that is, a list of 32 columns in a row, each element corresponds to the occurrence frequency of features '''return self. _ map_feature_to_cluster (img_path) def get_imgs_code_books (self, img_path_list): ''' gets the visual code of multiple images, that is, the list of 32 columns of row p, each element corresponds to the occurrence frequency of features '''return self. _ calc_imgs_clusters (img_path_list) def get_all_img_code_books (Self): ''' get the visual code book '''return self of all images in img_folder. _ calc_imgs_clusters (self. all_img_paths) def get_img_labels (Self): ''' get the labels of all images in img_folder. You can obtain ''' img_paths = List (itertools) from the folder name. chain (* self. all_img_paths) return [img_path.rpartition ('-') [0]. rpartition ('\') [2] for img_path in img_paths] def prepare_dataset (Self): ''' get the visual code and label of all images in img_folder, construct the dataset '''features = self. get_all_img_code_books () Labels = self. get_img_labels () return NP. C _ [features, labels]


2. Create a model using extreme random Forest

Extreme random forest is an upgraded version of the random forest algorithm. For more information, see my previous article [Fire ai] machine learning 007-using random forest to build a bike sharing demand prediction model. the method is almost the same as that of the random forest.

# Extreme random forest classifier from sklearn. ensemble import extratreesclassifierclass clf_model: def _ init _ (self, n_estimators = 100, max_depth = 16): Self. model = extratreesclassifier (n_estimators = n_estimators, max_depth = max_depth, random_state = 12) def fit (self, train_x, train_y): Self. model. FIT (train_x, train_y) def predict (self, newsample_x): return self. model. predict (newsample_x)

In fact, this classifier is very simple and there is no need to write it as a class.

Train the classifier:

dataset_df=pd.read_csv(‘./prepared_set.txt‘,index_col=[0])dataset_X,dataset_y=dataset_df.iloc[:,:-1].values,dataset_df.iloc[:,-1].valuesmodel=CLF_Model()model.fit(dataset_X,dataset_y)


3. Use the trained model to predict new samples

As shown in the following figure, I randomly tested three images and got better results.

# Use the trained model to predict the new image and see which type of new_img1 = 'e: \ pyprojects \ dataset \ fireai/test0.jpg 'IMG _ code_book = dataset. get_img_code_book (new_img1) predicted = model. predict (img_code_book) print (predicted) new_img2 = 'e: \ pyprojects \ dataset \ fireai/test1.jpg 'IMG _ code_book = dataset. get_img_code_book (new_img2) predicted = model. predict (img_code_book) print (predicted) new_img3 = 'e: \ pyprojects \ dataset \ fireai/test2.jpg 'IMG _ code_book = dataset. get_img_code_book (new_img3) predicted = model. predict (img_code_book) print (predicted)

------------------------------------- Enter --------------------------------

[0]
[1]
[2]

-------------------------------------

####################### Small ********** knot #### ###########################

1. The difficulty of this project lies in the understanding of the visual bag-of-words model and data set preparation. Therefore, I have written it into the form of a class, which has a certain degree of universality, it can be used for preparation of other project datasets.

2. From this project, we can see the advantages of the bag-of-vision model over the original Star features: if the original Star feature is used, a picture will get the number of features in N rows and 128 columns, using the bovw model, we map the feature data of N rows and 128 columns to the space of one row and 32 columns, which greatly reduces the number of features and simplifies the model, improved training and prediction efficiency.

3. Once you have prepared a dataset, you can use a variety of conventional machine learning classifiers for classification. You can also use various methods to evaluate the advantages and disadvantages of the classifier, such as performance reports, accuracy, and recall rate, this part is omitted because I have already mentioned it many times in the previous article.

######################################## #########################


Note: All the code has been uploaded (My GitHub.

References:

1. Typical Python machine learning instance, translated by Prateek Joshi, Tao Junjie, and Chen Xiaoli

[Ai refining] machine learning 051-bag of Vision Model + extreme random forest to build an image classifier

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.