In the machine learning field, there are usually multiple models available for most common problems. Of course, each model has its own characteristics and may be affected by different factors and behave differently.
The quality of each model is determined by evaluating its performance on a certain dataset,This dataset is usually called a "verification/test" dataset.This performance is measured by different statistics, including accuracy, precision, and recall. We will select the corresponding statistics based on a specific application scenario. For each application, finding a metric that objectively compares the model quality is crucial.
In this article, we will discuss the most common measurement standards for Target Detection ---Average accuracy(Mean average precision, map ).
In most cases, these metrics are easy to understand and calculate. For example, the accuracy rate and recall rate are both simple and intuitive statistics for the second-class classification problem.
On the other hand, target detection is a relatively different and interesting problem.
Even if your target detector detects a cat in an image, it is useless if you cannot find the specific location of the cat in the image.
Since you need to predict whether or not a target appears in an image and its specific location, it is quite interesting to calculate this metric.
First, let's define the target detection problem so that we can have a unified understanding of the problem.
Shard target detection problems
For "target detection problems", I mean to give an image, locate all the targets in the image, locate them, and classify them.
The target detection model is usually trained on a given fixed category. Therefore, the model can only locate and classify these existing categories in the graph.
In addition, the target location is usually determined in the form of a boundary rectangle/boundary box.
Therefore, the target detection includes two tasks: determining the target location in the image and classifying the target.
Figure 1-several well-known image processing problems: slides from the cs231n course at Stanford University (Lecture 1)
The average precision mean map is the performance measurement standard for predicting the target location and category. Therefore, we can see from figure 1 that map is very useful for evaluating the target Locating model, target detection model, and instance segmentation model.
Butler Evaluation Model detection model
Why map?
In the target detection problem, each image may contain different targets of different categories. As mentioned above, the classification and positioning performance of the model must be evaluated.
Therefore, the accuracy rate, the standard evaluation measure used in this image classification problem, cannot be directly used here. Now, it's time for the average precision mean map to take effect. I hope that after reading this article, you can understand the meaning and meaning of map.
About reference standard (ground truth)
For any algorithm, the metric value is always calculated after comparing the predicted value with the reference standard information. We only know the reference standard information for training, verification, and test datasets.
In the target detection issue, the reference standard information includes the image, the category of the target in the image, and the real border boxes of each target.
We have given real images (JPG, PNG, and other formats) and other explanatory texts (coordinates of the border boxes (X, Y, width and height) and categories ), the red boxes and text labels on the images are just for us to watch.
Visualization of reference standard information
Therefore, for this particular example, what our model obtained during training is actually this image:
Actual Image
And the three sets define the reference standard numbers (let's assume that the resolution of this image is 1000x800 pixels, all the coordinate units in the table are pixels, and the coordinate value is estimated)
Let's take a look at how map is computed.
I will introduce various target detection algorithms in another article, including their methods and performance. Now, let's assume that we have a trained model on hand and we will evaluate its results on the validation dataset.
Worker compute Map
Let's assume that the original image and the explanatory text of the reference standard are described above. All images of training data and verification data are labeled in the same way.
The trained model returns many prediction results, but most of these prediction results have very low confidence scores. Therefore, we only need to consider the prediction results that exceed the confidence scores of a specific report.
We use the model to process the original image. The following is the result returned by the target detection model after the confidence threshold is set.
Images with border boxes:
Results from our model
We can say that these detection results are correct, but how can we quantify them?
First, we need to know the correctness of each detection result. The measurement that tells us the correctness of a given boundary box is intersection over Union, iou ). This is a simple visual volume.
In terms of words, some people may say that the name of this quantity has already explained its meaning, but we need a better explanation. I will briefly explain the meaning of iou. For those readers who want to have a deep understanding of the meaning of IOU, there is a very well-written article by Adrian rosebrock, which can be used as a supplement to this content.
Https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection)
IOU
The intersection ratio is the ratio between the intersection and Union of the prediction boundary box and the reference boundary box. This statistic, also called the jaccard index, was first proposed by Paul jaccard in the early 20th century.
To obtain the intersection and Union values, we first overwrite the prediction boundary box on the reference boundary box. ()
For each category, the overlapping part of the prediction boundary box and the reference boundary box is called intersection, and all the areas that the two boundary boxes span are called Union sets.
We only use this horse as an example.
The intersection and Union areas of the horses in the class look like this:
In this example, the intersection area is quite large.
The intersection covers the overlapping area of the border box (blue-green area), and the Union covers all areas of orange and blue-green.
Then, iou can calculate as follows:
Identifying correct detection results and Calculating Accuracy
With IOU, we need to identify whether the detection results are correct. The most common threshold value is 0.5: If IOU> 0.5, this is considered a correct check; otherwise, this is considered an error check.
Now we calculate the IOU value for each detection box generated by the model (after the confidence threshold ). Using this IOU value and our IOU threshold (for example, 0.5), we calculate the number of correct checks (a) for each class in the image ).
Now we have reference data for each image, which tells us the actual number of targets for a specific category in the image (B ). And we have calculated the number of correct predictions (A) (true positives ). Therefore, we can use this formula to calculate the accuracy (A/B) of this class ).
Accuracy of Category C in a given image = number of real classes of Category C in the image/number of all targets of Category C in the image
For a given category, let's calculate its accuracy for each image in the verification set. Suppose we have 100 images in our verification set, and we know that each image contains all the categories (according to the reference standards ). In this way, we have 100 precision rate values for each category (one value for each image ). Let's average the 100 values. The average value is called the average precision of this class ).
Average precision of a category (c) = total precision of the class (c) in the validation set and/number of images containing the category (c) Target
Now, suppose there are 20 categories in our entire dataset. For each category, we perform the same operation: Calculate IOU-> precision-> average precision ). So we have 20 different average Precision values. With these average Precision values, we can easily determine the performance of our model for any given category.
To use only one number to represent the performance of a model (solving all problems with one measurement), we calculate the mean value (average/mean) for the average Precision values of all classes ). This new value is our mean precision map (mean average precision )!! (I have to say that this name is very creative)
Average precision mean = sum of average Precision values of all classes/number of all classes
Therefore, the average accuracy means the average accuracy of all categories in the dataset.
When comparing map values, remember the following important points:
-
Map is not an absolute measurement output by the quantization model, but it is a good relative measurement. When we calculate this metric on a popular public dataset, it can be easily used to compare the performance of the new and old methods for target detection. Therefore, we do not need an absolute measurement.
-
The average accuracy value may be very high for certain categories (which have good training data), depending on the distribution of different categories in the training data, then, some categories (which have less data or bad data) may be very low. Therefore, your map value may look good, but your model may only be good for some categories, but the effect on some categories is very poor. Therefore, when analyzing the results of your model, it is best to calculate the average accuracy of individual classes. If these values are too low, you may need to add more training samples.
Average precision mean (MAP)-Performance Statistics of the Target Detection Model