First, Background introduction
Skin detection has been widely used in many computer applications such as face recognition and tracking, gesture recognition, image retrieval and classification. The above research topics must solve a fundamental problem, that is, the image is precisely divided into two areas of the skin and background, the accuracy of the division is directly affecting the precision and performance of the follow-up work, so skin detection has gradually become the above task of the first step and technical basis.
There are many related algorithms for skin detection, including histogram statistics, Gaussian mixture model, color-based skin detection, texture-based skin detection, skin detection based on multiple features, skin detection based on wavelet transform, differential-based skin detection, and the use of spatial diffusion methods. Among them, m.j.jones and J.m.rehg proposed a histogram statistical model method based on the color of skin pixels [1] with good detection performance, which is also the algorithm I used in this experiment.
The training data used in this experiment includes about 160 million pixels, which are collected from the Internet. Using this data, I built a statistical model of color histogram to differentiate between skin pixels and non-skin pixels. The final experimental results show the effectiveness of the proposed method.
Second, the construction of histogram model
Color space selection is an important part of skin detection, but the performance of different color space in skin detection is carried out under the specific skin distribution model, the general meaning of the optimal space does not exist. For the skin color and non-skin color distinction, RGB is the best space, also used in this experiment.
In addition to color space, the selection of the histogram model size is also worth considering. If the pixel uses a 24-bit three-dimensional color value, then R, G, b each component value corresponds to a binary 8-bit, in the built RGB histogram model each dimension will be divided into 2^8=256 unit value, then the histogram model would have a total of 256^3 handles. This size can lead to high resolution, as it maximizes the use of the pixel color value information, thus guaranteeing a distinction between its differences.
However, this approach will occupy a large amount of storage space, while the experimental data set may be difficult to achieve the necessary basic limits, because the experimental data set can provide relatively sparse pixel assembly relative to the 256^3 number of handles, which affects the experimental results. On the other hand, the cost of storage can be reduced by using fewer quantization levels without significantly reducing the resolution. According to the previous experience, in the 256,128,64,32,16 of these quantitative levels , the grade is a relatively good compromise point. In order to verify the influence of different histogram model sizes on the experimental results, this experiment first uses 256 as the quantization grade, and chooses other quantization grade to compare with it.
The image datasets needed to build the color histogram model in this experiment are mainly derived from randomly crawled web images. The entire image dataset is divided into two categories-the image class containing the human skin and an image class that does not contain any skin at all, each of which consists of about 500 images. After the image is collected, it is necessary to use the relevant image processing software to remove the tags, watermarks, etc. contained in these images to filter out the data sets that meet the expected requirements.
In order to ensure the universality and unbiased nature of data samples, the experimental data sets cover all kinds of people, such as male, female, old and young, and different scenes and backgrounds. The final data set consists of 500 skin image classes containing areas of the human skin and 500 non-skin images without any skin area.
Once the data has been collected, the model is trained. Based on the convenience, this experiment has made a uniform size adjustment to the image of the training data set: The uniform processing of the skin image is 240*320 (pixel) size, and the non-skin image is uniformly processed into the size of 600*400(pixel).
is an example of a skin image processing, preceded by a pre-processing image, followed by a manually extracted skin pixel and filled with the image:
after the image size uniformity and the extraction of the skin pixel region, the next step is to build a histogram model of the skin and non-skin categories. The construction of the histogram model is done separately in these two kinds of data sets. for the skin-like histogram model, the R, G, B value information of each pixel is read by row-by-column for each image of the skin class after processing. Each of these pixel points corresponds to a set of (R,G,B) three- dimensional color values. After reading all the pixels of an image in turn, you can get all the color value information for the skin pixels in the image.
After counting the color values of the pixel points in all the images, the color histogram model of the skin class is built. In a 256-magnitude model, there are a total number of 256^3 handles, each of which stores the number of times the corresponding color value of the handle appears in all of the above skin images, which is achieved through the preceding counting process.
In the case of non-skin models, the method of construction is quite similar.
Third, using the histogram model for skin detection
After the two types of histogram models have been built, the next step is to use a built-in model for skin detection. The detection process is divided into two steps: 1, probability calculation, 2, using Bayesian decision-making to discriminate skin pixels.
From "Count" to "probability":
When the two models are built, the correlation probabilities can be calculated by their color values for any given pixel. For any given pixel, if p (Rgb|skin) is specified as the probability that the color value corresponding to the pixel appears in the skin histogram, and the probability that p (Rgb|nonskin) is the corresponding color value of the pixel in the non-skin histogram, It is easy to get the following calculation formula:
P (Rgb|skin) = S[rgb]/Ts;
P (Rgb|nonskin) = N[rgb]/Tn;
where S[rgb] is the corresponding color value of the pixel in the skin histogram model of the corresponding color unit handle stored in the count and, and N[rgb] corresponding to the color value of the pixel in the non-skin histogram model corresponding to the color unit handle stored in the count and, Ts and Tn are the total number of pixels contained in the skin and non-skin histogram models.
Through the above formula we can get the probability that the color value of any pixel will appear in two kinds of color histogram model, that is, the conversion from count to probability is completed.
Bayesian decision:
The core step of skin detection is to calculate the probability that it belongs to the skin pixels after given any pixel, the process is realized by the correlation probability discrimination of color value, and this experiment uses Bayesian decision to discriminate.
In Bayesian decision-making, for any given pixel, it is necessary to specify beforehand the probability p (skin) of the cutaneous class and the probability P (nonskin) of the non-skin category of "transcendental". because the sum of the two is 1, you only need to specify one of the lines. For the prior probability of the skin class, a reasonable option is to set it to the ratio of the total number of all skin pixels in the histogram to the total number of pixels, since the non-skin histogram model is free of "skin" pixels, and the skin histogram model has only skin pixels, so we can use the following formula to calculate:
P (skin) = ts/(ts + Tn)
P (Nonskin) = 1-p (skin)
the meaning of Ts and Tn , as described above, is the total number of pixels contained in the skin and non-skin histogram models. after specifying a priori probability, we can calculate the probability size of a given pixel's color value in the skin class by using the following formula, based on Bayesian decision:
P (SKIN|RGB) = P (Rgb|skin) p (skin)/[P (Rgb|skin) p (skin) +p (Rgb|nonskin) p (Nonskin)]
After finding out whether any of the specified pixels belong to the size of the skin class probability, it is possible to determine if it belongs to the skin class in the following way:
1. Set the threshold x, the calculated probability is greater than x, it is determined to be skin-like pixels, and vice versa, it is classified as non-skin pixels;
2. Calculate the probability that any pixel color value belongs to a non-skin class:
P (NONSKIN|RGB) = P (Rgb|nonskin) p (nonskin)/[P (Rgb|skin) p (skin) +p (Rgb|nonskin) p (Nonskin)]
If there are:P (SKIN|RGB) > P (NONSKIN|RGB) for the specified pixels, it is judged to be a skin-like pixel, whereas the other is a non-skin-like pixel.
In the later experiments, the above two discriminant methods are tested and compared, and the merits and demerits are judged by comparing the positive and false detection rates. For the first method, a series of comparisons are made by changing the size of the threshold x to observe the effect of the change of threshold on the experimental results, and the detection index is also based on the positive and false rate of the test data set.
Iv. Experimental results and comparative analysis
Some of the experimental results show:
Test criteria:
For skin testing, the best way to detect its performance is to test its positive and false detection rates. The so-called positive rate refers to the number of pixels that are correctly classified as skin pixels, divided by the number of skin pixel points that the original image has. The error rate refers to the number of points that are ultimately classified as skin pixels and are not actually skin-like pixels divided by the number of non-skin pixels that the original image itself has. At the level of histogram , the training data set is the original data set, and the decision scheme one is used, and the threshold value x = 0.5, the final experimental results are as follows:
Different histogram magnitude:
As mentioned earlier, the selection of the histogram model size is another issue that needs to be considered in addition to the color space. In order to compare the effects of different histogram levels on the skin detection performance I tested 256, 128, 64, 32, five of the histogram magnitude corresponding to the positive and error detection rate, in order to reflect the results of the experiment and the relationship between the variables, I used in five test data set for the same set, This avoids the differences in the test set to interfere with the results of the experiment, while the rest of the settings are set in preliminary, that is, the decision scheme one and the threshold value x = 0.5, the final experimental results are shown in the following table:
From the experimental results we can see: for the positive detection rate, at first it increases with the histogram magnitude, then it reaches a maximum value at the quantization level, and then decreases as the histogram magnitude continues to decrement. For the error rate, at first it decreases with the histogram magnitude decreasing, the minimum value is reached at the histogram magnitude, and then increases sharply when the histogram magnitude is 8 .
Analysis of the above experimental data, can confirm the previous conclusions: Aside from the storage cost, only in terms of detection performance, in theory, the use of the magnitude of the size of the histogram will be able to bring the highest resolution, Because it maximizes the use of the pixel color value information to ensure that the difference between the degree of differentiation. But in reality we have to face the problem of insufficient training data: if the amount of training data is large enough, then the histogram model using the scale scale is the best choice, but if the training data is not enough, then relative to the large histogram space, The sparse distribution of data will obviously adversely affect the experimental results. The experimental results clearly reflect this, so in comparison, the tradeoff of the magnitude histogram performs better in terms of performance.
Below I will give the same image using different levels of the histogram model detection, in order to more directly reflect the experimental results.
Amount of different training data:
In addition to the scale of the histogram will affect the performance of the skin detection, training data volume changes will also bring differences in the detection results, when the training data volume decreased or two types of training data is severely unbalanced, at the same level and the same test set at the same time, the positive and error detection rate will change, in 256 at the same time, the specific experimental data obtained by the decision-making method under the histogram magnitude are described in the following table:
From the experimental results, we can clearly see that the change in the amount of training data has a great impact on the experimental results. With the decrease of the normal rate of the number of skin training images, the positive detection rate is almost reduced when the skin and non-skin image data are seriously unbalanced. This shows that it is extremely important to maintain a general balance between the two types of training data, because if one type of data is significantly larger than the other, the sensitivity of such recognition will naturally drop sharply, as it will not be trained for the purpose of training, because it has little training and less storage of information.
The results of this experiment show that maximizing the amount of training data and keeping the balance between the two types of training data as much as possible will play an important role in improving the performance of skin detection.
Two different ways to make decisions:
After the corresponding probabilities are calculated, as mentioned above there are several different ways to determine whether a pixel belongs to the skin category:
1. Set a threshold x, if the corresponding RGB color value is the probability of the skin class is greater than x , I will judge it as a skin-like pixel, and vice versa, it is classified as non-skin pixels;
2. In contrast, if the color value corresponding to the specified pixel has:P (SKIN|RGB) >p (Nonskin|rgb), it is judged to be a skin-like pixel, which in turn is judged to be a non-skin-like pixel.
In order to verify who is more effective in these two methods of decision-making, we have verified the experiment separately. For the decision mode one we set different thresholds x for the experimental analysis, the final experimental results are shown in the following table:
From the experimental data results we can see that if the decision-making scheme is adopted, the positive rate and the error rate will be decreased sharply with the increase of setting threshold, which is also in line with the expectation. Because higher thresholds mean that a pixel is judged to be the higher the threshold for skin pixels, the higher threshold will reduce the rate of false detection and the positive rate. In other words, when the threshold is increased, non-skin pixels that are mistakenly classified as skin will be significantly reduced, but on the other hand, many pixels that are originally skin-like are eventually misjudged as non-skin pixels because of the high threshold. Therefore, in the practical application, the choice should be based on the actual need to choose the appropriate threshold value.
Application of hand-type region extraction in color depth image
The experimental results are as follows:
1. Original Image:
2. Skin pixel recognition result (red marked):
3. Combine depth information to get the result of hand-shaped area:
4. The final result of two value processing:
Reference documents:
[1] Michael J. Jones, James M. rehg:statistical Color Models with application to Skin Detection. CRL 98/11 December 1998
[2] B. Jedynak, H. Zheng, M. daoudi:statistical models for skin detection. IEEE Workshop on Statistical-computer Vision, in conjunction with CVPR 2003 Madison,
Wisconsin, June 16–22, 2003.
Research and implementation of skin region detection algorithm in color image based on Bayesian decision