10 rows Code Determine the color * sentiment * image-python can also be Series 2
Author: Lai Yonghao (http://blog.csdn.net/lanphaday)
To edit: I have already added a mosaic to the image. Don't delete it. I am a pure technical post!
Disclaimer: due to the needs of scientific research, this article contains some color * sentiment * images and cannot be used as evidence of spreading color * sentiment * information.
Today first in the csdn saw "College Students' invention based on skin ratio filter color * Love * picture software" (http://news.csdn.net/n/20081028/120298.html), and later found that this news has been on the Netease news channel (http://news.163.com/08/1028/05/4PAORMQB00011229.html ), it was amazing.
Pictures from Netease news
According to the author's statement: "the principle of this software is to calculate the area ratio and specific distribution of the image area of the face and limbs to the entire skin area, to determine whether the website contains any colored image." I guess he used a common skin color model to detect and collect statistics on the pixels of the image. At most, he added some color block distribution and shape data for some conditional filtering. So I wrote a code segment to analyze the image and remove non-skin pixels from the image. The effect is as follows (Declaration:ProgramThe calculated data is calculated based on images that are not mosaic. Because the data is required by the csdn blog, the data is mosaic and then published ):
X
The corresponding image processing result is:
It can be seen that simple skin color models have been able to work well.
Next we can write the code for calculating the pixel color. It is very short and there are only 10 lines. It fully reflects the strength of the Python language and the research of Li Jimin is not in-depth:
Import sys, imageimg = image. open (sys. argv [1]). convert ('ycbcr ') W, H = IMG. sizedata = IMG. getdata () CNT = 0for I, YCbCr in enumerate (data): Y, CB, Cr = YCbCr if 86 <= CB <= 117 and 140 <= CR <= 168: CNT + = 1 print '% S % s a porn image. '% (sys. argv [1], 'is' If CNT> W * H * 0.3 else 'is not ')
The following code is briefly explained:
1) image is a PIL library, I have written an article "using python for image processing" (http://blog.csdn.net/lanphaday/archive/2007/10/28/1852726.aspx), the basic usage can refer to this article.
2) IMG = image. open (FN ). convert ('ycbcr '). This line opens the file name passed in from the command line and converts it to The YCbCr color space. For more information about YCbCr, see http://baike.baidu.com/view/564370.htm.
3) Data = IMG. getdata (), which is used to obtain image data for quick pixel operations
4) if 86 <= CB <= 117 and 140 <= CR <= 168:, this sentence is the most important and the essence of this article. According to The YCbCr skin color model, many papers recommend 86 <= CB <= 127,130 <= CR <168. However, this value is not good, so I changed the CB upper limit to 117, the lower limit of Cr is changed to 140, and the white and black parts are filtered out.
Finally, the execution result of this program is as follows:
E:/> C:/python25/Python test_skin.py 114.20.114.jpeg is a porn image.
114.jpeg is the third image in the above example.
To sum up, Li Jimin, a senior student of Chongqing University of Posts and Telecommunications, is only using a very mature theory (skin color detection is the basic knowledge of computer vision such as face recognition ), I wrote a little bit of code (maybe he wrote more code in C ++ than I wrote in Python, but at most three or two hundred lines), and there was no substantial scientific research breakthrough, the product is not mature enough (in other words, it cannot be identified by * Kimy nimei). Reporters and websites promote him to kill him.
In addition, many csdn users are curious about his "it is difficult to delete embedded browsers" statement. here, by the way, Li should use Browser Helper Object, it is also called the BHO technology to accomplish this. This is a very simple technology. Check msdn and use VC/Vb/C # To easily write it out. Of course, it can also be easily deleted. Haha.