If you have two Black-and-white images with 32x32 resolution, what do you do to calculate the similarity between the two pictures?
According to this article, "Mathematical Beauty series 12-cosine theorem and news classification", we only need to calculate the two 1024-bit (32x32=1024) of the angle between the cosine can be, the result is closer to 1, the similarity is higher.
Well, there's a theoretical basis, and here's how to store our vectors.
Because there are only two colors in the picture, the 1-bit binary is enough to represent it. Then think of the white point as 0, and the black point is 1. In this way, each of our pictures can be placed in 32 32-bit integers, each of which is represented by an integer that saves space and reduces the complexity of the operation.
Next, how to calculate.
If you honestly follow the formula below, we need to take the value of the corresponding bit in the integer and multiply it or square it separately, which is obviously a waste of time.
Let's see if there are any easy ways.
Since we have only 0 or 12 values, the corresponding bits in the molecule are multiplied to be converted to the corresponding bit and are computed, and because we use an integer store, the computation is further reduced to two integer bitwise-BY.
And the denominator in the square and then add the sum, you can save the square of the operation, directly by the bit added.
Therefore, the process of the entire program can be performed as follows:
1. Store each picture in pixels into a 32-bit integer array of length 32, each of which holds one row, and each digit of the integer holds a pixel value (0 or 1);
2. When calculating the numerator, the integers in the two such arrays are bitwise-by-corresponding index, and then the computed results are added in bits;
3. When calculating the denominator, add all the integers in each array in bits, then open the square root and multiply at last;
4. The numerator is divided by the denominator to derive the cosine.
The code for the 2nd, 3, and 4 steps is as follows:
public double GetCosine(int[] e1, int[] e2)
{
int a = 0;//分母1
int b = 0;//分母2
int c = 0;//分子
for (int y = 0; y < 32; ++y)
{
//两个数组中的整数按位与
int i = e2[y] & e1[y];
//按位加
for (int x = 1; x < 33; ++x)
{
c += (i >> x) & 1;
a += (e2[y] >> x) & 1;
b += (e1[y] >> x) & 1;
}
}
//计算分母
int d = a * b;
return d == 0 ? 0 : c / Math.Sqrt(d);
}
If you see this here, you already know how we calculate the similarity of two pictures, in my experience, the result is more than 0.8, you can think of the two pictures the same.
If you change the image here to a captcha, and you happen to already know the value of one of the CAPTCHA codes, then you now know the value of another validation code.