[to] take pictures how to search questions? On

Source: Internet
Author: User
Tags scale image

How do I search for pictures? On

Original address: http://mp.weixin.qq.com/s?__biz=MzA3MDExNzcyNA==&mid=402907292&idx=1&sn= 889c4abcf576e24525ea6a705069c4de

2016-04-04 simple old king Simplemain

/*
* Write in the previous two sentences:
* 1, Lao Wang finished the previous article, "What is HTTPS?" After that, I would like to write a few technical dry goods, and then have the next article.
* 2, if you want to look at an article, as well as the future of N, you can pay attention to:Simplemain
*/

A few days ago a few photos of the software is very popular (such as: small ape search, homework help, genius June, etc.), mobile phone take a picture, you can put the answers to the questions found out, do not have to go Baidu hand knocking. This coke is bad for the students, but they don't know what their parents are feeling.

Because of the programmer's professional curiosity, but also to assess the difficulty and cost of doing this thing, Lao Wang spent two weeks to do a simple study and wrote a demo program, here to share to everyone (note: Because the study time is not long, if there is not the correct place to ask experts to correct ~)

The technical point of photo search is mainly composed of image recognition and content search, the text or graphics in the photo image can be identified, and then handed to the retrieval system to search the existing topics quickly to find the most similar topics. As a result of prior to the search technology has a certain understanding (after all, after a few years in The Wolf Factory, ^_^), so this time focused on the image of the recognition of text. At the same time, just research nature, so in order to reduce the difficulty, I only did the English recognition.

Let's show you the final results. Here are some of the English-language articles taken with your phone:

After the final recognition, no word correction work has been done, the effect is as follows:
=======================================
Three months after [he government stopped issulng (&%) or renewing permits for in[ernet cafes because of security (% #) Concerns, some cafe owners is having Flnanclal (ff%%) concerns of their own.
=======================================

Later, my colleague TT and YX pity me, help me to do two related word matching algorithm, the word correction, so that some of the recognition of the words are corrected, such as: ISSULNG-issuing.

Okay, let's get to the chase.


= = = = = Processing Process


The whole process is broadly divided into two stages:
1, the image processing. is to clean up the photos we took (a bit like washing clothes), and then slice the characters in the image to prepare for character recognition. It is roughly divided into several processes, namely:
A, grayscale processing: The color picture into a grayscale image
B, binary: Turn the grayscale image into black and white
C, de-noising: Eliminate the noise on the black-and-white image, RID looks cleaner.
D, rotate: Rotate the picture clockwise and counterclockwise to find an optimal horizontal position
E, horizontal cutting: To adjust the horizontal position of the picture for a line of cutting
F, Vertical cutting: a row of pictures in a row of a column of cutting, the output of a single character.

2, the image recognition. is to distinguish the picture of a single character above, to see what character he is.

Are you feeling a little dizzy? Yes, it's me and I'm dizzy, hahaha ~
Do you remember our idea? The complex problem, simple to speak clearly!
Come on, Lao Wang will not dry talk about the boring theory, see how to deal with the actual.


= = = Image Processing = = =

The first step, we take pictures to get the original picture.

Look at the naked eye, do you think this picture is black and white?
Answer "Yes" classmate, I tell you responsibly, your eyes deceive you. can use the image processing software to look at each pixel, they are actually color!!!
What's wrong with color? For the human eye, of course not. But for our program, there is a problem. He doesn't know which color is the valid information. Therefore, our next work, is to reduce the dimension of information, the RGB (red and green blue) of the 256*256*256 color values reduced to 2 kinds, namely: White + black. In this way, our program can easily judge the validity of the information.
OK, to achieve white + black, but also scored two steps: First Gray, then black and white.

So, Lao Wang will ask the question:

Q: What is black?
Answer: R = 0, G = 0, B = 0, which is the #000000 in CSS

Q: What is white?
Answer: R = 255, G = 255, B = 255, or #ffffff

Q: What is gray?
Answer: Eh ...

Lao Wang does not know the definite definition, but Lao Wang knows yes: R = G = B. That is, the color of red-green-blue is equal. For example, we often set the value in CSS: #e0e0e0 #9c9c9c等等.

The second step is to turn the original image taken above into a grayscale image.


Compare carefully with the picture above, what is different? It's time to test the pixel eye! If the eyes are still not visible, I will make a contrast chart for you to see:


See the difference? If you don't see it yet, use the image processing software to see each pixel in the upper and lower part of the graph.

So, how does the color map change to the grayscale map?
We know that the color values of each pixel can be represented by RGB primary colors, such as: Red =rgb (255, 0, 0), Yellow =rgb (255, 255, 0). If we set each pixel to the same, he becomes gray, for example: Color-pixel (x, y) = RGB (R, g, B) = RGB (T, T, T), where T = R * k + G * p + b * Q and k + p + q = 1, in this way, you can turn the color gray.

Here K, p, q the value of a lot of many, generally said to take 0.11, 0.59, 0.3 or 1/3, 1/3, 1/3. As for why, I did not go to the bottom of the ~
In my algorithm, 1/3, 1/3, 1/3 are taken.

Well, with the grey figure, how do we turn him black and white? (The saying of B: two value)
Algorithms have many, many, many, many ...

The third step is the two value of the grayscale graph.

I downloaded a software that lists n two-valued methods


Each method has its pros and cons, and there is no perfect solution. I did it myself. Some algorithms, such as gray average value, percentage threshold and Shuangfeng trough minimum, were selected by Shuangfeng trough Minimum value method according to my experimental results.

How does this algorithm work? Wait for me to come slowly.
We've already got the grayscale figure, his RGB value: T=r=g=b. This allows us to combine the RGB (R,G,B) values with a T representation. The final simplification is to use 1 byte (8bits) to represent the value of each pixel, the value of each pixel will fall on [0, 255] Such a closed interval, if we use the 16 binary representation, is [XX, FF] such an interval. If you magnify a 10*4 pixel image with a magnifying glass, it will look like this:

00 00 00 00 00 00 00 00 00 00
EE xx ee dd 4f 29 30 00 00 00
FF-EE 40 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00

Well, let's see, a value of 0 is the number C (0) = 29, C (EE) =3, C (DD) =1, C (4f) =1, C (=1) =1, C (=1), C (32) ...
So, what happens if we count the number of 0~255 per value of the image we've been experimenting with?


Using the above method, I put the gray-scale image of the previous processing of the grayscale value on the two-dimensional plane, the x-axis represents the 0~255 color value, the y-axis represents the number of each color value. Looking carefully, it is easy to see that there are two peaks. These two peaks are the most important two color values on this grayscale image: The foreground and background colors.

Generally speaking, in the case of the text photo, we will think that the background color will be much more than the front color, so the higher the crest, we think is the background color, and the lower one, we think it is the foreground. In many cases of non-text, there may be multiple peaks, at which point the foreground color may be multiple tones, and this case is not discussed here first.

Well, with such a numerical projection, we can identify both the foreground and the background color, and it's good to recognize what is text and which is the context, right. Thus, we only have to mark the foreground as 1 and the background flag as 0. We can turn a color map of w*h pixels into a w*h bitmap. It's a lot easier for us to do algorithms on this bitmap. The next task is how to use algorithms to find the foreground and background colors.

It is easy to see from the naked eye that there are two peaks near the two points of x=18 and x=135, which correspond to our foreground and background. However, it's not that easy for our program. We need to use the algorithm to deal with it, so that the program can be easily identified.

How do we judge a point by a program as a crest? The two points next to him are smaller than he is, right. The next two points are smaller than him, he must be the crest? It's obviously not. For example, above the chart, a lot of burr points, they are not peaks, but larger than the surrounding points. Is there a better algorithm to quickly find the crest?

One of the simple but effective methods is iterative smoothing: each point a[i] = (A[i-1] + a[i] + a[i+1])/3 (of course, the two points are handled separately), so repeatedly, to find only two points (if more than one continuous point value is the same, then as a point) than the next point is larger, or up to K times (say 100 times).

Let's think about this logic, we're constantly correcting a point with the value of the next point, and after a few more times, the value of this point will approach around two points. If it is an abrupt point, such as a high, and many times after the average, must and around two points of the value of the equivalent, the limit is that these points are equal, right.


This is the point of our iterative smoothing, it looks like silky smooth, hahaha.
Of course, if we do not find such two points after the iteration of K, we can only do a compatibility plan: first find the highest point, and then find the distance from the point around the P points of the second high. We think that these two points are the foreground and the background. Of course, this does not prove them to be, just think they are most likely to be. This is not absolutely good, can only be a technical compromise. This is the time to do experimental tuning parameters.

Well, with the crest, we always have to make a distinction between the former background, that is, from the point of separation, close to the background of the point that is the background, close to the foreground of the point, think is the future. At this time, we chose two peaks in the middle of the trough. (There are also algorithms that choose the average between them). Specifically to this case, we can easily choose to be near Point 62nd.


This way, points less than 62 are foreground colors and the rest are background colors. In other words, the grayscale image, where the value of gray=r=g=b is less than 62, is the foreground color, and we identify them as 1, which is considered black. The rest is 0 and is considered white. Let's look at the effect:


What, is it refreshing to feel a bit?
In this way, we will turn a color map into a black and white image.
Now, our grayscale pixel chart has turned out to be a similar effect:

0 0 0 0 0 0 0 0 0 0
1 0 1 1 1 1 1 0 0 0
1 0 1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0

is completely familiar with the binary bitmap, professional commonly known as bitmap. haha haha ~

OK, the binary is done, next, we need to do is to the black and white pictures on some of the interference points to optimize, remove these interference points (commonly known as: noise). This process is also called de-noising.

fourth step, go to the noise point.
Why do we have to get rid of the noise? We have enlarged the graph of the previous binary values:

We can look at the red shears, there are a lot of sporadic black spots. Just like the appearance of a small black klonopin on the face of beauty, very affect our pursuit of beauty, will greatly interfere with our program for the cutting and recognition of the picture. So we're going to do everything we can to get rid of these dark spots!!

There are a lot of common ways to de-noising.
The simplest is the DFS or BFS (depth and breadth search) that everyone learns from the algorithm data structure. Our bitmap for w*h is searched for all connected regions (value 1, we look black, the area is joined). All Unicom areas are an average pixel value, and if the pixel values of some of the unicom regions are much lower than the average, we think it is noise. Then replace him with 0.

There are some advanced algorithms, such as Gaussian denoising, salt and pepper denoising, using the average of 8 points at the top and bottom of each point or the comparison value after summing to replace this point, multiple iterations to the noise. Detailed algorithms can look at related papers, nor is it particularly complex. In view of space, this is only mentioned here.

If you do too much noise, it is easy to accidentally hurt, the normal point when the noise is removed (so-called corrosion), so pay attention to achieve a balance.

In my algorithm, the above methods have done experiments, and finally chose the method of BFs denoising. Let's look at the results as follows:


The noise above is basically cleared out. Just like the beauty of whitening skin care products like, cool ~

Well, for the cleaning process of the image itself, we basically do almost.
Next, we're ready to start cutting the image. Wash the Ducks, finally to serve the plate ~

Fifth step, rotate leveling.
For the user, the picture can not be absolute level, so, we need to use the program to do the image rotation processing, to find a position that the most likely level, so cut out of the diagram, it is possible to be the best one effect.

So how do we evaluate the angle of rotation is a good effect?

We assume that an angle is adjusted to alpha, if all points are projected to the left, and if the row is 0, the count is added one. In this way, after we accumulate the statistics, we find the angle that is the largest after adjusting the angle. To put it bluntly, that is, to adjust the angle alpha, if the width of the empty line is more obvious, this adjustment is good.

However, because the rotation operation is doing coordinate transformation, is an O (n^2) algorithm, so the cost is very large, our adjustment is best in a small range of adjustments, such as: -5°~ +5° between, according to 0.1° for the smallest unit to do the adjustment. This only takes 100 times and we can find a relatively satisfactory value.

The specific rotation of the algorithm, you can use the coordinate transformation to achieve, can also be used in various languages provided by the library function to do. I lazy, with the Java System library function, hehe ~

Let's look at the effects we've adjusted:

We put a contrast chart, because the adjustment angle is relatively small, all must look carefully oh ~


Well, the level of the image adjustment, we can cut the image. So next, we're going to cut the image horizontally and then cut it vertically for each line. The final output is a character.

Sixth step, horizontal cutting.
If you have written a diary, it is very clear that the diary of the time, the diary of every line has a horizontal level, to ensure that we write the levels of the word, right ~ now we are going along each horizontal line, our words with scissors cut into one line. This is horizontal cutting.

Or the usual, good, first look at the curative effect:

This is the effect of my program, on the level of two value, de-noising and rotation leveling after the image is done. A solid line was drawn on the top edge of the line, and a dashed border was drawn on the bottom edge. On the program, the corresponding line number is marked as the beginning and end of a horizontal chunk.

How is that concrete? Come on, Lao Wang will take you down.

First, let's repeat the data structure we have now, a w*h bitmap that looks like this:

0 0 0 0 0 0 0 0 0 0
1 0 1 1 1 1 1 0 0 0
1 0 1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0

Next, we cast the bitmap of our processed black and white picture onto the y-axis, adding up all the values so that we can get a curve (or histogram) on the y-axis.


Everyone should feel it, there is no big row of places, is the foreground color of the place, the value of small places, is basically the background of the scenery. All right, that's it. We follow the previous plan, first do the smoothing, and then find the trough, from the trough to slice.


The above is smooth after the effect, is not looking at a lot of cool ~ ^_^

Then we cut the trough by the way we did before, so we get a piece of it (in view of the space, I don't draw any more). Although with this piece of piece, but this piece of stuff also contains most of the background, we need to compress them. The upper and lower edge lines of each block are glued to our text.

What the hell? It's very simple, actually. We take the projected value of each piece for an average, and if the value of a point is much lower than the average, it is considered a background. Here is a parameter to adjust. I'll take the following values here:
Final double THRESHOLD = 0.5;
Final double avg = math.min ((total/lasttroughids.length) * THRESHOLD, 10);

As long as the projected value is less than the AVG, we take him as the background. In this way, we remove each piece of background line.

Is that the end of it? No no no ...

There is a case like this: I jump
As can be seen, I j particularly many of this, it is possible that the above point is separated into a single line, right. We need to merge the smaller chunks, and then let them form a whole.

After doing some of this work, basically the line has been cut off (in fact, there are a lot of details to do, here will not repeat.) If interested, I can later open source My Code ~)

After the horizontal incision, we finished 90% of the image processing, and finally the vertical segmentation, cut them into a separate block of characters.

seventh step, vertical cutting.
Immediately above, we will get the horizontal cutting line for the vertical cutting process. First look at the effect:

What's the difference between a vertical cut and a horizontal cut? A bit different: the same line of two characters tend to be tight, sometimes vertical overlap (for example: some fonts will overlap), projection is not good to cut, resulting in cutting errors. That's the point, making our vertical cutting harder than cutting it horizontally.

So I did some special work when I was dealing with it. Old Mr. Wang calculated that the average width of all the characters in this line, according to the truth, this line of characters should be almost wide (there are some exceptions, but according to the 2-8 law, we handle the majority of the situation, to facilitate our simplification of the complexity of the problem). In this way, if we stick to a range of the average width, we can basically cut them off in a weak place.

So how do these special jobs work? Don't worry, wait for the old king to come slowly.

First, we can see that in most cases, the characters are basically independent. So we first connect the points that can be connected to form a separate block of characters. If even there is adhesion, it does not matter, we stay behind to deal with. Requires a bitmap to connect chunks, there are many algorithms, we are here to use the BFS again (these classical algorithms are really tried), the graph is connected. Like what:

This figure uses BFS to run out two large connected chunks, the left R section and the right E part.

Is it good to finish this step? Of course not. Some characters also have to do the merging of upper and lower connected areas. Like what:

You can see the characters like I, J, they are divided into two parts, if we want to identify them, we must not be able to split them apart. And the BFS algorithm has no way to connect them together, how to do?
We projected all the blocks produced by the BFS onto the x-axis, and if there were more overlapping overlays, we would merge them into a new merged chunk and think that they were connected.

With such a block, we are well disposed of. Next, we calculate the average width of the blocks. If some chunks are extremely wide, such as exceeding the average width by 2-3 times, we think it may be two or more characters sticking together (for example, some of the two words of the RM of the font can easily be photographed to form the adhesion), it is possible to cut them in the weakest place, forming two or more separate blocks While there are some contiguous chunks, each with a very small width (only a fraction of the average width), and their intervals are particularly small (one of the average intervals), we think it might be that they were originally a character, but were separated during the process of denoising (such as the frequent reading of H and N characters, The one in the middle is particularly thin, and when the noise is removed when the film is not very clear, we merge the blocks into a block.

After slicing & merging, one of our separate chunks is the character pattern we want to identify. So much work was done in front of them to get them.

What do you say, Lao Wang is clear?
The next task is to identify the individual blocks of the pattern.

Well, the upper part is already fast 7,000 words, the old Wang has been a few late night. If you want to continue reading the lower part, please finish the following code:

void Next ()
{
You can pay attention to Lao Wang's public number:Simplemain

Through this public number can pay attention to Lao Wang recently wrote technical dry, can also ask questions to discuss together.
Do not want to type pot friends, you can long click Two-dimensional code attention HA ~

}

[to] take pictures how to search questions? On

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.