Part V
Image feature extraction and description
29 Understanding Image Features
Target
This section I'll try to help you understand what image features are, why image features are important, why corner points are important, and so on.
29.1 Explanation
I'm sure most of you have played jigsaw puzzles. First you get a bunch of pieces of a picture, and all you have to do is put the pieces together in the right way to reconstruct the image. The question is, how do you do it? If you write the principle of the game as a computer program, the computer will also play jigsaw puzzles. If the computer can play puzzles, we can give the computer a lot of natural pictures, and then we can let the computer to make it into a large picture. If the computer can automatically mosaic natural pictures, then we can give the computer a lot of pictures of a building, and then let the computer to create a 3D model for us?
Problems and associations can be boundless. But all of these problems are based on a fundamental problem. The question is: how do we play jigsaw puzzles? How did we put a bunch of pieces together? How do we sometimes stitch a natural scene into a single image? The
answer is: We are looking for unique features that are suitable for tracking and easy to compare. If we are to define such a feature, though we know what it is, it is difficult to describe it in words. If you want to find a good feature that can be compared between different pictures, you can definitely handle it. That's why kids play puzzles, too. We search for such features in an image, we can find them, and we can find them in other images and then stitch them together. (In jigsaw puzzles, we focus more on the continuity between images). We are all born with these abilities.
So one of our problems is now expanding into a few, but more precise. What are these characteristics? (Our answer must also be understood by the computer).
Well, it's hard to say how people find these traits. These abilities have been carved into our brains. But if we look at some images in depth and search for different pattern, we'll find something interesting. As an example:
The image is simple. Six small plots are given above the image. All you have to do is find the location of these small figures in the original image. How many correct results can you find?
A and B are planar, and their images exist in many places. It's hard to find the exact location of these small plots.
C and D are simpler. They are the edges of the building. You can find their approximate location, but the exact location is still hard to find. This is because: along the edge, all the places are the same. So the edge is a better feature than the plane, but it's not good enough (find a continuous edge in the puzzle game).
Finally E and F are some corner points of the building. They can be easily found. Because in the corner point, no matter which direction you move the small map, the result will be very different. So you can think of them as a good feature. To get a better understanding of the concept, let's give a simpler example.
As shown, the area in the blue box is a plane that is difficult to find and track. Whether you move the blue box in that direction, the long is the same. For the area in the black box, it is an edge. If you move in a vertical direction, it will change. However, if you move horizontally, it will not change. The corner point in the red box, whether you move in that direction, results in a different result, which means it's unique.
So, basically, the corner point is a good image feature. (Not just corner points, some spots are also good image features).
Now we have finally answered the question, "What are these characteristics?" ”。 But the next question comes again. How do we find them? Or how do we find the corner point? We've also answered in an intuitive way, such as finding areas in the image, where you can move in a very large area, whether you want that direction. In the next section we will use computer language to implement this idea. So the technique of finding image features is called feature detection.
Now we've found the image feature (assuming you've done it). After finding these, you should also find the same features in other images. What should we do? We select an area around the feature and describe it in our own language, such as "Blue sky above, buildings below, lots of glass in the building", and you can search for the same area in other images. Basically, you are describing features. Similarly, the computer must describe the area around the feature so that it can find the same feature in other images. We refer to this description as a feature description. Once you have a description of the features, you can find the same feature in all the images and you can do whatever you want to do.
In this chapter we are going to use the various algorithms in OpenCV to find the features of the images, then describe them, match them, and so on.
More Resources
Practice
Harris Corner Point Detection
Goal
? Understand the concept of Harris corner point detection
? Learning function: Cv2.cornerharris (), Cv2.cornersubpix ()
Principle
In the previous section, we already know one of the features of the corner point: moving in any direction varies greatly. Chris_harris and Mike_stephens in the 1988 article "A Combinedcorner and Edge Detector" has put forward the focus of detection methods, known as Harris corner Point detection. He converted the simple idea into a mathematical form. Move the window in all directions (U,V) and then calculate the sum of all the differences. The expression is as follows:
A window function can be a normal rectangular window or a Gaussian window that gives different weights to each pixel.
The maximum value of E (μ,ν) is to be made in corner detection. This means that the value of the second item on the right side of the equation must be maximized. For the above equation, the Taylor series expands and then passes through a few steps of mathematical conversion (which can refer to other standard textbooks), we get the following equation:
which
Here I x and I y are the derivative of the image in the X and y directions. (You can use the function Cv2.) Sobel ()
Calculated).
And then there's the main part. They score according to an equation that determines whether a corner is included in the window.
which
? Λ1 and λ2 are eigenvalues of the matrix M so according to these features we can determine whether an area is a corner, a boundary, or a plane.
? When λ1 and λ2 are both hours, | r| Also small, this area is a flat area.
? When λ1? λ2 or λ1? Λ2, when R is less than 0, this area is the edge
? When both λ1 and λ2 are large and λ1 ~λ2, R is also large, (the minimum value in λ1 and λ2 is greater than the threshold), which indicates that the area is a corner point.
Can be used to express our conclusion:
So the result of the Harris corner point detection is a grayscale image composed of a corner point fraction. Selecting the appropriate threshold value to binary the resulting image we have detected the corner point in the image. We will use a simple picture to illustrate.
Detection of Harris corner point in 30.1 OpenCV
The function Cv2.cornerharris () in Open can be used to detect corner points. The parameters are as follows:
? IMG-an input image with a data type of float32.
? BlockSize-The size of the field to be considered in the corner detection.
? Ksize-sobel the window size used in the derivation
? K-harris the free parameter in the angular point detection equation, the value parameter is [0,04,0.06].
Examples are as follows:
ImportCv2ImportNumPy as Npfilename='chessboard.jpg'img=cv2.imread (filename) Gray=Cv2.cvtcolor (Img,cv2. Color_bgr2gray) Gray=Np.float32 (gray) DST= Cv2.cornerharris (gray,2,3,0.04)#result is dilated-marking the corners, not importantDST =cv2.dilate (Dst,none)#Threshold for a optimal value, it may vary depending on the image.Img[dst>0.01*dst.max ()]=[0,0,255]cv2.imshow ('DST', IMG)ifCv2.waitkey (0) & 0xff = = 27: Cv2.destroyallwindows ()
The results are as follows:
30.2 sub-pixel accuracy corner point
Sometimes we need the most accurate corner detection. OpenCV provides us with the function Cv2.cornersubpix (), which provides sub-pixel-level corner detection. Here is an example. First we need to find the Harris corner Point and then pass the center of gravity of the corner to this function for correction. The Harris corner point is marked with a red pixel, and the green pixel is the corrected pixel. In using this function we are going to define an iterative stop condition. Iterations stop when the number of iterations is reached or the precision condition is met. We also need to define the neighborhood size for the corner search.
ImportCv2ImportNumPy as Npfilename='chessboard2.jpg'img=cv2.imread (filename) Gray=Cv2.cvtcolor (Img,cv2. Color_bgr2gray)#Find Harris CornersGray =Np.float32 (gray) DST= Cv2.cornerharris (gray,2,3,0.04) DST=cv2.dilate (dst,none) ret, DST= Cv2.threshold (Dst,0.01*dst.max (), 255, 0) DST=np.uint8 (DST)#Find CentroidsRET, labels, stats, centroids =cv2.connectedcomponentswithstats (DST)#define the criteria to stop and refine the cornersCriteria = (CV2. Term_criteria_eps + CV2. Term_criteria_max_iter, 100, 0.001) Corners= Cv2.cornersubpix (Gray,np.float32 (Centroids), (5,5), ( -1,-1), criteria)#Now draw themres =Np.hstack ((centroids,corners)) Res=np.int0 (res) img[res[:,1],res[:,0]]=[0,0,255]img[res[:,3],res[:,2]] = [0,255, 0]cv2.imwrite ('Subpixel5.png', IMG)
The results are as follows, to make it easier to see the parts of our diagonal points:
Shi-tomasi Corner Point Detection & image features for tracking
Goal
In this section we will learn:
? Another point detection technique: Shi-tomasi Focus Detection
? Function: Cv2.goodfeaturetotrack ()
Principle
In the previous section we studied Harris corner point detection, and later in 1994, J.shi and C.tomasi in their article "Good_features_to_track" in the algorithm made a small change, and get better results. We know the scoring formula for Harris corner point detection is:
However, the scoring function used by Shi-tomasi is:
If the score exceeds the threshold, we consider it to be a corner point. We can draw it into the λ1 ~λ2 space and we get:
From this picture, we can see that only if λ1 and λ2 are greater than the minimum value, they are considered to be corner points (green areas).
31.1 Code
OpenCV provides the function: Cv2.goodfeaturestotrack (). This function can help us use the Shi-tomasi method to get the N best corners of the image (187 if you like.
Www.linuxidc.com
You can also use the Harris corner detection algorithm by changing the parameters. Typically, the input should be a grayscale image. Then determine the number of corner points you want to detect. Then set the quality level of the corner points between 0 and 1. It represents the lowest quality of the corner point, and all corners below this number are ignored. Finally, set the minimum Euclidean distance between two corner points.
Based on this information, the function can find the corner point on the image. All corner points below the quality level will be ignored. The qualifying corner points are then sorted in descending order by the corner quality. The function takes the corner point with the highest corner quality (the first one sorted), and then deletes the corner of its vicinity (within the minimum distance). Finally, the N best corners are returned in such a way.
In the following example, we try to find the 25 best corners:
ImportNumPy as NPImportCv2 fromMatplotlibImportPyplot as Pltimg= Cv2.imread ('simple.jpg') Gray=Cv2.cvtcolor (Img,cv2. Color_bgr2gray) Corners= Cv2.goodfeaturestotrack (gray,25,0.01,10) Corners=np.int0 (Corners) forIinchCorners:x,y=I.ravel () Cv2.circle (IMG, (x, y),3,255,-1) Plt.imshow (IMG), Plt.show ()
The results are as follows:
We'll find out later that this function is well suited for use in target tracking.
32 Introduction SIFT (Scale-invariant Feature trans-form)
target
? Learn the concept of SIFT algorithm
? Learn to find SIFT keys and descriptors in an image
principle
In the first two sections we have studied some corner detection techniques such as Harris. They have a rotational invariant, and even if the picture is rotated, we can find the same corner point. It is obvious that the corner point is the corner point even after the image has rotated. What if we zoom in on the image? The corner point may no longer be a corner point. For example, a small window in a small image can detect a corner point, but if the image is magnified, then the same window will be used to detect the corner point.
So in 2004, D.lowe proposed a new algorithm: Scale invariant feature transformation (SIFT), which can help us extract key points in the image and calculate their descriptors.
The SIFT algorithm mainly consists of four steps. We are going to study gradually.
Scale Space Extremum detection
From what we can clearly see, in different scale spaces, you cannot use the same window to detect extreme points. Use small windows for small corner points, and only large windows for large corner points. To achieve this we will use a scale-space filter. (A scale space filter can be composed of a Gaussian convolution kernel with different variance σ). Using the Laplace operator (log) with different variance values σ to convolution the image, log has different variance values σ so it can be used to detect different size spots (when the variance σ of log is equal to the spot diameter, it can make the spots completely smooth). In simple terms, the variance σ is a scale transformation factor. For example, a Gaussian convolution kernel with a small variance σ can be used to detect a small corner point very well, while a large angular point can be detected well by using the Gaussian convolution kernel of the generous difference σ. So we can detect local maximums in both the scale space and the two-dimensional plane, such as (x,y,σ), which means that the (x, y) point in the σ scale may be a key point. (The size of the Gaussian variance has a multiple relationship to the size of the window: the window size equals 6 times times the variance plus 1, so the size of the variance also determines the window size) but the log is computationally large, so the SIFT algorithm uses the Gaussian difference operator (DoG) to approximate the log. There is a need to explain a pyramid like this, and we can make a set of different pyramids of image size (1,0.5,0.25, etc.) by reducing sampling, such as odd rows or odd numbers, and then using Σ with different variances for each image in this set of images The Gaussian convolution kernel constructs an image pyramid with different resolutions (different scale spaces). DoG is the difference between the two adjacent layers in the image pyramid with different resolutions. As shown in the following:
After the DoG is done, it is possible to search for local maximum values in different scale spaces and 2D planes. For a pixel in an image, it needs to be compared to the 8 neighbors in its vicinity, as well as the adjacent (2x9) points in the upper and lower levels of the scale space. If it is a local maximum, it can be a key point. Basically, the key point is the best representation of the image in the corresponding scale space. As shown in the following:
The author of this algorithm gives the empirical value of the SIFT parameter in the article: octaves=4 (reduce the size of the image by reducing the sampling, and make up the image pyramid with reduced size (4 layers)? ), the scale space is 5, that is, each size uses 5 different variance of the Gaussian core for convolution, the initial variance is 1.6, and so on.
key points (extreme point) positioning
Once the key points are found, we need to revise them to get more accurate results.
The author uses the Taylor series expansion of the scale space to obtain the exact position of the extremum, which is ignored if the gray value of the extremum point is less than the threshold value (0.03). In OpenCV, this threshold is called contrastthreshold.
The DoG algorithm is very sensitive to the boundary, so we have to remove the boundary. The Harris algorithm we talked about above can be used to detect boundaries in addition to corner detection. The author is using the same idea. The author calculates the principal curvature using a 2x2 Hessian matrix. From the algorithm of Harris corner detection, we know that the boundary is detected when one of the eigenvalues is far greater than the value of another eigenvalue. So they used a simple function that would be ignored if the ratio was higher than the threshold value (called the boundary threshold in OpenCV). The boundary threshold value given in the article is 10.
So the key points of low contrast and the key points of the boundary will be removed, and the rest is the key point of interest.
Specify direction parameters for key points (extreme points)
Now we have to assign a reverse parameter to each key so that it will have a rotational invariance. Gets the neighborhood of the key point (the scale space), and then calculates the gradient level and direction of the area. A histogram of directions with 36 bins (one bin per 10 degrees) is created based on the calculated results. (Weights are made using round Gaussian windows and gradient levels with 1.5 times times the variance of the current scale space σ value). The peak of the histogram in the main direction parameter, if the height of any other column is higher than the peak of 80% is considered to be the secondary direction. This constructs the key points with different orientations in the same place in the same scale space. This is useful for matching stability.
Key-point descriptors
The new key descriptor was created. Select a 16x16 neighborhood around the key and divide it into 16 4x4 small squares, creating a directional histogram with 8 bins for each small square. Add up to a total of 128 bins. A vector that grows to 128 in this group forms the key descriptor. In addition, several measurements are carried out to achieve the stability of light changes, rotations, etc.
Match key points
The next step is to use the Euclidean distance of the key eigenvector to measure the similarity of the key points in two images. Take one of the key points of the first graph and traverse to find the closest key to the distance in the second image. But in some cases, the nearest key to the second distance is too close to the key closest to the first. This may be caused by noise and so on. The ratio of the closest distance to the second close is calculated at this time. If the ratio is greater than 0.8, it is ignored. This removes 90% of the error match, while only 5% of the correct match is removed. As the article says.
This is the summary of the SIFT algorithm. It is highly recommended that you read the original document, which will deepen your understanding of the algorithm. Please keep in mind that this algorithm is protected by patents. So this algorithm is included in the charge module in OpenCV.
The SIFT in OpenCV
Now let's take a look at the function of SIFT in OpenCV. Let's start with the detection and drawing of the key points. First we want to create the object. We can use different parameters, which is not required, and the explanation of the parameters can be viewed in the document.
Import Cv2 Import = cv2.imread ('home.jpg') Gray== = Sift.detect ( Gray,none) img=cv2.drawkeypoints (GRAY,KP) cv2.imwrite ('sift_keypoints.jpg ', IMG)
The function Sift.detect () can find key points in the image. If you want to search only one area of the image, you can also create a mask image to use as a parameter. The key returned is a special struct with many different attributes, including its coordinates (x, y), the meaningful neighborhood size, the angle of its orientation, and so on.
OpenCV also provides a function to draw key points: cv2.drawkeypoints (), which draws a small circle at the point of the key. If you set the parameter to CV2. Draw_matches_flags_draw_rich_keypoints,
The circle that represents the key size is plotted and can even be drawn in a direction other than the key point.
Img=cv2.drawkeypoints (gray,kp,flags=cv2. draw_matches_flags_draw_rich_keypoints) cv2.imwrite ('sift_keypoints.jpg', IMG)
The results are as follows:
Now to compute the key descriptor, OpenCV provides two methods.
1. Since we have found the key points, we can use the function Sift.compute () to calculate the descriptors for these keys. For example: Kp,des = Sift.compute (GRAY,KP).
2. If no key has been found, you can use the function Sift.detectandcompute () One step directly to find the key and calculate its descriptor.
Here we take a look at the second method:
Sift == Sift.detectandcompute (Gray,none)
Here KP is a list of key points. Des is an Numpy array whose size is the number of key points multiplied by 128.
So we get the key points and descriptors and so on. Now we want to see how to match the key points between the different images, which is what we will learn in the next chapters.
33 Introduction SURF (speeded-up robust Features)
Goal
In this section we will learn:
? What is the basis of SUFR?
? The SURF in OpenCV
Principle
In the previous section, we learned to use the SIFT algorithm to detect and describe key points. But the algorithm is slow to execute, and people need faster algorithms. In 2006 Bay,h.,tuytelaars,t and Van Gool,l jointly proposed the SURF (Accelerated robustness feature) algorithm. Like its name, this is an algorithm that is an accelerated version of the SIFT.
In SIFT, Lowe uses DoG to approximate the LoG when building scale space. The surf uses a box filter (Box_filter) to approximate the LoG. Shows this approximation. In the convolution calculation can take advantage of the integral image (integral image is a major feature: The calculation of a window in the image of all the pixels and the size of the calculation is independent of the window size), is a big advantage of the box filter. And this calculation can be carried out simultaneously in different scale space. The same SURF algorithm calculates the scale and position of the key points is also dependent on the determinant of the Hessian matrix.
In order to ensure that the feature vectors are not deformed, it is necessary to assign a main direction to each feature point. It is necessary to Harr the image in a circular region with a radius of 6s (s as the scale of the feature point), centered on the feature point. This is actually a gradient operation of the image, but the use of integral image, you can improve the efficiency of the calculation of image gradient, in order to find the main direction value, you need yo ah design a direction-centric, the angle of 60 degrees of the fan sliding window, with a step size of 0.2 radians around this sliding window, and the image inside the window Haar The response value of the wavelet is accumulated. The main direction is the direction of the largest Haar response accumulated value. In many applications there is no need to rotate the invariance, so there is no need to determine their direction, if not to calculate the direction, but also to speed up the algorithm. The SURF provides the ability to become U-surf, which has a faster speed while maintaining stability for +/-15 degrees of rotation. OpenCV supports both modes, only the parameter upright is set, when the upright is calculated in the direction of 0 o'clock, the direction is not calculated at 1, and the speed is faster.
Generating feature vectors for feature points requires calculating the Haar wavelet response of the image. In a rectangular area, with the feature point as the center, the 20s*20s image is divided into 4*4 sub-blocks in the main direction, each sub-block is calculated using the Haar wavelet template of size 2s, then the response values are counted and the vector is formed.
。 The length of this descriptor is 64. Reduction of
Dimensions can accelerate calculations and matching, but they can provide features that are more easily distinguishable.
To increase the uniqueness of the feature points, SURF also provides an enhanced 128-D feature descriptor. When D y is greater than 0 and less than 0 o'clock respectively to D x and |d x | And to calculate, calculate d y and |d y | , so that the features are doubled, without increasing the computational complexity.
For more distinctiveness, SURF feature descriptor have an extended dimension version. The sums of and is computed separately for and. Similarly, the sums of and is split up according to the sign of, thereby doubling the number of features.
OpenCV also provides this functionality when the parameter extended is set to 1 for 128 dimensions, when the parameter is 0 is 64 dimensions, and the default is 128 dimensions.
In the process of detecting feature points, the determinant of Hessian matrix is computed, meanwhile, the trace of Hessian matrix is computed and the trace of the matrix is the sum of diagonal elements.
According to the different brightness, the feature points can be divided into two, the first is the characteristics of the small neighborhood of the brightness than the background area to light, Hessian matrix Trace is positive, and the other one for the feature points the brightness of the small neighborhood is darker than the background area, and the Hessian matrix is negative. Based on this feature, the Hessian traces of two feature points are compared first. If the same number, the two feature points have the same contrast, if the difference, it means that the contrast between the two feature points, discard the feature points of the subsequent similarity measurement.
For the similarity measure of two feature point descriptors, we use Euclidean distance to calculate.
In short, the SURF algorithm uses a number of methods to optimize each step to improve speed.
The analysis shows that the speed of SURF is 3 times times that of SIFT when the results are equivalent. SURF is good at dealing with blurred and rotated images, but is not good at dealing with changes in perspective and care.
The SURF in 33.1 OpenCV
The
Same as SIFT OpenCV also provides related functions for SURF. First we want to initialize a SURF object, and set the optional parameters: 64/128-D descriptor, upright/normal mode, etc. All the details have been explained in the documentation. As we do in sift, we can use function Surf.detect (), Surf.compute () and so on to carry out key points and describe them. The
starts with the Find Description drawing key first. As with SIFT, our examples are shown in the Python terminal.
>>> img = cv2.imread ('fly.png', 0)# Create SURF object. You can specify params here or later. # Here I set Hessian Thresholdto >>> surf = Cv2. SURF (+)# Find keypoints and descriptors directly>>> kp, des = Surf.detectandcompute (img,none)>>>699
There are too many 699 key points displayed in an image. Let's cut it down to 50 and draw it to the picture. We may need all of these features when matching, but we don't need them now. So we're now raising the Hessian threshold.
# Check present Hessian threshold Print surf.hessianthreshold400.0# We Set it to some 50000. Remember, it's just for representing in the picture. # In actual cases, it's better to has a value 300-500>>> surf.hessianthreshold = 50000# Ag Ain Compute keypoints and check its number. >>> kp, des = surf.detectandcompute (img,none)print len (KP)47
Now below 50, draw them into the image.
>>> Img2 = cv2.drawkeypoints (Img,kp,none, (255,0,0), 4)>>> plt.imshow (IMG2), Plt.show ()
The results are as follows. You will find that SURF is much like a spot detector. It detects the day shift on the wings of butterflies. You can test it in other pictures.
Now let's try the U-surf, it doesn't detect the direction of the key points.
# Check upright flag, if it False, set it to True Print surf.uprightfalse>>> surf.upright = True# recompute the feature points and draw It>>> kp = surf.detect (img,none)>>> img2 = cv2.drawkeypoints (Img,kp,none, (255,0,0), 4 )>>> plt.imshow (IMG2), Plt.show ()
The results are as follows. All the key points are oriented in the same direction. It's a lot faster than the front. This approach is faster if your work has no special requirements for the orientation of the key points (such as Panorama stitching).
Finally, we look at the key descriptor size, if it is 64-dimensional change to 128-dimensional.
# Find Size of descriptor
# Find size of descriptor Print surf.descriptorsize ()# means flag, "extended" is False. >>> surf.extended False# So we do it to True to get 128-dim descriptors. >>> surf.extended = True>>> kp, des = surf.detectandcompute (img,none)Print surf.descriptorsize ()print des.shape (47, 128)
The next thing to do is match, and we'll discuss it later.
Image feature extraction and description in
[Opencv-python] OpenCV section V (i)