Detection and Identification of multiple objects in complex scenarios

Source: Internet
Author: User
Detection and Identification of multiple objects in complex scenarios
13:37:44 [text font:Large Medium Small] Print favorites to close

In image tracking systems, Real-Time Detection and Recognition of image targets is crucial. In many application scenarios, scenarios and areas are complex. For example, in intelligent monitoring systems, computer vision, and augmented reality applications, it is often necessary to identify and track multiple target objects, and must meet certain real-time requirements. 3D sensors (electromagnetic sensors or ultrasonic sensors) are used to identify and track target objects. Due to the influence of distance and electromagnetic interference, the recognition and tracking effects are often not ideal. Using Computer Vision principles to detect and recognize target objects is a technology that has emerged in recent years. For example, Bajura uses light emitting diodes as the identifier to identify the target object of the Ar system. Uenohara, M. and kanade, t uses computer vision technology to overlay real images on video images to register the target object.

In this paper, a Real-Time Detection and Recognition Method for Multi-target objects in complex scenarios is proposed based on the computer vision principle. The matrix code is used to identify different objects in different scenes. the outlines and corner points of the possible identification areas in the pre-processed images are extracted, and then normalized (perspective transformation ); finally, the template matching of the normalized identifier is used to identify multiple objects in real time. The experimental results show that the method uses the algorithm for extracting corner points by distance and the improved template matching method, real-time and stable identification of multi-object identifiers with rotation, proportional variation, and deformation.

I. Design of Two-dimensional matrix Codes

To detect and recognize multiple target objects in the same image, the shape of the matrix code of the identifier must meet two conditions:. all matrix codes that identify the target object can be different from those of background interference objects. B. multiple object identifiers in the same scenario can be different from each other. Based on the above two features, the two-dimensional matrix logo image shown in 1 is used. The black rectangle background of the logo distinguishes scenes from the logo image, and detects and recognizes different target objects by changing the shape of the white pixel area in the black background. Different Logo images are pasted on different objects in the same scenario. Using computer image processing and analysis, multiple logo images are extracted to quickly identify multiple target objects in the scene image.

Figure 1 two-dimensional matrix code

Ii. Extract candidate regions

Given a given original image containing one or more target objects, the detection and recognition process of the target object consists of three main stages: Image Segmentation-Feature Extraction-object recognition; the Detection and Recognition Process of the target object is as follows:

Figure 2 Target Detection and Recognition Flowchart

(1) Image Segmentation

Figure 3 single-target Original Image 4 Multi-target Original Image

The purpose of image segmentation is to extract the target area from the original image for analysis. The target region can be divided by amplitude, edge, shape, and other conditions based on the characteristics of the image. In common amplitude segmentation algorithms, determining the segmentation threshold is a key issue, the appropriate threshold value directly affects the effect of segmentation.

There are many algorithms for threshold selection, including global threshold, adaptive threshold, and dynamic threshold. When dividing the image threshold, different processing objects should be flexibly selected for the segmentation algorithm. In this study, the original images (figure 3, figure 4) are taken in natural scenarios. Considering the changes in the scene light conditions and the adaptability of algorithms, in this paper, we use a dynamic threshold algorithm that can change with the illumination condition of the scenario. Split to 5, as shown in figure 6. binarization image. The black image areas are considered as sub-images to be analyzed.

Figure 5 binarization image 6 binarization Image

(2) Target Extraction

As shown in figures 5 and 6, after binarization, in addition to the target image (identification image), the sub-image to be analyzed also includes some background images, therefore, it is necessary to analyze the black sub-images in the image to further filter non-logo images.

You can use connected component searching to separate sub-images. Currently, common search methods include the chaincode algorithm, the serial local search algorithm proposed by Suzuki et al, and the area filling scanning line algorithm. Based on the analysis of the classic algorithm, this paper improves the area filling scanning line algorithm. This algorithm modifies the stack structure in the original algorithm, and the stack depth is far smaller than that of the Seed Filling Algorithm to avoid stack overflow, this improves the image processing stability and eliminates repeated operations on pixel color interpretation in the classic algorithm, greatly improving the filling speed.

The geometric features of an image can be used to filter out the target image (residual background image ). Therefore, while searching connected components, the area and shape feature parameters of each connected image are calculated. In this topic, the applied geometric features include image area width (AW), area height (AH), shape parameters (SP), density parameters (DP) the number of region partitions (count). The formula for calculating the first four items is shown in formula (1)-(4:

Aw = Maxx-Minx; (1)

Ah = Maxy-miny; (2)

SP = AW/Ah; (3)

Dp = count/(Aw * Ah); (4)

In formula, Minx, miny, Maxx, and Maxy are the extreme points of each sub-region. The number of pixels in each region is obtained by count during the connected search process. In order to filter out the positive area and extremely large interference areas, the number of connections in the connected areas is within a certain range. Because the residual non-target areas are mostly irregular (see Figure 5, figure 6 ),

With these geometric features, residual non-target areas can be effectively filtered out. Result 7 after filtering, as shown in figure 8.

Figure 7 extracting candidate regions figure 8 extracting candidate regions

Iii. Normalization of target regions

After the above processed images, the vast majority of remaining sub-images are already to be recognized (collectively referred to as candidate target regions) and can be identified by template matching, however, due to the camera's shooting position, the collected logo images may change, such as rotation, scaling, and deformation. Because the CCD imaging surface is related to the angle and distance of the target image, it is necessary to correct and normalize the image. The contour detector can be used to extract the contour of each Identification object and record the coordinate values of all points on the contour. Then, Four Corner Points in the identification object image area are obtained. Finally, Image Correction is realized through perspective transformation, normalize the image size and position of the identifier.

(1) Corner extraction

Corner Points are an important local feature of images. Corner Points are very important in image matching, object detection and recognition. There are many algorithms for Corner extraction, such as using a binary edge graph to detect and locate corner points. Kitchen uses a local quadratic equation surface approximation to extract corner points; and Susan corner detector. These algorithms are based on the image edge and the image gray scale. Based on the characteristics of the logo image, this paper proposes a corner detection method, which calculates the distance between each point on the contour and a straight line to solve the corner points.

The four extreme coordinate components Minx, miny, Maxx, Maxy, and, based on the geometric shape of the Quadrilateral, we can determine that at least two points are the corner points of the target area. 9. If the two corners are M and N, there are two kinds of linear Mn: (1) linear Mn is a diagonal line of the logo image; (2) the line MN is the logo image side. After the contour detector is used to extract the contour of each Identification object and record the coordinate values of all points on the contour, the online points of the contour are determined (including points on the linear Mn) located on both sides of the Line Mn (Fig 9.a) or one side (Fig 9. B), all points on the contour are divided into two groups or one group of storage. Calculate the Mn distance from each point on the contour respectively:

1) assuming the image 9.a, the points on the contour are stored in two groups. By calculating the Mn distance between the two groups of contour points to the straight line, D, D. (I 1, 2 .. m, J 1, 2... n) I j =, take the contour points corresponding to the maximum distance I j Max D and Max D in each group, that is, the angle points Q, L. max D max (d | I 1, 2... m) I I =, Max D max (d | J 1, 2... n) J =.

2) assuming that the image 9.bis, the points on the contour are stored in a group. By calculating the Mn distance from the top point of the contour to the straight line D (K 1, 2 .. m) k =, take the point corresponding to the maximum distance as the corner point (because all points in the contour are on the Mn side of the line, the maximum value is only one Q), and then use the Qn line as the diagonal line, repeat the 1) operation to obtain the fourth angle L.

3) assume that the image 9.c. if the sides of the logo area are parallel, multiple equal and maximum distances are displayed. When the four extreme values Minx, miny, Maxx, and Maxy of the search region are recorded during the connected search, the minimum and maximum x values are used as the Four Corner Points of the region, without the need to calculate the distance between the top point of the contour and the straight line.

A large number of tests show that the accuracy of Corner extraction by this algorithm is 97%, which has good stability and high accuracy. Compared with other algorithms that extract corner points, the computing time is greatly improved.

Figure 9 Corner extraction

(2) perspective transformation

In order to solve the problem of deformation of the image in the candidate target region, perspective transformation must be performed on the candidate target region before matching. A perspective transformation is a one-to-one correspondence between a quadrilateral with four known coordinate points and a rectangle with known coordinate points, generates a smooth ing that maintains continuity and connectivity. Figure 10 shows the Quadrilateral ing relationship.

Figure 10 ing between four corners

The forward ing function of perspective transformation can be expressed:

[UI, VI] and [XI, Yi] (I =, 2, 3) are the corresponding points of the distorted image and the target image. They form four pairs of control points.

Use the four known control points to substitute the equation (5) and (6) to obtain eight coefficients, so as to obtain the projection transformation matrix between the distorted image and the target image. Then, for any point (XI, Yi) in the target image, obtain the (ui, vi) according to the projection transformation matrix between the target image and the distorted image ), finally, find the gray value of the distorted image based on the principle of nearest interpolation. In Figure 11, the gray area in Figure 12 is the image before the perspective transform, and the black area is the image after the perspective transform (the image after the perspective transform is superimposed on the image before the perspective transform ).

Figure 11 perspective switching Fig 12 perspective Switching

Iv. Target Object Recognition

After a series of image processing and analysis, the candidate region is finally determined. Finally, it is necessary to confirm whether a specific object is contained and identify the identified different identifiers; therefore, it is necessary to match the candidate region and template to determine whether to retain a candidate region.

In the traditional template matching method, the basic process of the algorithm is to search for image sub-blocks that match the given template image in a certain image area. Generally, O (N-p + 1) * (m-q + 1) * p * q is calculated (assuming that the image to be matched is M * n, the template image is p * q), which consumes a lot of time. Although there are many improved algorithms for this method, they do not produce exponential improvements in computing complexity. Since the location and size of the region of each candidate marker are known after the Component Analysis and perspective transformation in the last two steps, the template matching proposed in this article only needs to match the four corners of the template and the candidate region one by one, then compare the difference values in four directions (180,270,). The minimum difference is the target image. This not only avoids the disadvantages that traditional template matching is sensitive to translation and rotation, but also avoids the slow operation speed caused by the entire image search, this greatly improves the matching speed and accuracy.

Figure 13 image matching figure 14 Image Matching

The correlation coefficient R is as follows:


T is the template image, and s is the image to be matched.

In this topic, the size of the template and the image to be matched is 40 × 40 after perspective transformation. Result 13, as shown in figure 14: the gray area is the image to be matched (including the marker and non-marker that failed to be filtered out), and the black image is the template image.

V. Summary

In this paper, we propose a method to extract and recognize multiple objects in real time in a complex background. This method uses 2D matrix codes as identifiers to extract candidate identifiers from original images after binarization and connectivity analysis. After Corner extraction and perspective conversion, the normalized identifiers are matched with templates, it not only detects targets with translation, rotation, and scale changes from a complex background, but also has good adaptability to matrix distortion caused by cameras. In this algorithm, the average time required for recognizing a logo image is 60 ms. In this example, the average time of the four logo images is 100 ms, and the recognition accuracy is over 96%. Experiments show that this method has stable performance and meets the real-time requirements of the system. It can be applied to vehicle tracking, Robot Control, Dynamic Identification and video scenario monitoring.

(Source: Micro-Computer Information Author: Ren Qing theory)



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.