Author: Wang Xianrong
This article attempts to translate the paper nonparametric background generation recommended in learning opencv. Due to my poor English skills, I had to work on and off for a few days. There must be many mistakes in it. please correct me and forgive me. The purpose of this article is to study. If you want to use it for commercial purposes, contact the author of the original article.
Non-parameter background generation
Liu Asia, Yao Hongxun, Gao Wen, Chen Xilin, Zhao Debin
Harbin Institute of Technology
Institute of computing, Chinese Emy of Sciences
Summary
This article introduces a novel background generation method, which is based on a non-parameter background model and can be used for background subtraction. We will introduce a new model named effect components description MCM to describe background changes. On this basis, we can use the local extreme values of the potential distribution to derive the most reliable background state (most reliable background mode MRBM ). The basic calculation process of this method adopts the Mean Shift classic pattern recognition process. Mean Shift uses Iterative Computing to locate the nearest point in the data density distribution ). This method has three advantages: (1) extracting the background from videos containing chaotic motion objects; (2) very clear background; (3) noise and small margin (CAMERA) vibration is robust. Extensive experimental results prove the advantages of the above.
Key words: background subtraction, background generation, mean shift, influencing factor description, most reliable background status, video monitoring
1 Introduction
In many computer vision and video analysis applications, Moving Object Segmentation is a basic task. For example, video monitoring, multimedia indexing, character detection and tracking, user-friendly human-machine interface, and "Genie" video encoding. Precise Object Segmentation can greatly improve the performance of object tracking, recognition, classification, and dynamic analysis. Common methods to recognize moving objects include optical flow, time difference or background subtraction. Among them, Background Subtraction is the most commonly used. The background model is calculated and evolved frame by frame. Then, the moving object is detected by comparing the differences between the current frame and the background model. The key to this method is to establish and maintain the background model. Although many promising methods are proposed in [1-4], the accuracy of Motion Object detection is still difficult to solve. The first problem is that the background model must reflect the real background as accurately as possible so that the system can accurately detect the shape of the moving object. The second problem is that the background model must be sensitive to changes in the background scenario, such as starting and stopping an object. If the preceding problem cannot be properly solved, Background Subtraction detects false objects, which are often called "ghosts ".
Currently, many background creation and maintenance methods are available for background subtraction. Based on the background modeling steps, we can divide them into parameterized and non-parametric methods. The parameterized Background Modeling method is generally assumed that the potential probability density function of a single pixel is a Gaussian or Gaussian mixture function. For details, see [5-7 ]. In article 8, stauffer and Grimson propose an Adaptive Background Subtraction Method to Solve the motion segmentation problem. In their work, they created a Gaussian mixture probability density function for each pixel, and then updated the model with an immediate approximate value. Document [] proposes some improvement methods for Gaussian Mixture Models. Toyama and others proposed a three-layer wallflower solution in the document [2], which attempts to solve many existing problems in background maintenance, such as opening and closing the light and foreground holes. The W4 method proposed by haritaoglu and others in document [1]. This method creates a background model and retains three values for each pixel, including the maximum value (M) and the minimum value (n) and the absolute difference between the maximum frame (d ). Kim et al. quantified the background value to the encoding book in [11], which describes the compression form of the background model in a long video.
Another frequently used background model method is based on non-parametric technologies, such as [3, 12-16 ]. In article 3, Elgammal et al. established a non-parametric background model through kernel density estimation. For each pixel, the observed intensity value is retained to estimate the potential probability density function, and the probability of the new intensity value can be calculated using this function. This model is robust and can adapt to chaotic and not completely static backgrounds that contain small disturbances, such as swinging branches and shrubs.
Compared with the parameterized background model method, the non-parametric background model method has the following advantages: No need to specify a potential model, and no explicit estimation parameter [14] is required ]. Therefore, they can adapt to any unknown data distribution. This feature makes non-parametric methods a powerful tool for many computer vision applications. In many computer vision applications, many problems involve the density of multiple forms. Data in the feature space has no rules and does not follow standard parameter forms. However, in terms of time and space complexity, the non-parametric method is not as effective as the parametric method. Parameterized methods generate concise density descriptions (such as Gaussian or Gaussian mixture) to produce effective estimation states. Relatively, non-parametric methods almost do not require computing in the learning stage, but require high-density computing in the evaluation stage. Therefore, the main defect of non-parametric methods is their calculation workload. However, some innovative achievements have been proposed to accelerate the evaluation of non-parametric methods, such as the rapid Gaussian Transformation (fgt) in [13 ), new ball tree in Article 17Algorithm, Kernel Density Estimation and K-Nearest Neighbor (KNN) classification.
This article focuses on non-parametric methods, which are closely related to the methods proposed in elagammal [3], but there are two essential differences. From the basic principle, we use the influence factor description (MCM) to model background changes. The most reliable background model (MRBM) is robust to the estimation of background scenes. From the calculation process, we can avoid the kernel density estimation process for calculating the probability of each new observed intensity value by using mean shift, which saves processing time. In our method, only frame difference can be used to determine the attribute of the pixel. Therefore, it can improve the robustness and efficiency of background subtraction.
The rest of this article is organized as follows: Section 2 describes the influencing factors to reflect the changes in the background. Section 3 describes the most reliable background model in detail; the fourth section contains the experiment results. The fifth section discusses the parts to be expanded.
2. Description of Influencing Factors
This section discusses the impact factor description (MCM). We try to use it to effectively simulate background changes.
The key factor for background subtraction is how to establish and maintain a good background model. In different applications, camera types, captured environments, and objects are completely different. The background model requires sufficient adaptive capabilities to adapt to different situations. To effectively model the background, we start with the simplest ideal situation. Ideally, for each spatial position in the video, the intensity value along the timeline is constant C; constant C indicates that a fixed scene (no moving object or system noise) is recorded by a fixed camera ). We call this scenario an ideal background scenario. However, this ideal situation is rarely encountered in practical applications. Therefore, background pixels can be considered as a combination of ideal background scenarios and other influences. We define this method as a description of the impact elements of the background, including the following:
System Noise N-sys: It is caused by image sensors and other hardware devices. If the environment is not strict, the system noise will not fundamentally affect the constant C, but only cause a moderate deviation.
Moving Object M-OBJ: It is caused by the actual moving object and its shadow. Most of the time, it has great interference with C.
Motion Background M-BGD: It is caused by the moving background area, such as the branches that swing with the wind in an outdoor scenario, or the ripple in the water.
Illumination S-illum: It indicates the gradual illumination of the outdoors with the change of the sun's position, or the lighting that changes when the indoor light is turned off and turned on.
Camera displacement d-cam: It indicates the pixel intensity change caused by a small shift of the camera.
The observed values of a scenario (recorded as V-obsv) are composed of ideal background Scenario C and valid components, as shown in formula (1.
V-obsv = C + N-sys + M-OBJ + M-BGD + S-illum + D-cam (1)
Here, we use symbol + to indicate the cumulative effect of the affected factors.
In fact, the above factors can be further divided into different attributes shown in table 1. The attribute that needs to be emphasized first is the process. We can divide the influencing factors into long-term and short-term effects by process. We split the video stream into equal-length blocks along the timeline, as shown in 1. Long term indicates that the influencing factors will last several or always exist, such as N-sys, S-illum, and D-cam. M-OBJ and M-BGD only happen occasionally and will not last for a long time. Therefore, we call it short-term effect.
Figure 1 divide video streams into equal-length blocks
Another classification criterion is deviation. We regard s-illum, D-cam, and m-BGD as the influence of Time-unchanged resident deviation. In a long process, these effects can be seen as lasting Increase (decrease) or substitution for the ideal background value c. Take S-illum as an example. If the lighting is enabled in an indoor scenario, S-illum can be seen as a persistent increase in C in the next frame. N-sys and M-OBJ have random values at different times, which is called the influence of random deviations with time changes. The above analysis is summarized into table 1.
Table 1 Categories of Influencing Factors
Long term and short term
Resident deviation S-illum, D-cam M-BGD
Random deviation N-sys M-OBJ
The following two points must be clarified here: (1) the preceding classification is not absolute and depends on the block length we have selected, but it does not affect our subsequent analysis; (2) maybe someone will point out that the classification of S-illum is incorrect. For example, the lighting of a driving car is not a long-term effect. In this case, the light change is a short-term effect, similar to m-obj, therefore, we do not list it as an independent influence factor.
Because S-illum and D-cam have long-term deviations from ideal background C, we merge them into the ideal background, C '= C + S-illum + D-cam. The direct interpretation of this merger is: if the light changes or the camera changes its position, we have reason to assume that the ideal background has changed. Therefore, the formula (1) is expressed:
V-obsv = C' + N-sys + M-OBJ + M-BGD (2)
Till now, the observed value V-obsv is composed of the new ideal background value C' and the influencing factors (n-sys, M-OBJ, M-BGD. These factors have different influences on C', which are summarized as follows:
N-sys exists in the entire video stream and has some influence on C. Therefore, most observations do not deviate from C' too far.
M-OBJ and M-BGD occur only occasionally, but cause a large deviation to C. Therefore, only a small number of observations are significantly different from those of C '.
The following conclusions are drawn: the pixel value at the spatial position remains stable for most of the time and is accompanied by some deviations (due to the long-standing random deviation N-sys ); only when the moving object passes through this pixel causes significant deviation (due to short-term deviation M-OBJ and M-BGD ). Therefore, within a period of time, a few significant deviations form an extreme value. This attribute exists most of the time, but sometimes it is not. In Figure 2, the pixel value at the center of the White Circle changes over time. Figure 2 ()~ (C) Excerpted from a -frame video, Fig 2 (d) depicts a change in pixel intensity. From figure 2 (d), we can see that the Small Deviation Caused by system noise takes most of the time, and only causes a significant deviation when the moving object (and its shadow) passes through. This is consistent with the description of the influencing factors.
Figure 2 shows an example of the efficiency of the primary node.
Our task is from the observed value sequence {v-obsv t} (t = 1 .... t, T indicates the length of time.) Find the c '^ of the ideal background. Through the above analysis, we found that c '^ is located at the midpoint of most observations. On the other hand, c '^ is in the most dense and potential distribution gradient. This task can be completed by mean shift. We call C' ^ The most reliable background state.
3. The most reliable background status for Motion Object Detection
Based on the description of the influencing factors described in section 2, we know that the center of most observed observations is the ideal background estimation. We use the symbol C '^ to represent this estimation, and it is called the most reliable background state (MRBM ). The Basic Calculation Method for locating MRBM is mean shift. On the one hand, by using MRBM, we can generate clear background images for videos of chaotic motion objects. On the other hand, the mean shift process can find some local extreme values of the intensity distribution. This information can distinguish the background of the movement from the actual Motion Object (for example, the branches swinging outdoors with the wind, or ripple in the water ).
3.1 mean shift for MRBM
Mean Shift is a concise method to locate the extreme density. The gradient at the extreme density is 0. this theory is proposed by Fukunaga in [18], and the smoothness and convergence of Mean Shift are confirmed by comaniciu and Meer in [19. In recent years, it has become a powerful tool for Computer Vision Applications and has reported many promising achievements. For example, Mean Shift-based image segmentation [19-21] and tracking [22-26 ].
In our work, we use mean shift to locate the extreme values of the intensity distribution (Note: there may be multiple local extreme values ). We define the maximum density state as MRBM. Key Aspect 3 of the algorithm includes the following steps:
Sample Selection: Select a set of samples for each pixel S = {Xi}, I = 1,..., n. Xis indicates the intensity value of the pixel along the timeline, and N indicates the number of samples. We perform the mean shift operation on the sample to locate the extreme values of the density.
Typical Point Selection: To reduce the computational workload, we select or calculate a set of typical points (the number of typical points is m, m <n ), and mark this set of typical points as P = {PI}, I = 1 ,..., m. A typical point in P can be the sample result or the local average of the original sample points. In our experiment, we select the local average value.
Mean Shift Process: From the typical sample points in P, we can get the convergence point M by using the Mean Shift Process. It is worth noting that the mean shift calculation is still based on the whole sample vertex set S. Therefore, the accuracy of gradient density estimation is not reduced due to the use of typical points.
Extract candidate background model: Because some convergence points are very close or even identical, these convergence points M can be clustered into q groups (Q ≤ m ). We can obtain the Q-weight clustering center, c = {CI, wi}, I = 1 ,...., q, where Ci is the strength value of each aggregation center, and WI is the weight of the aggregation center. The points in each group are counted as Li, I = 1,..., Q, Σ I = 1qli = m. The weight of each group center is defined as: Wi = Li/m, I = 1,..., Q.
Obtain the most reliable background model: C' ^ = CI *, where I * = argi max {wi} and C' ^ are the most reliable background models mentioned in section 2.
Figure 3 key points of the MRBM Algorithm
For each typical M point, follow these steps to implement mean shift in step 3:
(1) start point of the Mean Shift initialization process: Y1 = pi.
(2) The Mean Shift Process yt + 1 =... Until convergence. (Here we use the same Mean Shift Process as [19]. function g (x) is the kernel function g (x ).)
(3) Save the convergence point yconv for subsequent analysis.
After applying the preceding steps to all pixels, we can use MRBM to generate background scenario B. Through the above analysis, we find that the time complexity of the background generation process is O (N · m), the space complexity is O (n · N), where N is the video length.
3.2 moving object detection and Background Model Maintenance
After the background model is generated, we can use it to detect the motion areas in the scenario. To make our background model robust to the Motion Background (for example, branches swinging outdoors with the wind or ripple in water), we select K clustering centers as possible background values. We define this set as Cb ={{ CI, wi} | wi ≥θ}, I = 1 ,...., k, where CB ⊆ C and θ are predefined thresholds. For each new observed intensity value x0, we only calculate the minimum difference d between x0 and CB elements, where D = min {(x0-ci) | {CI, wi} And cb }. If the difference value D is greater than the predefined threshold value T, we think that the new observed intensity value is foreground, otherwise it is the background.
Background maintenance allows our background model to adapt to long-term background changes, such as new parking cars or changing lighting. When we observe a new pixel value, the background model is updated as follows:
(1) For each new pixel value, we regard it as a new typical sample point. Therefore, the number of typical sample points becomes m = m + 1.
(2) If the new pixel value belongs to the background area, assuming that the intensity value is closest to the clustering center {CI, wi}, We will update the weight of the center: wi = (Li + 1)/m.
(3) If the new pixel value belongs to the foreground region, we can use the new Mean Shift Process from this point to obtain the new convergence center {cnew, wnew }, wnew is initialized to: wnew = 1/m. The cluster center C is expanded to: C = C centers {cnew, wnew }}.
The time complexity of Background Subtraction is O (n), the time complexity of background maintenance is O (R), where N is the number of video frames, and r is the number of moving objects.
4. Experiment
We focus on two types of MRBM applications: Background generation and background subtraction. We compared MRBM with other common methods in the synthesis video and standard pets database.Source codeC ++ is used to implement the test. The computer configuration is as follows: the CPU is Pentium 1.6 GHz, and the memory is 512 MB.
The video size captured or synthesized by ourselves is 320x240 pixels, the video size of the pets database is 384/360x288 pixels, and the frame rate is 25fps. In all the experiments, we chose the YUV () color space as the feature space. For the description of algorithm implementation, see section 3. We use epanechnikov core, K (t) = 3/4*(1-t2 ).
Theoretically, a larger training set can produce a more stable background model, but it will sacrifice adaptability. Our experiments show that when n = 100, the background image can be used for optimal visual quality and adaptability. Typical point M affects the reliability of the training time and background model. In our experiment, we selected M = 10 typical points for the mean shift process. The training time is close to that of the Gaussian mixture model. The threshold values θ and t affect the detection accuracy. different datasets may have different θ and t values. In our experiment, when θ = 0.3, t = 10, we can get the maximum accuracy and the minimum error rate. Unless otherwise specified, use the preceding settings for all experiments.
4.1 background generation
In many monitoring and tracking applications, it is expected to generate background images without moving objects, which can provide reference information for further analysis. However, in many cases, it is not easy to obtain videos without moving objects. Our algorithms can extract clear background images from videos that contain chaotic motion objects. Figure 4 shows some background information. There are a total of video frames. We use the first 100 frames to generate the background. The figure shows frames, and 99. The bottom of Figure 4 shows the background of Algorithm generation. Taking figure 4 (c) as an example, this video is taken from the last class time on campus, with 10 students Walking each frame. Observe the background image at the bottom of Figure 4 (c). We find that the background is very clear and all motion objects are erased successfully.
Figure 4 background image generated by MRBM (frames, and 99 are displayed in each video segment)
Moving speed of a moving object is a key factor that significantly affects the background model, including our background model. We used a 300-frame video to evaluate the algorithm. There was a lady walking slowly in the video. Frames 120, and are displayed in Figure 5 ()~ (E. The background generated from different sample images is displayed in Figure 5 (f )~ (J ). When a sample image of 100 frames is maintained, noise is generated in the background, but the overall quality of the background is ensured. The noise area is marked with a white elliptic, as shown in 5 (f. When we increase the number of samples to 300, the background becomes very clear, as shown in 5 (j.
Figure 5 background image generated by different samples (n = 100,150,200,250,300)
We also compared our background generation method with other basic methods, such as Gaussian mixture model with multiple aggregation centers. To differentiate the comparison results, we have synthesized a multi-modal background distribution video. Background pixels are generated by Gaussian mixture distribution. PBG (x) = Σ I = 12 α ig μ I, σ I (x), where the α 1 = α 2 = 0.5, σ 1 = σ 2 = 6, μ 1 = 128, μ 2 = 240. The pixels of the foreground object are generated by Gaussian distribution. PFG (x) = G μ, σ (x), where μ = 10, σ = 6. In the above two formula, gμ, σ (·) represents the Gaussian distribution with mean μ and standard deviation σ. The intensity distribution of background pixels and foreground pixels is shown in Figure 6.
Figure 6 background pixel intensity distribution (blue curve) and foreground pixel intensity distribution (red curve)
There are a total of 120 video frames. We use the first 100 frames to generate the background. Figure 7 ()~ (E) Some selected frames are displayed, and the generated background image is displayed in Figure 7 (f )~ (I), the "ground fact" sample generated from the potential distribution is displayed in Figure 7 (j. For the Gaussian Model, the intensity value of the background pixel is selected as the Gaussian mean, And the generated background image is 7 (f. For Gaussian mixture models, we select the Gaussian mixture mean with maxim as the background value. Figure 7 (g) shows the Gaussian Mixture Models with two centers, and Figure 7 (h) shows the Gaussian Mixture Models with three centers. The Gaussian mixture model used in the experiment is implemented in opencv. For details, see [27 ]. Result 7 (I) of the MRBM method is shown.
Figure 7 background image generated from the merged video by different models. ()~ (E) displays frames, 80. (F )~ (I) displays background images generated from Gaussian model, two-center Gaussian mixture model, three-center Gaussian mixture model, and the most reliable background model. (J) displays the background image of the live samples on the ground.
Compared with the actual ground images and the generated background images, we found that the non-parameter model MRBM is superior to other methods. Intuitively, when dealing with multimode distributions, MRBM looks like a Gaussian mixture model. However, the key difference is that the Gaussian model depends on mean and variance. Their level 1 and level 2 statistics are very sensitive to external points (points outliers stay away from data peaks. If the object is slow and there are enough foreground values that lead to an incorrect mean, the incorrect background value is obtained. As a contrast, MRBM is independent of distribution and uses the extreme value as the possible background value, which is more robust to external points. Other parameter methods have similar problems, which are more obvious when the predefined model cannot describe the data distribution.
4.2 Background Subtraction
Figure 8 shows the background subtraction result of our algorithm. Figure 8 (a) shows the observed current frame, and figure 8 (B) shows the background image generated using MRBM from the 100 frame sample. Figure 8 (c) the result image of Background Subtraction is displayed, and we find that the moving object becomes very prominent. We compared MRBM and other common basic methods, such as the maximum and minimum values method in [1], the Mean Value Method in [28, 29], and the Gaussian mixture model in [8, 6. The comparison result is shown in Figure 9. Since we cannot modify the implementation of these original work results, we can only manage basic algorithms in the following ways: (1) For W4, we set parameters as recommended in the original results; (2) for the median method and Gaussian mixture model, we adjust the parameters to achieve the best detection accuracy. In addition, in order to make it as fair as possible, we only perform background subtraction without noise reduction and Morphological Processing.
Figure 8 Background Subtraction result
The optimal video sequence is selected from the pets database [30-32], as shown in frame 9 (. For all video sequences, we use 100 frames to generate a background, and 40th frames for background subtraction. These video sequences include two main scenarios: slow moving objects (such as pets00 and pets06) and multi-modal backgrounds (such as the swinging tree in pets01 ); the two scenarios are different in background subtraction. For objects with slow motion, the Gaussian model has poor results because the Gaussian mean is sensitive to external points, as shown in 9 (d. MRBM depends on the extreme value of the background distribution, and has little influence on the external point. Similarly, the mean value method and the maximum and minimum value method cannot properly cope with the multi-modal background. The Swinging tree in pets01 is mistaken for the foreground. As expected, MRBM is better than the other three methods.
Figure 9 Background Subtraction result obtained by different methods. (A) Standard pets database, (B) maximum and minimum value method, (c) mean value method, (d) Gaussian mixture model, and (e) the most reliable background model
4.3 possible deficiencies
Although MRBM is applicable to many applications, there are still some situations that cannot be addressed. Figure 10 is an example that cannot be addressed. In this experiment, there are a total of 300 Video Frames. We use the first 120 frames to generate the background. Figure 10 ()~ (G) displays frames 80,100,120, and respectively. The background image is displayed in Figure 10 (h. A large part of foreground figures are mistaken for the background.
Figure 10 shows an example where MRBM cannot be correctly processed. The 80,100,120, and frames are displayed respectively.
Generally, the definition of the foreground and background is not clear in itself. It is included in the semantics of the scenario and may be inconsistent in different applications. In our applications, we define a moving object as a foreground and a static (or almost static) thing as a background, which is consistent with the definition of most video monitoring applications. Through the second analysis, we try to use the MCM model to approximate the observed values. In the experiment shown in figure 10, the character remains static for most of the time and then suddenly moves. In this case, most of the observed intensity values belong to the characters rather than the background. The shoulder part of a character is particularly obvious, and the shoulder part has similar colors, so that motion cannot be detected. Therefore, most foreground figures are mistaken for the background.
In fact, this example reflects the fundamental problem of the background model: stability and adaptability. Theoretically, if we increase the number of background frames used for training, we can get a clearer background image. However, the adaptability of the background model is greatly reduced. When the background changes (for example, a new parked car or a sudden change in light), it takes a long time for the background model to adapt to the new situation, resulting in a large number of errors.
To solve this problem, an effective solution is to extend the existing pixel-based method to a region-based or frame-based method. It can be achieved by dividing the image or improving the low-level classification of pixels. Further, you can use both low-level Object Segmentation and advanced information (such as tracking or event description ). Therefore, our next work will focus on how to combine space and advanced information.
5 conclusion
This article has two main contributions: (1) the description of the influencing factors can be used to model the changing background; (2) based on, we have developed a robust background generation method-the most reliable background model. MRBM can be used to generate high-quality background images from video sequences containing chaotic motion objects. Some examples show the effectiveness and robustness of this method.
However, there are still some issues to be resolved. Currently, only the time information of pixels is taken into account. How to combine spatial information to improve the robustness of this method is the focus of subsequent work. A direct extension is to change the current pixel-based method to a region-based method that fuses the neighbor information. In addition, the combined use of low-level segmentation and advanced tracking information will greatly improve our work results.
6. Thank you
I would like to thank Dr. Chen Xilin and Dr. Shi Guang for their helpful discussions with the author. Funding for the study is sponsored by the China Natural Science Foundation, the training program of the Chinese Emy of Sciences for hundreds of talents, and Shanghai yinchen Intelligent Identification Technology Co., Ltd.
References
[1] I. haritaoglu, D. harwoodand, L. s. davis, W4: Real-Time surveillance of people and their activities, IEEE Transactions on Pattern Analysis and machine intelligence 22 (8) (2000) 809-830.
[2] K. toyama, J. krumm, B. brumitt, B. meyers ., wallflower: Principles and Practice of background maintenance, in: IEEE International conferenceon computer vision, Corfu, Greece, 1999, pp. 255-261.
[3]. elgammal, D. harwood, L. davis, non-parametric model for background subtraction, in: European Conference on computer vision, Dublin, Ireland, 2000, pp. 751-767.
[4] T. E. boult, R. J. micheals, X. Gao, M. Eckmann, incluthewoods: visual surveillance of noncooperative and camouflaged targets in complex outdoor
Settings, Proceedings of the IEEE 89 (2001) 1382-1402.
[5] C. R. Wren, A. azarbayejani, T. Darrell, A. P. Pentland, pfinder: real-time tracking of the human body, IEEE Transactions on Pattern Analysis and
Machine Intelligence 19 (7) (1998) 780-785.
[6] C. stauffer, W. grimson, Adaptive Background Mixture Models for real-time tracking, in: IEEE Conference on computer vision and pattern recognition, fortcollins, USA, 1999, pp. 246-252.
[7] S. Rowe, A. blke, statistical background modelling for tracking with a virtual camera, in: British Machine Vision Conference, birmheim, UK, 1995, pp. 423-432.
[8] C. stauffer, W. e. l. grimson, learning patterns of activity using real-time tracking, IEEE Transactions on Pattern Analysis and machine intelligence 22 (8) (2000) 747-757.
[9] L. li, W. huang, I. y. gu, Q. tian, foreground object detection in changing background based on color co-occurrence statistics, in: IEEE Workshop on Applications of computer vision, Orlando, Florida, 2002, pp. 269-274.
[10] p. kaewtrakulpong, R. bowden, An Improved Adaptive Background Mixture Model for real-time tracking with shadow detection, in: European workshop on advanced video based surveillance systems, Kluwer Academic, 2001.
[11] K. kim, T. chalidabhongse, D. harwood, L. davis, real-time foreground-Background Segmentation Using codebook model, Real Time Imaging 11 (3) (2005) 172-185.
[12]. elgammal, R. duraiswami, L. davis, effcient non-parametric adaptive color modeling using fast Gauss transform, in: IEEE Conference on computer vision and pattern recognition, vol. 2, 2001, pp. 563-570.
[13]. m. elgammal, R. duraiswami, L. s. davis, effcient Kernel Density Estimation Using the fast Gauss transform with applications to color modeling and tracking ., IEEE Transactions on Pattern Analysis and machine intelligence 25 (11) (2003) 1499-1504.
[14] a. Elgammal, effcient Nonparametric Kernel Density Estimation for realtime computer vision, Ph. D. Thesis, Rutgers, the State University of New Jersey (2002 ).
[15] H. askar, X. li, Z. li, background clutter suppression and dim moving point targets detection using nonparametric method, in: International Conference on communications, circuits and systems and West Sino expositions, vol. 2, 2002, pp. 982-986.
[16] D. thirde, G. Jones, hierarchical probabilistic models for Video Object Segmentation and tracking, in: International Conference on Pattern
Recognition, Vol. 1, 2004, pp. 636-639.
[17] T. liu,. w. moore,. gray, effcient exact K-nn and nonparametric classification in high dimensions, in: neural information processing systems, 2003, pp. 265-272.
[18] K. Fukunaga, L. Hostetler, the estimation of the gradient of adensity function, with applications in pattern recognition, IEEE Transactions on Information Theory 21 (1975) 32-40.
[19] D. comaniciu, P. meer, Mean Shift: a robust approach toward feature space analysis, IEEE Transactions on Pattern Analysis and machine intelligence 24 (5) (2002) 603-619.
[20] I .Y.-H. Gu, V. Gui, colour Image Segmentation Using Adaptive Mean Shift filters, in: International Conference on image processing, 2001, pp. 726-729.
[21] L. yang, P. meer, D. j. foran, unsupervised segmentation based on robust estimation and color active contour models, IEEE Transactions on information technology in biomedicine 9 (3) (2005) 475-486.
[22] D. comaniciu, V. Ramesh, P. Meer, kernel-based object tracking, IEEE Transactions on Pattern Analysis and machine intelligence 25 (5) (2003) 564-577.
[23] R. T. Collins, Y. Liu, on-line selection of discrimin ative tracking features, in: International Conference on computer vision, 2003, pp. 346-352.
[24] r. collins, Y. liu, M. leordeanu, on-line selection of discriminative tracking features, IEEE Transactions on Pattern Analysis and machine intelligence 27 (10) (2005) 1631-1643.
[25] O. debeir, P. v. ham, R. kiss, C. decaestecker, tracking of migrating cells under phase-contrast video microscopy with combined mean-shift processes, IEEE Transactions on medical imaging 24 (6) (2005) 697-711.
[26] C. Shen, M. J. Brooks, A. van den hengel, fast global Kernel Density mode seeking with Application to localisation and tracking, in: International
Conference on computer vision, 2005, pp. 1516-1523.
[27] intel open source computer vision Library (2004 ).
URL http://www.intel.com/technology/computing/opencv/
[28] B. lo, S. velastin, automatic congestion detection system for underground platforms, in: International Symposium on Intelligent Multimedia, video and speech processing, Hong Kong, China, 2001, pp. 158-161.
[29] r. cucchiara, C. grana, M. piccardi,. prati, detecting moving objects, ghosts, and shadows in video streams, IEEE Transactions on Pattern Analysis and machine intelligence 25 (10) (2003) 1337-1342.
[30] IEEE International Workshop on Performance Evaluation of tracking and surveillance (2000 ).
URL ftp://ftp.pets.rdg.ac.uk/pub/PETS2000/
[31] IEEE International Workshop on Performance Evaluation of tracking and surveillance (2001 ).
URL ftp://ftp.pets.rdg.ac.uk/pub/PETS2001/
[32] IEEE International Workshop on Performance Evaluation of tracking and surveillance (2006 ).
URL http://pets2006.net/data.html
Written at the end
The method described in this article is the peak of pixel-level Background Modeling. In the following time, I will try to implement the algorithm in the text according to my own understanding. I will try to improve it for the part that is not thoroughly described in the paper. Coming soon ~~
In translationArticleDr. Zhao Debin gave me his guidance and expressed his gratitude.
At the same time, thank you for your patience and hope to help you.
I want to know what to do later, and listen to the next decomposition.
The text editor on the webpage is not convenient to write formulas, and the formulas in the text are difficult to understand. We recommend that you download the Word documents in this article.