Models of Object Detection with discriminatively trained Part Based Models

Source: Internet
Author: User

LSVM-MDPM Release 4 notes

The codes downloaded on the home page are self-carried and translated. Put them here so that they cannot be found in the future. You can also provide a reference for the people you need. If you have any questions, or have any reasonable answers, you can leave a comment. Thank you.

 

1 Introduction

This is the latest improvement of the Object Detection System in [2. Some improvements were adopted by the uoctti_lsvm_mdpm System in [1], while others were later improvements.

 

2 models

In [2], each target class is represented by a hybrid deformation component (Component) model containing two components. Each component is symmetric on both sides. Here we use a hybrid asymmetric model that contains three components. Non-Bilateral symmetry allows each component to focus more on the target of the Left or Right pose. This means that the hybrid model actually has six components, and the constraint components is symmetric. The system automatically learns to distinguish between the left and right positions, and does not use additional pose tag information.

 

2.1 left-right pose Clustering

The input is an image containing the target object. The target contains a box and location indication. First, the object instance is clustered by the aspect ratio of the surrounding box. The clusters are further divided into left and right views.

First, the image in the box is split and scaled to a fixed width and height. Then, calculate the features of such samples and the flipped samples obtained from their flip along the vertical axis. Finally, the left and right positions are clustered on the feature description set of the sample and its flipped samples.

The clustering method is a variant online K-means clustering with the following constraints: the sample and its flip cannot be in the same cluster. First, select a sample and its flip, use its features as the seeds of the two clusters, then take a new sample, and add it to one of the clusters according to the principle of closest to the Euclidean distance of the cluster center, it is flipped into another cluster.

After all samples are processed, use the local search method to improve the clustering performance. Repeat a sample and flip it, and switch the cluster to see if it can reduce the sum of the square distance from the sample to the cluster center (SSD ).

By selecting different initial seed samples to repeat the clustering process several times to avoid a poor local minimum value, and finally selecting the smallest SSD cluster.

 

2.2 part Initialization

The root filter is provided (is it a template? Or is it SVM classifier ?) Then, we select K parts and split the D x D part area of the root filter at twice the resolution of the root filter. Default K = 8, D = 6 (6 pixels? It seems a little small ).

Two phases are used to select the component position. The greedy method is used in stage 1st to purify the initial part position.

Interpolation of the root filter to a resolution of twice and calculation of the "energy" map of the root filter after interpolation. The Energy Map stores the root filter, and each cell has a positive weight (What if there is a negative weight ?) The square of the (vector) norm. K parts occupy the largest energy position in turn, and the occupied cell energy is set to 0.

After greedy initialization of the Part Location, use local search to randomly move the Part Location (refers to a simple moving location? How to determine the direction and step size ?), Move one in a random order to maximize the total energy covered by all parts. If better coverage is not possible, reselect the initial position of the component in different order and then use local search (in different order? There is no clear distinction between components ?). Repeat several times to select the distribution of parts with the most energy covered.

 

2.3 Image Boundary occlusion

Pascal has many objects, some of which are out of the boundary of the image. To process these visible objects, a boundary area is added to the feature description of each image.

In [2], the feature vectors in the image boundary area (outside the image) are directly set to 0 vectors. In this way, if a filter is completely located in the boundary area, its score is fixed to 0, which may be less suitable than the response of the filter located in the image. Here, the feature vector is added with one dimension (? Augment feature vectors with an additional feature), if the feature is within the image range, the value of this dimension is 0, otherwise it is 1. When the filter cell is in the boundary area, the boundary occlusion feature enables us to learn a bias parameter, which will be part of the filter response.

The 0/1 occlusion feature is the same as that proposed in [3], but there are two differences in our implementation. In [3], each occlusion filter uses one occlusion feature (not each filter cell) to calculate the number of cell filters in the boundary area. The second is the requirements for training data. [3] The training process requires manual extension of the Pascal surround box. Each cropped by the image edge is used to determine the distance from the image edge. Our method has not changed the Pascal annotation. In the completion phase of the hidden variables during the training process, we first use the image boundary to crop the detection window, and then calculate the overlapping areas of the assumed detection window and the actual surrounding box.

 

3 Regularization

[2] train model parameters by optimizing the implicit SVM Objective Function

Experiments show that only the component vector with the largest norm can get better detection results.

 

4 Results

The tables below summarize the current results in thepascal 2006,200 7, and 2009 datasets following the comp3 protocol.

 

 

Aero

Bike

Bird

Boat

Bottle

Bus

Car

Cat

Chair

Cow

Table

Dog

Horse

Mbike

Person

Plant

Sheep

Sofa

Train

TV

Mean

Without Context

39.5

48.2

11.4

12.3

28.6

42.3

40.4

25.0

17.4

20.5

15.3

14.5

42.1

44.4

41.9

12.7

24.3

16.5

43.3

32.2

28.6

With context

43.6

50.8

15.1

14.1

30.2

45.6

41.8

27.3

18.9

22.1

15.8

18.2

45.7

47.3

43.8

14.3

26.4

18.2

46.8

33.7

31.0

Table 1: Pascal VOC 2009 comp3

 

 

Aero

Bike

Bird

Boat

Bottle

Bus

Car

Cat

Chair

Cow

Table

Dog

Horse

Mbike

Person

Plant

Sheep

Sofa

Train

TV

Mean

Without Context

28.9

59.5

10.0

15.2

25.5

49.6

57.9

19.3

22.4

25.2

23.3

11.1

56.8

48.7

41.9

12.2

17.8

33.6

45.1

41.6

32.3

With context

31.2

61.5

11.9

17.4

27.0

49.1

59.6

23.1

23.0

26.3

24.9

12.9

60.1

51.0

43.2

13.4

18.8

36.2

49.1

43.0

34.1

Table 2: Pascal VOC 2007 comp3

 

 

Bike

Bus

Car

Cat

Cow

Dog

Horse

Mbike

Person

Sheep

Mean

Without Context

67.1

65.8

70.7

26.8

47.7

15.8

48.3

66.0

41.0

45.6

49.5

With context

69.2

67.6

71.5

29.0

51.4

19.4

54.0

70.0

44.3

47.4

52.4

Table 3: Pascal VOC 2006 comp3

 

We also trained and tested a model on the INRIA persondataset. We scored the model using

The Pascal evaluation methodology in the complete testdataset, including images without people.

INRIA person average precision: 88.2

 

[1] M. everheim, L. van Gool, C. k. i. williams, J. winn, and. zisserman. the Pascal visual object classes Challenge 2009 (voc2009) results.

[2] p. felzenszwalb, R. girshick, D. mcallester, and D. ramanan. object Detection with discrim-inatively trained Part Based Models. ieeetransactions on Pattern Analysis and machine intelligence, 2009.

[3] a. vedaldi and A. zisserman. Structured outputregression for detection with partial occulsion. In advancesin neural information processing systems, 2009.

 

Appendix:

UoCTTI_LSVM-MDPM:

Our submission is based on [1]. each class is represented by amixture of deformable part models (6 components with 6 parts per class ). wealso have a binary mask associated to each component of each class to generatepixel-level segmentations
From detections. the models were trained frombounding boxes. the segmentation masks were trained from segmentations. [1] felzenszwalb, girshick, mcallester, ramanan, "Object Detection withdiscriminatively trained Part Based Models", PAMI (preprint)

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.