cascaded Pyramid Network for Multi-person Pose estimation
Paper Address: https://arxiv.org/abs/1711.07319
Coco Challenge 2017 Body Attitude Estimation Championship thesis
In this paper, we propose a method of using the Top-down approach to estimate the key points of multiple people. The detection structure of MASK-RCNN was used to detect the human body (fpn+roialign), then the globalnet+refinenet structure was used to return to the human body key point.
The image above is a network architecture, the GlobalNet essence is a similar FPN architecture, but in the upper sampling section, the author adds a 1x1 convolution before each element-wise Add. Then the L2 loss of the map and the key point response map of different scale feature (as FPN, P2-P5 layer loss)
Next is refinenet, put the front globalnet skeleton network P2-P5 layer out (this part of the paper is ambiguous, said is C2-C5, but look at the source after the discovery is similar to FPN P2-P5), After different times of bottleneck after concatenate. The front concatenate after a bottleneck regression to the key point response graph. However, unlike the previous L2 loss, this calculation loss use the online hard mining method, the training only dynamically return loss a large number of channel. It can be understood that the loss of the front is the key point of the real visible response, and the following loss use the global information to return to the Occlusion key point response.
The network takes RESNET50 as skeleton network to train for 1.5 days on 8 card titan. Using resnet-inception as Skeleton network, it has achieved excellent effect on Coco Test-dev data set, far exceeding cmu-pose (i.e. openpose) and mask-rcnn. Also won the championship in the Coco Challenge 2017 body posture estimate project.
Thesis Source: HTTPS://GITHUB.COM/CHENYILUN95/TF-CPN