The focus of the paper is on the approximation part of the back.
As mentioned in Rank Pooling's paper, the parameter vector D can be obtained by training RANKSVM as the representation of the video frame sequence. In the dynamic paper, it is found that such a parameter vector d, in fact, is the same size as the image, that is, it is a picture (if the map is the same size as the image rather than the extracted eigenvectors), then you can input the image into the CNN to calculate. If you can see some examples of parameter vector D pooling
Fast calculation of parameter vector D
Define a function for the process of calculating d. An approximate method is initialization, and the optimal value of D is solved by means of gradient descent.
, which can eventually be obtained,
The upper style is expanded to
which Here, so the result is.
Dynamic Maps Network
You can see the rank pooling action to pooling multiple image information to an image. Structure, you can see that rank pooling's operation is either directly on the input image or on a feature image that has been extracted by a multi-layered CNN, so you can define the pooling operation as follows
The pooling layer can be expressed as a linear combination, since VT is a linear function, so rewrite
You can see that the function itself is also dependent on it, which is difficult for the derivation of the BP algorithm.
Using an approximate method
The coefficients are independent of the image, as can be seen from the method of approximate calculation of the parameter vector d. Directly using the approximate calculation of D to replace the linear combination of the calculation, the Jiewei of the partial derivative can be seen in the back propagation of the BP algorithm.
is a unit matrix. Obviously, it's a constant.
Summarize
Personally, the approximate method is very ingenious, the experimental results are very good, but the approximate method seems to be not very reasonable appearance ...
"CV paper read" Dynamic Image networks for action recognition