CNN does not have isothermal variability in rotation, and data augmentation is proposed to solve this problem, but data augmentation requires a large capacity of the model, more iterations can be used to close the rotation and other changes in the training dataset. For the test set, it is not necessarily guaranteed to be the same.
You may ask, what are the advantages of network rotation and other changes? What are the advantages of Data augmentation and what are the advantages of network rotation? For example, if a dog in different directions is switched to one direction, it is still a dog, CNN hopes that the feature after multi-layer convolution is only in one direction. A 16-year icml paper came into being.
Link
Group equivariant convolutional networks, video, https://archive.org/details/Redwood_Center_2016_06_27_Taco_Cohen, code: https://github.com/tscohen/GrouPy
Here is a version of pytorch implementation on GitHub.
Code link: https://github.com/adambielski/pytorch-gconv-experiments, strong
In fact, in the past 18 years, a paper has explained how to use a group-level EDGE network to separate Pathological Images. The title is rotation equivariant CNNs for digital pathology.
There is a picture about the group and other network to speak very clearly, copy it to tell you about, good English can also see there are two video https://www.youtube.com/watch? Time_continue = 1586 & V = tlzryhbwep0
Toyun personally said there is also a: https://archive.org/details/Redwood_Center_2016_06_27_Taco_Cohen
The image is as follows:
The figure above indicates the rotated image, and the output Feature Map has the same variability.
Through the debug pytorch code, we can understand the specific operation process is that the Z2-P4 convolution is to rotate the kernel four times, respectively with the input image convolution, the P4-P4 convolution is for the output of four Feature Map, respectively four kernel around the clockwise rotation 90 °, and the kernel itself also Rotate 90 °, these four states perform convolution respectively with the non-moving feature map output, and each State adds the output results after convolution. This is a featuremap, the four States correspond to four feature maps, that is, the final output. Why do we learn the same nature of rotation? Because the four states of the same kernel should be able to get a P4 constraint for different feature maps, so they can learn the changing nature of a rotation.
Pytorch Implementation of networks such as group