Minimalist notes multi-task self-supervised Visual Learning
Paper Address: https://arxiv.org/abs/1708.07860
The core of this paper is to pretrain the model with the task of self supervision, then to migrate the resulting model to the related task for finetuning (this is to compare the skeleton network parameters without updating the head of different tasks), The desired performance is close to the Pretrain model with extra label.
The article refers to a variety of self monitoring tasks, these tasks can be annotated directly without additional annotation: 1. Relative position, that is, randomly cut two pieces of patch to the network, the relative position between regression; 2. Single channel image coloring; 3. Generate pseudo class; 4. Predict which pixels will be moved for video
The article mainly found that: 1. Deep-layer networks are more work;2 than shallow networks in self-supervised tasks. The performance of Multi-task self-supervised is higher than that of single task; 3. The performance of the Imagenet pretrained model and the multi-task self-supervised is different in different benchmark, but is relatively close (imagenet is, of course, the upper limit for experimental performance). ; 4. In the Self-supervise task, the input is harmonized and the lasso constraints are weight, and there is not much performance improvement; 5. It can accelerate network training with the help of self monitoring task.
Look at the network structure, in multitasking is from the skeleton network from each layer to add up by a factor, and then give a task head structure. The harmonization of the input of different tasks is to transfer the image to lab color space, and then lose the channel of A and B, the L channel is replicated three times, so that it can be applied to different task requirements (paint requires a single channel image, other three-channel image).
The next list of some experimental performance comparisons, Random look, the conclusion is written in the previous.
Self-supervision to generate pretrained model, a good idea. Now all take the imagenet pretrained model to initialize, but different task effects, perhaps in some specific tasks can be designed with a specific task-related self-supervised way of training.