1. Multi-Task Learning Guide
Multi-tasking learning is a branch of machine learning, as defined by the 1997 review paper Multi-task Learning: multitask Learning (MTL) is an inductive transfer mechanism whose Principle goal is to improve generalization performance. MTL improves generalization by leveraging the domain-specific information contained in the training signals of related TAS Ks. It does the training tasks in parallel while using a shared representation. Multi-task learning is an inductive migration mechanism, and the basic goal is to improve generalization performance. Multi-task learning can improve generalization ability by training the domain-specific information in the signal, and use the shared representation to learn multiple tasks by using the parallel training method.
As the name implies, multitasking learning is a machine learning method that learns multiple tasks at the same time, and 1 shows that multitasking learns both human and dog classifiers as well as male and female sex classifiers.
Further, Figure 2 shows a comparison of single-task learning and multitasking learning. In single-task learning, each task takes a separate data source and learns each individual task model separately. In multi-task learning, multiple data sources use shared representations to learn multiple sub-task models at the same time.
The basic assumption of multi-tasking learning is that there are correlations between multiple tasks, so it is possible to promote each other by using the dependencies between tasks. For example, in attribute classification, lipstick and earrings have a certain relevance, when training alone is unable to use this information, multi-tasking learning can be combined with task relevance to improve the accuracy of multiple attribute classification, details can refer to the article Maryland University hand and other people's paper attributes for Improved Attributes:a multi-task Network for Attribute classification.
2. Multi-Task Deep learning
In recent years, the field of computer vision has made rapid progress under the impetus of deep learning technology. In essence, deep learning is a multi-layered neural network, which has a hierarchical nonlinear representation of the input, and the evidence from the network visualization shows that the hierarchical representation of the depth network is constantly evolving from the bottom to the top. Deep network powerful expression ability, make the multi-task deep learning has the space to display. Figure 3 shows a multi-tasking deep network structure. Input x represents the inputs for different tasks, the green section represents the layers shared between different tasks, and the purple represents each task-specific layer, and task x represents the loss function layer for the different tasks. In the multi-task depth network, the sharing of low level semantic information helps to reduce the computational amount, while the shared presentation layer can make several common tasks better combine relevance information, and the task-specific layer can model task-specific information separately, and realize the unification of shared information and task-specific information.
In a deep network, the semantic information of multitasking can also be output from different levels, such as the two auxiliary loss layers in googlenet. Another example is the clothing image retrieval system, the color of this kind of information can be from the shallow layer when the output judgment, and clothing style style this kind of information, more close to high-level semantics, need to output from a higher level, where the output refers to each task corresponding to the loss layer of the previous layer.
3. Multi-Task Deep learning application case
At present, multi-task deep learning has been widely used in many fields such as face recognition, fine-grained vehicle classification, facial key positioning and attribute classification, and the following are some representative papers.
3.1 Facial recognition Network DeepID2
The Chinese University of Hong Kong Tang Group published in NIPS14 's paper deep learning face representation by Joint Identification-verification, This paper presents a multi-task face recognition network DeepID2, which is combined with the loss of face recognition and human face classification, and the network structure is as follows:
DeepID2 has two loss functions, respectively, for the loss function of human face classification, corresponding to the Softmaxloss in Caffe:
The other one is the face recognition loss function, which corresponds to the contrastive Loss in Caffe:
3.2 Fine-grained vehicle classification network
Here is a more interesting way to combine Softmaxloss and Tripletloss in a network for multitasking training embedding Label structures for fine-grained Feature Representation, current article published in ArXiv. Using this network for fine-grained vehicle classification, the authors note that in order to calculate Tiplet Loss, the features are L2 norm-by-operation and the network structure is as follows:
3.3 Object Detection Network faster R-CNN
In the object detection network faster R-CNN also has the multi-task study application. The network structure of the Faster r-cnn, as shown in 6, consists of two tasks, Window regression and window classification, where the convolution layer of the RPN module is shared between the two tasks. The latest version of Faster R-CNN supports the overall end-to-end training, can detect multi-class objects at the same time, is currently the most representative target detection framework, but also is a typical application of multi-tasking deep learning.
3.4 Facial key point positioning and attribute classification network TCDCN
There is a close connection between the critical point estimation and head posture and the face properties (whether wearing glasses, smiling and gender), the Chinese University of Hong Kong Tang Group published in ECCV14 's work facial Landmark Detection by the deep multi-task Learning using the Multi-task learning method, the key point location and attribute prediction of facial face are combined, and the network structure is shown in 7.
4, based on Caffe to achieve multi-task learning sample
This section implements multidimensional label input required for a multi-tasking deep learning algorithm based on the widely used deep learning open source Framework (Caffe). By default, the data layer in Caffe only supports single-dimension tags, in order to support multidimensional labels, first modify the Convert_imageset.cpp in Caffe to support multiple tags:
This allows us to have a basic part of the multi-tasking deep learning data entry. In order to up-compatible with the Caffe framework, this article abandoned some open source implementation to increase the data Layer label dimension options and modify the data layer code, directly using two layers to read, read into the data and multidimensional tags, and then introduce the corresponding network structure file Prototxt modification, Note the Red comment section.
In particular, the slice layer splits the multidimensional labels, outputting separate labels for each task.
Another worthy of discussion is the weight of each task set, in the practice of this article five tasks are set to equal weight loss_weight:0.2. In general, it is recommended that the weight value of all tasks be added to 1, if this value is not set, it may lead to network convergence instability, because the gradient of different tasks in multi-task learning accumulates, resulting in a gradient is too large, and may even cause a parameter overflow error caused by network training failure.
The complete code for this article can be downloaded on the author's personal GitHub homepage:
Codesnap/convert_multilabel.cpp at Master Holidayxue/codesnap GitHub
The network structure of the multi-task loss function layer is as follows:
5. Summary
This paper reviews the basic concepts of multi-task learning, and discusses the basic ideas and application cases of multi-task deep learning. Finally, this paper discusses the realization of multi-task deep learning based on the caffe of open-source deep learning platform, and gives the open source code.
Thanks
This article after the post has undergone three rounds of revision, one of the public number of editorial office, a round of double-blind review big change and a round of single-blind review of minor repairs, two reviewers of the original text carried out a comprehensive and careful reading, help the author to revise the article of a number of theoretical statements, to improve the readability of the revised opinion. In this article, the author thanked all the reviewers, and expressed appreciation for the patient and meticulous peer review service of the Deep Learning Lecture Hall public editorial office.
Author: Shewin , (Https://github.com/HolidayXue), mainly engaged in video image algorithm research, worked in Zhejiang Jie Shang Vision Technology Co., Ltd. as a deep learning algorithm researcher. Jie is committed to video big data and video surveillance intelligence, is now recruiting industry algorithms and engineering and technical personnel, recruitment homepage http://www.icarevision.cn/job.php, contact e-mail: [Email protected]on.cn
An arrow N carving: Multi-task deep learning combat