gan-Related: Application of Srgan,gan in super-resolution
photo-realistic single Image super-resolution Using a generative adversarial Network
Christian Ledig et al abstract
The super resolution of Gan is to solve the shortcoming of the lack of high frequency information and fine details in the results of the conventional method including the deep learning method, while the traditional deep CNN can only improve this defect by selecting the objective function. And Gan can solve this problem, get perceptually satisfying result.
I have to say that in the details are really very slip ... The left is the result of SR four times, and the right side is the original high definition image. Intro
First introduced the SR problem, the traditional DL method, because it is optimized MSE, so it is easy to solve and optimize the Psnr (because the Psnr is directly calculated from the MSE), so in the Psnr as an indicator, the performance is better, but in fact MSE for high texture The details constraints are very limited, so the resulting results are more serious to ignore, as shown below, and Gan can avoid this problem. Although Psnr and Ssim are smaller (suggesting that the images we have recovered from Gan are not particularly accurate (as opposed to the actual image, the details of the supplement may be different from the actual details, such as the pattern on the head of the figure in the image below, and the collar on the neck, with a distinct texture structure, And the actual discrepancy, but there is always the details, visual sense to be better. ), but the details are more abundant, which is the advantage of Gan for super resolution.
The model here is a resnet with Skip-connection, and uses the perceptual loss, the feature map of higher-order features is also counted, here is Vgg extract high-level features. related work
The first is the traditional and CNN approach to SR, withheld here.
The second section tells the design of the CNN network. In order to make our mapping more complex to improve accuracy, we need a deeper network, so to improve the training efficiency of deep network is a problem to solve, then you can use the BN layer to counteract the internal co-variate shift. In addition, another powerful design is residual block or skip-connection.
The following is the loss function. The traditional MSE is for all possible values, or uncertainty, to do an average, so that although the MSE can be smaller, that is, Psnr, but in the visual we found that this is an excessive smoothing, overly-smooth, the following figure is a good display of this phenomenon.
Dosovitskiy and Brox ["Use loss functions based" Euclidean distances computed in the feature space of neural networks In combination with adversarial training. It is shown that the proposed loss allows visually superior image generation. In addition to the advantages of Gan loss, the author also introduced the above, that is, in the characteristics of the similarity of space. In addition to the above "12" in the author, there are people using Vgg to extract features, and in the feature space with Euclidean distance to do measurements, the idea is similar. Method
The advantage of Gan is that it can provide photo-realistic images. Network structure:
Generally, gan do image task, usually use Prelu, and because the network is deep, using bn layer convenient training, and do not have max-pooling, Instead, use the strided convolution to lower the resolution and double the feature.
Here's the perceptual loss.
It is mentioned that our loss is composed of content loss and adversarial loss, weighted with a certain weight. Here adversarial loss is Gan loss, and content loss, the contents of loss refers to Vgg loss, that is, in the well-trained Vgg proposed a layer of feature map, the resulting image of this feature Map is compared to a map of real images, which makes them similar in feature space. Adversarial loss is generative here, so I did not write the log loss of D for I HR.
The final test results are MOS Testing,mos, referring to the mean opinion score, the average opinion score, is actually found 26 rater to score, the results obtained.
The final result can be seen, with Srgan and add vgg54 content loss effect is best. The 54 here refers to Phi's index (PHI is the Phi in the previous content loss), representing the feature of the layer I conv+activation after the J layer maxpooling. 54 represents the upper level, and 22 represents the lower layer. So high-level features are more effective.
The final conclusion shows that this method is mainly to get rid of the limitations of common methods of Psnr as measurement. A more photo-realistic image can be generated.
March 28, 2018 17:31:41