Perceptual Loss Function _ Thesis

Source: Internet
Author: User

Introduced

Recently, the new image style conversion algorithm, although the effect is good, but for each picture to be generated, you need to initialize, and then keep the parameters of CNN, Reverse propagation update image, get the final results. Performance problems are worrying.

However, the success of the image style conversion algorithm has produced a very important idea in the field of image generation, that is, the feature which can be extracted from the convolution neural network as a part of the objective function, By comparing the images to be generated via CNN's feature value with the target picture via CNN's feature value, the image to be generated is semantically more similar to the target picture (as opposed to the pixel level loss function).

Image style conversion algorithm to generate the image generated in the form of processing, such as style conversion, from a noise map (equivalent to the whiteboard) to get a result chart, with the content of a picture A and picture B style. Perceptual losses, however, regards the generation problem as a transformation problem. That is, the resulting image is changed from the content map.

Image style conversion is to be generated for the image to take derivative, CNN's reverse propagation because of the many parameters, is very slow, the same use of convolution neural network feature generated by the loss, training a neural network, the content of the image input, you can directly output converted style images. The high-resolution image can be obtained by inputting the low-resolution image. Because only a single network forward calculation, the speed is very fast, can achieve real-time results.


The following network diagram is the essence of the paper. The graph divides the network into transform network and loss network, in use, transform network is used to transform the image, its parameters are changed, and loss network, then keep the parameter unchanged, transform the result chart, the style diagram and the content diagram all through loss NET gets the feature activation value of each layer and calculates it loss.

On the style conversion, the input x=y<sub>c</sub> is the content picture. And in the image of High-definition, X is a low-resolution picture, content pictures are high-resolution pictures, style pictures have not been used.


Network details

The design of network detail follows the design idea in Dcgan: instead of using the pooling layer, you use strided and fractionally strided convolution to do downsampling and upsampling,
Used five residual blocks
All residual blocks except the output layer are followed by the nonlinear activation functions of spatial batch normalization and Relu.
The output layer uses a scaled tanh to ensure that the output value is within [0, 255].
The first and last convolution layer uses the 9x9 nucleus, and other convolution layers use the 3x3 nucleus.

For exact network parameter values, refer to document 2.

Input and output
For style conversions, the input and output sizes are 256x256x3.
For picture clarity, the output is 288x288x3, and the input is 288/fx288/fx3,f is compression ratio, because transform net is full convolution, so can support any size.

Downsampling and Upsampling
For the clarity of the picture, when the upsampling factor is F, use the following log<sub>2</sub>f stride for the 1/2 convolution layer residual blocks.
Fractionally-strided convolution allows the network to learn a upsampling function itself.
For style conversion, use 2 stride=2 convolution layer to do downsample, after each convolution layer followed by a number of residual blocks. Then follow the two stride=1/2 to make the upsample. Although the inputs and outputs are the same, there are two advantages:
Improved performance and reduced parameters
Large field of vision, style conversion will result in deformation of objects, thus, the resulting image in each pixel corresponding to the original image of the larger the better.

Residual connections

The residual connection can help the network to learn the Identify function, and the generation model also requires the result image and the generated image to share some structure, so the residual connection corresponds to the generation model exactly.

Similar to the image style conversion (transfer) algorithm, two loss functions are defined in the paper. The loss network uses VGG NET, which is trained on imagenet, and uses φ to represent the loss network.

Feature Reconstruction Loss

J represents the layer J of the network.
C<sub>j</sub>h<sub>j</sub>w<sub>j</sub> represents the size of the feature_map of level J.

The effects of using different layers of reconstruction are as follows:

Style Reconstruction Loss

For the loss function of style reconstruction, we first compute the gram matrix,

The size of the resulting feature_map is c<sub>j</sub>h<sub>j</sub>w<sub>j</sub> it can be seen as C<sub >j</sub> characteristics, the inner product of these characteristics 22 is computed as above.

Two pictures, in each layer of the loss network, find the gram matrix, then calculate the Euclidean distance between the corresponding layers, and finally add the Euclidean distance of different layers to get the final style loss.

Different layers of style reconstruction effect are as follows:

Simple Loss Function Pixel Loss, pixel-level Euclidean distance.
Total variation regularization is a loss used in the work of feature inversion and super resolution, as well as reference papers for reference papers [6,20,48,49]

Loss contrast

On the image-style conversion task, the Loss value is on the perceptual Loss (ours) and image style conversion ("Image style transfer") ([10]) and on the content picture for different resolution images.

As you can see, using the perceptual loss is equivalent to the original algorithm iteration 50 to 100 times.

And in terms of time:


Can be raised hundreds of times times, on the GPU 0.0015s can achieve considerable effect on the CPU more practical.

Effect chart

Although style conversions are practiced on 256 of images, they can also be applied to other size, such as 512


Picture Super Clear

4 times-fold sharpness Upgrade:


8 times-fold sharpness Upgrade:


Summarize

Contribution of course is the application of image style conversion: speed of three levels of Ascension.
Fully convolutional network can be applied to a wide variety of sizes.

Reference documents

1. Perceptual losses for real-time Style Transfer and super-resolution.

2. Perceptual losses for real-time Style Transfer and Super-resolution:supplementary Material












Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.