Perceptual Loss Function

Perceptual Loss Function _ Thesis

Last Update:2018-08-23 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Introduced

Recently, the new image style conversion algorithm, although the effect is good, but for each picture to be generated, you need to initialize, and then keep the parameters of CNN, Reverse propagation update image, get the final results. Performance problems are worrying.

However, the success of the image style conversion algorithm has produced a very important idea in the field of image generation, that is, the feature which can be extracted from the convolution neural network as a part of the objective function, By comparing the images to be generated via CNN's feature value with the target picture via CNN's feature value, the image to be generated is semantically more similar to the target picture (as opposed to the pixel level loss function).

Image style conversion algorithm to generate the image generated in the form of processing, such as style conversion, from a noise map (equivalent to the whiteboard) to get a result chart, with the content of a picture A and picture B style. Perceptual losses, however, regards the generation problem as a transformation problem. That is, the resulting image is changed from the content map.

Image style conversion is to be generated for the image to take derivative, CNN's reverse propagation because of the many parameters, is very slow, the same use of convolution neural network feature generated by the loss, training a neural network, the content of the image input, you can directly output converted style images. The high-resolution image can be obtained by inputting the low-resolution image. Because only a single network forward calculation, the speed is very fast, can achieve real-time results.

The following network diagram is the essence of the paper. The graph divides the network into transform network and loss network, in use, transform network is used to transform the image, its parameters are changed, and loss network, then keep the parameter unchanged, transform the result chart, the style diagram and the content diagram all through loss NET gets the feature activation value of each layer and calculates it loss.

On the style conversion, the input x=yc is the content picture. And in the image of High-definition, X is a low-resolution picture, content pictures are high-resolution pictures, style pictures have not been used.

Network details

The design of network detail follows the design idea in Dcgan: instead of using the pooling layer, you use strided and fractionally strided convolution to do downsampling and upsampling,
Used five residual blocks
All residual blocks except the output layer are followed by the nonlinear activation functions of spatial batch normalization and Relu.
The output layer uses a scaled tanh to ensure that the output value is within [0, 255].
The first and last convolution layer uses the 9x9 nucleus, and other convolution layers use the 3x3 nucleus.

For exact network parameter values, refer to document 2.

Input and output
For style conversions, the input and output sizes are 256x256x3.
For picture clarity, the output is 288x288x3, and the input is 288/fx288/fx3,f is compression ratio, because transform net is full convolution, so can support any size.

Downsampling and Upsampling
For the clarity of the picture, when the upsampling factor is F, use the following log2f stride for the 1/2 convolution layer residual blocks.
Fractionally-strided convolution allows the network to learn a upsampling function itself.
For style conversion, use 2 stride=2 convolution layer to do downsample, after each convolution layer followed by a number of residual blocks. Then follow the two stride=1/2 to make the upsample. Although the inputs and outputs are the same, there are two advantages:
Improved performance and reduced parameters
Large field of vision, style conversion will result in deformation of objects, thus, the resulting image in each pixel corresponding to the original image of the larger the better.

Residual connections

The residual connection can help the network to learn the Identify function, and the generation model also requires the result image and the generated image to share some structure, so the residual connection corresponds to the generation model exactly.

Similar to the image style conversion (transfer) algorithm, two loss functions are defined in the paper. The loss network uses VGG NET, which is trained on imagenet, and uses φ to represent the loss network.

Feature Reconstruction Loss

J represents the layer J of the network.
Cjhjwj represents the size of the feature_map of level J.

The effects of using different layers of reconstruction are as follows:

Style Reconstruction Loss

For the loss function of style reconstruction, we first compute the gram matrix,

The size of the resulting feature_map is cjhjwj it can be seen as Cj characteristics, the inner product of these characteristics 22 is computed as above.

Two pictures, in each layer of the loss network, find the gram matrix, then calculate the Euclidean distance between the corresponding layers, and finally add the Euclidean distance of different layers to get the final style loss.

Different layers of style reconstruction effect are as follows:

Simple Loss Function Pixel Loss, pixel-level Euclidean distance.
Total variation regularization is a loss used in the work of feature inversion and super resolution, as well as reference papers for reference papers [6,20,48,49]

Loss contrast

On the image-style conversion task, the Loss value is on the perceptual Loss (ours) and image style conversion ("Image style transfer") ([10]) and on the content picture for different resolution images.

As you can see, using the perceptual loss is equivalent to the original algorithm iteration 50 to 100 times.

And in terms of time:

Can be raised hundreds of times times, on the GPU 0.0015s can achieve considerable effect on the CPU more practical.

Effect chart

Although style conversions are practiced on 256 of images, they can also be applied to other size, such as 512

Picture Super Clear

4 times-fold sharpness Upgrade:

8 times-fold sharpness Upgrade:

Summarize

Contribution of course is the application of image style conversion: speed of three levels of Ascension.
Fully convolutional network can be applied to a wide variety of sizes.

Reference documents

1. Perceptual losses for real-time Style Transfer and super-resolution.

2. Perceptual losses for real-time Style Transfer and Super-resolution:supplementary Material

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Perceptual Loss Function _ Thesis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Perceptual Loss Function _ Thesis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support