Image Style Transfer Using convolutional Neural Network (theoretical article)

Source: Internet
Author: User

Long time no blog, but also ashamed, recently things more, now time to write a bar

Today this article is about neual art, the style transfer algorithm;
Article Source:
A Neural algorithm of artistic Style, CVPR2015
Image Style Transfer Using convolutional neural Networks, CVPR2016

Some time ago there is a fire of the app called Prisma, you can upload a picture of their style, the internal principle is said today neural art or style transfer bar

The contents of the above two articles are similar, the following one can be seen as an extension of the above, adding some other experiments, we are talking about this article

In fact, there is nothing to say, is how to learn a picture of style or style it; for example, now there is a Zhang Fankao, a picture of your own shot, how to add the style of Van Gogh sky map to the landscape map???
It's probably like the following:

Add the style of a diagram to the content of another graph, which is the style transfer structure

In fact, the idea of the algorithm is very simple, is to use CNN for feature extraction, and then use these extracted features to carry out reconstruct. We know that different CNN conv layer extract to feature is not the same, low-level bias to the point line and other characteristics, Higher levels are more biased towards texture information.
So the intuition of the algorithm is shown in the following illustration:

So the author uses VGG19 's network structure to do feature extractor, which ultimately conv2_2 as content layer, conv1_1,conv2_1,conv3_1,conv4_1 and conv5_ 1 as style layer concrete implementation

The algorithm uses a random white noise graph (noise image) as input, defines the ' style loss ' of content loss and style diagrams of the contents graph, and then updates the weight with the standard BP algorithm, adjusting the input image (white noise graph)

Note Here is a graph that adjusts the input to achieve minimum content for a particular input image X, its loss (containing the content loss and style losss) loss

A layer with NL distinct filters has NL feature maps each of size ml, where Ml is the height of the Featu Re map. So the responses in a layer l can be stored in a matrix where's the activation of the ith filter at position J-Layer L .

Assuming that P and X represent the original image and the generated image, and L represent layer and responce in a layer, the content loss between them is defined as:

Style loss

Feature correlations are given by the Gram matrix GL∈RNLXNL, where is the inner product between the vectorised Maps I and J in layer L:

The correlation between features can represent the multi-scale representation of the original image in the layer, which means the texture information

Let A and x be the original image and the image of that are generated, and Al and Gl their respective style representation in Layer L. The contribution of Layer l to the total loss is then:

The total loss is the linear combination of the content loss and style loss:

Architecture

The overall structure is shown in the following illustration:

Supplements the weight of style and content

That is, A/b ratio, the effect of the following figure:

It can be seen that the smaller the ratio, the more obvious the effect of different layer on the result.

Using different layer as content feature extraxtor or style feature extractor effect is not the same.

We find that matching the "style representations up" higher layers in the network preserves local images creasingly large scale, leading to a smoother and more continuous visual experience.

Accordingly, Conv (1-5) _1 was chosen as style layer

The following figure shows the different effects of different conv layer as content layer:

different initialization methods

In the experiment we use random white noise image as input, but can also use content image or style image as input directly, the author concludes:

The different initialisations does not seem to have a strong effect on the outcome of the synthesis

But:

Only initialising with noise allows to generate a arbitrary number of new images. Initialising with a fixed image always deterministically leads to the same outcome (up to stochasticity in the gradient de Scent procedure) PostScript

I did my own experiments on a 140*480 graph, iterative 300 times, on Titan x 30s, time is really very long

The original text also gives the conclusion:

The dimensionality of the optimisation problem as the number of units in the convolutional neural network grow Linearly with the number of pixels.

The images presented in this paper were synthesised in a resolution of about 512x512 pixels and the synthesis procedure Could take up into hour on a Nvidia K40 GPU (depending in the exact image size and the stopping criteria for the gradient Descent).

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.