As a free from the vulgar Code of the farm, the Spring Festival holiday Idle, decided to do some interesting things to kill time, happened to see this paper: A neural style of convolutional neural networks, translated convolutional neural network style migration. This is not the "Twilight Girl" Kristin's research direction?! Even the Hollywood actress began to engage in artificial intelligence hair paper, is really hot visible!
This article describes how to use a deep convolutional neural network to convert a normal photo into an artstyle painting (such as The Starry Night of Van Gogh), which can be seen as a DL (learning) on NPR (nonphotorealistic rendering non Photography Rendering) A revolution in the field (it is not difficult to imagine that after DL this crossdomain revolution will be more and more).
Paper Address: A neural algorithm of artistic Style
Project Address: Https://github.com/muyiguangda/neuralstyle
Algorithm parsing
(Children's shoes that are not interested in the algorithm can skip this part and see the results of the final experiment)
"Total Process"
As above, A has an individual name isconv1_1, B is, and so onconv2_1, c,d,e correspondenceconv3_1,conv4_1,conv5_1; input picture has style picturestyle imageand content picturecontent image, output is to synthesize picture, then use synthetic picture as guide training, But instead of training weights and biases in the same way as normal neural networksw, they train thebpixel points on the composite image to reduce the loss function. The paper uses a random noise pixel image as the initial synthesis, but using the original image will be a bit faster.
First he defined two loss, representing the final generated figure X and style loss on the styles of a, and the contents of X and Content graph p loss,α,β Is the parameter that adjusts the ratio of the two. The final loss function is the sums of both. The final x is obtained by optimize the total loss.
The CNN network used is VGG19, using 16 of its convolutional layers and 5 pooling layers to generate feature. Actually refers to the complex body of Conv+relu.
Of course, the use of other pretrained model is also completely possible, such as Googlet v2,resnet,vgg16 is possible (the author of this is VGG19 for example).
"Content loss function"

 l represents the characteristic of the Llayer, whichpis the original picture, whichxis the generated picture.
 Suppose a layer gets a response that isFL∈RNL∗ML, whereNL is the number of Llayer filter,< Span id= "mathjaxspan29" class= "Mrow" > ml is the size of the filter. < Span id= "mathjaxspan34" class= "Mrow" > fli J represents the output of the I filter at position J of the L layer.

 The meaning of the formula is that for each layer, the original image generated feature map and generated the image of the feature map one by one corresponding to do square difference
The gradient of the content loss function is reduced as follows:
"Style loss function"
 Fis a feature map that produces a picture. The meaning of the above: Gram line I, the number of column J is equal to the formation of the firstlfeature of the first layer of the diagram and the firstijfeature graph is pulled into one dimension after the summation.
 The above is the style loss function,Nlrefers to the graph of the number of features,Mlis the picture width multiplication height. Refersato the style of the picture, refers to thexgeneration of pictures.Gis the gram matrix of the generated graph,Ais the gram matrix of the style graph,wlis the weight.
"Total Loss"
Experimental results
Below are the contents, style, and iteration 10 times, 100 times, 500 times, 1000 times, 10,000 times, 100,000 times calculation results and analysis:
Original
The original image if the size is too large, resulting in the input layer batch size is too large, will greatly increase the program calculation (thereby prolonging the calculation time), easy to cause program instability, and the final effect is not significantly improved, it is recommended to reduce the size of the picture (in the premise of the pixel is not distorted), recommended value: 800 PPI x PPI.
"Style map"
Style charts do not need to match the size of the content map. Can be properly cropped, retaining the most prominent part of the style.
"Iteration 10 times"
Since the original input is a whitenoise picture, the outline of the content graph is still not formed when there are fewer iterations.
"Iteration 100 Times"
The outline of Tiananmen Square
"Iteration 500 Times"
has been basically close to the final effect, both can see the shape of Tiananmen Square, but also the Van Gogh "Starry Night" line style and color collocation.
"Iteration 1000 Times"
500 times to 1000 times, the changes in the composition of the screen are not drastic, basically tend to smooth.
"Iterate 500 times, repeat three times"
Repeated calculations three times, using the same picture, the same convolutional neural network model, the same number of iterations (500 times), but the difference between the obvious three results. This is a very interesting place!
(a) (b) (c)
Recently read a book, called "Random stroll Fool", mainly discusses the concept of randomness, randomness hidden in the unpredictable risk, also contains infinite possibilities. Without random mutation, biological evolution may still be in a single cell phase.
If the computer is just a tool, let it solve a set of equations, if the known quantity determines that the calculation conditions are determined, no matter how many times the calculation, the result is the same.
In this example, there are differences in the results, indicating that there must be random components in the system.
The random parts of machine learning are usually as follows: 1. The disorderly sequence operation of the training sample; 2. Random gradient descent; 3. The model randomly assigns the initial value.
In this example, there is one more: the initial input of the white noise image is randomly generated.
"Iteration 10,000 Times"
You can see the upper right part of the screen, the content is gradually lost, rendering gray.
Speculation reason: Because of several pooling layers in convolutional neural networks, the image is actually processed by means of the mean, which leads to the loss of edge details.
Pooling layer:
So what is the iteration 100,000 times like?
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
"Iteration 100,000 Times"
With the trend towards polarization, the gray area is more dim, the color area is brighter, the boundaries between the two are more distinct and the transition is lost.
"Original" Van Gogh oil painting with deep convolutional neural network What is the effect of 100,000 iterations? A neural style of convolutional neural networks