Depth convolution network image style transfer (i) Demand analysis
Taylor Guo, May 5, 2017 target
Image Style Transfer
* Light changes in one day, different weather changes, seasons change
* Compatible with the background of the object style changes
This refers to the image style transfer, and does not include the optimization of the pull and image segmentation results of the problem
Effect Chart Contrast
Image Local Skew
Style overflow
Images and styles do not match
Images and styles do not match
Failure due to extreme matches: such as flowers and city scenes, sofas and flames, etc.
The input image differs from the content of the reference image: irrelevant content is not in the expected range
You can use manual scene segmentation to solve this problem. Analysis of the Optimization problem style transfer problem: is to use CNN to generate a new image, the content and input images, style and the style of the desired to become the reference image of the same, can also be interpreted as an optimization problem for mathematical representation;
Optimize the content differences between two images:
The difference of image content is minimized: The mean of the square of the difference of the result after filtering in the convolution neural network;
Optimize the stylistic differences between two images:
The difference of image style is minimized: The average value of the square of the difference of the result of a layer filtered in the convolution neural network; the Graeme Matrix is the inner product of the vector that represents the image feature. The geometric meaning of the inner product of a vector can be understood as the approximate degree of two vectors in the direction. Color space affine transformation 1. The output image is distorted locally (why it is distorted and the truth is unknown). Distortion caused by stylistic transformations. ) 2. If the style transfer leads to distortion, the transformation is constrained to the color space, and the global color transformation uses the invariant transfer function to handle the global color translation and adjust the tonal curve (high and low contrast). You can use the color probability distribution function to use histogram analogy, but these methods cannot handle the complex style. If you want to make a strong overall change, such as the skyscrapers of several rooms lit light, but can not distort the local geometry space. 3. Local style transform: The local affine color transformation constraint is performed on the new image. Prevent distortion; In the process of building a new image, you need to get the content from the input image and get the style from the reference image in order to minimize the differences in content and content, style and style, and to prevent local distortion of the image, In this paper, we introduce the Laplace-pull cost function of the local linear model as the regular term as the correction. 4. The cost function of local linear Laplace pull graph is the product of the matrix of input image and output image to quantization.
The Laplace cutout is a seamless insertion of any desired object into a specified background, and, of course, can also incorporate movie actors into the virtual scene that the computer draws. In practice, the common pull-through technique is a digital image processing technology that separates the parts of the user's interest (the foreground part) from the rest of the image, such as the famous blue screen, which needs to be picked up in front of the blue or green background, has great limitations. The natural image is divided into three categories: based on the sampling method, based on the communication method, sampling and communication methods. The sampling method requires the user to give a more accurate three-point graph, and then the image parameters of the pixels in the unknown region are approximated by using the nearby samples. The advantage of this method is that the calculation speed is fast, the disadvantage is that it needs a more accurate three-point graph, and when the sampling is inaccurate, the result is poor, so the robustness is not strong. A spread based approach typically requires a user to give a simple foreground and background indicator line, and then propagate the information to nearby pixels in some way. The advantage of this method is that it only needs the user to provide rough three-point graph, and this method can get good image-pull effect for most pictures, so it has strong robustness. The disadvantage is that some prior information waste, good communication method design is more difficult, the calculation speed is slow. The combination of sampling and propagation is the hotspot of research, it can effectively combine the advantages of the first two methods, but when the sampling method and the propagation method is bad, it will inherit the disadvantage of the former two methods.
The method of combining sampling and propagation usually transforms the problem of cutout into the minimization of energy function, which consists of two parts, one is the data energy term, the other is a smooth energy term, and the general mathematical model of the foreground masking layer A of the image is as follows:
According to the different construction methods of ES and E, there are many methods of image-picking based on the combination of sampling and propagation.
The energy function is the loss function, in which the foreground color and background color difference is minimized, and the opacity value is obtained by minimizing the problem. The foreground color and background color are then obtained. An image can be represented as a synthesis of multiple layers; each layer is a mask of that layer, the mask is sparse, each pixel in the image is affected by only a few layers, and only a few pixels are mixed with several different layers. There is a linear transformation between the mask and the characteristic vector of the Laplace matrix. This linear transformation matrix is a connected block between different image layers, which is composed of a subset of pixels.
The Laplace matrix can be easily understood as the correlation matrix between pixels.
Laplace Matrix and local smoothing energy function
The image-pull algorithm assumes that each pixel is linearly synthesized by foreground color and background color, and the parameter is opaque value alpha. For hair, smoke, flame, etc. can not be expressed in mono-pixel or translucent objects, image-cutting compared to image segmentation has obvious advantages.
Reference:
Summarization of natural Image drawing technology
Research and application of closed-type drawing
Image semantic segmentation is based on image semantic segmentation, because the gram Matrix encodes neuron response value and restricts semantic content change, resulting in style "overflow".
Therefore, the semantic segmentation algorithm is used to generate image segmentation masks for input images and style reference images. Add the mask to the input image as another channel to enhance the style algorithm in CNN.
Semantic segmentation-enhanced style loss function
C is the number of channels in the semantic split mask; Ml,c is the channel image. Scheme
VGG-19 Network Architecture
Reference Code Analysis
INPUT: [224x224x3] memory:224*224*3=150k weights:0 conv3-64: [224x224x64] memory:224*224*64=3.2m weights : (3*3*3) *64 = 1,728 conv3-64: [224x224x64] memory:224*224*64=3.2m weights: (3*3*64) *64 = 36,864 POOL2: [112x112x64] memory:112*112*64=800k weights:0 conv3-128: [112x112x128] memory:112*112*128=1.6m weights: (3*3*64) *128 = 7 3,728 conv3-128: [112x112x128] memory:112*112*128=1.6m weights: (3*3*128) *128 = 147,456 POOL2: [56x56x128] Mem ory:56*56*128=400k weights:0 conv3-256: [56x56x256] memory:56*56*256=800k weights: (3*3*128) *256 = 294,912 CONV 3-256: [56x56x256] memory:56*56*256=800k weights: (3*3*256) *256 = 589,824 conv3-256: [56x56x256] memory:56*56*256= 800K weights: (3*3*256) *256 = 589,824 POOL2: [28x28x256] memory:28*28*256=200k weights:0-conv3-512: [28x28x51 2] memory:28*28*512=400k weights: (3*3*256) *512 = 1,179,648 conv3-512: [28x28x512] memory:28*28*512=400k weights : (3*3*512) *512 = 2,359,296 conv3-512: [28x28x512] memory:28*28*512=400k weights: (3*3*512) *512 = 2,359,296 POOL2: [14x14x512] Memory: 14*14*512=100k weights:0 conv3-512: [14x14x512] memory:14*14*512=100k weights: (3*3*512) *512 = 2,359,296 CONV3-51 2: [14x14x512] memory:14*14*512=100k weights: (3*3*512) *512 = 2,359,296 conv3-512: [14x14x512] memory:14*14*512=10 0K weights: (3*3*512) *512 = 2,359,296 POOL2: [7x7x512] memory:7*7*512=25k weights:0 FC: [1x1x4096] Memory: 4096 weights:7*7*512*4096 = 102,760,448 FC: [1x1x4096] memory:4096 weights:4096*4096 = 16,777,216 FC: [1x1x1000] memory:1000 weights:4096*1000 = 4,096,000 Total memory:24m * 4 bytes ~= 93mb/image (only forward! ~*2 for BWD) Tota L params:138m Parameters