Some simple applications of pytorch in deep learning are described earlier, and this section explains the use of Pytorch in style migrations. Basic Knowledge
Numpy.array ()
Converts a matrix or an object that has a __array____array__ method or sequence into a matrix.
Array.astype ()
Converts a matrix to the corresponding data type.
Tensor.squeeze ()
If you do not specify dim, the dimension of dim=1 in tensor is removed, and if you specify dim, only the dimensions of the specified dim=1 are removed.
Tensor.unsqueeze ()
Inserts a dim=1 on the specified dimension.
Tensor.type ()
Returns the data type of the tensor if no arguments are taken, otherwise the tensor is converted to the specified data type.
Tensor.mean ()
If the dimension is specified, the mean value on the dimension is computed, the tensor is returned, otherwise the global mean is computed, and float is returned.
TENSOR.MM ()
Calculates the matrix multiplication of two tensor and returns the tensor.
Tensor.clamp ()
Clamp all elements in tensor to [Min, Max].
Style Migration Combat
In fact, to achieve something very clear, is the need to merge two images together, this time need to define how to calculate the integration together. The first thing you need is that the content is similar, and then the style is similar. In this way we know what we need to do, we need to calculate the similarity of the fusion picture and the content picture, or the difference, and then reduce the difference as much as possible, and we also need to calculate the difference of the style of the fusion picture and style picture, and then reduce the difference. So that we can quantify our goals.
How do we define the difference of content? In fact, we can be very simple to think of the two pictures of each pixel to compare, that is to ask for the difference, because the simple calculation of the difference between them will have positive or negative, so we can add a square, so that the difference is positive, but also can add absolute value, but the absolute value of mathematics will destroy the function of the micro, This place does not understand also does not matter, remembers the universal is uses the square to be OK.
How do we define the difference of style? This is a difficult point. This is also the innovation point proposed in this article, introduced the Gram matrix calculation style difference. I try not to use the language of mathematics to explain, but to use popular language.
How is the gram matrix defined? First, the size of the Gram matrix is determined by the thickness of the feature graph, equal to CxC, then each Gram matrix element, that is, Gram (i, j) equals how much. The first and second layers of the feature map are taken out, so that a matrix of two MXN is obtained, and then the corresponding elements of the two matrices are multiplied and summed to get Gram (i, J), and all the elements of the same Gram can be obtained in this way. In this way, each element in the Gram can represent a combination of two-layer feature graphs, which can be defined as its style.
Author: Sherlockliao Link: https://www.jianshu.com/p/8f8fc2aa80b3 Source: Pinterest
Copyright belongs to the author. Commercial reprint please contact the author for authorization, non-commercial reprint please specify the source.
Import Torch
import torch.nn as nn from
torch.autograd import Variable
import torchvision
from Torchvision import transforms, models from
PIL import Image
import argparse
import numpy as NP
import OS
Global variables, whether the GPU is used.
Use_gpu = torch.cuda.is_available ()
dtype = Torch.cuda.FloatTensor if use_gpu else torch. Floattensor
Define the load image function and convert the PIL image to tensor.
def load_image (Image_path, Transforms=none, Max_size=none, Shape=none):
image = Image.open (image_path)
image _size = Image.size
If max_size is not None:
#获取图像size, for sequence
image_size = image.size
# The array
size = Np.array (image_size) converted to float. Astype (float)
size = max_size/size * size;
Image = Image.resize (size.astype (int), image.antialias)
If shape is not None:
image = Image.resize (Shape, image . LANCZOS)
#必须提供transform. Totensor, converted to 4D Tensor
if transforms is not None:
image = Transforms (image). Unsqueeze (0)
#是否拷贝到GPU
return Image.type (Dtype)
Defines the VGG19 model, which extracts the 0,5,10,19, 28-ply convolution feature in the forward direction.
Class Vggnet (NN. Module):
def __init__ (self):
super (Vggnet, self). __init__ ()
self.select = [' 0 ', ' 5 ', ' 10 ', ' 19 ', ' 28 ']
self.vgg19 = models.vgg19 (pretrained = True). Features
def forward (self, x):
features = []
#name类型为str , x is variable for
name, and layer in Self.vgg19._modules.items ():
x = Layer (x)
if name in Self.select:
Features.append (x)
return features
Define the main function, get the content and style features corresponding to the 5 convolution layers, and calculate Content_loss and Style_loss respectively.
def main (config): #定义图像变换操作, must be defined.
Totensor (). Transform = Transforms.compose ([Transforms. Totensor (), transforms. Normalize ((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)]) #加速content和style图像, S Tyle image resize into the same size content = Load_image (config.content, transform, max_size = config.max_size) style = Load_image (c Onfig.style, transform, shape = [Content.size (2), Content.size (3)]) #将concent复制一份作为target and need to calculate the gradient as the final output target = Variable (Content.clone (), Requires_grad = True) optimizer = Torch.optim.Adam ([target], lr = config.lr, betas=[0.5, 0 .999]) Vgg = vggnet () If Use_gpu:vgg = Vgg.cuda () for step in range (Config.total_step): #分别 Calculate 5 feature graphs target_features = Vgg (target) content_features = Vgg (Variable (content)) Style_features = V GG (Variable (style)) Content_loss = 0.0 Style_loss = 0.0 for F1, F2, F3 in Zip (Target_features, C Ontent_feaTures, Style_features): #计算content_loss Content_loss + = Torch.mean ((f1-f2) **2) n, c , h, W = f1.size () #将特征reshape成二维矩阵相乘, ask gram matrix F1 = F1.view (c, H * W) F3 = F3.view (c,
H * W) f1 = torch.mm (F1, f1.t ()) F3 = Torch.mm (F3, f3.t ()) #计算style_loss Style_loss + = Torch.mean ((f1-f3) **2)/(c * H * w) #计算总的loss loss = Content_loss + Style_loss * confi G.style_weight #反向求导与优化 Optimizer.zero_grad () Loss.backward () Optimizer.step () I
F (step+1)% Config.log_step = = 0:print (' Step [%d/%d], Content Loss:%.4f, Style Loss:%.4f ' % (step+1, Config.total_step, content_loss.data[0], style_loss.data[0])) if (step+1)% Config.sample_step = = 0: # Save the generated image denorm = Transforms. Normalize (( -2.12, -2.04, -1.80), (4.37, 4.46, 4.44)) img = tArget.clone (). CPU (). Squeeze () img = Denorm (img.data). CLAMP_ (0, 1) torchvision.utils.save_image (img , ' output-%d.png '% (step+1))
Gets the parameters from the command line.
if __name__ = = "__main__":
parser = Argparse. Argumentparser ()
parser.add_argument ('--content ', type=str, default= '/home/content.jpg ')
parser.add_ Argument ('--style ', type=str, default= '/home/style.jpg ')
parser.add_argument ('--max_size ', Type=int, default=
parser.add_argument ('--total_step ', Type=int, default=5000)
parser.add_argument ('--log_step ', type= int, default=10)
parser.add_argument ('--sample_step ', Type=int, default=1000)
parser.add_argument ('-- Style_weight ', Type=float, default=100)
parser.add_argument ('--LR ', Type=float, default=0.003)
config = Parser.parse_args ()
print (config)
Main (config)
The experimental results are as follows: