Let there is color!: Automatic Image colorization with simultaneous classification-sig-2016

Source: Internet
Author: User

Recently on the arxiv, see an article about colorization paper think very interesting:

Let there is color!: Joint end-to-end Learning of Global and Local Image Priors for Automatic image colorization with Simu Ltaneous classification-sig-2016


First look at the effect, from FIG1 can be seen, the paper can be very good to restore from gray image to color image, the effect is good.



Motivation:

For texture areas such as sky, grass, leaves, streets, walls, sea, etc.,

How to make use of global and local information, so that the network can fully learn and differentiate?

How to use context information to differentiate images in different scenarios, such as Indoor,outdoor?

The core research of all this paper is: How to make full use of global,local,semantic context information to better train the network model:

1 user-intervention-free means that no user intervention is required, such as graphcut requires the user's operation, and it is not required;

2 End-to-end Network, able to simultaneously learn the image of global and local features;

3 using the class information of the image, such as Indoor,outdoor, to guide the semantic context information of the network learning image,

so that the network can distinguish the images under different scenes while improving the performance;

4 based on global information, sytle can be transfer between different images.

The 5 model applies not only to images now captured by the device, but also to images captured decades ago and even a century ago.


The following is directly the paper network is how to do the above points.

Look at the picture, talk less.


The network framework, as shown in Fig2, shows the network parameters as shown in TAB1.


The entire framework is comprised of 4 sub-network:

1 low-level feature network, abbreviated Lnet

2 mid-level feature network, abbreviated Mnet

3 global-level Feature Network, abbreviated GNet

4 colorization Network, referred to as CNet


Below the 4 sub-network separately.

lnet: The underlying convolution layer is composed of two networks, but the weights of the network are shared.

Why need two networks, I guess, the back of GNet and CNET should be different network, need to differentiate, and the focus of the paper is cnet,gnet is only auxiliary.

Why does another need a fixed size (such as 224*224) while the other is not needed?

The author thinks, GNet is the classification network, the type alexnet like, the classification effect is good, and in order to study semantic context, need to use to the FC layer (full connection layer)

CNET's goal is to handle any input-sized image, which is an all-convolution network (FCN).

In fact, I think the gnet can be changed to FCN, apply sppnet ROI pooling, that is, the corresponding ROI prop for the entire input image


MNet: This is a bit far-fetched, after all, only 2 conv, the author of the paper is just to express the network clearly, just say so.


GNet: consisting of several conv and FC, the function is:

Followed by a CLS layer to classify the fixed-size input images

Why do you do this?

This is because the author of the paper uses the places sence Dataset with category tags (total 2448872, 205 categories)

Of course, if your dataset does not have this category tag, the above framework simply removes the CLS layer and applies to your data set.

followed by the fusion layer, allowing CNET to fuse global features

I think the fusion layer is one of the key points of the framework, combining the features of different networks to learn different features

Fig3

CNet: An anti-convolutional network, similar to the Auto-encoder network, reverts from feature maps to target image.

The target image here is not a color image (such as RGB), but also the CIE l*a*b* a*b* in the color space.

This is the paper of the place . (In fact, the paper also did the RGB and YUV contrast experiment)

Learning a*b*, instead of learning l* (gray-scale map itself is l*), this not only reduces the learning difficulty of the network, and does not need to change the original l*.

Is the constant sampling, so that the final output is the same size as the target image.

(The current anti-convolution network is still very hot)

Of course, target image does not need to be consistent with the original size, can be lower than the original resolution, this will not affect the network performance.


So far, the entire framework is clear, so how do you define its objective function and how to train it?

Fig4

Obviously the objective function is composed of 2 loss functions, one of CNET's predictions and the Euclidean distance between the target images, a cross-entropy loss of the gnet classification, and an Alpha variable to control the weights of different networks.

The author believes that the CLS layer here not only learns global features,semantic context information, but also to some extent slows down the gradient of gradient return, making such a large network easier to learn.

In order for the network to accelerate convergence or to converge, paper mentions that it is possible to optimize the network using the current comparison of batch normalization, as well as Adadelta optimizer.


Obviously the network is very large, it is difficult to train, at least requires a large data set, your graphics card is also a leverage.

It takes 3 weeks for the paper to train the network, and the year of the Sao is shaking.


Once the network training is complete, testing is quite convenient.

So how to do different images of the style of transfer?

Since Lnet is a two-network structure, a corresponding original input, a corresponding fixed-size input,

When these two inputs correspond to different input images, the style of the fixed-size input image is grafted onto the original input.

This is due to the fact that GNet learns the fixed size input image of global feature and semantic context.

Of course, these two images must be at least semantically identical or similar, i.e. their category tags are the same or similar, so style transfer is more meaningful.


So far, the network does not require user interaction, can be said to be user-intervention-free.

In addition, it is widely considered that the generalization ability of depth model is quite good, and the gray scale and the current gray scale map are not much different from the previous decades.

The network can certainly perform well on grayscale maps decades ago.


Because the grayscale map to the color map is irreversible, so the gray map corresponding to the color map may be a variety of situations, so the network in the paper can not handle this situation, as shown in:



Well, see here, congratulations, because the author has no idea what to say.



Let there is color!: Automatic Image colorization with simultaneous classification-sig-2016

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.