Basic introduction
ICCV 2017
Fast Image processing with fully-convolutional Networks
Notes
The author wants to build a neural network model to approximate operations in some images, such as style migration, image pencil painting, fog, coloring, adding details and so on. The main consideration is three aspects, approximate precision, running time, memory occupies how much.
- Now a common means of accelerating image processing is downsample-evaluate-upsample approach. The main problems with this approach are:
- Operation of the main low-resolution image, this is not conducive to interactive use.
- The main operation is for low-resolution images, which have an impact on precision.
The author intends to use a model to manipulate the image.
All operators is approximated using a identical architecture with no hyperparameter tuning.
There have been many accelerated algorithms for image processing, such as median filters, but their problems are not generic. There is a general reference to the next sampling method mentioned above.
The entire network is the context aggregation networks (CAN), the core is:
\[l_i^s = \phi\left (\psi^s (b_i^s+\sum_jl_j^{s-1}*_{r_s}k_{i,j}^s) \right) \]
where\ (l_i^s\) is the \ (s\ ) layer \ (l^s\ ), the \ ( i\) feature layer,\ (*_{r_s}\) represents an empty convolution operation,\ (K _{i,j}^s\) represents the 3x3 convolution kernel,\ (b_i^s\) is biased, $ \psi^s\ (is an adaptive normalization function, \) \phi$ is a pixel-level nonlinear unit Lrelu:\ (\phi (x) = max (\alpha x,x) \) . where \ (\alpha\) takes 0.2.
In the use of batch normalization, that is, to add bn layer to the network, the author found that the style of migration, pencil drawing help, performance in other operations is not very good, the more proposed from the adaptation of Bn, that is, adaptive normalization function.
\[\phi^s (x) = \lambda_s x + \mu_s BN (x) \],
Among them,\ (\lambda_s, \mu_s \in \mathbb{r}\) is a parameter that is learned in reverse propagation.
Training is the time, is the input picture pair, has supervised the training, has used many loss function training, finally found that the mean square error is the best.
\[\ell (\mathcal{k},\mathcal{b}) = \sum_i \frac{1}{n_i} \parallel \hat{f} (I_i;\mathcal{k},\mathcal{b})-f (I_i) \ Parallel \]
The complex loss,b did not improve the accuracy of the experiment.
In order to improve the ability of the model to use the resolution, in the course of training, randomly select the resolution of the image between (320p to 1440p). These pictures are obtained by randomly cropping. The training takes Adam, iterating 500k times and taking a day.
- The 10 operations for an approximate simulation of an experiment are as follows:
- Rudin-osher-fatemi: is an image restoration model.
- TV-L1 image restoration: is a model for restoring images.
- L0 the image smoothing of the smoothing:l0 paradigm.
- Relative total variation: an operation that extracts the image structure by stripping the details.
- Image enhancement by Multiscale tone manipulation: enhanced by multi-scale images.
- Multiscale detail manipulation based on local Laplacian filtering: Mimics image details with low-level operators.
- Photographic style transfer from a reference image: Images style migration.
- Dark-channel dehazing: Dark Channel to fog. The Dark channel priori (Dark channel Prior) is a result of observations based on statistical significance. In this paper, a large number of outdoor fog-free images are summarized, and a de-fog algorithm based on dark channel transcendental is proposed, in which there is a very low luminance value at least one color channel exists in most fog-free images without sky area. This lowest luminance value is almost equal to 0.
- Nonlocal dehazing: Non-local fog.
- Pencil Drawing: Pencil drawing style.
Specific details of the network:
Just a demo diagram, actually deeper. A circle represents a nonlinear function Lrelu. In addition to the first and last layer is three channels, the rest are multiple channels, the penultimate layer using 1x1 convolution, non-linear transformation, to obtain the last layer.
The structure of the CAN 32 (d = Ten and W = 32) is as follows:
- About the transverse contrast experiment.
- Comparison of network design a plain, is the above network of empty convolution all changed, replaced by ordinary convolution. The authors say this is to ensure the similarity of the structure.
Encoder-decoder Network. Say is reference u-net make a hourglass-shaped networks (hourglass type network, this adjective is good). The main modification for u-net is that in order to reduce the amount of computation and memory, half of the convolution cores are reduced, and the final result output is scaled to make the output image as large as the input. (about this modification, I think very unscientific ah, since to compare with it, why should castration after comparison?) )
The reasons for this article are:
We found that's sufficient to get high accuracy and it matches we have configuration of the other baselines.
Then again, the ability to achieve almost the same precision, even faster, the main drawback is that there are too many parameters, two orders of magnitude higher.
There is also a fcn-8s. The problem with this model is that the parameters are many and the accuracy is low.
- Training and testing of generalization capabilities: (each model, two training sessions, then two data set tests):
- Mit-adobe test set and raise test set
- Mit-adobe Training Set, RAISE training set
The authors also made experiments on depth and width (which should be said to be the number of channels).
?
Summarize
The author has a lot of work and may also be related to the task in particular, which is equivalent to many tasks. The most important feature of the model is to refer to the ICLR 2016, which uses void convolution extensively. I have thought about this model, but I didn't think of it as a substitute for so many tasks. There are some places in the experiment that are not very good, such as unet. is still somewhat enlightening. 2333.
Note_fast Image processing with fully-convolutional Networks