Image Super-resolution-DBPN

Last Update:2018-06-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article was translated from 2018CVPR deepback-projection Networks for super-resolution

Code: GitHub

Features: Unlike feedback net, the introduction of back projection net

Result: State of the art, especially on large scale, e.g. x8 times

Summary:

The recent proposed Feedforward network structure learns the characterization of low-resolution inputs and the nonlinear mappings from SR (Low-resoluton) to HR (high-resolution). However, this approach does not fully deal with the interdependence of SR and HR images. We present deep back-projection Networks (DBPN), using a top-down sampling layer, to provide an error feedback mechanism at each stage. We built interdependent up-down sampling modules, each representing different image degradation and high resolution components. We demonstrated this idea to connect the features in the up and down sampling phase to elevate the SR results. The best results for a large scale factor such as 8 times times are achieved.

1. Introduction

Single Image Sr is a morbid inverse problem whose purpose is to recover HR from the LR image. The typical way to build HR today is to learn a lr-hr mapping, which is implemented by a deep network. These networks compute a series of SR feature graphs, with one or more upper sampling layers at the end to increase resolution and ultimately build HR images. Compared with the simple Feedforward method, human vision uses a feedback link to guide certain tasks. Perhaps due to the lack of such feedback, the current Feedforward-only SR network is difficult to characterize the LR-to-HR relationship, especially for large-scale factors.

On the other hand, the feedback link is used efficiently in an earlier SR algorithm, i.e. the iterative back-projection method. It iteratively calculates the refactoring error and then uses him to adjust the strength of the HR image. Despite improved image quality, the results are still subject to ringing effect and chessboard effect. In addition, the method is sensitive to parameter selection, such as iteration number and fuzzy operator blur operator, resulting in changeable results.

Influenced by the thesis improving resolution by image registration, we built an end-to-end network based on the iterative top-down sampling: Deep back-projection Networks (DBPN). Our approach was successfully demonstrated in the large scale factor, 1. Our contribution is as follows:

(1) Error Feedback Fault Feedback We propose an iterative error feedback mechanism to calculate up-and down-projection errors up-down projection error to achieve better results. Here the projection error is used to constrain the characteristics of the first few layers, and the details are Section3.

(2) mutually connected up-and down-sampling stages the upper-lower sampling phase feedforward structure can be regarded as a mapping, only the representative characteristics of the input to the output space. This approach is unsuccessful for LR-to-HRT mapping, especially for large-scale factors, due to the characteristic limitations of the LR space . Therefore, our network not only uses the upper sampling layer to generate a variety of HR features and maps it to the LR space using the next sampling layer . This connection has show in 2. This alternating upper (blue box) (gold box) sampling process characterizes the interrelationship between LR and HR images.

Figure 2

Figure 2. The contrast of the depth SR network. (a) prede?ned upsampling pre-defined sampling (e.g., srcnn [6], VDSR [three], Drrn [43]) universal use of traditional interpolation methods such as dual-time interpolation Bic Ubic to upscale LR input image before entering the network. (b) Single upsampling (e.g., fsrcnn [7], ESPCN [38]) propagates the LR feature, and then builds the SR image in the final step. (c) Progressive upsampling progressive sampling (LAPSRN) uses the Laplace pyramid network to gradually predict the SR image. (d) We propose an iterative up-and-down sampling of DBPN networks that utilize interconnected, up-down sampling modules at different depths to achieve HR characteristics at different levels.

(3) deep concatenation depth Cascade Our network represents different types of image degradation and HR components. This capability allows the network to reconstruct the HR image using the depth cascade of the HR feature graph. Unlike other networks, our refactoring directly utilizes different types of lr-hr features without propagating in the sample layer, with Red arrows in 2.

(4) improvement with dense connection dense link implementation We use dense links at each of the top-down sampling stages (thesis densenet:densely Connected convolutional Networks) to encourage feature reuse to improve network accuracy.

2. Related work

2.1 Image super-resolution using deep networks

2, the depth SR network is mainly divided into 4 kinds.

(a) prede?ned upsampling mainly uses interpolation as the upper sampling descriptor to produce a medium-resolution (MR) image. First of all, this strategy has srcnn that a simple convolutional network is used to learn a nonlinear mapping relationship between Mr and HR. Then there is the use of residual learning residual learning and recursive layer recursive layers network structure. However, due to the introduction of MR, this method may generate new noise.

(b) Single upsampling provides an effective way to increase spatial resolution. This method is mainly fsrcnn and espcn. These methods are proven to be effective in increasing spatial resolution and can be substituted for pre-defined descriptors. However, due to network capacity limitations, it is not possible to learn the full mapping. NTIRE2007 's championship algorithm, EDSR , belongs to this type. However, it requires a lot of filters for each layer, and it takes a lot of time, about 8 days! These issues urge the birth of a lightweight network that effectively preserves HR components.

(c) Progressive upsampling was recently presented by LAPSRN. It uses non-scaling to incrementally construct multiple SR images in a feedforward network. In short, this network is a stack of a single up-sampled network, relying only on the limited LR characteristics. Based on this, LAPSRN is even superior to the experimental results of large scale factors (eg:8 times) in our shallow network.

(d) iterative up and downsampling are presented in this paper. We focus on increasing the sample rate and distributed tasks of SR features at different depths to calculate the refactoring errors at different stages. This scheme allows the network to preserve the HR component by learning to find a more up-and-down sampling descriptor (up-and down-sampling operators) while generating deeper features.

2.2 Feedback Networks

This method is not to learn a input-to-output space mapping in one step, but multi-step implementation prediction, so that the network can have self-correcting (self-correcting) process. The feedforward process has been implemented in many computational tasks.

Examples of some feedforward networks: in the context of human pose estimation, Carreira et al. [3] proposed an iterative error feedback by iteratively Estimating and applying a correction to the current estimation. Prednet [+] is a unsupervised recurrent network to predictively code the future frames by recursively feeding the Predic tions back into the model. For image segmentation,lietal. [29]learnimplicitshapepriorsandusethemto Improve the prediction. However, to our knowledge, feedback procedures has not been implemented to SR.

2.3. Adversarial training

For example, the generation of anti-network using confrontation training applied to a variety of reconstruction problems. For the SR task, Johnson describes a perceived loss based on advanced pre-training extraction features. The Srgan method is visualized as a single upsampling method, which proposes that the natural image manifold can generate a photo-like image by constraining a loss function based on Euclidean distance (features extracted from VGG19 and srresnet respectively).

Our network can be extended to build networks by fighting losses. Goose we only use the MSE objective function to optimize the network. So we can compare the dbpn with the same MSE -Optimized srresnet, rather than using the DBPN optimized for the combat losses.

2.4 back-projection

Back-projection is an efficient iterative process to optimize the reconstruction error. Originally back-projection as a multi-LR input image design, but only one LR image is input, the update process can sample the LR image by using multiple up-sampling descriptors, and iterate over the computational reconstruction error. It has been proven that back-projection can improve the quality of SR images. It is proposed to use iterative projection process to refine high-frequency text details, but the initialization of the optimal solution is unknown. Most of the previous work required a continuous, non-learning set of predefined parameters. For example, fuzzy operator and iteration count.

To extend this algorithm, we implemented a training end-to-end structure and focused on learning the non-linear relationship between LR and HR using a connected up-down sampling module. The relationship between HR and LR images is built by creating an iterative, up-down-projection unit. The upper projection up-projection generates an HR feature. The lower projection down-projection the projection into the LR space, as shown in Figure 2d. This scheme allows the network to retain HR components by learning a variety of up and down sampling operator, and to generate deep features to learn a large number of SR and HR features.

3. Deep Back-projection Networks

The HR and LR images are respectively. Sizes are (MXN) and: The main component of the DBPN structure is the Reflector unit projection unit, which maps the LR feature to the HR feature (up-projection) as part of the training SR system, or maps the HR feature to the LR feature ( Down-projection).

3.1 Projection Units

The up-projection unit is defined as follows:

Above * represents a spatial convolution operation. The upper-lower sampling operator of the scale factor s, respectively, is the (inverse) convolution layer of the phase T.

This projection unit makes the LR feature graph previously computed as input, maps it to an HR feature map, and then tries to map back to the LR feature map (this shows back-project). Then the residual (difference) between the LR feature map and the refactoring is again mapped to the HR. The final output of the unit is the HR feature graph, and is the two HR feature graph.

The Down-projection cell definition is similar, except that the task is to map the HR feature map to the LR's feature map:

We alternate H, l to organize a series of reflective units projection unit. These units can be understood as self-correcting processes, where the reflection error is fed to the sampling layer and alternately optimized by reverse passing the reflection error.

The reflective layer utilizes large-size filters such as 8*8 and 12*12. In other networks, large-size filters should be avoided because of the reduced convergence rate, which may yield suboptimal solutions. However, the iterative use of our reflective units makes the network suppress constraints and makes the large scale factor (large scaling factor) perform well in shallow networks.

3.2 Dense projection units

Densenet has proved that it alleviates the problem of gradient vanishing, produces better features, and encourages feature reuse. In this, we promote DBPN by introducing dense connections in the Reflection Unit projection units, which is called dense DBPN (D-DBPN).

Unlike the original densenets, we avoid using dropout and batch norm, which are not suitable for SR. Because they change the flexibility of the feature (remove the range exibility of the features). Instead, we use the 1*1 convolution as feature pooling and dimension reduction before entering the reflection unit.

In D-DBPN, the input of each cell is a link to the output of the previous unit. Dense to the input of the upper-lower reflection unit. They are generated by using all of the outputs of each cell before merging (Figure 4). This enhancement allows us to effectively generate the feature map, which is shown in the experimental results.

3.3 Network Architecture

D-DBPN Structure 5. Can be divided into three parts: feature extraction initial feature extraction, reflection projection, reconstruction reconstruction. The order represents the convolution layer, F represents the filter size, and n is the number of filters.

1. Initial Feature Extraction We build the initial LR feature map from the input. Then it is used to achieve the dimension reduction before entering the reflection unit . is the number of filters used in the initial LR feature extraction phase. is the number of filters used in each reflection unit.

2. Back-projection stages The subsequent initial feature extraction is a series of reflective units. Alternating LR and HR feature graphs and builds, each unit can touch the output of all previous units.

3. Reconstruction Finally, the target HR image is refactored like this:, use as refactoring. A link to a feature map produced for each reflection unit.

Because of these block definitions, our network structure is modular. It is easy to define and train different stages and control the depth. For networks with T-stages, we have the initial 2-layer extraction process, the T-up-projection unit and the T-1 down-projection unit, each with 3 layers, followed by refactoring (greater than or equal to 1 layers). for dense networks, however, we add in each reflection unit , except for the first three units.

4. Experimental Results

4.1.Implementation and training Details

In the proposed network, according to the scale factor, the size of the filter in the reflection unit is different. For 2x amplification, we use the 6x6 convolution layer with a step size of 2,padding of 2. For 4x amplification using the 8x8 convolution layer, the step size is 4,padding 2. Finally, for 8x8 amplification, using the 12x12 convolution layer, the step size is 8,padding 2.

We initialize based on He Keming's delving deep to recti?ers:surpassing human-level performance on Imagenet classi?cation. The concrete calculation looks at the thesis. All (anti) convolution layers are parameterized rectifier linear units (Prelus).

We use datasets div2k, Flickr, and imagenet to train all networks. Countless data enhancements. In order to produce LR image, the Downscale HR image is based on the difference of double three-wire on certain scale factor. 32x32 size Barch is 20 LR image, HR image size depends on scale factor. Learning rate 1e-4 for all layers, each iteration 5x105 ITER learning rate/10, a total of training 106 iter. For optimal use of Adam, momentum factor 0.9, weight attenuation 1e-4. All experiments were built by Caffe, Matlab r2017a, NVIDIA TITAN Xgpus.

4.2. Model Analysis

Depth Analysis mainly based on the original DBPN to build a number of networks S (t = 2), M (t = 4), and L (t = 6). Effect or look at the picture bar:

Number of parameters Tradeoff between performance and number of model parameters. The SS network is a lightweight version of S network (T = 2).

Deep concatenation each reflective unit contribute to the refactoring step by building the different characteristics of the HR component.

Dense connection implements D-DBPN-L as a dense connection to the L network to demonstrate how dense connection improves network performance.

4.3 Comparison with The-state-of-the-arts

And the excellent structure of the present a + [ srcnn ], [6], fsrcnn [7], VDSR [], DRCN [], Drrn [ LAPSRN ], and EDSR [31] made a comparison. 5 Sets of test data:Set5 [2], Set14 [], BSDS100 [1], Urban100 [+] and manga109 [33].

5. Conclusion

We present deep back-projection networks for single image hyper-resolution (mono images super-resolution, SISR). Unlike previous Feedforward predictive SR images, we propose that the network focus directly utilizes multiple up-down sampling stages to increase the SR feature, feedback the network's different depth error predictions to modify the sampling results, and then accumulate the self-correcting (self-correct) feature of each upper sampling stage to create the SR image. We use the error feedback from the up-down scaling step to make the network more optimal. The results show a comparison of the effectiveness with the Sate-of-the-art method. In addition, we propose that the network is superior to state-of-the-art in large scale factors such as 8x amplification experiments.

Note: For the first time contact Sr, simple translation, please point out the error.

Image Super-resolution-DBPN

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Image Super-resolution-DBPN

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Image Super-resolution-DBPN

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support