Spatial Transformer Networks

Source: Internet
Author: User

Reproduced from here

References: **jaderberg M, Simonyan K, Zisserman A. Spatial transformer networks[c]//advances in Neural information processing Sy Stems. 2015:2017-2025.

Summary

convolutional Neural Networks (CNN) have been shown to be able to train a powerful classification model, but similar to traditional pattern recognition methods, it is also affected by the spatial diversity of data. This paper proposes a spatial transformation network (spatial Transform Networks, STN), which does not require the calibration of key points to adaptively spatially transform and align data based on classification or other tasks (including panning, zooming, Rotation and other geometric transformations, etc.). This network can be added to the existing convolutional network to improve the accuracy of the classification when the input data is in a large space difference.

——————
Since my previous work involved human face alignment, I was thrilled to see this paper. Always feel able to do something with it.

Algorithm Introduction

    1. Algorithm Total Flow

STN can be divided into three parts: 1) localisation network. 2) Grid generator. 3) Sampler. (Chinese translation is not accurate, we have the tacit). Localisation network is used to calculate the parameters of the spatial transformation Θ,grid generator is to get input map U∈RHXWXC to the output map location V∈RH′XW′XC correspondence tθ, sampler according to input Map U and the corresponding relationship tθ, generating the final output map. Flow chart:

Write a picture description here

1.1 Localisation Network

Its function is to generate the parameter θ of the spatial transformation through a sub-network (fully connected or convolutional nets, plus a regression layer). Θ can be used in a variety of forms, such as 2D affine transformations, and θ is the output of a 6-D (2x3) vector.

1.2 Parameterised Sampling Grid

Suppose u (not limited to the input image, also can be the output of the other Layer feature map) each pixel coordinates (XSI,YSI), V each pixel coordinates (XTI,YTI), the spatial transformation function tθ is affine transformation function, then (XSI,YSI) and (Xti,yti) of the pair The relationship can be written as:

(Xsiysi) =tθ (Gi) =aθ (Xtiyti)
Of course, aθ can also have other forms, such as 3D affine transformation, transmission transformation and so on.

1.3 Differentiable Image Sampling

After calculating the tθ, you can get V from the following formula U (omit the derivation formula and put it in the final form only):

Vci=∑nh∑mwucnmmax (0,1?| xsi?m) |max (0,1?| ysi?n|)
After obtaining V, of course, the above formula for the derivation of U, XS, Ys, in order to follow the loss network of the subsequent propagation:

? Vci? Ucnm=∑nh∑mwmax (0,1?| xsi?m) |max (0,1?| ysi?n|)

? Vci?xsi=∑nh∑mwucnmmax (0,1?| xsi?m) |max (0,1?| ysi?n|)????? 0,1,?1if |m?xsi|≥1if m≥xsiif M<xsi
? Vci?ysi and? Vci?xsi similar. The derivation of θ is:

? vci?θ=??????? Vci?xsi?? Xsi?θ? Vci?ysi?? ysi?θ??????
And, xsi?θ, ysi?θ is available based on the specific transformation function.

Through the combination of the above 3 parts, a complete STN is formed.

    1. Algorithm analysis

STN calculates faster, with little increase in the training time of the original network model. Since it is able to learn the spatial transformation parameters associated with the task during the training process, it can further minimize the loss function of the network. STN can be used not only in the input image layer, but also after the convolution layer or other layers.

    1. Experimental results

This article has done experiments in handwriting recognition, Street View digital recognition, bird classification and co-location, and here I only list the most representative handwritten text experimental parts.

The experimental data are mnist, respectively, in the experiment of character recognition on the data of different processing (including rotation (R), rotation, scaling, translation (RTS), Transmission Transformation (P)) and elastic deformation (E). Baseline used two network structures FCN, CNN, joined the STN network for ST-FCN, ST-CNN. Among them, STN adopts the following transformation methods: Affine transformation (AFF), Transmission Transformation (Proj), and thin plate spline transformation (TPS). The following table lists the results of the comparison between STN and baseline on Mnist, with the data in the table identifying the error rate:
Write a picture description here

It can be seen that, for different forms of data, adding STN networks is better than baseline results. The following is the result of STN transforming a digital image, where a is the original data, B is the transformation parameter, and the C column is the result of the final transformation:
Write a picture description here

Summarize

STN can not mark the key point, according to the task to learn the image or feature of the spatial transformation parameters, the input image or learning features in the spatial alignment, thereby reducing the object due to the space of rotation, translation, scale, distortion and other geometric transformations on the classification, positioning and other tasks of the impact. Join the existing CNN or FCN network to improve the learning ability of the network.

Spatial Transformer Networks

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.