Multi-font handwriting recognition using sdnn (space Displacement Neural Network)

Source: Internet
Author: User

Handwriting single-font recognition, after looking at the mnist example of convolutional neural networks, is easy to implement, so how to achieve multi-font recognition at the same time? Such as

LeCun the Great God is SDNN space displacement neural network, what the hell is this?

After a query, it turned out to be a sliding window + image Gold Tower +nms,2015 a paper on Yahoo! Multi-View face Detection using deep convolutional neural Networks And that's the way it's used.

Reference page: Https://www.quora.com/What-is-a-space-displacement-neural-network-SDNN

Here are the answers from two people familiar with the situation:

Alessandro Ferrari, I had had a lot of fun playing with Convnets.

A neural network that's slided as a detector across all the possible locations in the image. You had a network with a input layer of size NxN pixels, and then, you had an image with size MxM pixels, with m>n. The objects, want to detect is somewhere in the image, but don't know where. Thus, you sweep your neural network all over the image. At the first position, in the Top-left corner, you are certain classification scores for the objects so want to Det ECT, and you update your score map at that position. Then, you apply your NN to a position shifted of 1 or few pixels horizontally, and you update the score map for that posit Ion as well. This process continue until all, the image is processed, and all the score map completed.

The score map represents a detection map of your objects. A mechanism of Non-maxima suppression has to being implemented in order to avoid multiple matches of the same object.

It avoids use segmentation. However, also in the case there are not free lunch. For making it scale invariant, you need to create a scale space of your input image. This requires to perform a number of classification on the order of ten thousands for few scale in 1MP image. Even if you can reuse a great part of the computation for convolutional layers for nearby classifications Ompute the fully connected layers all the time, making the process painfully slow.

That's why people started to the object proposal techniques. Maybe one day enough computational power could let us not think about these problems.

Translate as follows:

A neural network, like a detector, slides in all possible positions of the image. Suppose you have a neural network with an input size of nxn pixels, and then you have an image of size mxm pixels, where m>n you want to detect somewhere in the image, but you don't know where. So you use a neural network scan to spread the image. In the first position, in the upper left corner, you have certain classification scores of objects that you want to detect, and you update your score map in that position. Then, you transfer your NN level to 1 or several pixel positions, and you update the location of the score map as well. This process continues until all the image processing and all the scoring graphs are completed.
A fractional graph represents a detection diagram of an object. To avoid multiple matches of the same object, the mechanism of non-maximum suppression.
It avoids you using split. However, there is no free lunch in this case. To make the zoom constant, you need to create a zoom space for the input image. This requires a series of classifications to be performed on image 1MP in several sizes of 10 millions. Even if you can make use of a large portion of the nearby categorical convolution calculations, you must recalculate all the time of the fully connected layer, making the process slow.
This is why people begin to study the proposed technology for the object (the term "region proposal", "regional advice"). Perhaps one day enough computational power might let us not consider these issues.

Barath Lakshmanan, works at TVS

CNNs extract features from the input and classify them. However, the input has to be size-normalized. In case of a single composite objects, each individual object within them has variable size and it is difficult to segmen T them. One-to-recognize such objects is using a sliding window in the input layer as mentioned by Alessandro ferrari.

It is-to-be noted this when convolution was performed, on the inputs which was overlapping regions in an image, same set of Features gets extracted repeatedly. In order to avoid this redundant action, convolution was performed on the entire input image till the last conv layer. Finally the classifier is used as sliding windows on the obtained feature map to produce the heat map.

Performance of such network should improve drastically as the redundancy is removed. This design is called as Space displacement Neural Network (SDNN).

Translate as follows:

The feature extraction and classification of CNN input. However, the input must be dimension normalized. In the case of a single composite object, each individual object has a variable size, which is difficult to split. One way to recognize these objects is to use the input layer by Alessandro Ferrari to refer to a sliding window.
It is important to note that when making convolution, the same set of functions is extracted and duplicated in the overlapping areas of the image. To avoid this repetitive action, convolution is performed on the entire input image to the final conversion layer. Finally, the classifier is used as the thermal map generated by the sliding window of the resulting feature map.
The performance of such networks should significantly improve redundancy being removed. This design is called the spatial displacement of the neural Network (SDNN).

Multi-font handwriting recognition using sdnn (space Displacement Neural Network)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.