Vicarious published a Science paper: Probabilistic generation model beyond neural networks

Source: Internet
Author: User
At present, the rise of artificial intelligence is mainly based on the development of deep learning, but this method does not allow the computer to learn a small number of samples like humans can generalize knowledge into many kinds of problems, which also means that the system application scope is limited. Recently, vicarious, a well-known AI startup company, published a new probabilistic generation model in science. The new model has the ability of recognition, segmentation and reasoning, and it surpasses the deep neural network in the task of scene character recognition. The researchers say this approach may lead us to universal AI.


Paper: A Generative Vision model, trains with high data efficiency and breaks text-based CAPTCHAs




Paper Link: http://science.sciencemag.org/content/early/2017/10/25/science.aag2612


Absrtact: Learning from a few samples and generalizing to radically different situations is the ability of human visual intelligence, which has not yet been learned by advanced machine learning models. Through the revelation of the system neuroscience, we introduce the visual probabilistic generation model, in which the recognition, segmentation and inference (reasoning) are handled in a unified manner based on the inference of Message transmission (message-passing). The model shows excellent generalization and occlusion reasoning (occlusion-reasoning) capability, and is superior to deep neural network in difficult scene text recognition datum task, and has more than 300 times times data efficient advantage. In addition, the model basically breaks the modern text-based verification code generation scheme, that is, the target is segmented under the heuristic method without specific verification code. Our model may be important on the way to universal AI because it emphasizes data efficiency and semantic synthesis.



Figure 1: Human agility in the form of letters. (A) Humans are adept at interpreting unfamiliar verification codes. (B) The same letters can have very many forms of expression, the above is "a". (C) The perception of shapes can help to resolve them to similar targets.



The structure of Figure 2:RCN (Recursive cortical Network).


The above figure (A) hierarchy generates the outline of the object, and the Conditional random field (CRF) generates a surface appearance. (B) Two sub-networks of the same contour level maintain independent branch connections by replicating the child node characteristics of specific parent nodes and connecting them to the next branch (laterals) of the parent node. The node of the green rectangle in the figure is the copy of the feature "e". (C) A three-stage RCN representing a square profile, with a second-level feature characterizing four angles, and each with a connection representation of four segments. (D) A level four network characterizing the letter "a".



Figure 4: The process of communication and feature learning.


In figure (a) above, I is forward propagation (including side propagation), and the assumption of generating multiple letters is shown in the input image. Preproc is a group of Gabor-like filters that convert pixels to edge-likelihood. II is a split mask created for the back propagation and side propagation (lateral propagation), which can be used to select forward propagation assumptions, with the above mask being "a". III is the wrong assumption that "v" fits exactly the intersection of "a" and "k", and the wrong assumptions need to be resolved by resolution. (iv) Multiple assumptions can be activated to produce a joint interpretation to avoid letter occlusion situations. (B) Learning features on the second-level feature. A colored circle indicates feature activation, and a dashed circle represents the proposed feature. (C) learning edge from Contour area (laterals).



Figure 5: Parse the verification code with RCN.


The above figure (A) is the first two predictions given by the representative ReCAPTCHA analytic method, and their segmentation and labeling are done by two different labeling persons. (B) RCN and CNN on the restricted CAPTCHA data set. After modifying the character spacing, CNN is less robust than RCN. (C) The accuracy rate for different CAPTCHA styles. (D) Analytic and segmented results for representative botdetect (expressed in different colors).



Figure 6: MNIST classification results for training with a small number of samples.


The above figure (A) is the accuracy of the MNIST classification for RCN, CNN, and CPM. (B) to compromise the accuracy of the classification on the MNIST test set, the legend shows the total number of training samples. (C) Accuracy of MNIST classification configured for different RCN.



Figure 7: RCN generation, occlusion inference, and scene text parsing.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.