How does the human brain perceive images? --Visual computing theory (one of the latest pattern recognition, Marr)

Last Update:2018-07-20 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Since the 70 's, with the development of cognitive psychology itself, there have been some important changes in the orientation of cognitive psychology in the study of pattern recognition. Some cognitive psychologists continue to study the characteristics of computer and human identification patterns based on the hypothesis of physical symbology, while others turn to the idea of neural networks to study the problem of recognition patterns. Some of the models described below are theoretical models that have a significant impact over the last more than 10 years.

The Visual Computing theory (computational theory of Vision) was presented in the 1970s by the Maltese (David Marr). 1982 published his Masterpiece "Visual Computing Theory".

In Malta's view, vision is to make an effective symbolic description of the external world image (IAMGE), and its core problem is to derive the structure of the outside world from the structure of the image. The vision begins with the image, passes through a series of processing and transforms, finally achieves the external ministry real World understanding.

Three Important concepts:

characterization (Representation): A formalized system that can articulate certain objects or types of information, and several rules that explain how the system performs its functions. The result of using a representation to describe an entity is a description of the entity under this characterization.

process: Refers to an operation that promotes the transformation of things. From receiving images to recognizing a complete object arranged in space, it takes a series of characterization stages. The conversion from one characterization to another must be aided by some process.

The 0 crossover (zero crossing) generation indicates that the discontinuity or abrupt change of the darkness is the basis for forming the contour of the object. The detection of 0 crosses is the detection of the two-dimensional surface contour or boundary of the visual system.

The ultimate purpose of man and machine: to understand the meaning of a scene or an image. Malta divides the formation of visual images into three stages. The stages are as follows:

⑴ Hivikisutu (2-d sketch): The first stage of the visual process, obtained Kisutu from the input image. This stage of vision is also known as early vision. The so-called base map mainly refers to the position of the intensity change in the image and its geometric distribution and structure, in which the primitives include spots, endpoints, edge fragments, effective line segments, line groups, curve organization, boundary and so on. These primitives are generated based on the detection of 0 intersections.

The purpose of this phase is to make the important information in the original two-dimensional image more clearly expressed.

⑵ 2.5 : The second stage of the visual process, through symbolic processing, lines, dots, and spots are organized in different ways to obtain a 2.5-dimensional graph. This stage of the visual process is also called medium vision. The so-called 2.5-dimensional graph refers to the observer-centered coordinate system, the visible surface of the normal direction, the approximate depth and their discontinuous contour, etc., wherein the use of the primitives include the visible surface of the normal direction of the points, and the distance between the points from the Observer (depth), the depth of the discontinuity point, Discontinuous points in the direction of the surface normals and so on. Since the 2.5-D diagram contains a depth of information, which is more than the two-dimensional, but is not a true three-dimensional representation, so named 2.5-dimensional diagram.

This stage of vision, according to the theory of Malta, is composed of a series of relatively independent processing modules. These processing modules include: embodiment, motion, the surface shading from the shape, the surface contour lines to restore the shape, the surface texture to restore the shape and so on. Its role is to reveal the surface characteristics of an image. Malta claims that the goal of early vision processing is to create a 2.5-dimensional feature map, which is the final step before interpreting a surface as a specific object or set of objects.

⑶ Three-dimensional model characterization (D-representation): The third stage of the visual process, the three-dimensional representation of an object is obtained from an input image, a Kisutu, and a 2.5-D graph. This stage of the visual process, also known as Late vision. The so-called three-dimensional representation of the object refers to the object-centered coordinate system, with a volume primitive (that is, a primitive representing the volume of the shape) and the area primitive modular hierarchical representation, describing the shape and shape of the spatial organization, its characterization includes volume, size and shape.

When three-dimensional model characterization is established, the end result is a unique description of the objects that we can differentiate.

Comments:

⑴ 's vision theory has raised visual research from a descriptive level to a rigorous level of mathematical science, so it has been revered by neuroscientists, AI experts and cognitive psychologists.

⑵ criticism: The interpretation of Malta's vision is mainly focused on the early stages of visual processing; In addition to the feature map, the various representations he envisages have not been proven by neurophysiology. He limited the role of knowledge to the late stage of visual processing, and also aroused some doubts. It has also been suggested that perception begins with the extraction of a wide range of topological properties, rather than the analysis of individual features. The function of human visual system is topological, it pays attention to the whole property and ignores the local property, so it puts forward the sharp challenge to the computational nature of vision.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

How does the human brain perceive images? --Visual computing theory (one of the latest pattern recognition, Marr)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

How does the human brain perceive images? --Visual computing theory (one of the latest pattern recognition, Marr)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support