[Turn] a doubt about the current methodology of natural language processing

Last Update:2018-02-15 Source: Internet

Author: User

Tags html interpreter

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Zhou Xi Order
[Email protected]
2001-11-8

In most cases, I am used to studying objects that are far from our subjective world. The typical example is "Celestial Bodies". In the process of research, the method used is "building a model". The progress of the research is mainly manifested as "the gradual refinement of the model".

For example, a system model of two celestial bodies was initially studied, and the results were basically consistent with the actual data, but with slight differences. Thus, we further consider the perturbation effect of the third star, which is farther away, so that the model we build is more and more close to the actual situation.

For the world of mechanics, there are similar phenomena, the beginning of the establishment of Newton's mechanics, which is in line with the conventional world, but when the movement of the object gradually approaching the speed of light, and many phenomena can not be explained, so entered the theory of relativity correction.

In natural language processing, we seem to be using a similar approach. We have created one grammar model after another, hoping to enclose as many words as possible. However, compared with the fields of mechanics and electricity, the results are always unsatisfactory. In this respect, we always boil down to the reason: natural language is too complex!

We seem to have overlooked a very important fact, that is: like "celestial body", "Integrated circuit" ... Such objects are "completely independent of our subjective world", and the brains we use to study them are completely separate from the ones studied. When studying them, there is no need to model the working process of our own brains. and "Celestial", "Integrated circuit" ... These objective objects are different, natural language itself seems to not up a complete research object. As a valuable and complete research object, the main aspects involved in its operational mechanism should be included.

For example, if you have one of the following communication systems:

The in-car computer sends a message to the receiving device via a radio signal. Signals are often disturbed for a variety of reasons. The error correction code is added to the signal when it is sent, and the receiving device has a facility that corrects errors based on the error correction code. When we study this system, we will always consider all aspects of signal transmitting, transmitting, jamming, receiving and correcting. If we put aside the correction link of receiving this head, we should study the format of the signal and the statistic law, we will get the result of strange and not much value.

However, in the study of natural language, we have taken this strange way of research. Natural language is produced and continuously developed in the crowd in order to exchange ideas. The mechanism of language transfer of thought or information is embodied in the internal structure of language, and also in the process of interpreting the symbolic series contained in the language of the human brain. But we only study the language itself!

Therefore, the mechanism of transmitting information in natural language cannot be studied solely by language itself. In principle, the process of dealing with language in the human brain should also be included in the system under study in order to have results.

Of course, studying the brain's process of interpreting language is difficult. But if we therefore completely abandon this very important aspect, but the more we drill the finer in the formal structure of language, will we never get the result?

For now, of course, we are not able to propose all models of brains. But for the brain in understanding natural language, the most important part of the first to come out to study, this is the question: The missing part of the supplementary statement, to correct the inverted part of the structure. To unfold, it is:

It can be regarded as the basic model of "the process of interpreting language of the Brain" as a series of symbols that "contain many ambiguities", "can have multiple interpretations", and then use "knowledge" to choose the most reasonable way to interpret the "semantic rationality criterion".

Is the first combination of "grammar" and "semantics" right here?

According to my understanding at this stage, the HNC team has been working in this direction. Of course, this task must not be easy. As this path involves the expression and application of the knowledge accumulated by all mankind throughout history, the work along this path seems to be facing the infinitely daunting problems of mathematics. Therefore, it is crucial to make progress in this direction and to be recognized by society: to be fully aware of the boundaries of any project, the problems that can be solved by any technical means are limited, the limited objectives are clear, and the different stages of implementation of the project are intelligently divided.

Finally, we might as well compare the computer programming language. In developing such languages, the computer's ability to deal with language is always linked to the study. Because the computer at this stage basically does not have the ability to doubt, so this kind of language basically does not allow ambiguity, the program written out can not have the slightest error in grammar. Everything has to be explained clearly, there is no grammatical or ambiguous place. I say "basically" because modern computers are not absolutely free of any doubts. For example, many people who write HTML programs often do not strictly follow the rules of grammar. In this case, the actual HTML interpreter is often "correctly understood", which can be automatically supplemented or corrected to a degree. There are also differences in the ability of the HTML interpreter to be developed by different companies.

[Turn] a doubt about the current methodology of natural language processing

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More