Pattern in natural language processing (Pattern 1. probabilistic pattern)

Source: Internet
Author: User


/* Copyright statement: This statement can be reproduced at will. During reprinting, you must indicate the original source and author information of the Article .*/

Pattern in natural language processing (Pattern 1. probabilistic pattern)

Copymiddle: Zhang junlin
Timestamp: August 2010

Is the world definite? At the beginning, people thought yes. This actually reflects a kind of optimistic attitude towards understanding the world and transforming the world. Later, we gradually realized that an uncertain world is a more real world. We live in a reality where probability is everywhere, and this uncertain world is more difficult to grasp and transform, it reflects that people's understanding of the world is more realistic.
Uncertainty is the development direction of the vast majority of disciplines, that is, from certainty to uncertainty. This reflects a deeper understanding of the world, and physics itself is in line with this development trend, from Newton's Mechanical World View to the uncertainty principle of quantum physics, we basically show that this world is a world of possibilities, not a world of certainty.
I personally think that the development of a discipline from qualitative analysis to quantitative analysis represents whether it is scientific, and the quantitative from fully deterministic quantitative to uncertainty is one of the criteria for whether the discipline enters the maturity stage.
For natural language processing, its history is also long. From its macro research ideas, it has basically gone through two major stages, that is, the initial normalization method has become the dominant statistical method. This great turn is very similar to the development path of physics. If you think carefully, the big shift of this research model, which became popular in the 1980s s, has its own practical support, that is, the rapid growth of computer storage and computing resources, it can be seen that the appearance of PC is quite consistent with the pop-up time. The rapid popularization and growth of such real computing resources paved the way for the feasibility of statistical methods.

The obvious truth is that there are two opportunities for a master to easily come out of the research field, one being at the birth of a discipline, because it is still a wasteland, to build a magnificent building, someone needs to build a solid foundation. At this time, it is easy to see the master of the foundation type. The other is the transition period of the research model. Because the old house was demolished and rebuilt at this time, a new building architect is required. These two periods are an excellent time to become masters easily.
Let's take physics as an example. We can see this truth from the foundation of Galileo, Newton, to the Einstein of relativity, The heenburg, Pol, and Dirac of quantum mechanics. In modern times, Masters no longer exist. Is it a question about the capabilities of researchers? I don't think so. The talents of every age should be similar in terms of the proportion of the crowd. Why is it that talents are rather powerful? The foundation of the building of this discipline has been completed, and the framework on the ground has been basically completed. Of course, there are still opportunities to reinvent this framework, but there is a timing problem, such as the popularity of the statistical methods mentioned above, it would be hard to leave the mass popularity of PC. Therefore, apart from being intelligent and diligent, the opportunity and environment for researchers to break into their own uncontrollable fields are also very important to determine your position in the field.

Back to the natural language processing field, because this field is not so mainstream, so the domain masters are not so well-known. Let's take a look at the winners of the lifetime achievement award of Computational Linguistics (http://aclweb.org/aclwiki/index.php? Title = acl_lifetime_achievement_award_recipients)
From the perspective of these award-winning masters, they are basically the founders in the field of computational linguistics, and they should all be over 60 years old. Their basic contributions are in grammar, semantics, and formal linguistics, machine Translation and information retrieval theory. Only the award-winning Jelinek this year belongs to the masters who have made great contributions to the study's probability transition, and others are the founders of the field. It is expected that the ratio of masters in the probability transition period will increase in the future. However, the other obvious fact is that for researchers in the natural language processing field under 40 years old, it is almost impossible to rely on statistical methods to become field masters. The transformation of statistical methods is basically mature, and the framework of this building is ready. If you want to become a Domain Master, you have to consider re-engineering. But it is difficult to judge whether the time is ripe because the more mature a domain is, the less likely it is to be re-engineering. Of course, the road is too long for machines to really understand people's language (I believe there will be such a day). Before that, there will inevitably be quite a number of masters, this is a timing issue.

The above is a bit difficult, and we will go back to the probabilistic transformation in natural language processing. There must be an internal reason for the transformation of research models. Naturally, the new research model can solve many problems that the old research models cannot solve. The probabilistic transformation in natural language processing has many advantages over the rule method. Of course, there are also many disadvantages. I will not talk about it. Here is a more intuitive advantage.

In natural language processing, ambiguity is very prone to problems, both in terms of word segmentation, syntax, and semantics, the so-called ambiguity indicates that there are several types of outputs for an input, so selecting which one is used as the correct output becomes a problem. The introduction of probability provides an intuitive solution, that is, the maximum probability value is selected as the correct result. We can see this from the example below.

The development process of pcfg in syntactic analysis can be used as an example to describe the probability tendency in natural language processing.
Case 1: From CFG to pcfg
The task goal of syntactic analysis is very clear, that is, to determine the relationship between words in a natural language sentence. The general practice is to map a sentence into a syntax tree through a certain algorithm. The syntax tree can be used to determine the relationship between the elements of the sentence.
For example, the syntax analysis tree corresponding to the sentence John called Mary from Denver is:

CFG is a very classic syntax analysis tool. The basic idea is to define a set of syntax rules and parse sentences into a syntax tree according to the syntax rules.
For example, in the preceding example, you can gradually build a syntax tree from the following syntaxes.
S-> NP VP
VP-> V NP
NP-> NP PP
VP-> VP PP
PP-> P NP
NP-> JOHN | Mary | Denver
V-> called
P-> from

As you can see, CFG is a non-probabilistic analysis model. For syntactic analysis, there may be an ambiguous syntax tree, that is, given a sentence, several syntax trees can be constructed, these syntax trees are qualified. At this point, a natural idea is to add a probability factor, so that even if there are several syntax trees, you can choose based on the probability of the syntax tree, with a higher probability as the analysis result.
The transformation of CFG into pcfg is also very simple and straightforward. That is, each syntax rule of CFG is given a probability, which represents the likelihood of this rule. For example, the following transformed set of syntactic rules

For the left-side non-terminator verb phrase VP, there are two syntactic rules VP
->
V np and VP-> vp pp, because VP
->
V np is more common, so the statistical value is 0.7, And the other value is 1-0.7 = 0.3, that is, the total probability value of a non-Terminator on the same left side is 1.
With the above probability information, it is easy to solve the problem that CFG cannot solve, that is, calculate the overall probability of all possible syntactic analysis trees, select the highest probability as the analysis result. For example:
For a sentence: astpolicmers saw stars with ears. Obviously there is adhesion ambiguity, and there can be two valid syntax analysis trees:


 
According to the calculation, T1 is used as the result of syntactic analysis.

Case 2: introduction of probability information in Chinese Word Segmentation;
Chinese Word Segmentation eliminates ambiguity and non-Logon word recognition. New Word Recognition is a major problem. The idea of Chinese word segmentation is very simple at first, that is, directly querying the dictionary and then cutting the string according to the forward or reverse matching. To solve the ambiguity problem, the most direct idea is to introduce the word probability information. The basic idea is similar to the development from dfg to pcfg described above.
For example, the ambiguous sentence to be segmented is: cosmetics and clothing.
There are two possible word splitting results:
Cosmetic and kimono Installation
Cosmetics and clothing
Which one should be output? If the probability information is added, it is helpful to solve this problem. Suppose we already know the probability size of each word in a certain expected training set, the one with a higher probability value in the calculation result can be used as the output result.

The above two examples introduce probability to solve ambiguity. The basic idea is very intuitive, that is, we choose the most common combination as the correct result. Obviously, this idea cannot solve the problem at all, but it can solve a large part of the problem, which is also the role and limit of probability.

Two relatively fine examples are given above. In fact, as a research paradigm conversion, many sub-fields can consider this pattern. From the perspective of the probabilistic model research model, this transformation has been done for nearly 30 years, so almost all sub-fields of NLP have done this. There are not many opportunities to use this model for research and innovation. Apart from semantics, the pragmatic field may not be very mature, and other subfields are already quite mature. Although the practical application of this research model is of little significance, it is of great significance to grasp the essence of the current mainstream research ideas in this field to deepen the overall understanding of this field.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.