Several elites in the beauty of mathematics Series 15-plus and simplified Natural Language Processing

Source: Internet
Author: User
Several elites in the beauty of mathematics Series 15-plus and simplified Natural Language Processing

Poster: Wu Jun, Google researcher

A good method I have been emphasizing in the beauty of mathematics series is simple. However, in fact, there are also some special cases in natural language processing. For example, some scholars study a problem to the extreme and persistently pursue perfection or even perfection. Their work has a great reference value for our peers, so we need such scholars in scientific research. Michael Collins, the new generation of top figures in natural language processing, is such a person.


Collins: the pursuit of perfection

Collins received his doctorate from the University of byeushphalia from the master of natural language processing (Mitch Marcus) (we will mention Marcus many times later) and is now at the Massachusetts Institute of Technology (MIT) associate Professor (although he is an associate professor, his level is one of the best in today's natural language processing field, collins wrote a natural language grammar analyzer (sentence parser) named after him, which can accurately analyze every sentence in the written language. Grammar analysis is the basis for many natural language applications. Although both Collins's senior brother Eric Brill and Ratnaparkhi and his younger brother Eisnar have completed quite a good language grammar analyzer, Collins has done its best, make it the best grammar analyzer in the world for a long time. The key to Collins's success is to carefully examine every detail of grammar analysis. The Mathematical Model Collins uses is also very beautiful, and the work can be perfectly described. I asked Collins for the source program of his grammar analyzer because of my research needs. He gave it to me very cheerfully. I tried to modify his program to meet the requirements of my specific application, but later I found that his program had too many details and it was difficult to further optimize it. Collins's doctoral thesis is an example in the natural language processing field. It is like an excellent novel that clearly describes the ins and outs of all things. Anyone with a little knowledge of computer and natural language processing can easily understand his complicated methods.

After his graduation, Collins spent three years in the AT&T lab. There, Collins completed a lot of world-class research work, such as the Hidden Markov Model differentiation training method and the application of convolution kernel in natural language processing. Three years later, AT&T stopped its research on natural language processing, and Collins was lucky to find a faculty member at MIT. In just a few years at MIT, Collins won the Best Paper Award at many international conferences. Compared with other peers, this achievement is unique. Collins is characterized by taking things to the extreme. If someone prefers "tedious philosophy", Collins is one.


BLAIR: simple and elegant

In terms of research methods, the opposite of Collins is typical of his senior brothers Eric Brill and yaranski, which we have already introduced and will not repeat here. Unlike Collins's shift from industry to academia, Blair's career path is from academia to industry. Unlike corris's research method, Blair always tries to find a simple method that is no longer simple. Blair's name creation is a machine learning method based on transformation Rules (transformation rule based machine learning ). Although the method name is complex, it is actually very simple. Let's take PinYin conversion as an example to describe it:

The first step is to find the most common Chinese characters corresponding to each pinyin as the result of the first conversion. Of course, there are many errors in the result. For example, "Common Sense" may be converted into "long knowledge ";

The second step can be called "de-pseudodeposit". We use a computer to list all the rules for replacing Homophone Words Based on the context. For example, if chang is marked as "long ", however, if the Chinese character is "recognition", change "long" to "regular ";

The third step should be to use all the rules in the pre-identified corpus to pick out useful ones and delete useless ones. Repeat steps 2 and 3 until useful information is not found.

Blair achieved almost the best results in many natural language research fields by using this simple method. Since his method is no longer simple, many people are learning it. Blair can be my first business engineer in the United States. We used this simple method to mark words in sentences as nouns and verbs, no one can surpass it in many years. (What surpassed us was a Dutch engineer who later joined Google, using the same method, but doing a lot of detail) Blair left the academia and went to the Microsoft Research Institute. In the first year, he completed more work in one year than all others in the group did in many years. Later, Blair joined a new group, still a high-yielding scientist. It is said that his work is really valued by Microsoft and he wants to thank Google. with Google, Microsoft has provided him with tremendous support from human and material resources, this makes Blair one of the leading figures in Microsoft's search research. In terms of research, Blair may not be able to immediately find what to do, but immediately deny an impossible solution. This is related to his pursuit of simple research methods. He can find out the quality of each method in a short time.

Blair is always looking for simple and effective methods, and never hides his own methods, so he is always easily caught up and exceeded by many people, including the author and myself. Fortunately, Blair liked others to catch up with him, because when people surpassed him in a research direction, he had moved his ship to his side. Once, Eric told me that one thing I could never catch up with him was that he had a second child before me :)

In the next series, we will introduce an example of combining complexity with simplicity.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.