Beauty of mathematics series 20th-Godfather of natural language processing Marcus

Source: Internet
Author: User
In the previous series, we introduced and mentioned some young and promising scientists, such as Michael Collins, Eric Blair, David yarenski, and ranapati, they are all from the computer department of Pennsylvania, Mich Marcus. As described in many martial arts novels, the disciples have become leaders of various schools, and the Masters must not. Indeed, although Marcus has not published many papers as the first author, he can be said to be the godfather of the natural language processing field from many perspectives.

Professor Marcus has long served as the dean of the Computer Science Department at the University of Pennsylvania until he found Pierre to replace him from at&t a few years ago. As a manager, Marcus shows his foresight in natural processing and computer science. While guiding doctoral students, Marcus discovered the importance of corpus in natural language processing. Marcus spent more than a decade building a series of standard corpus for scholars around the world. This set is called the LDC corpus and is a tool used by all scholars in natural language processing around the world today. As we mentioned in the previous series, today's natural language processing is almost always a statistical method. To make statistics, a large amount of representative data is required. The process of developing a natural language processing system using the data can be collectively referred to as training. For example, if we want to train a Chinese Word Segmentation System, we need some Chinese sentences with good words. Of course, these sentences must be representative. If you want to know the accuracy of a word splitting system, we also need to manually separate sentences for testing. These manually processed Text Databases become corpus ). If each laboratory has several manually created corpus, it is not only a waste of time and effort, but also the data is not comparable when publishing an article. Therefore, Marcus thought of building a series of standard corpus for scholars all over the world. He used his influence to allow the US Natural Science Foundation and DARPA to establish projects, contact many universities and research institutions, and establish hundreds of standard corpus. The most famous one is penntree.
Bank corpus. Penntree Bank covers multiple languages (including Chinese ). In each language, it contains representative sentences of hundreds of thousands to millions of words, each sentence has a part-of-speech tagging, syntax analysis tree, and so on. LDC has become a database shared by natural language processing scientists around the world. Nowadays, almost all papers published in natural language processing need to provide LDC-based Corpus testing results.

Marcus gave his doctoral students the freedom to study their own topics, which is why he is full of science and technology. Marcus has unique insights into almost all natural language processing fields. He and many professors asked the doctoral students to do the projects he got the Fund for. Marcus asked the doctoral students to raise their own topics of interest, or use his existing funds to support the students, or apply for funds for their project areas. Marcus was able to quickly determine whether a research direction was correct, saving doctoral students a lot of time for try-and-error. So some of his students quickly got their doctorate.

As the head of the department, Marcus showed a brilliant foresight in professional settings. I was fortunate to have worked with him on the same school affairs advisory committee to discuss the research direction of the computer department. Marcus saw the importance of bioinformatics (bioinformatics) when he started Internet research on the Internet a few years ago, and set up this major at the University of binxi phalia, in addition, when other universities are not aware of it, they start to recruit professors in this field. Marcus also suggested that professors in related fields, including the later head of the department, put part of his energy into bioinformatics. Marcus also gave the same advice to other universities where he served as a consultant. After the network bubble burst, many university computer systems began to turn to bioinformatics, but it was hard to find good professors in these fields. I think the most important thing for Chinese universities today is a visionary manager like Marcus.

In a few days, I would like to hold a meeting with Marcus on the advisory committee. I don't know what he thinks about the development of computer science.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.