Columbia University natural language processing open course lecture translation (1)

Source: Internet
Author: User

I attended a natural language processing open class, which was taught by Daniel Collins. If you think it is good, translate the lecture into Chinese. On the one hand, I hope that through this translation process, I can better understand the content taught by Daniel and exercise my translation skills. On the other hand, hah is beneficial to mankind. The content in parentheses is my own Supplementary Understanding.

I hope you can correct the translation.

Course address:

Okay, so welcome to natural language processing. my name is Michael Collins, I'm a professor sor in computer science at Columbia University. I 've taught this course for several years now, most recently at Columbia, and before that at MIT. natural Language Processing is I think, a tremendously exciting field. it builds on insights from computer science, from linguistics, and as we'll see, increasingly from probability and statistics. it's also having a huge impact in our daily lives. applications and technologies are now making use of basic ideas from natural language processing. so, in this introductory lecture, We're re going to cover a few basic points.

Welcome to the natural language processing course. I am a professor from the computer department of Goron graduated university, Michael Collins (personal homepage :). I have taught this course for many years, most of which have been taught at Columbia University in recent years. Before I came to Columbia University, I stayed at MIT where I taught this course. In my opinion, natural language processing is a very exciting field. It is based on a deep understanding of computers and linguistics. Of course, we will also see that it is increasingly dependent on probability and statistics. Of course, he has a great influence on our daily life. Many technologies and applications are using the basic idea of natural language processing. In this course, we will teach you some basic points.

The first question we're re going to ask is, what is natural language processing? So, we'll discuss a few key applications in NLP and also a few key problems that are solved in natural language processing. The second question we'll consider is, why is NLP hard? So, we'll consider some key challenges that we'll find in natural language processing. finally, I'll talk a little bit about what this course will be about, what kind of material we'll cover in this course, and what in general you shoshould have CT taking this course? So, at a high level, natural language processing concerns the use of computers in processing human or natural ages. so, on one side of this problem, we have what is often referred to as natural language understanding where we take text as input to the computer and it then processes that text and, and does something useful with it. at the other hand, we have what is often referred to as natural language
Generation. Where a computer, in some sense produces language in communicating with a human or user.

First, what is natural language processing? Here, we will discuss some application of niucha in the natural language processing field and some key issues to be solved. Second, why is natural language processing so difficult? We will think about some of the great challenges we encounter in natural language processing. Finally, I will talk about what this course is like and what it will come into use in the class. In general, what your students want to get from the class. At a higher level, natural language processing is concerned with how computers are used to process human or natural languages. On the one hand, we are generally referred to as natural language processing: We input natural text into a computer and let the computer process it to obtain valuable information from it. On the other hand, we generally call it natural language generation: In a sense, machines can generate a language to communicate with humans (machines can understand human input and communicate with people ).

One of the oldest applications and a problem of great importance is machine translation. this is the problem of mapping sentences in one language to sentences in another language. and this is a very, very challenging task. but remarkable progress is being made in the last 10 or 20 years in this area. so here, I have an example translation from Google Translate which contains of you will be familiar with, This is a translation from Arabic into English. and, while these translations perfect, you can still understand a great deal of what was said in the original language. so, later in this course, we'll actually go through all of the key steps in building a model in the Machine Translation System.

A classic application is machine translation, and it is also a very important problem. This problem is to map sentences in one language to another. This is a very, very challenging task. However, in the past 10 to 20 years, this field has made great progress. Here, I will show you a few examples of Google translation, which are familiar to all of you. Well, this is an example of translation from Arabic to English. As you can see, Google Translate is very beautiful. (Through sentences automatically translated) You can understand the meaning of the original article (Arabic. In the next course, we will sort out the key steps in building a machine translation system.


So, a second example application is what is often referred to as information extraction. so, the problem in this case is totake some text as input and to produce some structured, basically a database representation of some key content in this text. so, in this particle example, we have input which is a job posting. and the output captures various important aspects of this posting.
For example, the industry involved, the position involved, the location, the company, the salary, and so on. and you'll see that this information is pulled out from this document. so, the salary in this case comes from this, this portion here. so this is a, a critical example of a natural Lang, language on the standing problem where the promise, in some sense understand this input were unstructured text and to turn it into a structured data base kind of representation. so there's some clear motivation for this particle problem, information extraction. once we 've saved med this step, we can, for example, perform complex searches. so say I want to find all jobs in the advertising sector paying at least a certain salary in a particle location. this wocould be a search that is very difficult to formulate using a regular search engine, but if I first run my information extraction system over websites all of the job postings that I find in the web. I can then perform a database query and, and perform much more complex searches such as this one. in addition, we might be able to perform St, statistical queries. so we might be able to ask you know how is the number of jobs in accounting changed over the years, or what is the number of jobs in software engineering in the Boston area posted during the last year.
Well, the second application is information extraction. The problem in this application is that some text is used as input to generate the core text content of the structured and basic representation of the database. Here is an example. We enter a recruitment notice and the output shows all the important information about this offer. Such as industries, jobs, work places, companies, and salaries. You can see that the output information is extracted from the above recruitment notice. You see, the salary mentioned in the recruitment notice is here. (The picture shows Collins's line pointing to the recruitment notice ). This is an important example of natural language understanding-we want (Computer) to understand unstructured text input to a certain extent and convert it into a database structured representation. Well, our motivation for this information extraction problem is clear: once we can achieve this step, we can do some very complex searches. (A query is displayed on the screen): I want to find all jobs inthe advertising sector paying at least a certain salary in aparticular location. for common search engines, this is a very difficult search. (Semantic understanding of this sentence, rather than simple literal matching ). However, if I use my information extraction system to process the recruitment notices that can be found on the Internet, we can query the database and answer a complex search like above. In addition, we can perform a statistical search to show you the changes in accounting positions in recent years and the number of software engineering jobs in Boston last year.

Another key application in natural language processing is text summarization. and the problem in this case is to take a single document or, potentially a group of several documents and to try to condense them down to a summary. which, in some sense, preserves the main information in those documents. so here, I actually have an example screenshot from a system developed at Columbia, which is called news blaster. and this is actually a multi-document system. it will take multiple documents on the same news story, and produce a condensed Summary of the main content of those documents. so in this particle example, we have a large group of statements all about vaccination program. and here is a summary which attempts to capture the main information in all of these documents. so summarization again has clear motivation in making sense of the vast amount of data or text available on the web and the news sources, and so on. it's very useful to be able to summarize that data.

Another natural language processing application is text summarization. The problem here is to enter one or more documents and try to combine them into a summary showing the main content of these documents. Here, I have a news blster system from Columbia University. This system is actually a multi-document summary system. The system processes multiple documents of similar news reports and generates a summary showing the main content of these documents. In this particular example, we have a large number of documents about the vaccination program. Well, this is the summary of this batch of documents (some text is displayed on the screen, he tried to summarize the main contents of this batch of documents. Abstract: there is also a clear motive for using a large amount of news data on the Internet. It is very useful to abstract the data.

Another key application is what are called dialogue systems. and these are systems where a human can actually interact with a computer to achieve some tasks. so, the example I 've shown here is from a flight domain, where a user is attempting to book a flight.
And so the user might come with some query to the system. and the system then goes and processes this query, in some sense it understands that query. and in this special case it realizes that there's a piece of missing information, namely the day of the flight and so the system them responds with a query, what day are you flying on? The user provides this information and the system returns a list of flights. so in dialog systems the basic problem is to build a system where the user can interact with a computer using natural language. and notice that this type of system involves both natural language understanding components, we have to understand what the user is saying. and there's also importantly a natural language generation component. in that we're re going to have to generate text in some cases. for example clarification questions as we 've shown here.



Another key application is the dialog system. Humans can interact with computers to complete some tasks. Well, here I will show you a man-machine conversation from the aviation field ). Here, the user wants to determine a flight. Therefore, the user requests the system, and then the system understands the request in a certain sense after processing the request. At this time, the system discovers (user request) there is an unspecified message, that is, day of the flight (the user determines the flight time ). The system understands to reply to a user dialog: What day are you flying on ?. After receiving the dialog, the user provides the time to the system. After receiving the message, the system returns the list of flights meeting the user's needs. From this example, we can see that the basic problem of a dialog system is to establish a system where humans can interact with computers using natural languages. We have noticed that this system involves natural language understanding: computers need to understand what users say. In addition, it involves natural language generation: The computer needs to generate (for the user's understanding) text. What day are you flying on ?).

So in addition to the applications I 've just described, we'll also consider some very basic natural language processing problems which on depend limit of these applications. and the first, we'll talk about is something called the tagging problem. so merge actly tagging problems take the following form. as input we have some sequence, in this case a sequence of letters and as output we are going to have a tagged sequence where each letter in the input now has an associated tag. this is probably best practice strated through a couple of examples. the first one is part-of-speech tagging. so the problem in this case is to take a sentences input, for example profits, soared, at Boeing Co and so on, and to tag each word in the input with its part of speech. so N stands for noun, V stands for verb, P stands for preposition, ADV stands for adverb, and so on. so, this is one of the sort of, most basic problems in natural language processing. if you can perform this mapping with high accuracy, it's actually useful processing ss a very wide range of applications.


In addition to the applications I mentioned earlier, let's take a look at some basic problems of natural language processing. Many applications rely on them for support. First, let's talk about tagging. Simply put, tagging is like the following. These sequences (characters) are used as input, and we will get a labeled sequence on the output. Here, the entered characters are bound with a tag. Let's take a few examples. The first example is part-of-speech tagging. The problem here is to use a sentence as the input, such as profits, soared, and at Boeing Co, to mark the word in the input sentence. N stands for nouns, V stands for verbs, P stands for prepositions, and adv stands for adjectives. This is one of the series of basic NLP problems. If you can improve the accuracy of this ing process, it will be of great help to many applications.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.