We still have a long way to go on the road of artificial intelligence. We are looking forward to a real game between the future and robots. The result of this game is the harmonious coexistence of people and machines, life is more convenient. From the intelligence we feel, for example, if your mobile phone is an iPhone, you will not be unfamiliar with Siri, this man-machine interaction artifact, it can help us read text messages, introduce restaurants, inquire about the weather, and set alarms. Of course, this is not too high. What really makes users strange is that Siri can continue to learn new voices and tones and provide conversational responses. However, the intelligence presented by Siri is just the tip of the iceberg for AI implementation in the future. Fortunately, we may be able to contribute to Machine Translation and human-computer interaction technologies in the future and see better intelligent applications. (If you want to have a deeper understanding of Siri technology, we recommend an article for you: http://www.infoq.com/cn/articles/zjl-siri)
We know that early machine translation was well-formed, and most of the corpus used came from news. It seems that it is not so grounded, however, Google translation and youdao translation are frequently used. (If you are interested, you can try zidong interpreter developed by the Chinese Emy of Sciences Automation Institute.) The translation content is getting closer and closer, we know that our technology will become more and more approachable. This closeness will not be satisfied with the translation of news reports and patent orthodox texts, but the translation of daily oral communication. As a popular science blog, I want to learn about the past, present, and key technologies of oral translation.
Generally speaking translation should consist of three modules: Automatic Speech Recognition, machine translation engine, and speech synthesizer. Obviously, traditional machine translation cannot fully adapt to this oral translation, only by looking at translation and recognition as a whole can we better serve the system of verbal translation. Therefore, when making a translation module, we must consider the recognition errors in speech recognition. We must also make our translation system better accept or correct such errors, at the same time, it is best to consider the meaning of spoken language in linguistics. Different from the written language, the syntax of the spoken language is relatively loose, and there are often repeated, redundant, omitted, reversed, and many other "irregular" syntax phenomena, which will increase the difficulty of research.
List several typical oral phenomena:
1. Ah, off for next week's second or third day (repeated)
2. Do you have a room now (upside down)
3. Can I make a reservation? (redundancy)
Let's take a look at what people have done before? In 1989, speech trans (CMU) was the first voice translation experimental system to meet with you. During the past 20 years, a speech translation system targeting different fields has been released one after another, now we can see that Siri, Google Translate voice edition, and itranslate voice have gradually become familiar to us, and the exploration of corresponding technology fields has become our focus: the 23-Nation speech homophone translation technology (U-STAR) Application of VoiceTra4U-M has also been successfully demonstrated, and it seems that the speech translation is also slowly going down. Although speech translation has already been put into application, most of its applications are in limited fields. If we want to be truly universal, we will still have a longer way to go, we hope that our future translation can automatically expand the knowledge base. We hope that our future translation can achieve communication between people without nationality. Of course, all these implementations cannot be separated from technologies.
For a long time, it has been used mostly as a method of intermediate expression-oriented oral parsing. The if format theory is used, based on the theory of conversational behavior-that is, language is not just used to present facts, the intention of the speaker is also included. One advantage of doing so is that it is easy to use rule parsing and can produce good accuracy. The disadvantage is that the domain is limited, and the use and construction of the IF format is very expensive. Here is an example to illustrate the meaning of the IF format (taking the corpus of hotel reservation as an example ):
I want to reserve a single room tomorrow
If: C: Give-Information + reservation + Room (room-spec = (room-type = single, quantity = 1 ), reservation-spec = (time = (relative-time = tomorrow )))
Meaning: the speaker is C. The intention of the sentence is to provide information. The topic is to reserve a room, the room type is single room, the number is 1, and the reservation time is tomorrow.
In addition to this method, you can also use the instance-based method. Currently, the better method is the statistical method. However, the essence of the regression problem is that the current language translation core is basically similar-the differences in Translation results are usually concentrated in the Corpus Based on phrases, hierarchical phrases, and syntaxes, and differences in input format quality. Therefore, the author believes that if the core translation process cannot be easily improved, it is better to think about how to prepare the corpus, which will be of great help to our oral translation. O (partition _ partition) O
The last picture is shown. I hope machine translation will get better and better ~
Oral English Translation-the only way for AI