Get a quick look at mastering Chinese natural language processing

Source: Internet
Author: User

What is NLP?
In the Computer field, NLP (Natural Language processing), also known as "natural language processing", is the study of how to make computers understand human language. This includes the ability to make the computer understand the meaning of natural language text, but also the natural language text to express a given deep intentions, ideas and so on. Therefore, this technology often embodies the highest task and realm of artificial intelligence, that is, only when the computer has the ability to understand natural language, the machine is to realize the real intelligence. However, because Chinese words are made up of the ever-changing Chinese characters, the "Natural language processing" in the Chinese field is particularly complex. Therefore, the research in this field will involve natural language, that is, people's daily use of the language, so it is closely related to the study of linguistics, but there are important differences. Natural language processing is not a general study of natural language, but the development of a computer system that can effectively achieve natural language communication, especially software systems. Thus it is part of computer science. Natural language Processing (NLP) is a field of computer science, artificial intelligence, and linguistics that focuses on the interactions between computers and human (natural) languages.
Focusing on the field of "natural language processing" for many years, NLP technology and services have been accompanied by rapid development and continuous progress. From automatic translation, information retrieval, automatic indexing, automatic summarization, automatic writing story novels and other fields, we can use our tool class dknlpbase to deal with, NLP technology is no longer purely at the conceptual level, but gradually infiltration and successful application in the large and fast in all fields.

Why NLP is needed
For example, in daily life, we always encounter some unfamiliar words do not know how to read, then often will go to search engines, such as "4 and read what." We found that the search results must show you how the word "4 words" is, along with pinyin and notes, rather than the "4 read what" these solitary words or their surface matching results.

This is actually a manifestation of NLP technology. Through this technology, people do not have to spend a lot of effort to learn and understand the difficult computer language, but in their most accustomed language to use the computer, and further understand the meaning behind it.
What can NLP be used for?
The big fast NLP module is a component of the big fast Big data integration platform, the user references this component can effectively carry on the natural language processing work, such as carries on the article summary, the semantic discrimination as well as enhances the content retrieval accuracy and the validity.
Natural language processing is now not only a core topic of AI research, but also as a new generation of computer core subject to study. From the knowledge industry point of view, the expert system, database, Knowledgebase, Computer Aided Design System (CAD), computer-aided teaching system (CAI), computerized decision-making system, office automation management system, intelligent robot and so on, all need to use natural language processing, The natural language comprehension system with the ability of discourse comprehension can be used in the fields of machine automatic translation, information retrieval, automatic indexing, automatic summarization, and automatic writing of story novels, all of which can be handled by our tool class Dknlpbase.
Standard participle
Method signature:list<term> standardtokenizer.segment (String txt);
Returns: the word breaker list.
Signature parameter Description: txt: The statement to be participle.
Example: The following example verifies that the 5th participle of a word is Afado.
public void Testsegment () throws Exception
???? {
???????? String Text = "Goods and services";
???????? list<term> termlist = dknlpbase.segment (text);
???????? Assertequals ("Commodities", Termlist.get (0). Word);
??? Assertequals ("and", Termlist.get (1). Word);
???????? Assertequals ("Service", Termlist.get (2). Word);
???????? Text = "Cathay Associates Commentary" Li Shishi vs Afado second inning "The end is like this";
???????? Termlist = dknlpbase.segment (text);
???????? Assertequals ("Alfa Dog", Termlist.get (5). Word); able to identify "Alfa Dog"
}
Keyword extraction
Method signature:list<string> Extractkeyword (String txt,int keysum);
Return: List of keywords.
Signature parameter Description: txt: To extract the keyword's statement, keysum to extract the number of keywords
Example: Give a word to extract a keyword is "programmer".
public void Testextractkeyword () throws Exception
???? {
???????? String content = "Programmer (English programmer) is a professional who engages in program development and maintenance. " +
???????????????" Programmers are generally divided into program designers and program coding staff, "+
???????????????" But the boundaries are not very clear, especially in China. " +
???????????????" Software practitioners are divided into junior programmers, advanced programmers, systems "+
???????????????" Analyst and project manager four categories. ";
???????? list<string> keyword = dknlpbase.extractkeyword (content, 1);
???????? Assertequals (1, keyword.size ());
???????? Assertequals ("Programmer", Keyword.get (0));
????}
Phrase extraction
Method signature:list<string> extractphrase (String txt, int phsum);
return: Phrase
Signature parameter Description: txt: The statement to extract the phrase, the number of phsum phrases
Example: give a paragraph of text, can represent the article five phrases, the first phrase is the algorithm engineer.

NLP在最近几年取得了很好的进展,但还有许多的难题需要去解决,所以大快在积极的尝试,不过也正是这样有挑战的问题,才能让更多有才华的人投身到大快来推动它的发展。

Get a quick look at mastering Chinese natural language processing

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.