Natural Language Processing (NLP) is a technique for studying computer-processing human languages, including:
1. Syntactic analysis : For a given sentence, word segmentation, part-of-speech tagging, named entity recognition and linking, syntactic analysis, semantic role recognition and polysemy disambiguation.
2. Information Extraction : Extract important information from a given text, such as time, place, person, event, cause, result, number, date, currency, proper noun, etc. In popular parlance, it is necessary to understand who is at what time, what reason, to whom, what has been done, what results. It involves the key technologies such as entity recognition, time extraction and causality extraction.
3. text mining (or text data mining): Including text clustering, classification, information extraction, abstract, emotional analysis and information and knowledge of the excavation of the visual, interactive expression interface. The current mainstream technology is based on statistical machine learning.
4. Machine translation : The input source language text is automatically translated to obtain the text in another language. According to the input medium, it can be subdivided into text translation, speech translation, sign language translation, graphic translation and so on. Machine translation from the earliest rule-based method to 20 years ago, based on the statistical method, to today's neural network (encoding-decoding) method, gradually formed a more rigorous system of methods.
5. Information retrieval : Indexing large-scale documents. You can easily index the words in the document by assigning them different weights, or you can use the techniques of the technology to build deeper indexes. In the query, the input query expression such as a search term or a sentence for analysis, and then in the index to find matching candidate documents, and then according to a sorting mechanism to sort the candidate documents, and finally output the highest ranked document.
6. Question and answer system : A question about the expression of a natural language, an accurate answer from the question and answer system. A certain degree of semantic analysis is required for natural language query statements, including entity linkage, relationship recognition, forming logical expressions, and then finding possible candidate answers in the knowledge base and finding the best answer through a sorting mechanism.
7. Dialogue System : The system through a series of dialogue, with the user to chat, answer, complete a task. Related to user intent comprehension, general chat engine, question and answer engine, dialogue management and other technologies. In addition, in order to reflect the context, to have multiple rounds of dialogue ability. At the same time, in order to embody individuality, we should develop user portrait and personalized reply based on user portrait.
With the development of deep learning in the field of image recognition and speech recognition, people have great hopes for the value of deep learning in NLP. Coupled with the success of Alphago, artificial intelligence research and application has become hot. Natural language processing, as a cognitive intelligence in the field of artificial intelligence, has become the focus of attention now. Many graduate students are entering the field of natural language, looking forward to the future in the direction of AI. However, people often encounter some problems. As the saying goes, everything starts hard. If the first thing succeeds, the student can build up confidence, find the trick, and do better in the future. Otherwise, you may lose your frustration and even leave the field. Here to give my personal advice, I hope that my superficial views can cause a deeper discussion.
Recommendation 1: How to learn the first skill quickly in the NLP field?
My advice: Find an open source project, such as a machine translation or deep learning project. Understand the task of the Open source project, compile the model program released through the project, and get the results consistent with the project demonstration procedure. The algorithm of the Open Source project demonstration program is then thoroughly understood. Self-programming to implement the algorithm of the demonstration program. Then test your own implementation of the program according to the standard test set provided by the project. If the result of the output is inconsistent with the results in the project, it is necessary to carefully check your own procedures and revise them repeatedly until the results are basically consistent with the demonstration procedures. If not, then boldly write to the author of the project to consult. On this basis, and then see if they can further improve the algorithm or implementation, to obtain a better result than the demonstration program.
Recommendation 2: How to choose the first good topic?
Engineering graduate students, many of the topics are given by the teacher. Need to take a more practical approach, solid hands to achieve. It may not require much theoretical innovation, but it requires strong ability of realization and comprehensive innovation. and academic graduate students need to obtain first-class research results, so the topic needs to have some innovation. Here are a few suggestions for me.
- Find your favorite field of study first. You find a recent ACL conference proceedings in which you find a field you like better. When choosing a topic, pay more attention to the field of blue sea. This is because the field of blue ocean, relatively new, easy to produce results.
- Fully investigate current developments in this area. Including the following aspects of research: methodological aspects, whether there is a relatively clear mathematical system and machine learning system, data, there is a recognized standard training set and test set, the research team, whether there are well-known teams and people to participate. If the findings of the above are not too clear, as a beginner may not be easy to enter.
- After confirming the entry into an area, as suggested, you need to find open source projects or tools in this area, and carefully study the existing major schools and methods before getting started.
- Read the latest articles in the field and read more articles published by cattle in this field. On the basis of in-depth understanding of the existing work, there are still some places that can be overturned, improved, integrated and migrated. Pay attention to doing the experiment, do not too much, each experiment only need to verify an idea. After each experiment, you must analyze the errors that exist to find out why.
- For the successful experiment, we further discuss how to improve the algorithm. Note that experimental data must be industry-accepted data.
- Compared with the existing algorithms, the experience can draw more general conclusions. If there is, then write an article, otherwise, should change a new topic.
Recommendation 3: How do I write my first essay?
- Answer the question, and if the idea is good and the experiment proves it, you can start writing your first paper.
- Determine the title of the paper. In the fixed topic, generally do not "... System "," ... Research and practice ", to avoid too long a topic, because it is not good to reflect the main points. Topics to be specific, with depth, highlighting algorithms.
- Write abstracts of papers. To highlight what this article is about what the important question, proposed what method, compared with the existing work, has what advantage. The experimental results show that what level has been reached and what problems have been solved.
- Write an introduction. First of all, the background of this work, the definition of the question, what is the importance of it. Then introduce to this question, the existing method is what, have what merit. But the existing approach still has many flaws or challenges. For example (note for example), what's the problem. In this paper, according to the question of what method (who's work) inspired, proposed what new methods and do the following aspects of the study. Then we describe each aspect in a classified way, and finally explain the conclusion of the experiment. Moreover this article has several contributions, generally writes three is sufficient. Then talk about the chapters of the article organization, as well as the focus of this article. Sometimes things are too much, space is limited, can only introduce the most important parts, do not need to be exhaustive.
- Related work. To do a sort of related work, according to the genre, the main three schools to do a brief introduction. Introduce its principle, and then explain its limitations.
- You can then set up two chapters to introduce your work. The first chapter is an algorithm description. Includes problem definition, mathematical notation, algorithm description. The main formula of the article is basically here. Sometimes a concise derivation process is given. If we draw on others ' theories and algorithms, we should give clear citation information. On this basis, because it is generally based on machine learning or deep learning methods, you should introduce your model training methods and decoding methods. The second chapter is the experimental link. Generally to give the purpose of the experiment, what to test, the method of the experiment, the data from where, how large-scale. The best data is to use public evaluation data to make it easier for others to repeat your work. Then give each experiment the required technical parameters, and report the results of the experiment. At the same time, in order to compare with the existing work, you need to refer to the results of existing work, necessary to reproduce important work and report the results. Use the experimental data to say that you are better than others ' methods. Make a good analysis of the results of your work and other people's work and the different pros and cons, and explain the reasons. For areas that are not very good at the moment, analyze the problem and make it a future work.
- Conclusion. The contribution to this article is summarized again. It is necessary to summarize and refine the theory and method, as well as to explain the contribution and conclusion in the experiment. The conclusion is to convince the readers and point out the direction of future research.
- Reference documents. Give a paper on all important related work. Remember, missing an important reference (or the job of a bull) is basically no hope of being admitted.
- Finish writing the first draft, then change it three times.
- Submit the article to the same project team, and ask them to review your article in terms of algorithmic novelty, innovation, and experimental scale and conclusions. Their own focus on weaknesses, further improvement, focusing on strengthening the depth of the algorithm and the work of innovation.
- Then ask people from different project groups to review. If they don't see it, the article is not readable enough. You need to modify the structure of the text, to polish it, to increase the readability of the article.
- For international conferences such as ACL, it is best to ask English majors or native speakers to refine the text.
How is natural language processing the quickest way to get started?