What is the application of syntactic analysis (syntactic parsing) in the field of NLP?

Source: Internet
Author: User
Tags add time
Ask a question in the NLP field. The question is like, "to-what extent would syntactic parsing is useful in a opinion extraction system and an information retrieval sy Stem? "
How does the opinion extraction system,information retrieval system through syntactic parsing be implemented in the dry? Ask the great God of NLP to explain their details and fields.
What is the right answer to this question?

Reply content:

Thank you for your invitation. Here are two questions: 1. How to use syntactic analysis in opinion extraction/ir; 2. The extent to which syntactic analysis is helpful for these two tasks (original question).

Since I mainly do syntactic analysis itself, and rarely do the upper application, so simply talk about my understanding of the application.

1. How to use syntactic analysis in opinion extraction/ir.
Give me a few examples.
For example, in opinion extraction we often draw the evaluation object (aspect):
Example:"The quality of the content is very good"
"Good" here describes "content quality". Through the analysis of dependent syntax, the corresponding collocation can be extracted. Such as:
(by inserting an ad, the analysis results come from the language cloud in our lab: (Insert an ad, analysis results from our Lab's language cloud: Online Demo | Language Cloud (Language technology platform Cloud Ltp-cloud)

Say again IR, take Baidu box calculate for example. For the following two query:
Query 1: who is the son of Nicholas Tse?
Query 2: who is Nicholas Tse's son?
The bag-of-words of the two query are exactly the same, and it is difficult to return the correct result directly to the user without regard to its grammatical structure. There are a lot of similar examples. In this case, by parsing the syntax, we are able to know what the user is asking about the real object.



By and large, more general requirements analysis of query can not be separated from the extraction of descriptive objects, many times the syntactic structure is very critical, but also the premise of the next step of semantic analysis.

2. The extent to which syntactic analysis is helpful for these two tasks (original question).

The original problem is very good, can expand a lot of thinking. Before the advent of the alchemy, perhaps we could give a very optimistic answer, such as 60%. But now, we need to be thoughtful. The main reason is that the powerful time series model (sequential modeling) such as rnn/lstm can depict the implicit grammatical structure of the sentence to some extent. While we cannot provide a clear explanation for the time being, it does show very promising performance on many tasks.
Recommend a car Wanxiang teacher wrote a simple survey: Hit car Wanxiang: is the deep learning model in natural language processing dependent on tree structure?

A performance comparison can illustrate the problem: Tree-lstm is based on the syntactic structure of the lstm,bi-lstm is a simple bidirectional (left<->right) LSTM. On many tasks, bi-lstm are better than tree-lstm.

However, this does not mean that the syntactic structure is useless, detailed analysis please refer to the above mentioned survey.

It is important to mention that the current performance of syntactic analysis is a key factor in preventing its practical application, especially on Open-domain. At present, the parsing performance of the English WSJ can achieve 94%, but once the cross-domain, performance or even fall below 80%, is not up to the actual application standards. In Chinese, parsing performance is lower.

Just started NLP soon, mainly to do is the text sentiment analysis direction, but did not use the syntactic analysis to do the research, see the problem of the main question, and all answer the main reply, and in recent several days consulted some related literature, to the question main problem to do the collation, also to the existing answer to make some supplement.

You are welcome to discuss ^_^ together.


syntactic analysis in opinionextraction (Opinion extraction) in the application:


1. Whatto do

First, the research objectives are analyzed.

Opinion extraction The main problem to be solved is to judge the emotion contained in a sentence or a comment. If only the judgement of emotional bias is the only goal, then it is a classification problem, the result of classification can be positive, negative, neutral three, or join the intensity, the division of finer granularity. Of course, although the judgment of emotional bias is a worthwhile research direction, the information opinionextraction can dig out is not limited to this.

The difference of the above excavation content actually leads to two big classes of opinionextraction research direction. One is sentiment classification, which is the goal of the high correct rate of emotional bias judgment; the other is sentimentrelated information extraction, Focus more on analyzing the constituent elements of emotional texts. What is a constituent element? is a comment, the reviewer, the subject of the commentary, can reflect the emotional words of the critic and other elements.

For example (taken from [1])

I highly recommend the Canon SD500 to anybodylooking for a compact cameraso can take good Pictures.

Bold words are the main body of the comment, and italic words indicate emotional words.

For the sentimentclassification problem, the above data can be judged that the emotion it contains is positive, enough.

For the sentimentrelated information extraction problem, for the above data, to be able to determine that Canon SD500 performance is good, if there is a performance classification, it should be classified into such as "Photo Performance" category.


2.Howto do and what is syntactic parsing important

To solve the problem, then say the solution to the problem, and analyze the key role of syntactic analysis in it

The above is a classification of the research direction of Opinionextraction. The following is the application of syntactic analysis.

syntactic analysis is of little significance to the first class of questions , which is why the Lord has not seen the study of syntactic analysis. Because for the first kind of problem, the statistical method is more efficient and the effect is better. Only the test text should be word segmentation (English words according to the word Word can be), the text appears in the emotional inclination of all words, the text of the emotional inclination of the prediction can be. Of course, this is the general idea, in order to improve the accuracy, you can also judge the field of text and so on.

In the second kind of problem, the function of syntactic analysis is greatly. Because the first type of problem can be regarded as equal to all the words, only according to the calibration data, determine the probability of a word in positive and negative comments. If you want to know which is the subject of the comment in a sentence, which is the word that modifies it, you must have enough settings for the language rules, and there are essential differences between nouns and adjectives. on the other hand, because there may be clauses in a sentence, it is possible that the adjective and its modified noun subjects are far apart, and if there is not enough judgment on the rule of sentence, the result may be a mess.

In the case of syntactic analysis, there is a clear example of what has been given, and it is not mentioned here. Here is an example of an emotional word extraction from a rule (from [2])


Flag Comment:

M:modifier, modifier words

Np:noun phrase, noun phrase

S:subject, subject

P:predicate, predicate

O:object Object

F:feature the subject of being evaluated

This is not the whole rule, and it can be seen that the syntactic structure of the opinionextraction is troublesome and the accuracy is difficult to achieve at a high level, because it is difficult to specify precise rules for all situations. So how to make syntactic analysis, in Opinionextraction is also a research content.


syntactic analysis in Informationretrieval in the application:


1.whatto do

The Lord did not contact Informationretrieval before, through this time of research, that the main problem of its solution is to pass the given query input, in a certain range to retrieve the relevant information. An important application is the question and answer system (questionanswering systems).


2.Howto do and what is syntactic parsing important

Because of the analysis of the input query, to find the data set (such as the Internet) and query related content, such as a query related to the sentence, or to the question directly give a short answer.

Literature [6] The task of the question and answer system is broadly divided into three steps, which the individual considers to be of reference significance:

(1) locatingthe relevant documents location related document

(2) Retrievingpassages that may contain the answer find the paragraph in the document that might contain an answer

(3) Pinpointingthe exact answer from candidate passages precise positioning in the alternate paragraph to find the answer

From the above can be seen, in the Informationretrieval, not a step to get the answer, but constantly narrowing the scope. This requires that every step is not too much deviation, such as step (2), even if the paragraph may contain the answer is wrong, it will certainly not find the right answer.

Because the query can actually look like a matching operation, matching and parsing the input statements will have a decisive impact on the query results. Here is an example to illustrate (taken from [7])

Q:What did George Washington call he house?

The traditional approach, which matches the document/paragraph containing the keywords in the database (search scope), is no doubt that the key word in the question is the green Georgewashington and house. Then found the following keywords:


Did find some related words, but a big problem is, so directly to the keyword matching, without analyzing the relationship between the keywords, you can not find the answer that contains multiple keywords. That is, the matched words are only related to one or a few keywords . But the question is Georgewashington house, so it's hard to get a precise answer if it matches only George Washington or a match.


This is a syntactic analysis, analysis of the correlation between the relationship between the relevant results obtained. Although there are multiple words above, the latter will rank the results associated with the two key words in advance, such as "George ' Swashington's house, Mount Vernon." This search result, the number of occurrences is very few, in the traditional method, will be ranked in the post-comparison position, but it is obviously associated with two keywords, so in the syntactic analysis method, it will be ranked very forward position .

In the literature [7] There are more detailed examples of how to establish parsingtree through syntactic analysis, the query statement and search results of the grammatical sequence (because there is a correlation between the words of each sentence, so it can be regarded as a grammatical sequence) match, select the result of high score as the answer. Details are more detailed, it is not here to repeat.


Summarize

It said that parsing how to solve the problem of opinionextraction and information retrieval, some trivial. Personally, as a whole, because of the relevance of natural language words and expressions, so through the syntactic analysis can be compared with only statistical methods to excavate more information, of course, statistical methods can also be in some dimensions to dig out the syntactic analysis of the data not obtained, so say more accurate: Syntactic analysis can be more directly through the grammatical structure of the rules to filter out the useful results, in this way to improve the accuracy of the method.

Because it is to see the main problem in the near future to do some research, the answer may be imperfect, but the answer to the example, the theory has a literature basis, welcome to discuss ^_^ together.


Reference Documents and PDF links

Opinion Extraction Part

[1] Wu Y, Zhang Q, Huang X, et al Phrase dependency parsing for opinionmining[c]//proceedings of the "Conference on E" Mpirical Methods in Naturallanguage processing:volume 3-volume 3. Association for Computationallinguistics, 2009:1533-1541.

Link:http://www. aclweb.org/anthology/d/ d09/d09-1159.pdf

[2] Popescu A m,etzioni O. Extracting product features and opinions from reviews[m]//naturallanguage Processing and text M Ining. Springer London, 2007:9-28.

Link: http://www. Aclweb.org/old_antholog y/h/h05/h05-1.pdf#page=375

[3] Poria S, Cambria E, Ku L W, et al. A rule-based approach to aspectextraction from product reviews[c]//proceedings of the Second Workshop onnatural Language P Rocessing for Social Media (SOCIALNLP). 2014:28-37.

Link:http://www. anthology.aclweb.org/w/ w14/w14-59.pdf#page=38

[4] Choi Y, Breck E, Cardie C. Joint extraction of entities andrelations for opinion Recognition[c]//proceedings of the 20 Conference onempirical Methods in Natural Language processing. Association for Computationallinguistics, 2006:431-439.

Link:http://www. aclweb.org/anthology/w/ w06/w06-16.pdf#page=453

[5] Dave K, Lawrence S, Pennock D m. Mining the peanut gallery:opinionextraction and semantic classification of product R eviews[c]//proceedings ofthe 12th International Conference on the World Wide Web. ACM, 2003:519-528.

Link:

/ http citeseerx.ist.psu.edu/v iewdoc/download?doi=10.1.1.13.2424&rep=rep1&type=pdf

Information Extraction Part

[6] Cui H, Sun R, Li K, et al. Question Answering passage retrievalusing dependency relations[c]//proceedings of the 28th Annual International Acmsigir Conference on the, and development in information retrieval. acm,2005:400-407.

Link:http://www. comp.nus.edu.sg/~kanmy/ papers/f66-cui.pdf

[7] Sun R, Ong C H, Chua T S. Mining dependency relations for queryexpansion in Passage retrieval[c]//proceedings of the 2 9th Annual INTERNATIONALACM Sigir Conference on, and Development in information retrieval. acm,2006:382-389.

Link:/httplms.comp.nus.edu.sg/sites/default/files/publication-attachments/ Sigir06-sunrenxu.pdf

[8] Carpineto C, Romano G. A Survey of automatic query expansion ininformation RETRIEVAL[J]. ACM Computing Surveys (Csur), 2012, 44 (1): 1.

Link:

https://www. Researchgate.net/profil e/claudio_carpineto/publication/220566113_a_survey_of_automatic_query_expansion_in_information_retrieval/ Links/00b7d515aa3ac40767000000.pdf



"It's nice to see people like it and agree with our answer. Should be many friends of the proposal, recently we opened the same name of the public number:Phder, will regularly update our articles, if you do not want to miss each of our answers, welcome to sweep code attention ~ "


I also come to answer, although no one invited _ (: 3"∠)

Jiwei Li sent an article on this year's ACL "when is the Tree structures necessary for deep learning of representations?", presumably tree structure vs. sequence (recursive vs. recurrent)
He used four tasks to illustrate the problem:
    1. Sentiment classification at the sentence level and phrase level
    2. Matching questions to Answer-phrases
    3. Discourse parsing
    4. Semantic relation classification
Each task has been designed with tree-based RNN and sequence-based RNN, and has been enhanced with lstm. The results of each task will not be said. The final conclusion is In addition to task 4 (semantic relation classification) tree has a clear advantage, other tasks are bi-lstm on sequence better
The concrete conclusions are as follows:
    1. Longer text is better with tree (if corpus is sufficient)
    2. Bi-lstm can make up the gap between sequence and the tree.
    3. The long text can be divided into short texts (with punctuation), and then layered using the sequence model
I personally think it's not fair to use Bi-lstm and tree-recurssive., because each word sees the context is different, if the trees are made into bidirectional ., the results should also rise, such as the global belief rnn~
I believe in the validity of the tree, maybe it's just that we didn't raise I believe in the validity of the tree, maybe it's just that we didn't raise a better modelI myself is to do the text sentiment analysis, often uses the syntactic dependence analysis, is somewhat to these some understanding, temporarily publishes the personal viewpoint, does not have the hope to point out, everybody common study, common progress.
1. Introduction to syntactic dependency
The dependency analysis is mainly to analyze the relationship between the components in the sentence. General objective: To identify syntactic components (principal predicate, etc.) and to determine syntactic relationships (SBV, VOB, ATT, ADV, etc.).
The implementation of dependency analysis mainly includes graph-based method and state transfer-based method. The graph-based approach uses the dependency relationship training of the whole sentence, using the maximum spanning tree algorithm to parse, has a global, but he has a fatal disadvantage-not until the end of the search, no intermediate parsing results, so the intermediate results can not be used for subsequent parsing. Based on the state transfer method, using each step of the transfer training, gradually search the local optimal resolution until the resolution is complete, this method is a bit greedy algorithm meaning (personal feeling) and local, the advantage of this algorithm can use the intermediate results to follow-up analysis. These two methods can be said to be antagonistic and complementary relations, and many people merge them, it is said to produce a better than any single method of effect.
2. The meaning of syntactic dependency
2.1 Text Comprehension
"Who is the son of Nicholas Tse?" "and" Nicholas Tse is whose son? "The two problems are completely different, but if the search results are almost consistent with traditional searches (because it's basically a keyword match), the results will be completely different if you add syntactic dependency analysis."
2.2 Semantic disambiguation (This is a lot of use, the most common can be rewritten with query)
such as "Go to the hospital to see cancer patients", "see" Here can be "treatment" meaning can also be "visit" meaning, so easy to ambiguous. If you introduce syntactic dependency, you will definitely be "visiting" because the object you are looking at is "patient."

2.3 Main Extraction
"Recommend a restaurant where you can hear classical music" This sentence can be easily obtained by syntactic analysis.

2.4 Abstract Extraction
Exactly like that, no more talking.
2.5 sentiment analysis
Generally because of the domain problem (my home sound is very big vs my home washing machine sound very loud) and irony problem (you're awesome), at present sentiment analysis is mostly based on rules rather than statistics, rules-based general syntax analysis to determine and verify some of the emotional words, subjective words of the syntactic structure, and then determine the emotional tendency of the sentence, Draw emotion tag and so on.
2.6 Machine Translation
The simplest method: the sentence structure is determined by syntactic analysis, then translated by word, then the translation results are collated and modified according to the syntactic structure.
2.7 Unique answers search (question and answer system)
For example, users search "Chen daoming height", "Chen Daoming Age", if the addition of syntactic dependency analysis will directly give the search results.
Similar applications are not listed. In addition, syntactic dependency analysis is the basis of semantic role labeling. Similar applications are not listed. In addition, syntactic dependency analysis is the basis of semantic role labeling.
3. When do I need syntactic dependency analysis?
A. For complex problems but less training data
B. The distance between semantic dependencies is far away (those silly little female fans attacked me on the Internet)
Specific reference car teacher This article: Hit car Wanxiang: is the deep learning model in natural language processing dependent on tree structure?
4. How to use syntactic analysis
A brief introduction to the application of subjective sentence extraction, basic processing (word segmentation, part-of-speech tagging, etc.)-->match aspect-->match aspect-verb-->match emotional words-->match pan-emotional words-- Particle size analysis (merge results, re-pos tagging, etc.)--(Give v Force N--to force a)--syntactic dependency analysis--Validate the structure of the subjective sentence (for example, emotional verbs must have subject and object)--sentence pattern matching (a variety of sentence patterns, Different sentence positions and hit rules are not the same)--sentence pattern filtering (various sentence patterns, different sentence filter location and filter rules are not the same)--and the whole sentence parsing--the whole sentence filter--and output subjective sentence. Syntax opinion extraction system based on rules this is very academic, I do not know how to express Chinese.
Information retrieval system refers to search engines.

First of all, based on text relevance to search, in the NLP scenario, the Core keyword score is easily diluted by some nonsense. Although stopwords can be used to solve the problem, syntactic analysis of extracting ontology is much more accurate.

And then, more importantly, we do search for upper-level applications, and in addition to text dependencies, there are often special rules-based treatments that can be used. This piece is largely based on its own business, so the domestic LTP or Ictclas, and did not give the general solution of syntactic analysis.

For example, a video search that finds "new" or "hot" in the query may have to add time-related special rules to the search. The formulation of these rules is the understanding of the business with syntactic analysis completed.

Video Search This example is not very good, because set a keyword can also be solved, do not need to use the syntactic analysis of this sledgehammer. If "I want to book a flight from Guangzhou to Beijing tomorrow", we must use syntactic analysis. First of all to determine this is to buy a ticket business, and then extract the time and place in the sentence, and then fill in the corresponding business interface, and finally presented to the user.

The difficulty is how to abstract and run efficiency issues, as well as the business and engineering aspects of countless unexpected pits you think of.

About extent would syntactic parsing be useful,
This should help the main topic:
https:// Open.weixin.qq.com/zh_c n/htmledition/res/assets/smart_lang_protocol.pdf Basically, stagnant.
  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.