A tutorial on using only 500 lines of Python code to implement an English parser

Source: Internet
Author: User
The parser describes the syntactic structure of a sentence to help other applications to reason. Natural language introduces a lot of unexpected ambiguity, and we can quickly find these ambiguities with our understanding of the world. Give me an example that I really like:








The correct parsing is to connect "with" and "pizza", while the wrong parsing links "with" and "Eat" together:






In the past few years, the Natural Language Processing (NLP) community has made great strides in grammatical analysis. Now, a small Python implementation might perform better than a widely used Stanford parser.






The rest of the article sets the issue first, and then brings you to the simple implementation of this preparation. The first 200 lines in the parser.py code describe the label and learner of part of speech (here). Unless you are very familiar with the research of NLP direction, you should skim at least before studying this article.



The Cython system and the Redshift are written for my current research. After the contract with Macquarie University expires, I plan to improve it in June for general use. The current version is hosted on GitHub.
Problem description



It is very friendly to enter such an instruction on your phone:



Set volume to zero when I ' m in a meeting, unless John's school calls.



Then make the appropriate policy configuration. On Android, you can apply Tasker to do such things, and the NL interface will be better. By receiving a semantic representation that you can edit, you can see what it thinks you mean, and you can correct his idea, which is particularly friendly.



There are many problems to be solved in this work, but some kinds of syntactic forms are absolutely necessary. We need to know:



Unless John ' s school calls, when I ' m in a meeting, set volume to zero



is another way of parsing instructions, and



Unless John ' s school, call when I ' m in a meeting



To express a completely different meaning.



The dependency parser returns a graph of words to and from words, making reasoning easier. The diagram is a tree structure with a forward edge, each node (word) having only one arc (head dependent).



Usage examples:

>>> parser = parser.Parser ()
>>> tokens = "Set the volume to zero when I 'm in a meeting unless John' s school calls" .split ()
>>> tags, heads = parser.parse (tokens)
>>> heads
[-1, 2, 0, 0, 3, 0, 7, 5, 7, 10, 8, 0, 13, 15, 15, 11]
>>> for i, h in enumerate (heads):
... head = tokens [heads [h]] if h> = 1 else 'None'
... print (tokens [i] + '<-' + head])
Set <-None
the <-volume
volume <-Set
to <-Set
zero <-to
when <-Set
I <-'m
'm <-when
in <-'m
a <-meeting
meeting <-in
unless <-Set
John <-'s
's <-calls
school <-calls
calls <-unless
One view is that derivation through parsing should be slightly easier than string. The semantic analysis mapping is expected to be simpler than the literal meaning mapping.

The most confusing part of this problem is that correctness is determined by convention, the comment guide. If you don't read the guide and are not a linguist, you can't judge whether the parsing is correct, which makes the whole task strange and false.

For example, there is an error in the above parsing: According to Stanford's comment guide, "John's school calls" have a structural error. The structure of this part of the sentence is to guide the annotator how to parse an example similar to "John's school clothes".

This point deserves further consideration. Theoretically, we have established guidelines, so the "correct" interpretation should be the opposite. If we violate the agreement, there are good reasons to believe that parsing tasks will become more difficult because the consistency of tasks and other grammars will decrease. [2] But we can test the experience, and we are happy to gain an advantage through the reversal strategy.

We do need differences in conventions-we don't want to receive the same structure, otherwise the results will not be useful. Annotation guidelines strike a balance between what makes downstream applications effective and which parsers can easily predict.
Map tree

When deciding what kind of diagram to build, we can make a particularly effective simplification: limiting the structure of the diagram to be processed. It has advantages not only in terms of learnability, but also in deepening the understanding of algorithms. In most> English parsing work, the dependency graph that we follow the constraint is the mapping tree:

tree. Except for the root, every word has an arc head.
Mapping relations. For each pair of dependencies (a1, a2) and (b1, b2), if a1 <b2, then a2> = b2. In other words, dependencies cannot cross. There cannot be a dependency in the form of a1 b1 a2 b2 or b1 a1 b2 a2.

There is a wealth of literature on parsing non-mapped trees, and relatively little literature on parsing acyclic directed graphs. The parsing algorithm I will explain is used in the field of map trees.
Greedy transformation-based parsing

Our parser takes a list of string symbols as input and outputs a list of arc-headed indexes representing edges in the graph. If the i-th arc head element is j, the dependency includes an edge (j, i). A transformation-based parser> is a finite state converter; it maps an array of N words to an output array of N arc-head indices.
The arc head array represents the arc head of MSNBC: the word index of MSNBC is 1, the word index of reported is 2, head [1] == 2. You should have discovered why the tree structure is so convenient-if we output a DAG structure, the words in this structure may contain multiple arc heads, and the tree structure will no longer work.

Although heads can be represented as an array, we do like to maintain a certain alternative way to access parsing to facilitate the efficient extraction of features. This is the Parse class:

class Parse (object):
  def __init __ (self, n):
    self.n = n
    self.heads = [None] * (n-1)
    self.lefts = []
    self.rights = []
    for i in range (n + 1):
      self.lefts.append (DefaultList (0))
      self.rights.append (DefaultList (0))
 
  def add_arc (self, head, child):
    self.heads [child] = head
    if child <head:
      self.lefts [head] .append (child)
    else:
      self.rights [head] .append (child)
As with grammatical parsing, we also need to track the position in the sentence. We do this by placing an index into the words array and introducing a stack mechanism. Words can be pushed into the stack, and when the arc head of a word is set, the word pops up. So our state data structure is the foundation.

An index i, active in the symbol list
Added dependencies in the parser so far
A stack containing the words generated before index i, for which we have declared arc heads.
Each step of the parsing process applies one of three operations:

SHIFT = 0; RIGHT = 1; LEFT = 2
MOVES = [SHIFT, RIGHT, LEFT]
 
def transition (move, i, stack, parse):
  global SHIFT, RIGHT, LEFT
  if move == SHIFT:
    stack.append (i)
    return i + 1
  elif move == RIGHT:
    parse.add_arc (stack [-2], stack.pop ())
    return i
  elif move == LEFT:
    parse.add_arc (i, stack.pop ())
    return i
  raise GrammarError ("Unknown move:% d"% move)

LEFT and RIGHT operations add dependencies and pop the stack, while SHIFT pushes the stack and increases the value of i in the cache.

Therefore, the parser starts with an empty stack, the cache index is 0, and there is no dependency record. Select a valid action to apply to the current state. Continue selecting and applying until the stack is empty and the cache index reaches the end of the input array. (It is difficult to understand this algorithm without step-by-step tracking. Try to prepare a sentence, draw a mapping parse tree, and then traverse the parse tree by choosing the correct transformation sequence.)

Here is the parsing loop in the code:

class Parser (object):
  ...
  def parse (self, words):
    tags = self.tagger (words)
    n = len (words)
    idx = 1
    stack = [0]
    deps = Parse (n)
    while stack or idx <n:
      features = extract_features (words, tags, idx, n, stack, deps)
      scores = self.model.score (features)
      valid_moves = get_valid_moves (i, n, len (stack))
      next_move = max (valid_moves, key = lambda move: scores [move])
      idx = transition (next_move, idx, stack, parse)
    return tags, parse
 
def get_valid_moves (i, n, stack_depth):
  moves = []
  if i <n:
    moves.append (SHIFT)
  if stack_depth> = 2:
    moves.append (RIGHT)
  if stack_depth> = 1:
    moves.append (LEFT)
  return moves
We start with marked sentences and perform state initialization. The states are then mapped to a feature set scored using a linear model. It then looks for the highest-scoring valid action to apply to the state.

The scoring model here works the same as in the part-of-speech tagging. If you are confused about the idea of extracting features and scoring using a linear model, you should review this article. Here are tips on how the scoring model works:

class Perceptron (object)
  ...
  def score (self, features):
    all_weights = self.weights
    scores = dict ((clas, 0) for clas in self.classes)
    for feat, value in features.items ():
      if value == 0:
        continue
      if feat not in all_weights:
        continue
      weights = all_weights [feat]
      for clas, weight in weights.items ():
        scores [clas] + = value * weight
    return scores
Only the class weights of each feature are summed here. This is usually expressed as a dot product, but I find it unsuitable for many classes.

A directional parser (RedShift) iterates through multiple candidate elements, but ends up picking only the best one. We will focus on efficiency and simplicity while ignoring its accuracy. We performed only a single analysis. Our search strategy will be completely greedy, just like part-of-speech tagging. We will lock in every step of the selection.

If you read the part-of-speech tags carefully, you may find the following similarities. What we do is map the parsing problem to a sequence labeling problem solved using "flattening," or an unstructured learning algorithm (via greedy search).
Feature set

Feature extraction code is always ugly. The characteristics of the parser refer to some identifiers in the context.

First three words in the cache (n0, n1, n2)
Three words at the top of the stack (s0, s1, s2)
two leftmost children of s0 (s0b1, s0b2);
s0 two rightmost children (s0f1, s0f2);
n0 two leftmost children (n0b1, n0b2);
We indicate the word list, part-of-speech tagging, and number of left and right children associated with the above-mentioned 12 tags.

Because a linear model is used, features refer to triples consisting of atomic properties.

def extract_features (words, tags, n0, n, stack, parse):
  def get_stack_context (depth, stack, data):
    if depth>; = 3:
      return data [stack [-1]], data [stack [-2]], data [stack [-3]]
    elif depth> = 2:
      return data [stack [-1]], data [stack [-2]], ''
    elif depth == 1:
      return data [stack [-1]], '', ''
    else:
      return '', '', ''
 
  def get_buffer_context (i, n, data):
    if i + 1> = n:
      return data [i], '', ''
    elif i + 2> = n:
      return data [i], data [i + 1], ''
    else:
      return data [i], data [i + 1], data [i + 2]
 
  def get_parse_context (word, deps, data):
    if word == -1:
      return 0, '', ''
    deps = deps [word]
    valency = len (deps)
    if not valency:
      return 0, '', ''
    elif valency == 1:
      return 1, data [deps [-1]], ''
    else:
      return valency, data [deps [-1]], data [deps [-2]]
 
  features = ()
  # Set up the context pieces --- the word, W, and tag, T, of:
  # S0-2: Top three words on the stack
  # N0-2: First three words of the buffer
  # n0b1, n0b2: Two leftmost children of the first word of the buffer
  # s0b1, s0b2: Two leftmost children of the top word of the stack
  # s0f1, s0f2: Two rightmost children of the top word of the stack
 
  depth = len (stack)
  s0 = stack [-1] if depth else -1
 
  Ws0, Ws1, Ws2 = get_stack_context (depth, stack, words)
  Ts0, Ts1, Ts2 = get_stack_context (depth, stack, tags)
 
  Wn0, Wn1, Wn2 = get_buffer_context (n0, n, words)
  Tn0, Tn1, Tn2 = get_buffer_context (n0, n, tags)
 
  Vn0b, Wn0b1, Wn0b2 = get_parse_context (n0, parse.lefts, words)
  Vn0b, Tn0b1, Tn0b2 = get_parse_context (n0, parse.lefts, tags)
 
  Vn0f, Wn0f1, Wn0f2 = get_parse_context (n0, parse.rights, words)
  _, Tn0f1, Tn0f2 = get_parse_context (n0, parse.rights, tags)
 
  Vs0b, Ws0b1, Ws0b2 = get_parse_context (s0, parse.lefts, words)
  _, Ts0b1, Ts0b2 = get_parse_context (s0, parse.lefts, tags)
 
  Vs0f, Ws0f1, Ws0f2 = get_parse_context (s0, parse.rights, words)
  _, Ts0f1, Ts0f2 = get_parse_context (s0, parse.rights, tags)
 
  # Cap numeric features at 5?
  # String-distance
  Ds0n0 = min ((n0-s0, 5)) if s0! = 0 else 0
 
  features ['bias'] = 1
  # Add word and tag unigrams
  for w in (Wn0, Wn1, Wn2, Ws0, Ws1, Ws2, Wn0b1, Wn0b2, Ws0b1, Ws0b2, Ws0f1, Ws0f2):
    if w:
      features ['w =% s'% w] = 1
  for t in (Tn0, Tn1, Tn2, Ts0, Ts1, Ts2, Tn0b1, Tn0b2, Ts0b1, Ts0b2, Ts0f1, Ts0f2):
    if t:
      features ['t =% s'% t] = 1
 
  # Add word / tag pairs
  for i, (w, t) in enumerate (((Wn0, Tn0), (Wn1, Tn1), (Wn2, Tn2), (Ws0, Ts0))):
    if w or t:
      features ['% d w =% s, t =% s'% (i, w, t)] = 1
 
  # Add some bigrams
  features ['s0w =% s, n0w =% s'% (Ws0, Wn0)] = 1
  features ['wn0tn0-ws0% s /% s% s'% (Wn0, Tn0, Ws0)] = 1
  features ['wn0tn0-ts0% s /% s% s'% (Wn0, Tn0, Ts0)] = 1
  features ['ws0ts0-wn0% s /% s% s'% (Ws0, Ts0, Wn0)] = 1
  features ['ws0-ts0 tn0% s /% s% s'% (Ws0, Ts0, Tn0)] = 1
  features ['wt-wt% s /% s% s /% s'% (Ws0, Ts0, Wn0, Tn0)] = 1
  features ['tt s0 =% s n0 =% s'% (Ts0, Tn0)] = 1
  features ['tt n0 =% s n1 =% s'% (Tn0, Tn1)] = 1
 
  # Add some tag trigrams
  trigrams = ((Tn0, Tn1, Tn2), (Ts0, Tn0, Tn1), (Ts0, Ts1, Tn0),
        (Ts0, Ts0f1, Tn0), (Ts0, Ts0f1, Tn0), (Ts0, Tn0, Tn0b1),
        (Ts0, Ts0b1, Ts0b2), (Ts0, Ts0f1, Ts0f2), (Tn0, Tn0b1, Tn0b2),
        (Ts0, Ts1, Ts1))
  for i, (t1, t2, t3) in enumerate (trigrams):
    if t1 or t2 or t3:
      features ['ttt-% d% s% s% s'% (i, t1, t2, t3)] = 1
 
  # Add some valency and distance features
  vw = ((Ws0, Vs0f), (Ws0, Vs0b), (Wn0, Vn0b))
  vt = ((Ts0, Vs0f), (Ts0, Vs0b), (Tn0, Vn0b))
  d = ((Ws0, Ds0n0), (Wn0, Ds0n0), (Ts0, Ds0n0), (Tn0, Ds0n0),
    ('t' + Tn0 + Ts0, Ds0n0), ('w' + Wn0 + Ws0, Ds0n0))
  for i, (w_t, v_d) in enumerate (vw + vt + d):
    if w_t or v_d:
      features ['val / d-% d% s% d'% (i, w_t, v_d)] = 1
  return features
training

Learning weights and part-of-speech tagging use the same algorithm, the average perceptron algorithm. Its main advantage is that it is an online learning algorithm: examples flow in one by one, we make predictions, check the true answer, and adjust the opinion (weight) if the prediction is wrong.

The loop training looks like this:

class Parser (object):
  ...
  def train_one (self, itn, words, gold_tags, gold_heads):
    n = len (words)
    i = 2; stack = [1]; parse = Parse (n)
    tags = self.tagger.tag (words)
    while stack or (i + 1) <n:
      features = extract_features (words, tags, i, n, stack, parse)
      scores = self.model.score (features)
      valid_moves = get_valid_moves (i, n, len (stack))
      guess = max (valid_moves, key = lambda move: scores [move])
      gold_moves = get_gold_moves (i, n, stack, parse.heads, gold_heads)
      best = max (gold_moves, key = lambda move: scores [move])
    self.model.update (best, guess, features)
    i = transition (guess, i, stack, parse)
  # Return number correct
  return len ([i for i in range (n-1) if parse.heads [i] == gold_heads [i]])

The most interesting part of the training process is get_gold_moves. With Goldbery and Nivre (2012), the performance of our parser may be improved, and they have pointed out that we have been wrong for many years.

In the part-of-speech tagging article, I remind you that during training, you need to ensure that the last two predictive markers are passed as features of the current marker, not the last two golden markers. During the test, there are only predictive markers. If the features are based on the golden sequence in the training process, the training environment will not be consistent with the test environment, so it will get the wrong weight.

The problem we face in parsing is not knowing how to pass the prediction sequence! By adopting the gold standard tree structure and discovering the transition sequence that can be converted to a tree, etc., to make the training work, you get the returned action sequence, guarantee that the motion is performed, and you will get the gold standard dependency.

The problem is that if the parser is in any state that doesn't follow the gold standard sequence, we don't know how to teach it the "correct" movement. Once the parser has an error, we don't know how to train from the instance.

This is a big problem, because it means that once the parser starts to make errors, it will stop in any state that is not part of the training data-causing more errors.

For greedy parsers, the problem is specific: once you use directional properties, there is a natural way to make structured predictions.

Like all the best breakthroughs, once you understand these, the solution seems obvious. All we have to do is define a function that asks "how many gold standard dependencies can be recovered from this state". If you can define this function, you can perform each exercise in turn, and then ask, "How many gold standard dependencies can be recovered from this state?". It is suboptimal if the operation adopted makes it less dependent on the implementation of the gold standard.

There is a lot to learn here.

So we have the function Oracle (state):

Oracle (state) = | gold_arcs ∩ reachable_arcs (state) |

We have a collection of operations, and each operation returns a new state. We need to know:

  shift_cost = Oracle (state) – Oracle (shift (state))
  right_cost =Oracle (state) – Oracle (right (state))
  left_cost = Oracle (state) – Oracle (left (state))

Now, at least one operation returns 0. Oracle (state) asks: "What is the cost of the best path forward?" The first step in the best path is to shift, right, or left.

It turns out that we can conclude that Oracle has simplified many transition systems. The derivative of the transition system we are using, Arc Hybrid, was proposed by Goldberg and Nivre (2013).

Instead of implementing a function Oracle (state), we implement oracle as a 0-cost motion method. This prevents us from doing a bunch of expensive copy operations. I hope that the reasoning in the code is not too difficult to understand. If you are confused and want to get the bottom line, you can refer to the paper by Goldberg and Nivre.

def get_gold_moves (n0, n, stack, heads, gold):
  def deps_between (target, others, gold):
    for word in others:
      if gold [word] == target or gold [target] == word:
        return True
    return False
 
  valid = get_valid_moves (n0, n, len (stack))
  if not stack or (SHIFT in valid and gold [n0] == stack [-1]):
    return [SHIFT]
  if gold [stack [-1]] == n0:
    return [LEFT]
  costly = set ([m for m in MOVES if m not in valid])
  # If the word behind s0 is its gold head, Left is incorrect
  if len (stack)> = 2 and gold [stack [-1]] == stack [-2]:
    costly.add (LEFT)
  # If there are any dependencies between n0 and the stack,
  # pushing n0 will lose them.
  if SHIFT not in costly and deps_between (n0, stack, gold):
    costly.add (SHIFT)
  # If there are any dependencies between s0 and the buffer, popping
  # s0 will lose them.
  if deps_between (stack [-1], range (n0 + 1, n-1), gold):
    costly.add (LEFT)
    costly.add (RIGHT)
  return [m for m in MOVES if m not in costly]

Performing a "dynamic oracle" training process will produce a large difference in accuracy-usually 1-2%, and is no different from the runtime approach. The old "static oracle" greedy training process is completely outdated; there is no reason to do that.
to sum up

I feel that language technology, especially those related grammars, is particularly mysterious. I can't imagine what kind of program can be realized.

I think it is natural for people that the best solution may be quite complicated. A 200,000-line Java package feels appropriate.

However, when implementing only a single algorithm, the algorithm code is often short. When you implement only one algorithm, you really know what to write before you write, and you don't need to pay attention to any unnecessary abstractions that have a large performance impact.
Comment

[1] I'm really not sure how to calculate the number of lines of code for the Stanford parser. Its jar file is loaded with 200k size content, including a large number of different models. It doesn't matter, but it seems safe around 50k.

[2] For example, how to interpret "John's school of music calls"? You need to confirm that the phrase "John's school" has the same structure as "John's school calls" and "John's school of music calls". Inferring the different "slots" into which a phrase can be put is a key way for our inference syntax analysis. You can think of each phrase as a connector with a different shape, you need to insert different slots-each phrase also has a certain number of different shaped slots. We are trying to figure out what kind of connector is where, so we can figure out how the sentences are connected together.

[3] Here is an updated version of the Stanford parser using "deep learning" technology, this version is more accurate. However, the accuracy of the final model still ranks behind the best move-in reduction analyzer. This is a great article, the idea is implemented on a parser, it doesn't really matter if the parser is the most advanced.

[4] A detail: Stanford dependencies are actually automatically generated by a given gold standard phrase structure tree. Refer to the Stanford dependency converter page here: http://nlp.stanford.edu/software/stanford-dependencies.shtml.
Unfounded guess

For a long time, incremental language processing algorithms have been a major interest in the scientific community. If you want to write a parser to test the theory of how the human statement processor works, then this parser needs to build a partial interpreter. There is ample evidence here, including commonsense reflection, which establishes that we do not cache input, and the speaker completes the analysis immediately after the expression.

But compared with neat scientific features, the current algorithm wins! As far as I can tell you, the secret to winning is:

Increment. Early text restricted search.
Error-driven. Training includes an operating assumption that an error occurs and is updated.

The connection to human sentence processing looks tempting. I look forward to seeing whether these engineering breakthroughs have led to some advances in psycholinguistics.
bibliography

The NLP literature is almost completely open. All related papers can be found here: http://aclweb.org/anthology/.

The parser I describe is an implementation of the dynamic oracle arc-hybrid system:

Goldberg, Yoav; Nivre, Joakim

Training Deterministic Parsers with Non-Deterministic Oracles

TACL 2013

However, I wrote my own features. The arc-hybrid system was originally described here:

Kuhlmann, Marco; Gomez-Rodriguez, Carlos; Satta, Giorgio

Dynamic programming algorithms for transition-based dependency parsers

ACL 2011

Originally described here is the dynamic oracle training method:

A Dynamic Oracle for Arc-Eager Dependency Parsing

Goldberg, Yoav; Nivre, Joakim

COLING 2012

When Zhang and Clark studied directed search, this work relied on a major breakthrough in accuracy based on transformation-based parsers. They have published many papers, but their preferred citations are:

Zhang, Yue; Clark, Steven

Syntactic Processing Using the Generalized Perceptron and Beam Search

Computational Linguistics 2011 (1)

Another important article is this short feature engineering article, which further improves accuracy:

Zhang, Yue; Nivre, Joakim

Transition-based Dependency Parsing with Rich Non-local Features

ACL 2011

As a learning framework for directional parsers, the generalized perceptron comes from this article

Collins, Michael

Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms

EMNLP 2002

Experimental details

The results at the beginning of the article cited Article 22 of the Wall Street Journal Corpus. The Stanford parser performs the following:

java -mx10000m -cp "$ scriptdir / *:" edu.stanford.nlp.parser.lexparser.LexicalizedParser \
-outputFormat "penn" edu / stanford / nlp / models / lexparser / englishFactored.ser.gz $ *
A small post-processing was applied, revoking the hypothetical markup added to the number by the Stanford parser, bringing the number into PTB markup:

"" "Stanford parser retokenises numbers. Split them." ""
import sys
import re
 
qp_re = re.compile ('\ xc2 \ xa0')
for line in sys.stdin:
  line = line.rstrip ()
  if qp_re.search (line):
    line = line.replace ('(CD', '(QP (CD', 1) + ')'
    line = line.replace ('\ xc2 \ xa0', ') (CD')
  print line
The resulting PTB format file is converted into dependencies using the Stanford converter:

for f in $ 1 / *. mrg; do
 echo $ f
 grep -v CODE $ f> "$ f.2"
 out = "$ f.dep"
 java -mx800m -cp "$ scriptdir / *:" edu.stanford.nlp.trees.EnglishGrammaticalStructure \
  -treeFile "$ f.2" -basic -makeCopulaHead -conllx> $ out
done
I can't read it easily, but it should just use the general settings of the relevant literature to convert each .mrg file in a directory into a CoNULL format Stanford basic dependency file.

Then I converted the gold standard tree from the Wall Street Journal Corpus Article 22 for evaluation. Accurate scores refer to all unlabeled auxiliary scores in the unlabeled identification (such as arc head index)

To train parser.py, I exported the Gold Standard PTB tree structure of the Wall Street Journal Corpus 02-21 into the same transformation script.

In a nutshell, the Stanford model and parser.py are trained in the same set of sentences, making predictions on a test set of holdings where we know the answer. Accuracy refers to how much we correct the beginning of a sentence.

Test speed on a 2.4Ghz Xeon processor. I experimented on the server and provided more memory for the Stanford parser. The parser.py system works fine on my MacBook Air. In parser.py's experiments, I used PyPy; CPython is about half faster than earlier benchmarks.

One reason parser.py runs so fast is that it does unlabeled parsing. Based on previous experiments, the labeled parser may be 400 times slower and increase accuracy by about 1%. If you have access to the data, adapting the program to a marked parser will be a great exercise for the reader.

The result of the RedShift parser is taken from version b6b624c9900f3bf, and runs as follows:

./scripts/train.py -x zhang + stack -k 8 -p ~ / data / stanford / train.conll ~ / data / parsers / tmp
./scripts/parse.py ~ / data / parsers / tmp ~ / data / stanford / devi.txt / tmp / parse/
./scripts/evaluate.py /tmp/parse/parses ~/data/stanford/dev.conll


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.