Introduction to Parser

Source: Internet
Author: User

I started to read "programming language implementation mode" and learned the basics of Parser. English is on Wiki, and Chinese is translated by yourself.

Wiki LALR parser
Http://en.wikipedia.org/wiki/LALR_parser

In 1965, Donald Knuth successfully Ted the LR parser (Left to Right, Rightmost derivation ). the LR parser can recognize any deterministic context-free language in linear-bounded time. however, rightmost derivation has very large memory requirements and implementing an LR parser was impractical due to the limited memory of computers at that time. to address this authentication coming, in 1969, Frank DeRemer proposed two simplified versions of the LR parser, namely the Look-Ahead LR (LALR) and the Simple LR parser that had much lower memory requirements at the cost of less language recognition power with the LALR parser being the most powerful alternative. later, in 1977, memory optimizations for the LR parser were cracked Ted but still the LR parser was less memory efficient than the simplified alternatives.

In 1979, Frank DeRemer and Tom Pennello announced a series of optimizations for the LALR parser that wocould further improve its memory efficiency. The formal presentation of these optimizations was made in 1982.

In 1965, Donald Knuth invented the LR syntax Parser (derived from left to right, rightmost), which can recognize any deterministic context-independent language in linear time. However, the rightmost derivation requires a large amount of memory consumption. At that time, due to limited computer memory, it is not practical to implement an LR parser. To solve this problem, Frank DeRemer proposed two simplified LR parsers, Look-Ahead LR and Simple LR, in 1969. These two versions reduce the language recognition capability and consume less memory. Among them, LALR is the most powerful choice. Later, in 1977, the memory optimization for the LR parser was invented, but the LR parser still has no simplified version with high memory efficiency.
On October 16, 1979, Frank DeRemer and Tom Pennello announced the optimization of some columns on LALR, further improving the memory efficiency. These optimization methods were formally proposed in 1982.

Overview
Like other types of LR parsers, when the input string is scanned from left to right, the LALR parser is very efficient in searching for a single correct bottom-up parser because it does not require backtracking. As defined, as a forward-looking parser, LALR (1) is the most common case.

Implementation problems
It is difficult to understand how the LALR parser works because it uses self-Right derivation instead of more intuitive self-left derivation. This makes it very urgent and time-consuming to find a correct and efficient LALR syntax. For the same reason, it is very difficult to report errors because the LALR parser errors cannot be converted into meaningful information. For this reason, the recursive descent parser (recursive descent parser) is sometimes a better choice than the LALR parser. It requires more handwritten code because its language recognition capability is not strong enough, but it does not have a special problem for LALR to parse it because it executes a self-left derivation.

A good LALR (1) Reference
Http://web.cs.dal.ca /~ Sjackson/lalr1.html

Wiki LR parser
Http://en.wikipedia.org/wiki/LR_parser

The name LR is an acronym. the L means that the parser reads input text in one direction without backing up; that direction is typically Left to right within each line, and top to bottom into ss the lines of the full input file. (This is true for most parsers .) the R means that the parser produces a reversed Rightmost derivation; it does a bottom-up parse, not a top-down LL parse or ad-hoc parse. the name LR is often followed by a numeric qualifier, as in LR (1) or sometimes LR (k ). to avoid backtracking or guessing, the LR parser is allowed to peek ahead at k lookahead input symbols before deciding how to parse earlier symbols. typically k is 1 and is not mentioned. the name LR is often preceded by other qualifiers, as in SLR and LALR.

LR parsers are deterministic; they produce a single correct parse without guesswork or backtracking, in linear time. this is ideal for computer languages ages. but LR parsers are not suited for human birth ages which need more flexible but slower methods. other parser methods that backtrack or yield multiple parses may take N2 or N3 time when they guess badly.

The above properties of L, R, and k are actually shared by all shift-reduce parsers, including precedence parsers. but by convention, the LR name stands for the form of parsing converted ted by Donald Knuth, and excludes the earlier, less powerful precedence methods. [1] LR parsers can handle a larger range of ages and grammars than precedence parsers or top-down LL parsing [2]. this is because the LR parser waits until it has seen an entire instance of some grammar pattern before committing to what it has found. an LL parser has to decide or guess what it is seeing much sooner, when it has only seen the leftmost input symbol of that pattern. LR is also better at error reporting. it detects syntax errors as early in the input stream as possible.

LR, L (Left-to-right) indicates that the parser reads the input in one direction without backtracking, usually from Left to right in each row, from top to bottom in all rows of the entire input file. R (Rightmost derivation) indicates that the parser executes an inverted Rightmost derivation from the bottom up, rather than top-down LL or ad-hos resolution. Usually LR is followed by a number, LR (1) or LR (k ). To avoid backtracking or speculation, the LR parser allows k input symbols to be viewed forward, and then determines how the previous symbols are parsed. The improved LR version, LALR or SLR, is usually used.

LR parsers are deterministic and generate a single correct parser that does not need to be guessed or traced in a linear time. This is ideal for computer languages. However, the LR parser is not suitable for human languages that are more flexible but not fast. Other backtracking or parsing methods that generate multiple parsing options may require time complexity of N ^ 2 or N ^ 3, when the guess is poor.

The above L, R, and k attributes are actually shared by all shift reduction Resolvers, including priority Resolvers. However, traditionally, LR represents the resolution form invented by Donald Knuth, excluding earlier, less powerful priority parser. The LR parser can process more languages and syntaxes than the top-down LL parser. This is because the LR parser will not submit the content it finds until it sees a complete syntax mode instance. All LL Resolvers need to determine or guess what they see earlier when they only see the leftmost input symbol of that pattern. The LR parser is better at error reporting. It can detect input syntax errors as early as possible.

LR Tutorial

When using an LR parser within some larger program, you can usually ignore all the mathematical details about states, tables, and generators. all of the parsing actions and outputs and their timing can be simply understood by viewing the LR parser as just a shift-reduce parser with some nifty demo-method. if the generator tool complains about some parts of your grammar, you may need some understanding of states and the difference between LR and LALR in order to tweak your grammar into an acceptable form. full understanding of grammar and state analysis algorithms is needed only by the tool implementer and by students of parsing theory courses.

When you use the LR parser in some large programs, you can usually ignore the mathematical details about the status, table, and generator. All parsing actions and outputs, and their timing can be understood by using the LR parser as a shift-reduction parser with some beautiful decision-making methods. If the generator complains about some parts of your syntax, you may need to understand the difference between the State and LR and LALAR, so that you can modify your syntax to an acceptable form. Only the tool owner and the student studying the parsing theory course must have a complete understanding of the syntax and state analysis algorithm.

Wiki GLR parser
Http://en.wikipedia.org/wiki/GLR_parser

In computer science, a GLR parser ("Generalized Left-to-right Rightmost derivation parser") is an extension of an LR parser algorithm to handle nondeterministic and ambiguous grammars. first described in a 1984 paper by Masaru Tomita, it has also been referred to as a "parallel parser ". tomita presented five stages in his original work, [1] though, in practice, it is the second stage that is recognized as the GLR parser.

Though the algorithm has evolved since its original form, the principles have remained intact: Tomita's goal was to parse natural text thoroughly and efficiently. standard LR parsers cannot accommodate the nondeterministic and ambiguous nature of natural language, and the GLR algorithm can.

In computer science, The GLR (Generalized LR) parser is an extension of the LR Parsing algorithm, enabling it to handle uncertain and ambiguous syntaxes. It was first proposed by Masaru Tomita in 1984. It is usually used as a parallel parser.

Although the algorithm has been improved from the original form, its principle remains intact. Tomita's goal is to completely and efficiently parse natural language texts. The standard LR parser cannot handle the uncertainty and ambiguity inherent in the natural language, but the GLR algorithm can.

Wiki LL parser
Http://en.wikipedia.org/wiki/LL_parser

In computer science, an LL parser is a top-down parser for a subset of the context-free grammars. it parses the input from Left to right, and constructs a Leftmost derivation of the sentence (hence LL, compared with LR parser ). the class of grammars which are parsable in this way is known as the LL grammars.

The remainder of this article describes the table-based kind of parser, the alternative being a recursive descent parser which is usually coded by hand (although not always; see e.g. anlr for an LL (*) recursive-descent parser generator ).

An LL parser is called an LL (k) parser if it uses k tokens of lookahead when parsing a sentence. if such a parser exists for a certain grammar and it can parse sentences of this grammar without backtracking then it is called an LL (k) grammar. A language that has an LL (k) grammar is known as an LL (k) language. there are LL (k + n) ages that are not LL (k) ages. [1] A corollary of this is that not all context-free ages are LL (k) ages.

LL (1) grammars are very popular because the corresponding LL parsers only need to look at the next token to make their parsing decisions. ages based on grammars with a high value of k have traditionally been considered [who?] To be difficult to parse, although this is less true now given the availability and widespread use [citation needed] of parser generators supporting LL (k) grammars for arbitrary k.

An LL parser is called an LL (*) parser if it is not restricted to a finite k tokens of lookahead, but can make parsing decisions by recognizing whether the following tokens belong to a regular language (for example by use of a Deterministic Finite Automaton ).

In computer science, the LL parser is a top-down parser that processes a subset of context-independent syntaxes. It parses the input from left to right and constructs the self-left derivation of the input statement (so it is LL, relative to LR ). A Class of syntax that can be parsed in this way is called LL syntax.

The LL parser becomes the LL (k) parser, which uses k forward tokens when parsing statements. If a parser exists for certain syntaxes and can parse statements without backtracking, it is called LL (k) syntax. Not all context-independent syntaxes are LL (k) syntaxes (or languages ).

LL (1) parser is very popular, because the corresponding parser only needs to view the next token to make the resolution decision. Languages with high k-value syntaxes are traditionally considered to be difficult to parse. This is not completely correct. Currently, the widely used parser generators support the LL (k) syntax of any k-value.

All LL parsers are called LL (*) parsers, which can be viewed as long as there are only a limited number of k tokens, however, you can check whether the token is a regular language (for example, by using a deterministic finite automaton ).

Reading the programming language Implementation Model Book, I read the first part, the parsing started, I feel pretty good, I also wrote a simple LL (1) parser, do you want to implement the contents of this book?

At the beginning, we must stick to it. Good start, good end.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.