Original: http://www.inf.puc-rio.br/~roberto/lpeg/lpeg.html Translator: This is the official Lpeg document. This period of time to study lpeg found that the domestic articles on lpeg very few, so decided to translate the document. Translation is not very complete, just a common part, will slowly translate down, there are students can help me to complete the words will be too grateful.
Description: Lpeg is a library of new pattern matching (pattern-matching) in Lua, based on parsing Expression grammars (pegs). This article is a reference manual for the Lpeg library. For more detailed documentation, see a Text pattern-matching Tool based on parsing Expression grammars., here is a more detailed discussion of the implementation. According to Snobol's tradition, Lpeg defines patterns as the first level object, which means that patterns can be used as a regular LUA variable (represented by UserData). This library provides a variety of ways to create and assemble patterns. By using meta-methods, some of the individual functions can provide similar infix or prefix operators. On the one hand, the results of lpeg matching are usually more detailed than the regular expressions in general. On the other hand, the first level of patterns can better describe and extend the regular relationship, we can define functions to create and combine patterns.
Operator |
Description |
lpeg.P(string) |
Match string |
lpeg.P(n) |
Match n Strings |
lpeg.S(string) |
Matches any character (Set) in a string |
lpeg.R("xy") |
Matches any one of the characters between x and Y (Range) |
patt^n |
匹配至少n个 patt |
patt^-n |
匹配最多n个 patt |
patt1 * patt2 |
Match first patt1 and then matchpatt2 |
patt1 + patt2 |
Match to meet patt1 or satisfy patt2 (two choice one) |
patt1 - patt2 |
Match meets Patt1 and does not meet PATT2 |
-patt |
And("" - patt)一样 |
#patt |
Matches but patt consumes no input |
lpeg.B(patt) |
Matches patt behind the current position, consuming no input |
To give a very simple example, lpeg. R ("09") ^1 creates a pattern with the function of matching a non-empty sequence of numbers. To give a slightly more complicated example,-lpeg. P (1) matches an empty string that cannot have any characters, which is usually used at the end of the matching rule.
Functions
Lpeg.match (pattern, subject [, Init])The matching function. It attempts to match the target string with a given pattern. If the match succeeds, the position of the first character that matches the successful substring is returned, or the captured value is returned (if the value is successfully captured). An optional numeric parameter, INIT, as the starting position to match the target string. As with the usual LUA library, if the parameter is a negative number, it is calculated from the last character of the target string and the starting position is obtained. Unlike a typical matching function, match works only in a fixed pattern, meaning that it tries to match the prefix character of the target string instead of matching any substring. So, if we want to match a substring of any location, You must use LUA to write a loop to match each position of the target string as the starting position, or to write a pattern to match any character. In contrast, the second is very convenient, fast and efficient, so take a look at the following example.
Lpeg.type (value)If value is a pattern, a string of "pattern" is returned. Otherwise, nil is returned.
lpeg.version ()Returns the string version number of the Lpeg.
lpeg.setmaxstack (max)Set the upper limit of the stack, which is 400 by default.
Basic Constructions
Lpeg. P (value)Use the following rules to convert a given value into a suitable pattern:
- If the argument is a pattern, the parameter pattern is returned.
- If the argument is a string, the pattern that matches the string is returned.
- If the argument is a non-negative integer n, it returns the pattern of a string that matches exactly n characters.
- If the argument is a negative integer- n, only the input string has less than n characters left. Lpeg. P (-N) is equivalent to-lpeg. P (n) (see the unary minus Operation).
- If the parameter is a Boolean, the result is a pattern of always succeeds or always fails (according to the Boolean value), without CO nsuming any input.
- If the parameter is a table, it is interpreted as a grammar (see grammars).
- If the argument is a function, a pattern is returned, equivalent to a match-time capture with an empty string match.
Lpeg. B (Patt)Returns a pattern that matches only if the input string at the current position was preceded by Patt. Pattern Patt must match only strings with some fixed length and it cannot contain captures. Like the
and predicate, this pattern never consumes any input, independently of success or failure.
Lpeg. R ({range})Returns any one character within a given range. The range is a 2-length string xy, and all characters returned are x and y corresponding to the ASCII encoding (including x and y). As an example, pattern lpeg. R ("09") matches all numbers, lpeg. R ("AZ", "AZ") matches all ASCII letters.
Lpeg. S (String)Returns a pattern that matches a character that is any one character in a given string. (The S stands for
Set.) As an example, pattern lpeg. S ("+-*/") matches any one of the arithmetic operators. Note that if S is a character, then Lpeg. P (s) is equivalent to Lpeg. S (s).
Lpeg. V (v)This operation creates a non-terminal (a
variable) for a grammar. The created non-terminal refers to the rule indexed by V in the enclosing grammar. (see
GrammarsFor details.)
lpeg.locale ([table])Returns a table with patterns for matching some character classes according to the current locale. The table has fields named Alnum, Alpha, cntrl, digit, graph, lower, print, punct, space, upper, and Xdigit, each one cont Aining a correspondent pattern. Each of the pattern matches any single character, belongs to its class. If called with a argument table, then it creates those fields inside the given table and returns that table.
#pattReturns a pattern that matches only if the input string matches Patt, but without consuming any input, independently of SU Ccess or failure. (This pattern was called an
and predicateAnd it's equivalent to
&pattIn the original PEG notation.) This pattern never produces any capture.
-pattReturns a pattern that requires the input string to not match Patt. It does not consume any input, only success or failure. (this pattern was equivalent to
!pattIn the original PEG notation.) As an example, Pattern-lpeg. P (1) matches the end of the string. This pattern never produces any catch, because either Patt failure or-patt failure. (A failed pattern never produces any captures)
patt1 + patt2Returns a pattern that conforms to patt1 or PATT2. If both PATT1 and PATT2 are set together, the result is a two-set.
Lower = Lpeg. R ("az"= lpeg. R ("AZ"= lower + Upper
Patt1-patt2Equivalent
!patt2 patt1。 This pattern means that it does not match the PATT2 and matches the PATT1. If successful, the last capture is the content of PATT1. This pattern does not capture any information from the PATT2 (as either patt2 fails or patt1-patt2 fails). If both Patt1 and PATT2 are character sets, the operation is equivalent to the set difference. Note that-patt is equivalent to ""-Patt (or 0-patt). If Patt is a character set, 1-patt is its complement.
PATT1 * Patt2Returns a pattern that matches the patt1,patt1 match, starting with the next character that matches the completion of the PATT2. The identity element for this operation is the pattern lpeg. P (True), which always succeeds. (Lpeg uses the * operator [instead of the more obvious.] Both because it had the right priority and because in formal LA Nguages It is common to use a dot for denoting concatenation.)
patt^nIf n is a non-negative number, this pattern is equivalent to
Patt
N
patt*。 It matches a condition that is at least N Patt. In addition, if n is a negative number, this pattern is equivalent to
(Patt?)
- N: It matches the condition is up to |n| A patt. In individual cases, in the original PEG, the patt^0 is equivalent to
patt*, patt^1 is equivalent to
patt+,Patt^-1 is equivalent to
Patt?。 In all cases, the resulting pattern is greedy with no backtracking (also called a
possessiveRepetition). Note that patt^n only matches the longest sequence.
GrammarIn LUA environments, you can customize some of the patterns so that the newly defined pattern can use the old pattern that has already been defined, however, these techniques do not allow you to define the patterns of the loop. For recursive patterns, we need real grammars. Lpeg by using table to define the Gramar, each entry of the table is a rule.
Captures
Captureis a value that is captured after a pattern match succeeds. Lpeg provides a variety of capture methods that generate different capture values based on pattern matching and composition. Here is a basic overview of capture:
Operation |
What it produces |
lpeg.C(patt) |
Substring of all pattern captures |
lpeg.Carg(n) |
The value of the nth extra argument to lpeg.match (matches the empty string) |
lpeg.Cb(name) |
The values produced by the previous group capture named name (matches the empty string) |
lpeg.Cc(values) |
The given values (matches the empty string) |
lpeg.Cf(patt, func) |
The captured result is called by Func as a parameter in turn |
lpeg.Cg(patt [, name]) |
Patt all return values as a return value and specify a name |
lpeg.Cp() |
Location of the capture |
lpeg.Cs(patt) |
Create an alternative capture |
lpeg.Ct(patt) |
Returns all the return values in the Patt in an array in the parent-child relationship. |
patt / string |
string , with some marks replaced by captures ofpatt |
patt / number |
The n-th value captured patt by, or no value is number zero. |
patt / table |
table[c] , where is the c (first) capture ofpatt |
patt / function |
The returns of function applied to the captures ofpatt |
lpeg.Cmt(patt, function) |
The returns of applied to the captures of, the application is do at function patt match time |
Lpeg. C (Patt)Returns the substring matched to and the return value of the Patt inner sub-patt.
Lpeg. Carg (n)Creates an
argument capture. This pattern matches the empty string and produces the value given as the nth extra argument given in the call to Lpeg.mat Ch.
Lpeg. Cb (name)Creates a
Back capture. This pattern matches the empty string and produces the values of produced by the
Most recent
Group CaptureNamed name (where name can is any Lua value).
Most recentmeans the last
Complete
OutermostGroup capture with the given name. A
CompleteCapture means that the entire pattern corresponding to the capture have matched. An
OutermostCapture means that the capture was not a inside another complete capture.
Lpeg. Cc ([value, ...])Creates a
constant Capture. This pattern matches the empty string and produces all given values as its captured values.
Lpeg. Cf (Patt, func)Create a collapsed capture, assuming that Patt has n return values, C1,C2,C3, then CF returns F (f (f (C1), C2), C3). For example, a comma-separated sequence of numbers calculates the result of adding each number in a number string:
--matches a numeral and captures its numerical valueNumber = Lpeg. R" the"^1/Tonumber--matches a list of numbers, capturing their valuesList = number * (","* Number) ^0--auxiliary function to add numbersfunctionAdd (ACC, NewValue)returnACC + newvalueEnd--folds the list of numbers adding themsum =Lpeg. Cf (list, add)--example of UsePrint(Sum:match ("10,30,43"))-->
Lpeg. Cg (Patt [, name])Creates a captured collection, which returns all the values that are type into a capture. The collection may be anonymous (if there is no name) or named (can be any non-nil value Lua value).
Lpeg. Cp ()Creates a
Position Capture. It matches the empty string and captures the position in the subject where the match occurs. The captured value is a number.
Lpeg. Cs (Patt)Creates a
substitution Capture, which captures the substring of the subject that matches Patt, with
Substitutions. For any capture inside Patt with a value, the substring that matched the capture was replaced by the capture value (which s Hould be a string). The final captured value is the string resulting from all replacements.
Lpeg. Ct (Patt)Creates a captured array. Create a table capture; This capture creates a table that saves anonymous captures to the table, starting with an index of 1. For named group captures, the group name is key.
patt/stringCreates a
string Capture. It creates a capture string based on string. The captured value is a copy of string, except that the character% works as an escape character:any sequence in string o f the form%
N, with
NBetween 1 and 9, stands for the match of the
N-th capture in Patt. The sequence%0 stands for the whole match. The sequence is a single percent stands for a.
Patt/numberCreates a
numbered Capture. For a non-zero number, the captured value was the n-th value captured by Patt. When number was zero, there is no captured values.
patt/tableCreates a
Query Capture. It indexes the given table using as key the first value captured by Patt, or the whole match if Patt produced no value. The value at: Index is the final value of the capture. If The table does not has this key, there is no captured value.
patt/functionCreates a
function Capture. It calls the given function passing all captures made by Patt as arguments, or the whole match if Patt made no capture. The values returned by the function is the final values of the capture. In particular, if function returns no value, there is no captured value.
Lpeg. CMT (Patt, function)Creates a
Match-time Capture. Unlike all and captures, this one was evaluated immediately when a match occurs. It forces the immediate evaluation of all their nested captures and then calls function. The given function gets as arguments the entire subject, the current position (after the match of Patt), plus any capture Values produced by Patt. The first value returned by function defines how the match happens. If The call returns a number, the match succeeds and the returned number becomes the new current position. (Assuming a subject
sand current position
IThe returned number must is in the range
[I, Len (s) + 1].) If the call returns
trueThe match succeeds without consuming any input. (So, to return
trueis equivalent to return
I.) If the call returns
false,
Nil, or no value, the match fails. Any extra values returned by the function become the values produced by the capture.
Some Examples
Note: The following content is not translated because open source China has been translatedHttp://www.oschina.net/translate/lpeg-syntax
"Translation" Lpeg Programming Guide