Lua pattern matching

Source: Internet
Author: User
Tags character classes uppercase character uppercase letter

Pattern Matching Function

The most powerful functions in the string library are:

String. Find (string SEARCH)
String. gsub (Replacement of global strings)
String. gfind (Global string SEARCH)
String. gmatch (returns the iterator that finds the string)

These functions are all pattern-based. Unlike other scripting languages, Lua does not use POSIX Regular Expressions [4] (also write Regexp) for pattern matching. The main reason is the program size: Implementing a typical POSIX-compliant Regexp requires about 4000 lines of code, which is larger than the overall Lua standard library. Under the trade-off, the implementation of pattern matching in Lua only uses 500 lines of code. Of course, this means that it is impossible to implement all the POSIX specifications. However, the pattern matching feature in Lua is powerful and contains some features that are not easily implemented using standard POSIX pattern matching.

String. gmatch (STR, pattern)

This is a function that returns the iterator. The actual use case is as follows:

S = "Hello world from Lua"
For W in string. gmatch (S, "% A +") Do
Print (W)
End

Here is an example of capturing and saving the matching characters to different variables:

T = {}
S = "from = World, to = Lua"
For K, V in string. gmatch (S, "(% W +) = (% W +)") Do
T [k] = V
End
For K, V in pairs (t) Do
Print (K, V)
End

String. gsub (STR, pattern, REPL, n)

String. the gsub () function matches the source string 'str' Based on the given pair expression and returns a copy of the source string. All successfully paired substrings in this copy will be replaced. the function returns the number of successful pairs. the actual replacement behavior is determined by the repl parameter type:

When repl is a string, all matched substrings are replaced with the specified repl string.
When repl is a table, for each successfully paired sub-string, the function will try to find the elements in the table with the key value and return this element. if the pair contains any capture information, search for it with the capture number 1 as the key value.
When repl is a function, each matched substring is passed into the function as a parameter.
When repl is a table or function, if the table or function returns a string or number value, this value will still be used to replace the matched substring in the copy string. if the value returned by this table/function is null, it will not be replaced.

The N parameter is optional. When it is specified, the string. gsub () function operates only the first n successfully matched members in the source string.

The following are examples:

> Print (string. gsub ("Hello World", "(% W +)", "% 1% 1 "))
Hello World 2

> Print (string. gsub ("Hello Lua", "(% W +) % S * (% W +)", "% 2% 1 "))
Lua Hello 1

> String. gsub ("Hello World", "% W +", print)
Hello World 2

> Lookuptable = {["hello"] = "Hola", ["world"] = "mundo "}
> Print (string. gsub ("Hello World", "(% W +)", lookuptable ))
Hola Mundo 2

String. Match (STR, pattern, init)

String. Match () only searches for the first pair in the source string 'str'. The init parameter is optional and specifies the start point of the search process. The default value is 1.

When a pair is successfully matched, the function returns all the capture results in the pair expression. If no capture tag is set, the entire pair string is returned. If no matching is successful, Nil is returned.

String. Match ("abcdaef", "")
->

String. Reverse (STR)
Returns the reverse order of a string.
String. Reverse ("ABCDE ")
-> Edcba

String. Dump (function)
Returns the binary code of the specified function (the function must be a Lua function with no upper value)

String. Find (STR, pattern, init, plain)
The basic application of string. Find is to search for strings matching the specified mode in the target string (subject string. If the function finds a matched string and returns its position, otherwise it returns nil. The simplest pattern is a word, which only matches the word itself. For example, the 'hello' mode only matches "hello" in the target string ". When the mode is found, the function returns two values: Start index and end index of the matching string.
S = "Hello World"
String. Find (S, "hello") --> 1 5
String. Find (S, "world") --> 7 11
String. Find (S, "L") --> 3 3
String. Find (S, "lll") --> Nil
The third parameter of the string. Find function is optional: it indicates the start position of the search in the target string. This option is useful when we want to find all matched substrings in the target string. We can continuously search cyclically, starting from the ending position of the previous match. The following code uses all the new rows in a string to construct a table:
Local T = {} -- location where the carriage return is stored
Local I = 0
While true do
I = string. Find (S, "\ n", I + 1) -- find the next row
If I = nil then break end
Table. insert (t, I)
End

String. sub (STR, SPOs, EPOs)
String. gsub intercepts a string from the specified starting position. String. sub can use the value returned by string. Find to intercept matched substrings.
For a simple mode, matching is its own
S = "Hello World"
Local I, j = string. Find (S, "hello") --> 1 5
String. sub (S, I, j) --> hello

String. gsub (STR, sourcestr, desstr)
The basic function of string. gsub is to find the matching string and replace it with the replacement string:
The string. gsub function has three parameters: Target string, mode string, and replacement string.
S = string. gsub ("Lua is cute", "cute", "great ")
Print (s) --> Lua is great
S = string. gsub ("All LII", "L", "x ")
Print (s) --> axx XII
S = string. gsub ("Lua is great", "Perl", "TCL ")
Print (s) --> Lua is great
The fourth parameter is optional and is used to limit the replacement range:
S = string. gsub ("All LII", "L", "X", 1)
Print (s) --> Axl LII
S = string. gsub ("All LII", "L", "X", 2)
Print (s) --> axx LII
The second return value of string. gsub indicates the number of replacement operations. For example, the following code calculates the number of times a space appears in a string:
_, Count = string. gsub (STR ,"","")
(Note: _ is just a dummy variable)

Mode

You can also use character classes in mode strings. A character class is a pattern item that matches any character in a specific character set. For example, the character class % d matches any number. Therefore, you can use the mode string '% d/% d % d' to search for date in DD/MM/YYYY format:
S = "deadline is 30/05/1999, firm"
Date = "% d/% d"
Print (string. sub (S, String. Find (S, date) --> 30/05/1999
The following table lists all the character classes supported by Lua:

Single Character (except ^ $ () %. [] * + -? ): Matches the character itself.
. (Point): match with any character
% A: match with any letter
% C: pairing with any control operator (for example, \ n)
% D: paired with any number
% L: match with any lowercase letter
% P: paired with any punctuation (punctuation)
% S: paired with blank characters
% U: match with any uppercase letter
% W: match with any letter/Number
% X: paired with any hexadecimal number
% Z: paired with any character that represents 0
% X (here X is a non-alphanumeric character): paired with character X. It is mainly used to process the functional characters in the expression (^ $ () %. [] * + -?) For example, % And %
[Several character classes]: paired with any [] character classes. For example, [% W _] is paired with any letter/number or underscore (_).
[^ Several character classes]: pairing with any character classes not included in []. For example, [^ % s] and any non-blank characters

When the preceding character class is written in uppercase, it indicates pairing with any character of the non-character class. for example, % s indicates pairing with any non-blank characters. for example, '% a' is a non-letter character
Print (string. gsub ("Hello, up-down! "," % ","."))
--> Hello... up. Down. 4
(Number 4 is not part of the string result. It is the second result returned by gsub, which indicates the number of replicas. This value will be ignored in other examples of printing gsub results below .) There are some special characters in pattern matching. They have special meanings. The special characters in Lua are as follows:
(). % + -*? [^ $
'%' Is used as the Escape Character for special characters, so '%.' matches the dot; '%' matches the character '% '. Escape Character '%' can be used not only to escape special characters, but also to all non-letter characters. If you have any questions about a character, use the escape character to escape it for security reasons.
For Lua, a pattern string is a normal string. They are no different from other strings and will not be specially treated. '%' Is used as an escape character only when they are used as a mode string for functions. Therefore, if you need to put quotation marks in a mode string, you must use the method of placing quotation marks in other strings to process them, and use '\' to escape the quotation marks, '\' is the Escape Character of Lua. You can use square brackets to enclose character classes or characters to create your own character classes ). For example, '[% W _]' matches letters, numbers, and underscores, '[01]' matches binary numbers, and '[% [%]' matches a pair of square brackets. The following example counts the number of times a metachin letter appears in the text:

_, Nvow = string. gsub (text, "[aeiouaeiou]", "")

In char-set, a range can be used to represent the character set. The first character and the last character are connected to indicate the character set within the range between the two characters. Most of the commonly used character ranges have been predefined, So you generally do not need to define a set of characters. For example, '% d' indicates' [0-9] ';' % x' indicates '[0-9a-fa-f]'. However, if you want to query the number of octal nodes, you may prefer '[0-7]' instead of '[01234567]'. You can use '^' at the beginning of the character set (char-set) to indicate its complement set: '[^ 0-7]' to match any character that is not an octal number; '[^ \ n]' matches any non-line break. Remember, you can use an uppercase character class to indicate its complement: '% s' is shorter than' [^ % s.
The Lua character class depends on the local environment, so '[A-Z]' may be different from the character set indicated by '% l. In general, the latter includes 'ç' and 'taobao', but the former does not. Use the latter as much as possible to indicate letters, unless for some special considerations, because the latter is simpler, more convenient, and more efficient.
The modifier can be used to enhance the expression ability of the mode. The pattern modifier in Lua has four:

+ Match the previous character once or multiple times
* Match the previous character 0 or multiple times
-Match the previous character 0 or multiple times
? Match the previous character 0 times or 1 time

'+', Matching one or more characters, always the longest match. For example, the pattern string '% A +' matches one or more letters or words:

Print (string. gsub ("One, and two; and three", "% A +", "word "))
--> Word, word; WORD

'% D +' matches one or more numbers (integers ):

I, j = string. Find ("the number 1298 is even", "% d + ")
Print (I, j) --> 12 15

'*' Is similar to '+', but it matches a character 0 or multiple times. A typical application is to match blank spaces. For example, you can use '% (% S * %)' to match a blank pair of parentheses () or parentheses )'. ('% S' is used to match zero or multiple white spaces. Since parentheses have special meanings in the mode, we must use '%' to escape them .) Let's look at another example. '[_ % A] [_ % W] *' matches a string of letters, underscores (_), and numbers starting with a letter or underscore in Lua.
Like '-' and '*', each character matches zero or multiple occurrences, but it performs the shortest match. In some cases, there is no difference between the two, but sometimes the results will be completely different. For example, if you use the mode '[_ % A] [_ % W]-' to find the identifier, you can only find the first letter, because '[_ % W]-' always matches null. On the other hand, if you want to find comments in the C program, many may use '/% *. * % */'(that is, "/*" is followed by any number of characters, and then "*/"). However, because '. *' is the longest match, this pattern matches all the parts between the first "/*" and the last "*/" in the program:

Test = "int X;/* x */INT y;/* y */"
Print (string. gsub (test, "/% *. * % */", "<comment> "))
--> Int X; <comment>

However, the pattern '.-' performs the shortest match, which matches the part before "/*" to the first:

Test = "int X;/* x */INT y;/* y */"
Print (string. gsub (test, "/% *.-% */", "<comment> "))
--> Int X; <comment> int y; <comment>

'? 'Match a character 0 times or 1 time. For example, if we want to search for an integer in a text segment, the integer may have positive and negative numbers. Mode '[+-]? % D + 'meets our requirements. It can match numbers such as "-12", "23", and "+ 1009. '[+-]' Is a character class that matches '+' or '-'. What follows '? 'Indicates matching the previous character Class 0 times or 1 time.
Unlike other systems, the modifier in Lua cannot use character classes. It cannot group modes and then use modifiers to apply to these groups. For example, no pattern can match an optional word (unless the word has only one letter ). As I will see below, you can usually use some advanced technologies to bypass this restriction.
The pattern starting with '^' matches only the start part of the target string. Similarly, the pattern ending with '$' matches only the end part of the target string. This can be used not only to limit the mode you want to search for, but also to locate the (Anchor) mode. For example:

If string. Find (S, "^ % d") then...

Check whether the string s starts with a number, and

If string. Find (S, "^ [+-]? % D + $ ") then...

Check whether string s is an integer.
'% B' is used to match symmetric characters. It is often written as '% bxy', and X and Y are any two different characters. x serves as the start of matching and Y as the end of matching. For example, '% B ()' matches a string ended:

Print (string. gsub ("A (enclosed (in) parentheses) line", "% B ()",""))
--> A line

Common modes include '% B ()', '% B []', '% B % {%}', and '% B <> '. You can also use any character as the Separator

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.