Lua string pattern matching function Summary _lua

Source: Internet
Author: User
Tags character classes character set lua modifiers posix regular expression uppercase character alphanumeric characters

Pattern matching function

The most powerful functions in the string library are:

Copy Code code as follows:

String.find (String lookup)
String.gsub (Global string substitution)
String.gfind (Global string lookup)
String.gmatch (returns the iterator that finds the string)

These functions are all based on pattern matching. Unlike other scripting languages, LUA does not use a POSIX-canonical regular expression [4] (also written regexp) for pattern matching. The main reason is due to the size of the program: implementing a typical POSIX-compliant regexp probably requires 4000 lines of code, which is larger than the entire LUA standard library. On balance, the implementation of pattern matching in Lua takes only 500 lines of code, which means, of course, that it is impossible to achieve all the more than the POSIX spec. However, pattern-matching capabilities in LUA are powerful and include features that are not easily implemented using standard POSIX pattern matching.

String.gmatch (str, pattern)

This is a function that returns an iterator. The actual use cases are as follows:

Copy Code code as follows:

s = "Hello World from Lua"
For W in String.gmatch (S, "%a+") does
Print (W)
End

Here is an example of capturing and saving pairing characters to different variables:
Copy Code code as follows:

t = {}
s = "From=world, To=lua"
For K, v. in String.gmatch (S, "(%w+) = (%w+)") do
T[k]=v
End
For K, v. in pairs (t) do
Print (k, v)
End

String.gsub (str, pattern, REPL, N)

The string.gsub () function pairs the source string str according to the given pairing expression, and returns a copy of the source string in which all substrings successfully paired are replaced. The function also returns the number of successful pairs. The actual substitution behavior is determined by the type of the REPL parameter:

When Repl is a string, all successfully paired substrings are replaced with the specified REPL string.

When REPL is a table, the function attempts to find the element in the table with its key value and returns the element for each successfully paired substring. If the pairing contains any capture information, the capture with number 1th is searched as the key value.

When Repl is a function, each successful paired substring is passed into the function as a parameter.

When REPL is a table or function, if the table or function returns the value of a string or number, the value is still used to replace the pairing substring in the replica string. If the value returned by the table/function is empty, no substitution will occur.

The n parameter is optional, and when it is specified, the String.gsub () function only operates on the first n successfully paired members in the source string.

Here are a few examples:

Copy Code code as follows:

> Print (string.gsub ("Hello World", "(%w+)", "%1%1"))
Hello Hello World 2

> Print (string.gsub ("Hello Lua", "(%w+)%s* (%w+)", "%2%1"))
Lua Hello 1

> string.gsub ("Hello World", "%w+", print)
Hello World 2

> lookuptable = {["hello"] = "Hola", ["world"] = "Mundo"}
> Print (string.gsub ("Hello World", "(%w+)", lookuptable))
Hola Mundo 2

String.match (str, pattern, init)

String.match () only looks for the first pairing in the source string str. Parameter init is optional, specifying the starting point for the search process, which defaults to 1.

When the pairing is successful, the function returns all the captured results in the pairing expression; If no capture tag is set, the entire pairing string is returned. Returns nil when there is no successful pairing.

Copy Code code as follows:

String.match ("Abcdaef", "a")
-> A

String.reverse (str)

Returns the reverse order of a string

Copy Code code as follows:

String.reverse ("ABCDE")
->edcba

String.dump (function)

Returns the binary code for the specified function (the function must be a LUA function with no upper value)

String.find (str, pattern, init, plain)

The basic application of String.find is to search for a string within the target string (subject string) that matches the specified pattern. function returns nil if a matching string is found to return to his position. The simplest pattern is a word, just the word itself. For example, the pattern ' hello ' matches only the "Hello" in the target string. When a pattern is found, the function returns two values: the matching string start index and the end index.

Copy Code code as follows:

s = "Hello World"
String.find (S, "Hello")--> 1 5
String.find (S, "World")--> 7 11
String.find (S, "L")--> 3 3
String.find (S, "lll")--> Nil

The third parameter of the String.find function is optional: Indicates the starting position of the search in the target string. This option is useful when we want to find all the matching substrings in the target string. We can continue to cycle the search, each time from the previous match to the end of the position start. Let's look at an example where the following code constructs a table with all the new rows in a string:

Copy Code code as follows:

Local t = {}--places where carriage returns are stored
Local i = 0
While True
i = String.find (s, "\ n", i+1)--Find the next line
if i = = Nil then break end
Table.insert (t, i)
End

String.sub (Str,spos,epos)

The function of the string.gsub is to intercept the string, and he intercepts a string from the specified starting position. String.sub can use the value returned by String.find to intercept a matching substring.
For a simple pattern, the match is itself.

Copy Code code as follows:

s = "Hello World"
Local I, j = string.find (S, "Hello")--> 1 5
String.sub (S, I, j)--> Hello

String.gsub (str, SOURCESTR, DESSTR)

The basic role of string.gsub is to find a string of matching patterns and replace them with a replacement string:

The String.gsub function has three parameters: a target string, a pattern string, and a replacement string.

Copy Code code as follows:

s = String.gsub ("Lua is cute", "cute", "great")
Print (s)--> Lua is great
s = String.gsub ("All Lii", "L", "X")
Print (s)--> Axx XII
s = String.gsub ("Lua is Great", "Perl", "Tcl")
Print (s)--> Lua is great

The fourth parameter is optional and is used to limit the scope of the substitution:

Copy Code code as follows:

s = String.gsub ("All Lii", "L", "X", 1)
Print (s)--> Axl Lii
s = String.gsub ("All Lii", "L", "X", 2)
Print (s)--> Axx lii

The second return value of the string.gsub represents the number of times he made the substitution operation. For example, the following code surges to calculate the number of occurrences of a string hollow:

Copy Code code as follows:

_, Count = String.gsub (str, "", "")

(Note that _ is just a dummy variable)

Mode

You can also use character classes in a pattern string. A character class refers to a pattern item that can match any character in a particular character set. For example, character class%d matches any number. So you can use the pattern string '%d%d/%d%d/%d%d%d%d ' to search for a date in dd/mm/yyyy format:

Copy Code code as follows:

s = "Deadline is 30/05/1999, firm"
Date = "%d%d/%d%d/%d%d%d%d"
Print (String.sub (S, string.find (s, date))--> 30/05/1999

The following table lists all the character classes that LUA supports:

Single character (except ^$ ()%.[] *+-): Pairing with the character itself

. (point): pairing with any character
%a: Pairing with any letter
%c: Pairing with any of the controls (for example \ n)
%d: Pairing with any number
%l: Pairing with any lowercase letter
%p: Paired with any punctuation (punctuation)
%s: Pairing with white-space characters
%u: Pairing with any uppercase letters
%w: Pairing with any letter/digit
%x: Pairing with any hexadecimal number
%z: Pairing with any character representing 0
%x (here x is a non-alphanumeric character): paired with the character X. Used primarily to handle functional characters in an expression (^$ ()%.[] *+-?) Pairing problems, such as percent% versus%
[several character classes]: Pairing with character classes contained in any []. For example [%w_] paired with any letter/digit, or underscore sign (_)
[^ Several character classes]: Pairing with any character class that is not included in []. For example [^%s] pairing with any non-white-space character

When the above character class is written in uppercase, the expression is paired with any character that is not a class of this character. For example,%s represents a pairing with any non-white-space character. For example, '%A ' non-alphabetic characters

Copy Code code as follows:

Print (String.gsub ("Hello, up-down!", "%A", "."))
--> hello.. Up.down. 4

(The number 4 is not part of the string result, and he is the second result returned by Gsub, representing the number of substitutions.) The following other examples of printing gsub results will ignore this value. There are special characters in pattern matching, they have special meanings, and the special characters in Lua are as follows:

Copy Code code as follows:

( ) . % + - * ? [ ^ $

'% ' is used as an escape character for special characters, so '%. ' matches the '% ' of the character '% '. The escape character '% ' can be used not only to escape special characters, but also for all non-alphanumeric characters. When you have questions about a character, use the escape character to escape him for security reasons.

For Lua, the pattern string is a normal string. They are no different from other strings and will not be treated with special treatment. Only when they are used as a pattern string for a function, '% ' is the escape character. So, if you need to place quotes within a pattern string, you must use the method of placing quotes in other strings, using ' escaped quotes, ' ' is a LUA escape character. You can use brackets to enclose character classes or characters to create your own character classes (translator: Lua called Char-set, which means the bracket expression in the concept of traditional regular expressions). For example, ' [%w_] ' will match alphanumeric and underscore, ' [01] ' matches binary digits, ' [%[%]] ' matches one pair of brackets. The following example counts the number of vowel letters in the text:

Copy Code code as follows:

_, Nvow = String.gsub (Text, "[Aeiouaeiou]", "")

You can use a range to represent a collection of characters in Char-set, and a hyphen connection between the first and last characters to represent the character set in the range between the two characters. Most common-character ranges are already predefined, so you don't normally need to define a set of characters yourself. For example, '%d ' means ' [0-9] '; '%x ' means ' [0-9a-fa-f] '. However, if you want to look up the octal number, you might prefer to use ' [0-7] ' instead of ' [01234567] '. You can use ' ^ ' at the beginning of the character set (Char-set) to indicate its complement: ' [^0-7] ' matches any character that is not a octal number; ' [^\n] ' matches any character that is not a newline. Remember, you can use uppercase character classes to represent their complement: '%s ' is shorter than ' [^%s] '.

The LUA character class relies on the local environment, so ' [a-z] ' may be different from the character set represented by '%l '. In general, the latter includes ' C ' and ' coal tar ', while the former does not. The latter should be used as much as possible to represent letters, except for some special considerations because the latter is simpler, more convenient, and more efficient.

You can use modifiers to modify the expressive power of pattern enhancement patterns, with four schema modifiers in Lua:

Copy Code code as follows:

+ Match previous character 1 or more times
* Match the previous character 0 or more times
-matches the previous character 0 or more times
? Match the previous character 0 or 1 times

' + ', matching one or more characters, always making the longest match. For example, the pattern string '%a+ ' matches one or more letters or one word:

Copy Code code as follows:

Print (String.gsub ("one, and two; and three ","%a+ "," word "))
--> word, Word Word; Word Word

'%d+ ' matches one or more digits (integers):

Copy Code code as follows:

I, j = String.find ("The number 1298 is even", "%d+")
Print (I,J)--> 12 15

' * ' is similar to ' + ', but he matches one character 0 or more times. A typical application is matching whitespace. For example, to match a pair of parentheses () or gaps between parentheses, you can use '% (%s*%) '. ('%s* ' is used to match 0 or more blanks. Because the parentheses have a special meaning in the pattern, we must use '% ' to escape him. One more example, ' [_%a][_%w]* ' matches the identifier in the LUA program: The alphanumeric sequence of letters or underscores that begin with an underscore.

'-' like ' * ', it matches one character 0 or more times, but he does the shortest match. There are times when the two are no different, but sometimes the results are different. For example, if you use the pattern ' [_%a][_%w]-' to find the identifier, you'll only find the first letter because ' [_%w]-' always matches the empty. On the other hand, suppose you want to look up comments in C programs, many people may use '/%*.*%*/' (that is, "/*" followed by any number of characters, followed by "* *"). However, because '. * ' is the longest match, this pattern matches all the parts between the first "*" and the Last "* *" in the program:

Copy Code code as follows:

Test = "int x;" /* x */int y; * * Y * *
Print (String.gsub (test, "/%*.*%*/", "<COMMENT>"))
--> int x; <COMMENT>

However, the pattern '.-' is the shortest match, she will match the "/*" start to the first "* *" before the part:

Copy Code code as follows:

Test = "int x;" /* x */int y; * * Y * *
Print (String.gsub (test, "/%*.-%*/", "<COMMENT>"))
--> int x; <COMMENT> int y; <COMMENT>

'? ' matches one character 0 times or 1 times. For example, suppose we want to look up an integer within a paragraph of text, the integer may have a positive sign. Pattern ' [+-]?%d+ ' meets our requirements, it can match numbers like "12", "23" and "+1009". ' [+-] ' is a character class that matches ' + ' or '-', and the next '? ' means matching the previous character class 0 or 1 times.

Unlike other system modes, modifiers in Lua cannot be in character classes; You cannot group patterns and then use modifiers to effect this grouping. For example, no pattern can match an optional word (unless the word has only one letter). As I'll see below, there are usually some advanced techniques you can use to circumvent this limitation.
A pattern that begins with ' ^ ' matches only the beginning of the target string, and similarly a pattern ending with ' $ ' matches only the end of the target string. This can be used not only to limit the patterns you are looking for, but also to locate (anchor) patterns. Like what:

Copy Code code as follows:

If String.find (S, "^%d") then ...

Checks whether the string s begins with a number, and

Copy Code code as follows:

If String.find (S, "^[+-]?%d+$") then ...

Checks whether the string s is an integer.
'%b ' is used to match symmetric characters. Often written as '%bxy ', x and Y are any two different characters; X as the beginning of the match, y as the end of the match. For example, '%b () ' matches a string ending with ' (' Start with ') ':

Copy Code code as follows:

Print (String.gsub ("A (enclosed () parentheses) line", "%b ()", ""))
--> a line

The commonly used models are: '%b () ', '%b[', '%b%{%} ' and '%b<> '. You can also use any character as a separator.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.