Use of Regular Expressions in Lua

Source: Internet
Author: User
Tags character classes control characters uppercase character

Today, you need to write a small tool to parse CSV data to Lua, which uses the string in the Lua library. gsub () has never been very familiar with this method. Today, I finally tried it well. It is very powerful, enjoyable, and convenient...

The source CSV file is:

Level, level 1 required experience, level 2 required experience, Level 3 required experience, Level 4 required experience, Level 5 required experience, each lattice required gold coins 90,100,110,120,632, 98,117,130,143,156,663,130,156,173,190,208,694,173,207,230,253,276, 72

The format required after conversion is:

1 local lvlexp = {2     {75,90,100,110,120, coin = 63}, --13     {98,117,130,143,156, coin = 66}, --24     {130,156,173,190,208, coin = 69}, --35     {173,207,230,253,276, coin = 72}, --46 }7 8 return lvlexp

The gadgets I wrote are:

-- [[--- Switchcsvtolua: Convert CSV -- here is the converter that matches the magic update table-@ Param :... -CSV file path] function switchcsvtolua (...) -- local frompath = "/users/GUY/desktop/creepupgrade/lvlexp.csv" Local Arg = {...} local frompath = Arg [1] local filef = assert (Io. open (frompath, "R +") -- local frompath = "/users/GUY/desktop/creepupgrade/lvlexp. lua "Local topath = string_gsub (frompath,"/fig. CSV ", function (h) -- match/lvlexp.csv return string_gsub (H ,". CSV ",". lua ") -- replace. CSV. lua end) Local filet = assert (Io. open (topath, "W +") filef: Read () -- discard the first row index local filetitle = "Local lvlexp = {\ n" Local filefinish = "} \ n \ nreturn lvlexp" filet: Write (filetitle) local lineindex = 1 Local nextline = filef: Read () while nextline do -- write local Aline = string if the next row exists. trim (nextline) Local beginindex = string_find (Aline, ",") + 1 -- locate the first, coordinate local Len = string_len (Aline) Aline = string_sub (Aline, beginindex, len) -- Delete the first index element Aline = "{".. aline .. "},--".. lineindex .. "\ n" -- add the Lua table format Aline = string_gsub (Aline, ", % d +}", function (h) For each row) -- match the last number of each row return string_gsub (H, ", coin =") -- add an index for this number end) filet: Write (Aline) nextline = filef: Read () lineindex = lineindex + 1 end filet: Write (filefinish) filef. close () filet. close () end -- [[--- switchcsvtolua: Convert CSV-@ Param :... -CSV file path] switchcsvtolua (...)

Note that the string_gusd () function in the above Code declares local string_gusd = string on the file. gusd, a function in the Lua database, is written as a local variable to improve efficiency...

 

Below is a post on the reference of the regular expression I used when using string. gusd,

Reposted from http://blog.csdn.net/liuyukuan/article/details/5489623. Thank you for choosing the author.

Pattern Matching Function 
The most powerful function in the string library is: string. find (string SEARCH), String. gsub (Replacement of global strings), and string. gfind (Global string search ). these functions are all pattern-based.
Unlike other scripting languages, Lua does not use POSIX Regular Expressions (also writing Regexp) for pattern matching, POSIX also regulates Regexp ). The main reason is the program size: Implementing a typical POSIX-compliant Regexp requires about 4000 lines of code, which is larger than the overall Lua standard library. Under the trade-off, the implementation of pattern matching in Lua only uses 500 lines of code. Of course, this means that it is impossible to implement all the POSIX specifications. However, the pattern matching feature in Lua is powerful and contains some features that are not easily implemented using standard POSIX pattern matching.
The basic application of string. Find is to search for strings matching the specified mode in the target string (subject string. If the function finds a matched string and returns its position, otherwise it returns nil. The simplest pattern is a word, which only matches the word itself. For example, the 'hello' mode only matches "hello" in the target string ". When the mode is found, the function returns two values: Start index and end index of the matching string.

S = "Hello World"
I, j = string. Find (S, "hello ")
Print (I, j) --> 1 5
Print (string. sub (S, I, j) --> hello
Print (string. Find (S, "world") --> 7 11
I, j = string. Find (S, "L ")
Print (I, j) --> 3 3
Print (string. Find (S, "lll") --> Nil
In this example, when the match is successful, String. sub uses the value returned by string. Find to intercept the matched substring. (For a simple pattern, matching is itself .)
The third parameter of the string. Find function is optional: it indicates the start position of the search in the target string. This option is useful when we want to find all matched substrings in the target string. We can continuously search cyclically, starting from the ending position of the previous match. The following code uses all the new rows in a string to construct a table:
Local T = {} -- table to store the indices
Local I = 0
While true do
I = string. Find (S, "/N", I + 1) -- find 'Next' newline
If I = nil then break end
Table. insert (t, I)
End
We will also see that we can use the string. gfind iterator to simplify the previous loop.
The string. gsub function has three parameters: Target string, mode string, and replacement string. Its basic function is to find the matching string and replace it with the replacement string:
S = string. gsub ("Lua is cute", "cute", "great ")
Print (s) --> Lua is great
S = string. gsub ("All LII", "L", "x ")
Print (s) --> axx XII
S = string. gsub ("Lua is great", "Perl", "TCL ")
Print (s) --> Lua is great
The fourth parameter is optional and is used to limit the replacement range:
S = string. gsub ("All LII", "L", "X", 1)
Print (s) --> Axl LII
S = string. gsub ("All LII", "L", "X", 2)
Print (s) --> axx LII
The second return value of string. gsub indicates the number of replacement operations. For example, the following code calculates the number of times a space appears in a string:
_, Count = string. gsub (STR ,"","")

(Note: _ is just a dummy variable .)
20.2 Mode
You can also use character classes in mode strings. A character class is a pattern item that matches any character in a specific character set. For example, the character class % d matches any number. therefore, you can use the mode string '% d/% d % d' to search for date in DD/MM/YYYY format:
S = "deadline is 30/05/1999, firm"
Date = "% d/% d"
Print (string. sub (S, String. Find (S, date) --> 30/05/1999
The following table lists all the character classes supported by Lua:
. Any character
%
% C control characters
% D Number
% L lowercase letter
% P punctuation
% S blank space character
% U uppercase letters
% W letters and numbers
% X hexadecimal number
% Z represents 0 characters
The upper-case form of the character class indicates the complement set of the set represented by lower-case letters. For example, '% a' is a non-letter character:
Print (string. gsub ("Hello, up-down! "," % ","."))
--> Hello... up. Down. 4
(Number 4 is not part of the string result. It is the second result returned by gsub, which indicates the number of replicas. This value will be ignored in other examples of printing gsub results below. There are some special characters in pattern matching. They have special meanings. The special characters in Lua are as follows:
(). % + -*? [^ $
'%' Is used as the Escape Character for special characters, so '%. 'match point; '%' match character '% ′. escape Character '%' can be used not only to escape special characters, but also to all non-letter characters. If you have any questions about a character, use the escape character to escape it for security reasons.
For Lua, a pattern string is a normal string. They are no different from other strings and will not be specially treated. '%' Is used as an escape character only when they are used as a mode string for functions. Therefore, if you need to put quotation marks in a mode string, you must use the method of placing quotation marks in other strings to process them, and use '/' to escape the quotation marks, '/' is the Escape Character of Lua. You can use square brackets to enclose character classes or characters to create your own character classes ). For example, '[% W _]' matches letters, numbers, and underscores, '[01]' matches binary numbers, and '[% [%]' matches a pair of square brackets. The following example counts the number of times a metachin letter appears in the text:
_, Nvow = string. gsub (text, "[aeiouaeiou]", "")
In char-set, a range can be used to represent the character set. The first character and the last character are connected to indicate the character set within the range between the two characters. Most of the commonly used character ranges have been predefined, So you generally do not need to define a set of characters. For example, '% d' indicates' [0-9] ';' % x' indicates '[0-9a-fa-f]'. However, if you want to query the number of octal nodes, you may prefer '[0-7]' instead of '[01234567]'. You can use '^' at the beginning of the character set (char-set) to indicate its complement set: '[^ 0-7]' to match any character that is not an octal number; '[^/n]' matches any non-line break. Remember, you can use an uppercase character class to indicate its complement: '% s' is shorter than' [^ % s.
The Lua character class depends on the local environment, so '[A-Z]' may be different from the character set indicated by '% l. In general, the latter includes 'ç' and 'taobao', but the former does not. Use the latter as much as possible to indicate letters, unless for some special considerations, because the latter is simpler, more convenient, and more efficient.
The modifier can be used to enhance the expression ability of the mode. The pattern modifier in Lua has four:
+ Match the previous character once or multiple times
* Match the previous character 0 or multiple times
-Match the previous character 0 or multiple times
? Match the previous character 0 times or 1 time
'+' Matches one or more characters, and she always performs the longest match. For example, the pattern string '% A +' matches one or more letters or words:
Print (string. gsub ("One, and two; and three", "% A +", "word "))
--> Word, word; WORD
'% D +' matches one or more numbers (integers ):
I, j = string. Find ("the number 1298 is even", "% d + ")
Print (I, j) --> 12 15
'*' Is similar to '+', but it matches a character 0 or multiple times. A typical application is to match blank spaces. For example, you can use '% (% S * %)' to match a blank pair of parentheses () or )'. ('% s' is used to match zero or multiple spaces. since parentheses have special meanings in the mode, we must use '%' to escape them .) let's look at another example. '[_ % A] [_ % W] *' matches a string of letters, underscores (_), and numbers starting with a letter or underscore in Lua.
Like '-' and '*', each character matches zero or multiple occurrences, but it performs the shortest match. In some cases, there is no difference between the two, but sometimes the results will be completely different. For example, if you use the mode '[_ % A] [_ % W]-' to find the identifier, you can only find the first letter, because '[_ % W]-' always matches null. On the other hand, if you want to find comments in the C program, many may use '/% *. * % */'(that is, "/*" is followed by any number of characters and "*/"). however, because '. * 'is the longest match. This mode matches all the parts of the first "/*" and the last "*/" in the program:
Test = "int X;/* x */INT y;/* y */"
Print (string. gsub (test, "/% *. * % */", "<comment> "))
--> Int X; <comment>
However, the pattern '.-' performs the shortest match, which matches the part before "/*" to the first:
Test = "int X;/* x */INT y;/* y */"
Print (string. gsub (test, "/% *.-% */", "<comment> "))
--> Int X; <comment> int y; <comment>
'? 'Matches a character 0 or 1 time. For example, if we want to search for an integer in a piece of text, the integer may carry a plus or minus sign. Mode '[+-]? % D + 'meets our requirements. She can match numbers such as "-12", "23", and "+ 1009. '[+-]' is a character class that matches '+' or '-'. What follows '? 'Indicates matching the previous character Class 0 times or 1 time.
Unlike other systems, the modifier in Lua cannot use character classes. It cannot group modes and then use modifiers to apply to these groups. For example, no pattern can match an optional word (unless the word has only one letter ). As I will see below, you can usually use some advanced technologies to bypass this restriction.
The pattern starting with '^' matches only the start part of the target string. Similarly, the pattern ending with '$' matches only the end part of the target string. This can be used not only to limit the mode you want to search for, but also to locate the (Anchor) mode. For example:
If string. Find (S, "^ % d") then...
Check whether the string s starts with a number, and
If string. Find (S, "^ [+-]? % D + $ ") then...
Check whether string s is an integer.
'% B' is used to match symmetric characters. It is often written as '% bxy'. X and Y are two different characters. x serves as the start of matching and Y serves as the end of matching. For example, '% B ()' matches the string ending:
Print (string. gsub ("A (enclosed (in) parentheses) line ",
"% B ()",""))
--> A line
Common modes include '% B ()', '% B []', '% B % {%}', and '% B <> '. You can also use any character as the separator.

 

Use of Regular Expressions in Lua

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.