Q: Under what circumstances does "pattern" match an empty string?
A: Be careful to use *
and as -
they can match 0 times.
-- 如果你打算用"%a*"匹配单词,你会发现到处都是单词。print(string.find(";$% **#$hello13""%a*")) --> 1 0print(string.find(";$% **#$hello13""%a*"6)) --> 6 5-- 使用"%a+"才能正常的完成工作。print(string.find(";$% **#$hello13""%a+")) --> 10 14
Q: How do I use LUA to generate "pattern"?
A: Using LUA can help us generate some tedious "pattern",
--[[finds rows with a line character greater than 70 in a text, that is, a line that matches a 70 character before a non-line break. Repeat matches a single character 70 times, followed by a match for a single character 0 or more times. ]]Pattern =string. Rep ("[^\n]", -) .."[^\n]*]--find that the word is case insensitive. function nocase (s) --Each child is found to be converted to the form "[XX]". s =string. Gsub (S,"%a", function (c) return string. Format ("[%s%s]",string. Lower (c),string. Upper (c))End)returnSEndPattern = nocase ("Hi there!")Print(pattern)-- [hh][ii] [tt][hh][ee][rr][ee]!
Q: How to preprocess the target string?
A: The significance of preprocessing is to exclude the effect of special characters on matching.
In the first example, the string in double quotation marks in the string is converted to uppercase, and the double quotation marks can contain escaped double quotes. \"
--Converts the escaped double quotation marks to the form "\ddd", where "ddd" is a decimal representation of a double-quoted ASCII code. function code (s) return(string. Gsub (S,"\\(.)", function (x) return string. Format ("\\%03d",string. Byte (x))End))End--Restores "\ddd" to escaped double quotes. function decode (s) return(string. Gsub (S,"\ \ (%d%d%d)", function (d) return "\\"..string. Char (d)End))Ends =[[follows a typical string: "This is \" great\ "!".]--[[omit this preprocessing, "great" will not be converted to uppercase. Because "this was \" is the first time match, "!" is the second match. ]]s = code (s)--use "string.upper ()" to convert to uppercase. s =string. Gsub (S,' (".-") ',string. upper) S = Decode (s)-Restores the pre-processing parts. Print(s)--follows a typical string: "This is \" great\ "!".
For the second example, let's expand the "LaTeX" format referred to in "Quick mastering Lua 5.3--string Library (2)" to "XML" format. This time the "LaTeX" format can contain escape characters \
, which means that you can use \\
, \{
and \}
, respectively, \
{
and }
,
--[[in order to avoid confusion between "\emph" and "\command" and "\{a\\b\}", we should first convert "\{", "\ \" and "\}" to special encodings, while at the same time cannot convert "\emph" and "\command". Therefore, the conversion occurs only when "\" is not followed by a child parent. As in the previous example, they are converted to their "\ddd" decimal form. ]] function code (s) return(string. Gsub (S,' \ \ (%A) ', function (x) return string. Format ("\\%03d",string. Byte (x))End))End--compared to the previous example, the decoding time does not need "\", so you can directly call "String.char ()". function decode (s) return(string. Gsub (S,' \ \ (%d%d%d) ',string. Char))Ends =[[A \emph{command} is written as \command{text\{a\\b\}} ]s = code (s) s =string. Gsub (S,"\ \ (%a+) {(.-)}","<%1>%2</%1>")Print(Decode (s))--A <emph>command</emph> is written as <command>text{a\b}</command>.
The third example is a slightly more complex example of how to decode a "CSV" file.
1, each row of the "CSV" file represents a record, each record consists of multiple fields, and each field is ,
delimited.
2. Any space character in each field is a valid character and cannot be ignored.
3, if the domain contains ,
, then the entire domain needs to be ""
cited, if the domain also contains "
, then each "
use ""
instead.
4 ,
. Domains that are not included can also be used for selective use ""
. But as long as the fields are used, if they are ""
included "
, then each one "
must be used ""
instead.
According to the above rules, if there is a table
t = {‘a b‘, ‘a,b‘, ‘‘, ‘ a,"b"c‘, ‘hello "world"!‘}
Where the elements are converted to the "CSV" file format should be,
a b,"a,b"," a,""b""c",,hello "world"!
Print("Encode:") function escapecsv (s) if string. Find (S,', ') Then --if there is a comma in the field, the entire field needs to be enclosed in double quotation marks. --if there are double quotes in this field, each double quotation mark should be replaced with two consecutive double quotes. s =' "'..string. Gsub (S,' "',' ' " ') ..' "' End returnSEnd function tocsv (t) Locals ="" for_,pinch Pairs(t) Dos = S..",".. Escapecsv (P)--Each domain is delimited with ",". End return string. Sub (S,2)--Remove first commaEndt = {' a B ',' A, b ',' A, ' B ' C ',"',' Hello ' world! '}s = tocsv (t)Print(s)- a B, "A, B", "A," "B." "C", hello "world"!Print()Print("Decode:") function fromcsv (s) s = S..', ' --Add "," at the end of the string to make it easier to find the last field. Localt = {}--table to collect fields LocalFieldstart =1 --The starting index of the field in the string. --loop through each field in a record. Repeat --If the starting position of the field is a double quotation mark, then the entire field is quoted in double quotation marks. if string. Find (S,' ^ ', Fieldstart) Then LocalA, CLocali = FieldstartRepeat --[[finds double quotation marks at the end of a field. Because if the fields enclosed in double quotation marks contain double-quote characters, each double quotation mark will be replaced with two consecutive double quotes. So when we find two consecutive double quotes, "C" gets the double quotation mark in the back, and "I" gets the index position of the double quotation mark, so we continue to look for it from "i+1", that is, to skip the two consecutive double quotes. When you find a double quote followed by a non-double quote character, the double quotation marks at the end of the field are found. "C" gets an empty string, and "I" gets the index position of the double quotation marks. At this point, complete the lookup. ]]A, I, C =string. Find (S,' "("?) ', i+1)untilC ~=' "' --Quote not followed by quote? if notI Then Error(' unmatched ')End --Remove the field and discard the double quotation marks of the field. Localf =string. Sub (s, fieldstart+1, I-1)--[[all successive two double quotes in the field revert to a single double quotation mark because the field's double quotes have been discarded. The contents of the domain are stored in the table. ]] Table. Insert (T, (string. gsub (F,' ' " ',' "')))--[["I" is the index position of the trailing double quotation marks, where the comma between the next field is searched for and the starting index position of the next field is updated. Because at the beginning of the function, the end of the last field is also added a comma, so according to the specification, the character after the end of the field double quotation marks should be a comma, so here can actually be written as "Fieldstart = i + 2", The following notation is intended to prevent irregular writing, that is, there is a space character between the end of the field and the comma between the fields. ]]Fieldstart =string. Find (S,', ', i) +1 Else --If the starting position of the field is not a double quotation mark, then the entire field is not enclosed in double quotation marks. --Search directly for commas between the next domains. LocalNexti =string. Find (S,', ', Fieldstart)--Deposit the contents of the Domain in "table". Table. Insert (T,string. Sub (s, Fieldstart, nexti-1)) Fieldstart = Nexti +1 --Updates the starting index position of the next field. End untilFieldstart >string. Len (s)returnTEndT = fromcsv (s) forI, Sinch Ipairs(t) Do Print(I, s)End--[[Result:1 A B 2 A, a 3 A, "B" C 4 5 Hello "world"! ]Print()--[[tests the different effects of two consecutive double quotes. The contents of each field are: 1, hello, space, double quotes, spaces, hello. 2, spaces, double quotes, double quotes. 3, none. ]]t = fromcsv (' "Hello " "Hello", "" "," ") forI, Sinch Ipairs(t) Do Print(I, s)End--[[result:1 Hello "Hello 2" "3] "
Quick mastery of Lua 5.3--string Library (3)