Lua_ 19th String library (top)

Source: Internet
Author: User
Tags character classes escape quotes posix uppercase character

Lua_ 19th Chapters String Library

The LUA interpreter has limited support for strings. A program can create strings and concatenate strings, but cannot intercept substrings, check the size of a string, and detect the contents of a string. The ability to manipulate strings in Lua is essentially from the string library.

Some of the functions in the string library are very simple: String.len (s) returns the length of the string s, and String.rep (s, N) returns a string that repeats n times the string s; you can create a 1M bytes word using string.rep ("a", 2^20) (for example, for testing purposes), String.Lower (s) converts uppercase letters in s to lowercase (string.upper converts lowercase to uppercase). If you want to sort an array without caring about the case, you can do this:

Table.sort (A, function (a, B) return String.Lower (a) < String.Lower (b) End)

both String.upper and String.Lower rely on local environment variables. So, if you're in the European Latin-1 Environment, the expression:

String.upper ("A?? O "and" A?? O ".

Call the String.sub (S,I,J) function to intercept the string s from the i-th character to the J-character. In Lua, the first character index of a string starts at 1. You can also use negative indexes, and negative indexes count forward from the end of a string: 1 points to the last character, 2 points to the penultimate, and so on. Therefore, string.sub (s, 1,j) returns the length of the string s as the prefix of J, and String.sub (S, J,-1) returns the suffix beginning with the first J character. If you do not provide a 3rd parameter, the default is-1, so we write the last call as String.sub (S, j); String.sub (S, 2,-2) returns the substring after the first and last characters are removed.

s = "[in brackets]" print (String.sub (s, 2,-2))--     Inbrackets

Remember: the strings in Lua are constant. The String.sub function and other string manipulation functions in Lua do not change the value of the string, but instead return a new string. A common mistake is:

String.sub (S, 2,-2) <span style= "font-family: ' Microsoft Yahei '; Background-color:rgb (255, 255, 255); " > </span>

Think that the above function will change the value of the string s. If you want to modify the value of a string variable, you must assign the variable to a new string:

s =string.sub (S, 2,-2)

The String.char function and the String.byte function are used to convert characters between characters and numbers. String.char gets 0 or more integers, converts each number to a character, and then returns a string that all of these characters join together. String.byte (S, i) converts the first character of the string s to an integer, and the second parameter is optional, i=1 by default. In the following example, we assume that the characters are expressed in ASCII:

<pre name= "code" class= "CSharp" >print (String.char ()                    --A i = 99;print (String.char (i, i+1, i+2))           - CDE Print (String.byte ("abc"))--                 97print (String.byte ("abc", 2))--98print (String.byte ("abc"              , -1))-             99

in the last line above, we use a negative index to access the last character of the string.

The function String.Format is a powerful tool when it comes to formatting strings, especially string output. This function has two parameters, which are almost identical to the printf function in C, and you can use this function in the C-language printf. The first argument is a formatted string: consists of an indicator and a character that controls the format. The character of the control format after the indicator can be: decimal ' d '; hex ' x '; octal ' o '; Floating point ' f '; string ' s '. There are other options between the indicator '% ' and the control format character: the number of digits used to control a more detailed format, such as a decimal point of a float:

Print (String.Format ("Pi =%.4f", pi))--    pi =3.1416d = 5; m = one; y =1990print (String.Format ("%02d/%02d/%04d", D, M , y))--    05/11/1990tag, title= "H1", "a title" Print (String.Format ("<%s>%s</%s>", Tag,title, Tag)) C4/>--> 

In the first example,%.4f represents a floating-point number with 4 decimal places after the decimal point. The second example%02d represents a fixed two-bit display in decimal numbers, with insufficient front-fill 0. The%2d is not specified in front of 0, less than two will be filled with empty mortar. For the format string section indicate Fu De a detailed description of the clear reference Lua manual, or refer to the C manual, because Lua invokes standard C's printf function to achieve the final function.

19.1 Pattern matching function

The most powerful functions in the string library are: String.find (String lookup), string.gsub (Global string substitution), and string.gfind (global string lookup). These functions are based on pattern matching.

Unlike other scripting languages, LUA does not use POSIX (POSIX is the industry standard for UNIX, regexp originally derived from unix,posix, which also regulates regexp. ) canonical regular Expressions (also written regexp) for pattern matching. The main reason for this is the size of the program: implementing a typical POSIX-compliant regexp requires about 4000 lines of code, which is larger than the entire LUA standard library. Under the balance, the implementation of pattern matching in LUA only uses 500 lines of code, which means that it is not possible to implement all the more that POSIX has regulated. However, the pattern matching feature in Lua is powerful and includes some features that are not easily implemented using standard POSIX pattern matching.

The basic application of String.find is to search for strings that match the specified pattern within the target string (subject string). function returns nil if a matching string is found to return to his position. The simplest pattern is a word that matches only the word itself. For example, the pattern ' hello ' only matches the "hello" in the target string. When the pattern is found, the function returns two values: Match the string start index and end index.

<pre name= "code" class= "CSharp" ><pre name= "code" class= "CSharp" >s = "Hello World" I, J = string.find (s, "Hello ") Print (I, j)--                        1   5print (string.sub (S, I, J))--         Hello print (String.find (S," World "))--                          7   11i, J =string.find (S, "L") print (I, j)--                        3   3print (String.find (S, "lll"))--       Nil

in the example, when the match succeeds, String.sub uses the value returned by String.find to intercept the matched substring. (For simple mode, the match is itself)

The third parameter of the String.find function is optional: Indicates the starting position of the search in the target string. This option is useful when we want to find all the matching substrings in the target string. We can constantly cycle through the search, starting at the end of every previous match. As an example, the following code constructs a table with all the new lines in a string:

Local t = {}      --table tostore the indiceslocal i = 0while true do   i = String.find (s, "\ n", i+1)   --find ' next ' NewLine   If i = nil then breakend   table.insert (t, i) end

we'll also see later that we can use the String.gfind iteration to simplify the loop above.

The String.gsub function has three parameters: a target string, a pattern string, and a replacement string. His basic function is to find a string that matches the pattern and replace it with a replacement string:

s = string.gsub ("Lua iscute", "cute", "great") print (s)--  Lua is greats = String.gsub ("All Lii", "L", "X") print ( s)--   Axx XIIs = string.gsub ("Lua isgreat", "Perl", "Tcl") print (s)   

The fourth parameter is optional and is used to limit the range of substitutions:

S =string.gsub ("All Lii", "L", "X", 1) print (s)--   Axl Liis = String.gsub ("All Lii", "L", "X", 2) print (s)   --& Gt Axx LII

The second return value of String.gsub represents the number of times he has performed a replacement operation. For example, the following code surges to calculate the number of occurrences of a string in a hollow lattice:

_, count= string.gsub (str, "", "")

(Note that _ is just a dummy variable)

19.2 mode

You can also use character classes in the pattern string. A character class refers to a pattern item that can match any character within a particular character set. For example, the character class%d matches any number. So you can use the pattern string '%d%d/%d%d/%d%d%d%d ' to search for dd/mm/yyyy format dates:

s = "Deadline is30/05/1999, firm" date = "%d%d/%d%d/%d%d%d%d" Print (String.sub (S, string.find (s, date)))--    30/05 /1999

The following table lists all the character classes that Lua supports:

.      Any character%a     letter%c     control character%d     number%l     lowercase%p     punctuation character%s     empty char%u     capital Letter%w     Letter and number%x     hex digit%z     Represents the 0 character

The uppercase form of the above character class represents the complement of the collection represented by lowercase. For example, '%A ' characters that are not alphabetic:

Print (String.gsub ("Hello, up-down!", "%A", ".")) -Hello.. Up.down. 4

(the number 4 is not part of a string result, he is the second result returned by Gsub, representing the number of times the substitution occurred.) The following other examples of printing gsub results will ignore this value. There are some special characters in pattern matching, they have special meanings, and the special characters in Lua are as follows:

( ) . % + - * ? [ ^ $

'% ' is used as the escape character for special characters, so '%. ' Match point; ' Percent ' matches the character '% '. The escape character '% ' can be used not only to escape special characters, but also for all non-alphabetic characters. When you have questions about a character, escape him with an escape character for security reasons.

For Lua, a pattern string is a normal string. They are no different from other strings and are not treated in a special way. Only when they are used as a pattern string for a function, '% ' is the escape character. So, if you need to put quotes inside a pattern string, you have to use the method of placing quotes in other strings, use ' \ ' escape quotes, ' \ ' is the Lua escape character. You can use square brackets to enclose character classes or characters in order to create your own character class (Lua calls Char-set, which refers to the parentheses expression in traditional regular expression concepts). For example, ' [%w_] ' will match an alphanumeric and an underscore, ' [01] ' matches a binary number, ' [%[%] ' matches a pair of parentheses. The following example counts the number of vowels appearing in the text:

_, Nvow = String.gsub (Text, "[Aeiouaeiou]", "")

in Char-set, you can use ranges to represent a collection of characters, and a hyphen connection between the first character and the last character to represent the character set within the range between the two characters. Most of the characters commonly used are pre-defined, so you don't usually need to define a set of characters yourself. For example, '%d ' means ' [0-9] '; ' %x ' means ' [0-9a-fa-f] '. However, if you want to find octal numbers, you might prefer to use ' [0-7] ' instead of ' [01234567] '. You can use ' ^ ' at the beginning of the character set (Char-set) to indicate its complement: ' [^0-7] ' matches any character that is not an octal number; ' [^\n] ' matches any characters of a non-newline character. Remember that you can use uppercase character classes to represent their complement: '%s ' is shorter than ' [^%s] '.

Lua's character classes depend on the local environment, so ' [a-z] ' may be different from the character set represented by '%l '. In general, the latter includes '? ' and '? ', while the former does not. The latter should be used as much as possible to denote letters, except for some special considerations, because the latter is simpler, more convenient, and more efficient. You can use modifiers to modify the expression of a pattern-enhanced mode, which has four pattern modifiers in Lua:

+      Match previous character 1 or more *      match previous character 0 or more times-      match previous character 0 or more times?      Match previous character 0 or 1 times

' + ', matching one or more characters, always making the longest match. For example, the pattern string '%a+ ' matches one or more letters or a word:

Print (String.gsub ("one, and"; and three ","%a+ "," word "))--Word,word word; Wordword

'%d+ ' matches one or more numeric C integers):

I, J =string.find ("The number 1298is even", "%d+") print (I,J)--  12 15

' * ' is similar to ' + ', but he matches a character 0 or more times. A typical application is to match the empty mortar. For example, to match a pair of parentheses () or empty mortar between parentheses, you can use '% (%s*%) '. ('%s* ' is used to match 0 or more of the empty mortar. Since the parentheses have special meanings in the pattern, we must use '% ' to escape him. Look again at an example, ' [_%a][_%w]* ' matches the identifier in the LUA program: A sequence of alphanumeric numbers that begin with a letter or underscore.

'-' As with ' * ', it matches 0 or more occurrences of a character, but he does the shortest match. At some point the two are not different, but sometimes the results will be different. For example, if you use the pattern ' [_%a][_%w]-' to find the identifier, you will only find the first letter, because ' [_%w]-' Always matches null. On the other hand, suppose you want to find comments in a C program, many people may use '/%*.*%*/' (that is, "/*" followed by any number of characters, followed by "*/"). However, since '. * ' is the longest match, this pattern will match all the parts between the first "/*" and the Last "/*" in the program:

Test = "int x; /* x */int y; /* y*/"Print (String.gsub (test,"/%*.*%*/"," <COMMENT> "))--int x;<comment>

However, the pattern '.-' is the shortest match, and she will match "/*" to start with the first "*/" section:

Test = "int x; /* x */int y; /* y*/"Print (String.gsub (test,"/%*.-%*/"," <COMMENT> "))--int x;<comment> int y; <COMMENT>

‘?‘ Match one character 0 or 1 times. For example, suppose we want to find an integer within a text, and the integer may have a sign. The pattern ' [+-]?%d+ ' meets our requirements and it can match numbers like "-12", "23" and "+1009". ' [+ +] ' is a character class that matches ' + ' or '-', and the next '? ' means matching the preceding character class 0 or 1 times.

Unlike other system modes, the modifiers in Lua cannot be used with character classes, they cannot be grouped and then use modifiers to act on this grouping. For example, no pattern can match an optional word (unless the word has only one letter). As I'll see below, there are usually some advanced techniques you can use to circumvent this limitation.

Patterns that begin with ' ^ ' only match the beginning of the target string, and similarly, a pattern ending with ' $ ' matches only the end of the target string. This can be used not only to limit the mode you are looking for, but also to position (anchor) mode. Like what:

If String.find (S, "^%d") then ...

Checks whether the string s starts with a number, and

If String.find (S, "^[+-]?%d+$") then ...

Checks whether the string s is an integer.

'%b ' is used to match symmetric characters. Often written as '%bxy ', x and Y are any two different characters; X is the beginning of the match, and Y is the end of the match. For example, '%b () ' matches a string that ends with ' (' Start, ') ':

Print (String.gsub ("A (enclosed () parentheses) line", "%b ()", ""))--a line

The commonly used patterns are: '%b () ', '%b[', '%b%{%} ' and '%b<> '. You can also use any character as a delimiter.

Lua_ 19th String library (top)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.