A regular expression is a text mode consisting of common characters (such as characters A to Z) and special characters (such as metacharacters. This mode describes one or more strings to be matched when searching the text subject. A regular expression is used as a template to match a character pattern with the searched string. \ Mark the next character as a special character, a literal character, a back reference, or an octal escape character. For example, 'n' matches the character "N ". '\ N' matches a line break. The sequence '\' matches "\" and "\ (" matches "(". ^ Matches the start position of the input string. $ Matches the end position of the input string. * Matches the previous subexpression zero or multiple times. For example, Zo * can match "Z" and "Zoo ". * Is equivalent to {0 ,}. + Match the previous subexpression once or multiple times. For example, 'Zo + 'can match "zo" and "Zoo", but cannot match "Z ". + Is equivalent to {1 ,}.? Match the previous subexpression zero or once. For example, "Do (ES )? "Can match" do "in" do "or" does ".? It is equivalent to {0, 1 }. {N} n is a non-negative integer. Match n times. For example, 'O {2} 'cannot match 'O' in "Bob", but can match two o in "food. {N,} n is a non-negative integer. Match at least N times. For example, 'O {2,} 'cannot match 'O' in "Bob", but can match all o in "foooood. 'O {1,} 'is equivalent to 'o + '. 'O {0,} 'is equivalent to 'o *'. Both {n, m} m and n are non-negative integers, where n <= m. Match at least N times and at most m times. "O {1, 3}" will match the first three o in "fooooood. 'O {0, 1} 'is equivalent to 'o? '. Please note that there must be no space between the comma and two numbers .? When this character is followed by any other delimiter (*, + ,?, The matching mode after {n}, {n ,}, {n, m}) is not greedy. The non-Greedy mode matches as few searched strings as possible, while the default greedy mode matches as many searched strings as possible. For example, for strings "oooo", 'O ++? 'Will match a single "O", and 'O +' will match all 'O '.. Matches any single character except "\ n. To match any character including '\ n', use a pattern like' [. \ n. (Pattern) matches pattern and obtains this match. The obtained match can be obtained from the generated matches set. The submatches set is used in VBScript, and $0… is used in Visual Basic Scripting Edition... $9 attribute. To match the parentheses, use '\ (' or '\)'. (?: Pattern) matches pattern but does not get the matching result. That is to say, this is a non-get match and is not stored for future use. This is useful when you use the "or" character (|) to combine each part of a pattern. For example, 'industr (?: Y | ies) is a simpler expression than 'industry | industries. (? = Pattern) Forward pre-query: matches the search string at the beginning of any string that matches pattern. This is a non-get match, that is, the match does not need to be obtained for future use. For example, 'windows (? = 95 | 98 | nt | 2000) 'can match "Windows" in "Windows 2000", but cannot match "Windows" in "Windows 3.1 ". Pre-query does not consume characters, that is, after a match occurs, the next matching search starts immediately after the last match, instead of starting after the pre-query characters. (?! Pattern) negative pre-query, match the search string at the start of any string that does not match negative lookahead matches the search string at any point where a string not matching pattern. This is a non-get match, that is, the match does not need to be obtained for future use. For example, 'windows (?! 95 | 98 | nt | 2000) 'can match "Windows" in "Windows 3.1", but cannot match "Windows" in "Windows 2000 ". Pre-query does not consume characters, that is, after a match occurs, the next matching search starts immediately after the last match, instead of starting after the pre-query characters. X | y matches X or Y. For example, 'z | food' can match "Z" or "food ". '(Z | f) Ood' matches "zood" or "food ". [Xyz] Character Set combination. Match any character in it. For example, '[ABC]' can match 'A' in "plain '. [^ XYZ] combination of negative character sets. Match any character not included. For example, '[^ ABC]' can match 'p' in "plain '. [A-Z] character range. Matches any character in the specified range. For example, '[A-Z]' can match any lowercase letter in the range of 'A' to 'Z. [^ A-Z] negative character range. Matches any character that is not within the specified range. For example, '[^ A-Z]' can match any character that is not in the range of 'A' to 'Z. \ B matches a word boundary, that is, the position between a word and a space. For example, 'er \ B 'can match 'er' in "never", but cannot match 'er 'in "verb '. \ B matches non-word boundaries. 'Er \ B 'can match 'er' in "verb", but cannot match 'er 'in "never '. \ CX matches the control characters specified by X. For example, \ cm matches a control-M or carriage return character. The value of X must be either a A-Z or a-Z. Otherwise, C is treated as an original 'C' character. \ D matches a numeric character. It is equivalent to [0-9]. \ D matches a non-numeric character. It is equivalent to [^ 0-9]. \ F matches a break. It is equivalent to \ x0c and \ Cl. \ N matches a linefeed. It is equivalent to \ x0a and \ CJ. \ R matches a carriage return. It is equivalent to \ x0d and \ cm. \ S matches any blank characters, including spaces, tabs, and page breaks. It is equivalent to [\ f \ n \ r \ t \ v]. \ S matches any non-blank characters. It is equivalent to [^ \ f \ n \ r \ t \ v]. \ T matches a tab. It is equivalent to \ x09 and \ CI. \ V matches a vertical tab. It is equivalent to \ x0b and \ ck. \ W matches any word characters that contain underscores. It is equivalent to '[A-Za-z0-9 _]'. \ W matches any non-word characters. It is equivalent to '[^ A-Za-z0-9 _]'. \ XN matches n, where N is the hexadecimal escape value. The hexadecimal escape value must be determined by the length of two numbers. For example, '\ x41' matches "". '\ X041' is equivalent to '\ x04' & "1 ". The regular expression can use ASCII encoding .. \ Num matches num, where num is a positive integer. References to the obtained matching. For example, '(.) \ 1' matches two consecutive identical characters. \ N identifies an octal escape value or a backward reference. If at least N subexpressions are obtained before \ n, n is a backward reference. Otherwise, if n is an octal digit (0-7), n is an octal escape value. \ Nm identifies an octal escape value or a backward reference. If there are at least is preceded by at least nm obtained subexpressions before \ nm, then nm is backward reference. If at least N records are obtained before \ nm, n is a backward reference followed by text M. If none of the preceding conditions are met, if n and m are Octal numbers (0-7), \ nm matches the octal escape value nm. \ NML if n is an octal digit (0-3) and both M and l are octal digits (0-7), the octal escape value NML is matched. \ UN matches n, where n is a Unicode character represented by four hexadecimal numbers. For example, \ u00a9 matches the copyright symbol (?). The grep Regular Expression tool must be used to write a regular expression. Therefore, we will not explain all the functions of grep here. We will only list a few examples to illustrate how to write a regular expression. $ LS-L | grep '^ a' filters the LS-L output content in the MPs queue and only displays rows starting with. $ Grep 'test' D *: displays all rows containing test in files starting with D. $ Grep 'test' aa bb cc is displayed in the AA, BB, and CC files that match the test row. $ Grep '[A-Z] \ {5 \}' AA displays all rows of a string containing at least five consecutive lowercase characters. $ Grep 'W \ (ES \) T. * \ 1' Aa if the West is matched, the es will be stored in the memory, marked as 1, and any characters (. *). These characters are followed by another ES (\ 1). If they are found, the row is displayed. If you use egrep or grep-E, you do not need to escape the "\" number and directly write it as 'W (ES) T. * \ 1. Grep Regular Expression metacharacters (basic set) ^ start of the anchor row, for example, '^ grep' matches all rows starting with grep. $ The End Of The Anchor row is as follows: 'grep $ 'matches all rows ending with grep .. Match a non-linefeed character, for example, 'gr. P' matches gr followed by any character and then p. * Match zero or multiple previous characters, for example, '* grep'. Match All one or more spaces followed by the grep line.. * Represents any character. [] Matches a character in a specified range, for example, '[Gg] rep' matches grep and grep. [^] Match a character that is not within the specified range, for example, '[^ A-FH-Z] rep' match a line that does not start with a letter that does not contain the A-R and T-Z, followed by Rep. \ (.. \) Mark matching characters, such as '\ (love \)', and love is marked as 1. \ <Specify the start of a word, for example, '\> specify the end of a word, for example, 'grep \>. The characters x \ {M \} are repeated for X and m times. For example, '0 \ {5 \} 'matches the rows containing 5 o. The characters x \ {M, \} are repeated at least m times, for example, 'O \ {5, \} 'matches rows with at least 5 o. The characters x \ {M, N \} are repeated at least m times, and must not be more than N times. For example, the line 'o \ {5, 10 \} 'matches 5--10 O. \ W matches characters and numbers, that is, [A-Za-z0-9], for example, 'g \ W * P' matches 0 or more characters or numbers after G, then p. The inverse form of \ W. It matches one or more non-word characters, such as periods and periods. \ B word lock, for example, '\ bgrepb \' only matches grep. Shell string processing constr_str_zero = hellostr_first = "I Am a string" str_second = 'success' repeated multiple times # repeat the first parm ($1) by $2 timesstrrepeat () {local X = $ 2if ["$ X" = ""]; thenx = 0 filocal str_temp = "" While [$ X-ge 1]; dostr_temp = 'printf "% S % s" "$ str_temp" "$1" 'x = 'expr $ X-1' doneecho $ str_temp} example: str_repeat = 'strrepeat "$ user_name" 3 'echo "Repeat = $ str_repeat" assign and copy values directly the same as the constructed string user_name = Terry assign values from the variable aliase_name = $ The user_name join directly joins two strings str_temp = 'printf "% S % s" "$ str_zero" "$ user_name" '. Using printf, you can perform more complex join operations to evaluate the length of characters (char) count_char = 'echo "$ str_first" | WC-m' echo $ count_char number of bytes) count_byte = 'echo "$ str_first" | WC-C' echo $ count_byte count (word) count_word = 'echo "$ str_first" | WC-W' echo $ count_word comparison str1 = str2 comparison str1! = Str2 example: if ["$ user_name" = "Terry"]; thenecho "I am Terry" fi less than comparison # Return 0 if the two string is equal, return 1 if $1 <$2, else 2 strcompare () {local x = 0 if ["$1 "! = "$2"]; then X = 2 Local temp = 'printf "% s \ n % s" "$1" "$2" 'local temp2 = '(echo "$1 "; echo "$2") | sort 'If ["$ Temp" = "$ temp2"]; then x = 1 fi echo $ x} test-z str judge whether non-empty-n str is a number # Return 0 if the string is num, otherwise 1 strisnum () {local ret = 1if [-n "$1"]; thenlocal str_temp = 'echo "$1" | SED's/[0-9] // g'' if [-z "$ str_temp"]; thenret = 0 then iecho $ RET} example: if [-n "$ user_name"]; thenecho "my name is not empty" fiecho 'strisnum "9980" 'is separated by the symbol +, use sed to separate the characters into the left and right parts: command date -- the output of the rfc-3339 seconds is 15:09:47 + get its + part on the left date -- rfc-3339 seconds | SED's/+ [0-9] [0-9]: [0-9] [0-9] // G' output is 15:09:47 get + part date on the right -- rfc-3339 seconds | SED's /. * + // G' the output is and the string separated by space is used as an example: str_fruit = "banana 0.89 100" Get the 3rd Field echo $ str_fruit | awk '{print $3 ;} 'substring 1 is a substring of string 2 # Return 0 is $1 is substring of $2, otherwise 1 strissubstring () {local x = 1 case "$2" in * $1 *) x = 0; esacecho $ x}
Shell Regular Expression string processing