Shell Regular Expression string processing

Source: Internet
Author: User
Tags control characters grep regular expression rfc
A regular expression is a text mode consisting of common characters (such as characters A to Z) and special characters (such as metacharacters. This mode describes one or more strings to be matched when searching the text subject. A regular expression is used as a template to match a character pattern with the searched string. \ Mark the next character as a special character, a literal character, a back reference, or an octal escape character. For example, 'n' matches the character "N ". '\ N' matches a line break. The sequence '\' matches "\" and "\ (" matches "(". ^ Matches the start position of the input string. $ Matches the end position of the input string. * Matches the previous subexpression zero or multiple times. For example, Zo * can match "Z" and "Zoo ". * Is equivalent to {0 ,}. + Match the previous subexpression once or multiple times. For example, 'Zo + 'can match "zo" and "Zoo", but cannot match "Z ". + Is equivalent to {1 ,}.? Match the previous subexpression zero or once. For example, "Do (ES )? "Can match" do "in" do "or" does ".? It is equivalent to {0, 1 }. {N} n is a non-negative integer. Match n times. For example, 'O {2} 'cannot match 'O' in "Bob", but can match two o in "food. {N,} n is a non-negative integer. Match at least N times. For example, 'O {2,} 'cannot match 'O' in "Bob", but can match all o in "foooood. 'O {1,} 'is equivalent to 'o + '. 'O {0,} 'is equivalent to 'o *'. Both {n, m} m and n are non-negative integers, where n <= m. Match at least N times and at most m times. "O {1, 3}" will match the first three o in "fooooood. 'O {0, 1} 'is equivalent to 'o? '. Please note that there must be no space between the comma and two numbers .? When this character is followed by any other delimiter (*, + ,?, The matching mode after {n}, {n ,}, {n, m}) is not greedy. The non-Greedy mode matches as few searched strings as possible, while the default greedy mode matches as many searched strings as possible. For example, for strings "oooo", 'O ++? 'Will match a single "O", and 'O +' will match all 'O '.. Matches any single character except "\ n. To match any character including '\ n', use a pattern like' [. \ n. (Pattern) matches pattern and obtains this match. The obtained match can be obtained from the generated matches set. The submatches set is used in VBScript, and $0… is used in Visual Basic Scripting Edition... $9 attribute. To match the parentheses, use '\ (' or '\)'. (?: Pattern) matches pattern but does not get the matching result. That is to say, this is a non-get match and is not stored for future use. This is useful when you use the "or" character (|) to combine each part of a pattern. For example, 'industr (?: Y | ies) is a simpler expression than 'industry | industries. (? = Pattern) Forward pre-query: matches the search string at the beginning of any string that matches pattern. This is a non-get match, that is, the match does not need to be obtained for future use. For example, 'windows (? = 95 | 98 | nt | 2000) 'can match "Windows" in "Windows 2000", but cannot match "Windows" in "Windows 3.1 ". Pre-query does not consume characters, that is, after a match occurs, the next matching search starts immediately after the last match, instead of starting after the pre-query characters. (?! Pattern) negative pre-query, match the search string at the start of any string that does not match negative lookahead matches the search string at any point where a string not matching pattern. This is a non-get match, that is, the match does not need to be obtained for future use. For example, 'windows (?! 95 | 98 | nt | 2000) 'can match "Windows" in "Windows 3.1", but cannot match "Windows" in "Windows 2000 ". Pre-query does not consume characters, that is, after a match occurs, the next matching search starts immediately after the last match, instead of starting after the pre-query characters. X | y matches X or Y. For example, 'z | food' can match "Z" or "food ". '(Z | f) Ood' matches "zood" or "food ". [Xyz] Character Set combination. Match any character in it. For example, '[ABC]' can match 'A' in "plain '. [^ XYZ] combination of negative character sets. Match any character not included. For example, '[^ ABC]' can match 'p' in "plain '. [A-Z] character range. Matches any character in the specified range. For example, '[A-Z]' can match any lowercase letter in the range of 'A' to 'Z. [^ A-Z] negative character range. Matches any character that is not within the specified range. For example, '[^ A-Z]' can match any character that is not in the range of 'A' to 'Z. \ B matches a word boundary, that is, the position between a word and a space. For example, 'er \ B 'can match 'er' in "never", but cannot match 'er 'in "verb '. \ B matches non-word boundaries. 'Er \ B 'can match 'er' in "verb", but cannot match 'er 'in "never '. \ CX matches the control characters specified by X. For example, \ cm matches a control-M or carriage return character. The value of X must be either a A-Z or a-Z. Otherwise, C is treated as an original 'C' character. \ D matches a numeric character. It is equivalent to [0-9]. \ D matches a non-numeric character. It is equivalent to [^ 0-9]. \ F matches a break. It is equivalent to \ x0c and \ Cl. \ N matches a linefeed. It is equivalent to \ x0a and \ CJ. \ R matches a carriage return. It is equivalent to \ x0d and \ cm. \ S matches any blank characters, including spaces, tabs, and page breaks. It is equivalent to [\ f \ n \ r \ t \ v]. \ S matches any non-blank characters. It is equivalent to [^ \ f \ n \ r \ t \ v]. \ T matches a tab. It is equivalent to \ x09 and \ CI. \ V matches a vertical tab. It is equivalent to \ x0b and \ ck. \ W matches any word characters that contain underscores. It is equivalent to '[A-Za-z0-9 _]'. \ W matches any non-word characters. It is equivalent to '[^ A-Za-z0-9 _]'. \ XN matches n, where N is the hexadecimal escape value. The hexadecimal escape value must be determined by the length of two numbers. For example, '\ x41' matches "". '\ X041' is equivalent to '\ x04' & "1 ". The regular expression can use ASCII encoding .. \ Num matches num, where num is a positive integer. References to the obtained matching. For example, '(.) \ 1' matches two consecutive identical characters. \ N identifies an octal escape value or a backward reference. If at least N subexpressions are obtained before \ n, n is a backward reference. Otherwise, if n is an octal digit (0-7), n is an octal escape value. \ Nm identifies an octal escape value or a backward reference. If there are at least is preceded by at least nm obtained subexpressions before \ nm, then nm is backward reference. If at least N records are obtained before \ nm, n is a backward reference followed by text M. If none of the preceding conditions are met, if n and m are Octal numbers (0-7), \ nm matches the octal escape value nm. \ NML if n is an octal digit (0-3) and both M and l are octal digits (0-7), the octal escape value NML is matched. \ UN matches n, where n is a Unicode character represented by four hexadecimal numbers. For example, \ u00a9 matches the copyright symbol (?). The grep Regular Expression tool must be used to write a regular expression. Therefore, we will not explain all the functions of grep here. We will only list a few examples to illustrate how to write a regular expression. $ LS-L | grep '^ a' filters the LS-L output content in the MPs queue and only displays rows starting with. $ Grep 'test' D *: displays all rows containing test in files starting with D. $ Grep 'test' aa bb cc is displayed in the AA, BB, and CC files that match the test row. $ Grep '[A-Z] \ {5 \}' AA displays all rows of a string containing at least five consecutive lowercase characters. $ Grep 'W \ (ES \) T. * \ 1' Aa if the West is matched, the es will be stored in the memory, marked as 1, and any characters (. *). These characters are followed by another ES (\ 1). If they are found, the row is displayed. If you use egrep or grep-E, you do not need to escape the "\" number and directly write it as 'W (ES) T. * \ 1. Grep Regular Expression metacharacters (basic set) ^ start of the anchor row, for example, '^ grep' matches all rows starting with grep. $ The End Of The Anchor row is as follows: 'grep $ 'matches all rows ending with grep .. Match a non-linefeed character, for example, 'gr. P' matches gr followed by any character and then p. * Match zero or multiple previous characters, for example, '* grep'. Match All one or more spaces followed by the grep line.. * Represents any character. [] Matches a character in a specified range, for example, '[Gg] rep' matches grep and grep. [^] Match a character that is not within the specified range, for example, '[^ A-FH-Z] rep' match a line that does not start with a letter that does not contain the A-R and T-Z, followed by Rep. \ (.. \) Mark matching characters, such as '\ (love \)', and love is marked as 1. \ <Specify the start of a word, for example, '\> specify the end of a word, for example, 'grep \>. The characters x \ {M \} are repeated for X and m times. For example, '0 \ {5 \} 'matches the rows containing 5 o. The characters x \ {M, \} are repeated at least m times, for example, 'O \ {5, \} 'matches rows with at least 5 o. The characters x \ {M, N \} are repeated at least m times, and must not be more than N times. For example, the line 'o \ {5, 10 \} 'matches 5--10 O. \ W matches characters and numbers, that is, [A-Za-z0-9], for example, 'g \ W * P' matches 0 or more characters or numbers after G, then p. The inverse form of \ W. It matches one or more non-word characters, such as periods and periods. \ B word lock, for example, '\ bgrepb \' only matches grep. Shell string processing constr_str_zero = hellostr_first = "I Am a string" str_second = 'success' repeated multiple times # repeat the first parm ($1) by $2 timesstrrepeat () {local X = $ 2if ["$ X" = ""]; thenx = 0 filocal str_temp = "" While [$ X-ge 1]; dostr_temp = 'printf "% S % s" "$ str_temp" "$1" 'x = 'expr $ X-1' doneecho $ str_temp} example: str_repeat = 'strrepeat "$ user_name" 3 'echo "Repeat = $ str_repeat" assign and copy values directly the same as the constructed string user_name = Terry assign values from the variable aliase_name = $ The user_name join directly joins two strings str_temp = 'printf "% S % s" "$ str_zero" "$ user_name" '. Using printf, you can perform more complex join operations to evaluate the length of characters (char) count_char = 'echo "$ str_first" | WC-m' echo $ count_char number of bytes) count_byte = 'echo "$ str_first" | WC-C' echo $ count_byte count (word) count_word = 'echo "$ str_first" | WC-W' echo $ count_word comparison str1 = str2 comparison str1! = Str2 example: if ["$ user_name" = "Terry"]; thenecho "I am Terry" fi less than comparison # Return 0 if the two string is equal, return 1 if $1 <$2, else 2 strcompare () {local x = 0 if ["$1 "! = "$2"]; then X = 2 Local temp = 'printf "% s \ n % s" "$1" "$2" 'local temp2 = '(echo "$1 "; echo "$2") | sort 'If ["$ Temp" = "$ temp2"]; then x = 1 fi echo $ x} test-z str judge whether non-empty-n str is a number # Return 0 if the string is num, otherwise 1 strisnum () {local ret = 1if [-n "$1"]; thenlocal str_temp = 'echo "$1" | SED's/[0-9] // g'' if [-z "$ str_temp"]; thenret = 0 then iecho $ RET} example: if [-n "$ user_name"]; thenecho "my name is not empty" fiecho 'strisnum "9980" 'is separated by the symbol +, use sed to separate the characters into the left and right parts: command date -- the output of the rfc-3339 seconds is 15:09:47 + get its + part on the left date -- rfc-3339 seconds | SED's/+ [0-9] [0-9]: [0-9] [0-9] // G' output is 15:09:47 get + part date on the right -- rfc-3339 seconds | SED's /. * + // G' the output is and the string separated by space is used as an example: str_fruit = "banana 0.89 100" Get the 3rd Field echo $ str_fruit | awk '{print $3 ;} 'substring 1 is a substring of string 2 # Return 0 is $1 is substring of $2, otherwise 1 strissubstring () {local x = 1 case "$2" in * $1 *) x = 0; esacecho $ x}

  

Shell Regular Expression string processing

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.