Regular expression (Awk,sed,awk) learning under Linux

Last Update:2017-04-28 Source: Internet

Author: User

Tags uppercase letter

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

 A regular expression: a regular expression (or regular expression, called re) is a text pattern consisting of ordinary characters (such as characters A through Z) and special characters (called metacharacters). This pattern describes one or more strings to match when looking up a text body. A regular expression, as a template, matches a character pattern to the string you are searching for. Simply put, the regular expression is the method of processing the string, it is the behavior unit to conduct the string processing behavior, regular expression through some special symbols of the auxiliary, can let the user easily reach the searchDeletea handler that supersedes a particular string. Regular expressions are supported by commands such as Vim, grep, find, awk, sed, and so on. Common Regular Expressions:1,. Represents any single character, such as:/L. e/with an L, followed by two characters, and then matched with a line of e? Matches 0 characters or one character. Such as:'gr?p'match GR followed by one or no characters, then the lines of P2, ^ represents the beginning of the line. ^Love like: matches all lines that begin with Love3, $ represents the end of the line. love$ such as: Match all love end lines Then '^$ ' means blank line4, [...] Match one of the characters in parentheses [ABC] to match a single character A or B or c[123] matches a single character 1 or 2 or 3[a-Z] Match lowercase letter A-Z One [a-za-Z] Match any of the English letters [0-9a-za-Z] Match any English letter or number note: The single and one of the red ones above, no matter how complex it is, the result is a character! can be usedThe ^ tag does a prefix within [], which represents characters other than the characters within []. For example, the search for a string without G before OO line. Application'[^g]oo'As a search string, the ^ symbol is negative if it appears at the start of [], but the other position in [] is the normal character. [^ab^c] Match B or ^or C or any single character that is not a5、*used to decorate a leading character, indicating that a leading character appears 0 or more times, such as:'A*grep'Matches all 0 or more rows immediately following the grep. “.*"represents an arbitrary string6、\?used to decorate a leading character, indicating that a leading character appears 0 or 1 times a\?match 0 or 1 a7, \+used to decorate a leading character, indicating that a leading character appears 1 or more a\+match 1 or more a8, \{n,m\} is used to decorate leading characters, indicating that leading characters appear n to M times (N and M are integers, and n<m) a\{3,5\} matches 3 to 5 consecutive a\{n,m\} There are several other forms: \{n\} consecutive n leading characters \{n,\} at least n leading characters in a row9, \ is used to escape a single special character immediately following it, so that the special word literal as normal such as:^\. [0-9][0-9] to start with a period and two numbers for example: a*match any successive (also including 0) aa\?match 0 or 1 aa\+match 1 or more aa\{3,5\} matches 3 to 5 consecutive a\.*Match 0 or more consecutive. \. Indicates a normal character periodTen, | represents, or, for example, A|b|c matches a or B or C. such as: grep|sed match grep or sed One, (), to synthesize parts of a unit group, such as to search for glad or good can be as follows'g (La|oo) d'General Example 1:1Christian Scott lives here and would put on a Christmas party.2There is around -To *people invited.3They is:4Tom5Dan6Rhonda Savage7Nicky and kimerly.8Steve, Suzanne, Ginger and Larry.^[a-Z].. The search line begins with a letter A to Z, followed by two arbitrary letters, followed by a line with a newline character. The 5th line will be found. ^[a-z][a-z]*3[0-5] The search starts with a capital letter followed by 0 or more lowercase letters, then a number 3, followed by a number between 0-5. Could not find matching row (changed to^[a-z][a-z]*.*3[0-5] to find the 2nd line)^ *[a-z][a-z][a-The z]$ search begins with 0 or more spaces, followed by an uppercase letter, two lowercase letters, and a transfer character. The 4th row of Tom (the whole row match) and the 5th row will be found. Note that*There is a space in front. ^[a-za-z]*[^,][a-za-z]*$ will look for 0 or more uppercase or lowercase letters, not followed by commas, followed by 0 or more uppercase or lowercase letters, followed by a transfer character. Rows 4th and 5 will be found. Comprehensive example 2:# LS-l/bin | Grep'^...s'The above command is used to find the suid file; # ls-LR/USR | Grep'^...s. S'The above command is used to find the suid and GUIDs. second, the use of grep command grep (GlobalSearch Regular expression (RE) and print outThe line, which searches for regular expressions and prints them out, is a powerful text-search tool that uses regular expressions to search for text and print matching lines. Parameters:1. -A num,--after-context=num lines are listed in addition to the rows that are compliant, and the following Num. such as: $ grep–a1Panda file (searches for a row with a panda style from file and displays the following 1 rows of that row)2. -B num,--before-context=num and-A num is relative, but this parameter is displayed in the NUM line except that it matches the line and appears before it. such as: (Search for a line with a panda style from file and display the first 1 rows of that row) $ grep-B1Panda File3,-c [NUM],-num,--context[=num] Lists rows that are outside the line and lists the upper and lower num rows, with the default value of 2. such as: (list file In addition to the line containing the panda style and out of its top and bottom 2 lines) (to change the default value, directly change num can) $ grep-C[num] Panda file4,-C,--count does not display matching style rows and displays only the total number of rows that are compliant. If you add-V,--invert-match, parameter shows the total number of rows that are not met5,-i,--ignore- CaseIgnore case Differences6,-n,--line-number prints line numbers in front of matching lines7,-v,--revert-match reverse retrieval, showing only rows that do not match8, Exact match: for example, when extracting a string " -", The returned result contains a containing" such as 484 and 483 " -other strings, you should actually extract exactly 48 rows. An efficient way to extract an exact match using grep is to add a string after extracting the>. Suppose you now extract 48 accurately, as follows: #grep'48\>'filename9、-s does not display error messages that do not exist or have no matching text such as: Execute command grep"Root"/etc/password, because the password file does not exist, so the error message is printed on the screen, if you use the grep command-s switch, can block error messages to use grep this tool, in fact, is to write a good regular expression, so here does not have all the functions of grep to explain, only a few examples, explain a regular expression of the wording. $ ls-L | Grep'^d'filter LS by pipeline-l output, showing only lines that begin with D. $ grep'Test'd*displays all rows that contain test in a file that begins with D. $ grep'Test'The AA bb cc Displays the line that matches test in the aa,bb,cc file. $ grep'[A-z]\{5,\}'AA Displays all lines that contain a string of at least 5 consecutive lowercase characters for each string. $grep ' T[a|e]st ' filename displays all rows that contain test or tast. $grep'\.$'filename Displays all rows that end with. Iii. usage of the SED command sed is an online editor that processes a single line of content at a time. When processing, the currently processed rows are stored in a temporary buffer called pattern space, followed by the SED command to process the contents of the buffer, and after processing is done, the contents of the buffer are sent to the screen. Then the next line is processed, so it repeats until the end of the file. The file content does not change unless you use redirection to store the output. Basic commands for SED:1. Replace: s command1.1basic usage: sed's/day/night/'<old >NewThis example replaces the day of the first occurrence of each line in the file old with night, outputting the result to a fileNews"Replace"Command/.. /.. /Separator (Delimiter) Day search string Night Replace string in fact, the delimiter"/"Can be replaced with other symbols, such as",","|"and so on. such as: sed's/\/usr\/local\/bin/\/common\/bin/'<old >Newequivalent to sed'S_/usr/local/bin_/common/bin_'<old >NewObviously, this time with"_"As a split-character ratio"/"much better .1.2With &A string that represents a match may sometimes want to add some characters around or near the matched string. For example: sed's/abc/(ABC)/'<old >NewThis example adds parentheses around the found ABC. This example can also be written as Sed's/abc/(&)/'<old >NewThe following is a more complex example: sed's/[a-z]*/(&)/'<old >Newsed defaults to replace only the first occurrence of the search string, using/g can replace search string all $ sed's/test/mytest/g'Example-----Replace test with mytest within the entire row range. If there is no G tag, only the first matching test of each row is replaced with mytest. $ sed's/^192.168.0.1/&localhost/'Example-----& symbol indicates the part that is found in the replacement string. All with 192.168.0The line starting with. 1 is replaced by its self-added localhost, which becomes 192.168.0. 1localhost. $ sed's#10#100#g'Example-----No matter what character, followed by the S command is considered a new delimiter, so, "#" Here is the delimiter, instead of the default "/"separator character. means to replace all 10 with 100. If you need to make multiple modifications to the same file or row, you can use the"- e"Options

Get eth0 network card IP address:

2. Delete line: D command

Remove all rows that contain "how" from a file

Displays the contents of the/etc/passwd and prints the line number while deleting the 2~5

Attached: The NL command is used in the Linux system to calculate the line number of a file. NL can automatically add the output file content to the line number

If you want to delete line 2nd, you can use NL/ETC/PASSWD | Sed ' 2d ' to achieve, as if to delete the 3rd to the last line, it is NL/ETC/PASSWD | Sed ' 3, $d ' can.

3. Add row: A command (added after the specified line) or I command (added before the specified line)

A can be followed by strings, and the strings appear in a new line

Add a new line to the word "XXXXX" after the second line of/etc/passwd

Add a new line to the word "XXXXX" before the second line of/etc/passwd

If you want to add more than one row at a time, use a backslash \ for adding new rows between each row

4. Replace line: C command

C can be followed by strings, which can replace rows between n1,n2

5. Print: P command

Sed '/north/p ' datafile output all rows by default, find rows in north to repeat print

Sed–n '/north/p ' datafile suppresses default output, prints only rows found north

nl/etc/passwd | Sed-n ' 5,7p ' lists only the 5th to 7th line of content in the/etc/passwd file

Note: the-i option of SED can directly modify the contents of the file

6. Extended:

There are three ways of calling Sed:

L type commands at the command line

L INSERT the SED command into the script file and then call SED

L INSERT the SED command into the script file and make the SED script executable.

A, use the SED command-line format as:

sed [Options] sed command input file.

Remember to add single quotes to the actual command when using the SED command at the command line. SED also allows double quotes.

B. Use the SED script file in the following format:

sed Options - f sed script file input file

C, to use the SED script file with the SED command interpreter in the first line, in the following format:

sed script file [options] input file

Whether you use the shell command line or the script file, sed accepts input from standard input, typically a keyboard or redirection result, if no input file is specified.

The SED options are as follows:

-F,--filer=script-file boot sed script file name

V. awk command:

Awk is also a data processing tool! Rather than sed, which often acts on a whole line of processing, awk prefers to divide a row into several fields to handle.

The most basic function of the awk language is to decompose the extracted information in a file or string based on a specified rule, or to output data based on a specified rule.

There are three ways of calling Awk

1. Command line mode

awk [-F field-separator] ' commands ' input-files

where the [-F domain delimiter] is optional, because awk uses a space or TAB key as the default field delimiter, so if you want to browse text with spaces between domains, you do not have to specify this option, and if you want to browse such files, such as the passwd file, with colons as separators, you must indicate the-f option, such as: awk -F: ' Commands ' input-file.

Commands is the real awk command, Input-files is the file to be processed.

The output is as follows: Split the field to the first field in each row

Iput_files can be a list of files of more than one file, and awk will process each file in the list sequentially.

In awk, each line in a file, separated by a domain delimiter, is called a domain. In general, the default field delimiter is a space or TAB key without naming the-F domain delimiter.

2. Shell Script mode

Insert all the awk commands into a file and make the awk program executable, and then awk the command interpreter as the first line of the script in order to invoke it by typing the script name.

Equivalent to the first line of the shell script: #!/bin/sh can be replaced by: #!/bin/awk

3. Insert all the awk commands into a separate file, and then call:

Awk-f Awk-script-file Input-files

Where the-f option loads the awk script in Awk-script-file, Input-files is the same as above.

Awk's patterns and actions

Any awk statement is made up of patterns and actions (Awk_pattern {actions}).
There may be many statements in an awk script.

The mode section determines when an action statement triggers and triggers an event. Processing is the manipulation of the data. If you omit the mode part, the action remains in the execution state at all times. That is, the corresponding actions are performed when the input records are not matched when omitted.

A pattern can be any conditional statement or regular expression, and so on. The Awk_pattern can be of several types:

1) Regular expressions used as awk_pattern:/regexp/

For example: awk '/^[a-z]/' input_file

2) The Boolean expression is used as Awk_pattern, and when the expression is set, the corresponding actions are executed.

Variables can be used in ① expressions (such as field variables $1,$2, etc.) and/regexp/

② operators in Boolean expressions:

Relational operators: <> <= >= = = =!
Match operator: value ~/regexp/if value matches/regexp/, returns true
value!~/regexp/if value does not match/regexp/, returns true
For example: awk ' $ > {print ' OK '} ' input_file
awk ' $ $ ~/^d/{print ' OK '} ' input_file

③&& (with) and | | (or) You can connect two/regexp/or Boolean expressions to form a mixed expression. (non) can be used in Boolean expressions or before/regexp/.

For example: awk ' ($1<) && ($ >) {print "OK"} ' input_file
awk '/^d/| | /x$/{print "OK"} ' input_file

The pattern consists of two special fields begin and end. Use the BEGIN statement to set the count and print head. The BEGIN statement is used before any text-browsing action, and then the text-browsing action is executed according to the input text. The end statement is used to print the total and end status flags of the output text after awk completes the text-browsing action.

The actual action is indicated within the curly braces {}. Most of the actions are for printing, but there are some longer codes such as I f and loop statements and loop exit structures. If you do not specify an action, awk prints out all the records that were browsed.

When Awk executes, its browse domain is marked as $1,$2 ... $n. This method is called the domain identifier. Using these domain identities will make it easier to further process the domain.

Use $ $ to refer to fields 1th and 3rd, and note that the fields are delimited by commas. If you want to print one with 5 fields

The records of all fields, do not have to specify $ $, $ $, $ $, $4, $, can be used, meaning all domains.

To print a field or all domains, use the Print command. This is an awk action

Awk runs the process:

① if the begin chunk exists, awk executes the actions it specifies.

②awk reads a row from the input file, called an input record. (If the input file is omitted, it will be read from the standard input)

③awk divides the read-in record into a field, puts the 1th field in the variable, the 2nd field into the $ $, and so on. $ A indicates the entire record.

④ the current input records in sequence with each awk_cmd in Awk_pattern, see if they match, if they match, execute the corresponding actions. If they do not match, the corresponding actions are skipped until all awk_cmd are compared.

⑤ when an input record compares all awk_cmd, awk reads the next line of input, continues repeating steps ③ and ④, until awk reads the end of the file.

⑥ when awk has finished reading all the input lines, if there is an end, the appropriate actions are performed.

Instance:

Example 1: Displaying the user name and login shell in the/etc/passwd file

Displays the/etc/passwd account and the shell for the account, and the TAB key between the account and the shell

Displays the user name and login shell in the/etc/passwd file, separated by commas between the account and the shell

Note:

1.awk followed by two single quotes with curly braces {} to set the processing action you want to make to the data

The 2.awk workflow is done by first executing the beging, then reading the file, reading a record with a newline character split, then dividing the record by the specified domain delimiter, populating the field, and $ A for all fields, representing the first domain, $n

Regular expression (Awk,sed,awk) learning under Linux

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More