Use of text processing tools sed and awk

Source: Internet
Author: User

Use of text processing tools sed and awk
I. Text Processing Sankey grep: a text filtering tool. For details, see blog article 2 and sedsed (stream editor): A Row editing tool that works perfectly with regular expressions. During processing, the currently processed rows are stored in the temporary buffer, called the pattern space. Then, the sed command is used to process the content in the buffer, send the buffer content to the screen, and then process the next line. This will be repeated: sed [option]... 'address command' FILE... for example, sed-I '/^ [[: upper:]/D'/etc/fstab # Delete the address of line ① starting with an upper-case letter in/etc/fstab: Row range: start_line, end_line/pattern1/,/pattern2/: All rows between the first row matched by pattern1 and the end of the row matched by pattern2. Specific rows: /pattern/# No address: Full Text ② edit command: p: print. By default, sed prints all rows to the screen. If a row matches the pattern, print the row again. d: delete I \ text: insert (insert) texta \ te above the matched row. Xt: add (attach) textr/path/to/somefile below the matched row: read the row from the specified file and append it to w/path/to/some_file below the matched row: Save the qualified row to the specified file =: display the row number s/address/new content to be replaced with/[gi], or s @ g: global. Replace I: ignore in full text to ignore case-insensitive characters! COMMAND: options:-n: cancel Automatic Display Mode space-r: support extended regular expression-I: modify the original file-e: you can use this option to perform multiple editing operations simultaneously. sed-e 'address 1COMMAND1'-e 'address 2COMMAND2 '... note: ① sed displays the mode space by default, which means that if the row is matched by the mode, the processed result is displayed. If the row is not matched, the original content is displayed, instead of not displaying. Use Option-n to cancel the automatic display mode space ② sed does not modify the original file by default, use the-I option to process the original file sed example: ① print only the content of rows 2nd to 5th in/etc/fstab.

sed -n '2,5p' /etc/fstab[root@node2 ~]# sed -n '2,5p' /etc/fstab## /etc/fstab# Created by anaconda on Thu Aug6 04:30:38 2015#[root@node2 ~]# sed '2,5p' /etc/fstab### /etc/fstab# /etc/fstab# Created by anaconda on Thu Aug6 04:30:38 2015# Created by anaconda on Thu Aug6 04:30:38 2015### Accessible filesystems, by reference, are maintained under '/dev/disk'# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info#UUID=aa0330af-3681-428c-98e2-ccf2e6f0f686 / ext4defaults 1 1UUID=cc46ae66-0720-4c17-89de-ef0d5b72da60 /bootext4defaults 1 2UUID=24de94c2-8c13-4f50-aeb9-82d0b35d654f swapswapdefaults 0 0tmpfs/dev/shm tmpfs defaults 0 0devpts /dev/pts devptsgid=5,mode=6200 0sysfs/syssysfs defaults 0 0proc/procprocdefaults 0 0

 

② Delete the rows starting with an uppercase letter in the original/etc/fstab file and insert "# modified" before Row 3"
Sed-I-e '/^ [[: upper:]/d'-e' 3i \ # modified'/etc/fstab [root @ node2 ~] # Sed-I-e '/^ [[: upper:]/d'-e' 3i \ # modified'/etc/fstab [root @ node2 ~] # Cat/etc/fstab # the original file has been modified #/etc/fstab # Created by anaconda on Thu Aug6 04:30:38 2015 # Accessible filesystems, by reference, are maintained under '/dev/disk' # See man pages fstab (5), findfs (8), mount (8) and/or blkid (8) for more info # tmpfs/dev/shm tmpfs ults 0 0 devpts/dev/pts devptsgid = 5, mode = 6200 0 sysfs/syssysfs defaults 0 0 0 proc/procprocdefaults 0 0

 

③ Echo a path to sed and retrieve its directory name Through sed; for example, echo "/etc/sysconfig/" | sed, return/etcecho '/etc/sysconfig/' | sed's @ [^/] \ {1 ,\}/\? $ @ '3. awk is a powerful programming tool used to process text in linux/unix. Awk is much more powerful than grep and sed. It can format and display each field in each row of the file separately and then use variables (built-in variables and custom variables), loops, conditions, arrays, etc. But in view of the special working mechanism of awk, if grep and sed can do the operation, try not to let awk complete in linux, awk is actually gawk (GNU awk), just to cater to user habits, linux creates a soft link/bin/awk working mechanism for/bin/gawk: awk is processed in the unit of action, it assigns the row content to the variable $0, and slice it according to the default or specified separator (the separator is saved in the variable FS, each segment is stored in a variable starting from $1 and can be referenced or formatted for output. Usage: awk [options] 'project' file... awk [option]... 'pattern' {action1; action2 ...} 'File... common options:-F fs: Specifies the input separator-v Var_name = VALUE: Custom variable-f scriptfile: load the awk script awk-f scriptfile FILE in the specified FILE... [root @ node2 ~] # Cat awk.txt $3 >=500 {print $1, $3} $7 ~ /Nologin $/{print $1} [root @ node2 ~] # Awk-F:-f awk.txt/etc/passwd... wittgenstein 500 dhcpdapachetesla 501 1. PATTERN: Row range: start_line, end_line/pattern1/,/pattern2/Specific Row:/pattern/# expression: >,>=, ==, <, <= ,! = ,~ (Pattern matching ),!~ BEGIN mode: run the following command before the input stream starts. For example, awk 'in in {action2} PATTERN {action1} 'indicates that the action2 END mode is executed before action1: after reading all rows from the input stream, run awk 'pattern {action1} END {action2} 'PATTERN to empty: full text 2. Common action ① Expressions ② Control statements ③ Compound statements ④ input statements ⑤ output statements 3. Variable ① built-in variable: NF: Number of Field, Number of fields NR: number of record, row number; all files are counted together FNR: row number, each file is counted separately; FS: Field Seperator, Field separator when input, by default, space is used to specify the field separator when the input is entered. awk-F: 'pattern {action} awk 'BEGIN {FS = ":"} PATTERN {action} 'rs: record Seperator: Specifies the field separator when the input line delimiter OFS is used for output. The default Delimiter is space ORS: Outpput Row Seperator. The Row delimiter ARGV: array when the output line Delimiter is used, save the awk command itself and parameters. For example, awk '{print $0}' 1.txt 2.txt: ARGV [0] Save "awk", ARGV [1] Save "1.txt ", ARGV [2] saves "2.txt" and ARGC is 3 ARGC: saves the total number of awk commands and parameters FILENAME: name of the current file being processed by awk ② custom variable-v var_name = VALUE variable name case sensitive; (1) variables can be defined in script; (2) you can use the-v option to customize variables in the command line. Example: awk 'in in {a = "hello "; print a} 'awk-v a = "hello" 'In in {print }'★Reference the value of a variable in awk. It does not need to start with $ and has a variable starting with $. It is the value used to reference a field [root @ node2 ~] # Awk 'in in {FS = ":"; OFS = ":"} {print $1, $7} '/etc/passwdroot:/bin/bashbin: /sbin/nologindaemon:/sbin/nologinadm:/sbin/nologinlp:/sbin/nologinsync:/bin/syncshutdown:/sbin/shutdownhalt:/sbin/halt... 4. awk output print item1 [, item2,...] key points: ① each project is separated by a comma, while the output is separated by an output separator; ② each output item can be a string or value, a field in the current record, a variable, or an awk expression. The value is implicitly converted to a string and then output. ③ if the item after print is omitted, equivalent to print $0; Blank output, use pirnt ""; 5. Use the printf command of awk: printf format, item1, item2,... Key points: ① The format must be specified, and the format is used to specify the output format for each item. ② the line feed is not automatically generated, if you want to wrap a line, you need to give indications in \ n format starting with %, followed by a character: % c: ASCII code of the displayed character; % d, % I: decimal integer; % e, % E: Numeric value displayed in scientific notation; % f: floating point number displayed. The default value is six decimal places, rounded to % g. % G: The value is displayed in scientific notation or floating point format; % s: Display string; % u: Display unsigned integer; %: Display % itself; modifier: #: display width-: Left alignment; right alignment by default +: the symbol that displays the value. #: Sample value accuracy: awk-F: '{printf "% 15 s, %-20s \ n", $1, $7} '/etc/passwd specifies that the input Delimiter is a colon and the first and seventh fields are displayed. The bits are 15 and 20, respectively, and the seventh field is left aligned with awk 'begin{ printf "% 15.2f \ n", 3.1415926} '[root @ Node2 ~] # Awk-F: '{printf "% 15 s, %-20s \ n", $1, $7}'/etc/passwd root,/bin/bash bin, /sbin/nologindaemon,/sbin/nologinadm,/sbin/nologin lp,/sbin/nologin sync,/bin/sync shutdown,/sbin/shutdown halt, /sbin/halt... [root @ node2 ~] # Awk 'in in {printf "% 15.2f \ n ", 3.1415926} '196. awk output redirection print items> output-fileprint items | command special file descriptor:/dev/stdin: Standard Input/dev/stdout: standard output/dev/stderr: Incorrect output 7. Arithmetic Operators of awk operators: x + yx-yx * yx/yx ** y, x ^ y (the power of y) x % y-x: negative value + x: Convert to numeric string OPERATOR: Join value operator: =, + =,-=, * =,/=, % =, ^ =, ** =, ++, -- (if the mode itself is a = sign, write it as/=/) comparison operator: <, <=,>, >=, = ,! = ,~ (Pattern matching, the string on the Left can be true by the pattern on the right, otherwise false ),!~ Logical OPERATOR: &, | ■ condition expression: selector? If-true-expression: if-false-expression example: # awk-F: '{$3> = 500? Utype = "common user": utype = "admin or system user"; print $1, "is", utype} '/etc/passwd ■ function call: function_name (argu1, argu2) [root @ node2 ~] # Awk-F: '{$3 >= 500? Utype = "common user": utype = "admin or system user"; print $1, "is", utype} '/etc/passwd... tcpdump is admin or system userwittgenstein is common usermysql is admin or system userdhcpd is admin or system userapache is admin or system usertesla is common user 8. Control statement ① if-else format: if (condition) {then body} else {else body} # awk-F: '{if ($3 >= 500) {print $1, "is a common user"} else {print $1, "is an admin or system user"} '/etc/passwd # awk' {if (NF> = 8) {print} '/etc/inittab ② while format: while (condition) {while body} # awk' {I = 1; while (I <= NF) {printf "% s", $ I; I + = 2}; print "} '/etc/inittab # awk' {I = 1; while (I <= NF) {if (length ($ I) >=6) {print $ I}; I ++} '/etc/inittablength () function: take the length of the string ③ do-while loop format: do {do-while body} while (condition) ④ for loop format: for (variable assignment; condition; iteration process) {for body} # awk '{for (I = 1; I <= NF; I + = 2) {printf "% s", $ I }; print ""} '/etc/inittab # displays the odd fields in each line. Note: print will automatically wrap, while printf will not wrap automatically # awk' {for (I = 1; I <= NF; I ++) {if (length ($ I)> = 6) {print $ I }}'/etc/inittab for loop can be used to traverse array elements: Syntax: for (I in array) {for body} ⑤ case statement format: switch (expression) {case VALUE or/RGEEXP/: statement1 ;... default: stementN} ⑥ loop control breakcontinue 7. next: process the current row before entering the next row;
# Awk-F: '{if ($ 3% 2 = 0) {next}; print $1, $3} '/etc/passwd # users with an odd UID # awk-F:' {if (NR % 2 = 0) {next}; print NR, $1} '/etc/passwd # users with odd rows displayed [root @ node2 ~] # Awk '{if (NF> = 8) {print} '/etc/inittab # display the rows with no less than 8 fields in the specified file # inittab is only used by upstart for the default runlevel. # adding other configuration here will have no effect on your system. # Terminal gettys are handled by/etc/init/tty. conf and/etc/init/serial. conf, # For information on how to write upstart event handlers, or how # upstart works, see init (5), init (8), and initctl (8 ).... [root @ node2 ~] # Awk '{for (I = 1; I <= NF; I + = 2) {printf "% s", $ I }; print ""} '/etc/inittab # is used upstart the runlevel. # other here have effect your # initialization started/etc/init/rcS. conf ## runlevels started/etc/init/rc. conf... [root @ node2 ~] # Awk-F: '{if ($ 3% 2 = 0) {next}; print $1, $3} '/etc/passwdbin 1adm 3 sync 5 halt 7 operator 11 gopher 13 nobody 99 limit 81...

 

9. array [index-expression] index-expression: any string can be used. If an array element does not exist in advance, awk automatically creates this element and initializes it as an empty string. Therefore, to determine whether an element exists in an array, the format of "index in array" must be used; A [first] = "hello awk" print A [second] to traverse every element in the array, use the following special structure: for (var in array) {for body} note: var will traverse the index of the array instead of the value of the array element; delete the array element: delete array [index] example: # netstat-tan | awk '/^ tcp/{state [$ NF] ++} END {for (s in state) {print s, state [s]} '# count the number of states in the current tcp connection # awk '{ Ip [$1] ++} END {for (I in ip) {print I, ip [I]} '/var/log/httpd/access_log [root @ node2 ~] # Netstat-tan | awk '/^ tcp/{state [$ NF] ++} END {for (s in state) {print s, state [s]} 'Established 3 LISTEN 14 10. awk built-in functions ① split (string, array [, fieldsep [, seps: slice the string represented by string with fieldsep as the separator, and save the result after slicing to an array named after array; array subscript starts from 1; this function has a return value, example of the number of elements returned after slicing: # awk 'in in {split ("root: x: 0: 0", user ,":"); print user [1]} '# netstat-tn | awk'/^ tcp/{lens = split ($5, client ,":"); ip [client [lens-1] ++} END {for (I in ip) print I, ip [I]} '# display the IP address connected to the remote client over tcp and the number of connections ② length (string) function: return the length of the given string ③ substr (string, start [, length]) function: obtains the length of a substring from a string and the start position. [root @ node2 ~] # Awk 'in in {split ("root: x: 0: 0", user, ":"); print user [1]} 'root [root @ node2 ~] # Netstat-tn | awk '/^ tcp/{lens = split ($5, client ,":"); ip [client [lens-1] ++} END {for (I in ip) print I, ip [I]} '2017. 168.30.1 3 awk example: ① awk-F: '$3> = 500 {print $1}'/etc/group ②: awk-F: '$7 ~ /Nologin $/{print $1} '/etc/passwd ③ displays the configuration information of the eth0 Nic configuration file, only awk-F = '{pring $2}'/etc/sysconfig/network-scritps/eth0 ④ show/etc/sysctl. the Parameter Name of the Kernel Parameter defined in the conf file is awk-F = '/^ [^ #]/{print $1}'/etc/sysctl. conf ⑤ display the IP address of the eth0 Nic ifconfig eth0 | awk-F: '/inet addr/{print $2}' | awk '{print $1 }'

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.