[Text Processing] use of awk and sed-Updating

Source: Internet
Author: User

I. Regular Expression Introduction

Basic metacharacters (basic regular expressions): character matching :. match any character except the line break [] character group metacharacters. If the metacharacters are within [], it will lose special meaning and do not need to escape [^] Except for the number of characters in the character group. match: * match the first character Zero or multiple times \? Zero or one \ {M, N \} At least m times, up to n \ {M, \} m times of Anchor: \ <, \ B-start anchor \>, \ B-end anchor-start of ^ line $ end of line ^ $ empty line. * Any string group: \ (\) \ 1, \ 2 forward reference, \ 1 content extension metacharacters in the first bracket (sed-r extension Regular Expression ): do not need to add \ escape *? {M, n} () \ 1, \ 2 + match the previous character once or multiple times | or

Tested, sed does not seem to support laziness.

For details, see the regular expression for 30 minutes:

Http://www.cnblogs.com/deerchao/archive/2006/08/24/zhengzhe30fengzhongjiaocheng.html



Ii. awk usage


1. Reference external variables

Format: awk-V variable = "$ variable" 'In in {print variable} 'this variable can be referenced in the end three places in the middle of begin.
[[email protected] awk]# time=`date +%s`[[email protected] awk]# echo $time1404716831[[email protected] awk]# awk -v d=$time ‘BEGIN{print d}‘1404716831[[email protected] awk]#


2. Internal Variables

FS: delimiter of the input field. The default Delimiter is space. Regular Expressions are supported. For example, the [] character group is used as the separator below.

NF: number of fields currently processed. $0 indicates the entire record. $1 indicates the first field and $ NF indicates the last field.

[[email protected] awk]# cat boracle:x:500:500::/home/oracle:/bin/bash[[email protected] awk]# awk ‘BEGIN{FS="[:/]"}{for (i=1;i<=NF;i++){print $i}}‘ boraclex500500homeoraclebinbash[[email protected] awk]#


Nr: Number of current records (number of records in all files)

FNR: current number of records per file

[[email protected] awk]# cat c dc1c2c3c4d5d6d7d8[[email protected] awk]# awk ‘BEGIN{printf("%5s%5s%5s\n","data"," NR"," FNR")}{printf("%5s%5s%5s\n",$0,NR,FNR)}‘ c d data   NR  FNR   c1    1    1   c2    2    2   c3    3    3   c4    4    4   d5    5    1   d6    6    2   d7    7    3   d8    8    4[[email protected] awk]#

NR and FNR are often used when processing multiple files. For example, Nr = FNR can be used together with next to process the first file before processing the second file, for example:


Next: Let's look at the two statements below. If there is no next, read a record. After the first {} is executed, the second {} will be executed {}, so his results will be printed again in the first file.

Add the next statement: Execute next in the first {}, which means to skip all subsequent statements and re-read a new record for matching.



[[email protected] awk]# awk ‘NR==FNR{print $0;next}{print $0}‘ c dc1c2c3c4d5d6d7d8[[email protected] awk]# awk ‘NR==FNR{print $0}{print $0}‘ c dc1c1c2c2c3c3c4c4d5d6d7d8[[email protected] awk]#


OFS: delimiter of the output field. The default value is space.

ORS: Specifies the output line delimiter. The default value is a line break.

[[Email protected] awk] # Cat boracle: X: 500: 500:/home/Oracle: /bin/bash [[email protected] awk] # awk-v fs = ":"-v ofs = "=" '{print $1, $2, $3} 'boracle = x = 500 [[email protected] awk] # awk-v fs = ":" '{print $1, $2, $3} 'boracle X 500 use-V to pass OFS and FS values here
[[email protected] awk]# awk ‘1‘ boracle:x:500:500::/home/oracle:/bin/bashroot:x:500:500::/home/oracle:/bin/bashhxw168:x:500:500::/home/oracle:/bin/bash[[email protected] awk]# awk ‘BEGIN{ORS="***";}{print $0}‘ boracle:x:500:500::/home/oracle:/bin/bash***root:x:500:500::/home/oracle:/bin/bash***hxw168:x:500:500::/home/oracle:/bin/bash***[[email protected] awk]#


RS: the input record delimiter. The default value is a line break.

A nonexistent input separator *** is used, so the entire file content is read into $0.

[[email protected] awk]# cat e[a]name=1sex=2age=3[b]address=gd sd[[email protected] awk]# awk -v RS="***" ‘{print $0,"-----"NR}‘ e[a]name=1sex=2age=3[b]address=gd sd -----1[[email protected] awk]#


3. Array

The array can be used directly. It does not need to be declared in advance and is initialized with 0 or null.


Split function: divides a string into multiple fields using the specified delimiter and stores it in an array to return the number of segments.


Common loops of Arrays:

For (array subscript in array) # The printed array is out of order.

{

Print array subscript, array [array subscript];

}

As follows:

[[email protected] awk]# cat boracle:500:home:oracle:/bin/bash[[email protected] awk]# awk ‘{i=split($0,a,":");for(x in a){print x,a[x]};}‘ b4 oracle5 /bin/bash1 oracle2 5003 home[[email protected] awk]#


Common examples:

Count the number of Logon times per IP address using SSH

[[email protected] log]# cat secure.3Jun 16 10:04:47 localhost sshd[5360]: Accepted password for root from 10.9.11.44 port 7520 ssh2Jun 16 10:04:47 localhost sshd[5360]: pam_unix(sshd:session): session opened for user root by (uid=0)Jun 16 12:21:27 localhost sshd[5360]: pam_unix(sshd:session): session closed for user rootJun 17 16:23:53 localhost sshd[9174]: Accepted password for root from 10.9.11.44 port 6651 ssh2Jun 17 16:23:53 localhost sshd[9174]: pam_unix(sshd:session): session opened for user root by (uid=0)Jun 17 19:00:09 localhost sshd[9174]: pam_unix(sshd:session): session closed for user rootJun 18 09:22:33 localhost sshd[11487]: Accepted password for root from 10.9.11.44 port 58455 ssh2Jun 18 09:22:33 localhost sshd[11487]: pam_unix(sshd:session): session opened for user root by (uid=0)Jun 18 18:30:56 localhost sshd[11487]: pam_unix(sshd:session): session closed for user rootJun 19 15:23:23 localhost sshd[16970]: Accepted password for root from 10.9.11.44 port 48345 ssh2Jun 19 15:23:23 localhost sshd[16970]: pam_unix(sshd:session): session opened for user root by (uid=0)Jun 19 18:59:00 localhost sshd[16970]: pam_unix(sshd:session): session closed for user rootJun 20 09:24:57 localhost sshd[19425]: Accepted password for root from 10.9.11.44 port 5519 ssh2Jun 20 09:24:57 localhost sshd[19425]: pam_unix(sshd:session): session opened for user root by (uid=0)Jun 20 19:14:30 localhost sshd[19425]: pam_unix(sshd:session): session closed for user rootJun 21 08:59:13 localhost sshd[22674]: Accepted password for root from 10.9.11.44 port 4640 ssh2Jun 21 08:59:13 localhost sshd[22674]: pam_unix(sshd:session): session opened for user root by (uid=0)Jun 21 15:23:28 localhost sshd[22674]: subsystem request for sftpJun 21 18:51:04 localhost sshd[22674]: pam_unix(sshd:session): session closed for user root[[email protected] log]# cat secure.3 | awk ‘/Accepted/{a[$(NF-3)]++}END{for(x in a){print x,a[x]}}‘10.9.11.44 6[[email protected] log]#


This article comes from "even if it is wrong, let me make a mistake to death !" Blog, please be sure to keep this source http://hxw168.blog.51cto.com/8718136/1435310

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.