Shell text filtering Programming (a): grep and regular form

Last Update:2014-11-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

"Copyright notice: Reprint please keep Source: Blog.csdn.net/gentleliu. mail:shallnew at 163 dot com "

There are many files in the Linux system, such as configuration files, log files, user files, and so on. The file contains a lot of information, and we can easily output it to the screen using commands such as cat, but it is necessary to have other tools to parse or extract the data from the file. And Linux provides these tools: grep, awk, sed, and so on. The use of these tools can greatly improve your productivity, greatly help the system administrator to analyze the data, and for Linux developers can also be in the development test and the usual use of time saved. These tools are described in this series to enable text filtering analysis.

The frequently used grep options are:
-C outputs only the count of matching rows.
-I does not differentiate between uppercase and lowercase (applies to single-character only).
-H does not display the file name when querying multiple files.
-L Only output file names that include matching characters when querying multiple files.
-N Displays matching lines and line numbers.
-S does not display error messages that do not exist or have no matching text.
-V Displays all lines that do not include matching text.
This section of the article large number Demo sample uses file/etc/passwd as the filter object.
First, matching rows
The simplest (and most frequently used) way to find a string in a file (or multiple files) is to look for a line in the file/etc/passwd that includes the string "user":

# grep "User"/etc/passwdusbmuxd:x:113:113:usbmuxd user:/:/sbin/nologinoprofile:x:16:16:special user account to be used by Oprofile:/var/lib/oprofile:/sbin/nologinqemu:x:107:107:qemu User:/:/sbin/nologinradvd:x:75:75:radvd user:/:/ Sbin/nologintss:x:59:59:account used by the trousers package to sandbox the TCSD Daemon:/dev/null:/sbin/nologinsaslauth: x:994:76: "SASLAUTHD user":/run/saslauthd:/sbin/nologinrpcuser:x:29:29:rpc Service user:/var/lib/nfs:/sbin/ Nologinnm-openconnect:x:992:991:networkmanager User for openconnect:/:/sbin/nologin#

In general, the string we are going to look for is included with a double-argument, one to prevent it from being misunderstood as the S-H-l command, and the second to be able to find multiple words in a string.
Output the number of matching rows using the-C option:

# grep-c "User"/etc/passwd8#

Output matching lines and line numbers using the-N option:

# grep-n "User"/etc/passwd18:usbmuxd:x:113:113:usbmuxd user:/:/sbin/nologin19:oprofile:x:16:16:special user account To is used by Oprofile:/var/lib/oprofile:/sbin/nologin22:qemu:x:107:107:qemu user:/:/sbin/nologin25:radvd:x:75:75: Radvd User:/:/sbin/nologin27:tss:x:59:59:account used by the trousers package to sandbox the TCSD daemon:/dev/null:/sbin/ nologin30:saslauth:x:994:76: "SASLAUTHD user":/run/saslauthd:/sbin/nologin34:rpcuser:x:29:29:rpc Service User:/var /lib/nfs:/sbin/nologin38:nm-openconnect:x:992:991:networkmanager User for openconnect:/:/sbin/nologin#

The first column outputs the line number, followed by the output line content.
We find that the rows that match "user" above are returned with rows that include "Rpcuser" and "trousers", assuming that we want to extract the "user" line precisely, which can be used:

# grep-n "\<user\>"/etc/passwd18:usbmuxd:x:113:113:usbmuxd user:/:/sbin/nologin19:oprofile:x:16:16:special User account to being used by Oprofile:/var/lib/oprofile:/sbin/nologin22:qemu:x:107:107:qemu User:/:/sbin/nologin25: Radvd:x:75:75:radvd user:/:/sbin/nologin30:saslauth:x:994:76: "SASLAUTHD user":/run/saslauthd:/sbin/nologin38: Nm-openconnect:x:992:991:networkmanager User for openconnect:/:/sbin/nologin#

27 and 34 fewer lines in a flash.
If you want to ignore uppercase and lowercase, you can use the-I option, such as the following:

A few more lines in an instant.
Use option-V to filter lines that do not contain a specified string:

# grep-v "A"/etc/passwd  bin:x:1:1:bin:/bin:/sbin/nologinsync:x:5:0:sync:/sbin:/bin/syncshutdown:x:6:0: Shutdown:/sbin:/sbin/shutdownnobody:x:99:99:nobody:/:/sbin/nologinpolkitd:x:999:999:user for polkitd:/:/sbin/ Nologinusbmuxd:x:113:113:usbmuxd User:/:/sbin/nologinunbound:x:997:997:unbound DNS resolver:/etc/unbound:/sbin/ Nologinqemu:x:107:107:qemu user:/:/sbin/nologinopenvpn:x:996:995:openvpn:/etc/openvpn:/sbin/nologintcpdump:x : 72:72::/:/sbin/nologin#

The rows listed above do not include "a".
Use the PS command to find out if the current system is executing the program we need, such as the following:

# PS X | grep vsftpd1020?      Ss    0:00/usr/sbin/vsftpd/etc/vsftpd/vsftpd.conf10264 pts/1    s+    0:00 grep--color=auto vsftpd#

However, we found that the print results included the current shell process, and we were able to filter the output by using the-v option, such as the following:

# PS X | grep "VSFTPD" | Grep-v "grep" 1020?      Ss    0:00/usr/sbin/vsftpd/etc/vsftpd/vsftpd.conf#

This method is quite often used.
Second, the use of a combination of the form
In fact, grep filtering can be combined with the regular form to add some rules to the match, which is more flexible to use. It is best to use a single argument when using a regular table, which prevents the proprietary patterns used in G r e p from being confused with the special ways of some shell commands.
Here, let's talk about the normal form, here is the normal table-type primitive character set and its meaning:
^ Just match the beginning of the line
$ just matches the end of the line
* Just a single character immediately after *, matching 0 or more of this one character
[] only matches [] inside characters. Can be a single character, or it can be a sequence of characters. Ability to use-to represent [] the range of characters within a sequence, such as [1-5] instead of [1 2 3 4 5]
\ is used only to mask the special meaning of a meta-character. Since sometimes some metacharacters in S H e l l have
Special meaning. \ can make it lose its proper meaning
. Match Random single character only
p a t t e r n \ {n \} is used only to match the number of occurrences of the preceding p a t T e R N. n is the number of times
p a t t e R n \ {n,\} M only has the same meaning, but the least number of times is n
p a t t e R n \ {n,m \} only has the same meaning, but p a t T e R n appears in the number of times between N and M.
Period "." Ability to match random single characters. “． "Agree to match a S C i I concentration at random characters, or as letters, or as numbers."
Example:

# grep '. mm.. '/etc/passwdsmmsp:x:51:51::/var/spool/mqueue:/sbin/nologin## grep ' m. L '/etc/passwdmail:x:8:12:mail:/var/spool/mail:/sbin/nologinmailnull:x:47:47::/var/spool/mqueue:/sbin/nologin#

To match a string or a sequence of characters at the beginning of a line, ^ simply agrees to match a character or word at the beginning of a row.
For example, match a line that starts with a MA:

# grep ' ^ma '/etc/passwd   mail:x:8:12:mail:/var/spool/mail:/sbin/nologinmailnull:x:47:47::/var/spool/mqueue:/ sbin/nologin#

The normal form is capable of mixing various modes:

# grep ' ^ma....ll '/etc/passwdmailnull:x:47:47::/var/spool/mqueue:/sbin/nologin#

^ is used frequently in the regular form, due to the large number of decimation operations usually at the beginning of the line.
Matches a string or character at the end of a line, and $ matches a string or character at the end of a line, and the $ symbol is placed after the matching word. For example, match the line ending with bash:

# grep ' bash$ '/etc/passwdroot:x:0:0:root:/root:/bin/bashallen:x:1000:1000:allen:/home/allen:/bin/bashaln:x : 1001:1001::/home/aln:/bin/bash#

Assuming that you want to match all empty rows, run the following operation:

^ $

Detailed analysis to match the beginning of the line, but also match the end of the row, no matter what mode, so the blank line.
Suppose you just return a line that includes a single character, such as the following:

^ . $

Unlike blank lines, there is a pattern between the beginning and the end of the line that represents a random character.
Use the * match single character in a string or its repeated sequence, and use this special character to match random characters or strings of repeated expressions.
Use [] to match a range or collection, the ability to separate the different strings to be matched in parentheses with a comma, and "-" to denote a range of strings, indicating that the string range starts at the "-" left character and ends with the "-" right character.
For example, to match a row that includes AlN or all, you can write as follows:

# grep ' al[l,n] '/etc/passwdallen:x:1000:1000:allen:/home/allen:/bin/bashaln:x:1001:1001::/home/aln:/bin/bash#

Ignoring uppercase and lowercase is also possible by the second way:

# grep ' system '/etc/passwddbus:x:81:81:system message Bus:/:/sbin/nologinpulse:x:995:994:pulseaudio system Daemon:/ Var/run/pulse:/sbin/nologin[[email protected] shell_text_filter]# grep ' [Ss]ystem '/etc/passwddbus:x:81:81:system Message bus:/:/sbin/nologinsystemd-journal-gateway:x:191:191:journal gateway:/var/log/journal:/usr/sbin/ Nologinpulse:x:995:994:pulseaudio System daemon:/var/run/pulse:/sbin/nologin#

Use the \{\} to match the number of occurrences of the pattern result, using * to match all matching results arbitrarily, but assuming that you want to specify only the number of times, you should use \ {\}, this pattern has three forms, namely:
pattern\{n\} match pattern appears n times.
Pattern\{n,\} The matching pattern appears at least n times.
PATTERN\{N,M} The matching pattern occurs between N and M times, N, M is 0-2 5 5 in random integers.
The filter character m appears at least 2 times on the line:

# grep ' m\{2,\} '/etc/passwdsmmsp:x:51:51::/var/spool/mqueue:/sbin/nologin#

Filter 9 appears 2 times and ends with 4 lines:

# grep ' 9\{2,\}4 '/etc/passwdpulse:x:995:994:pulseaudio System daemon:/var/run/pulse:/sbin/nologinsaslauth:x:994:76 : "SASLAUTHD user":/run/saslauthd:/sbin/nologin#

The grep command adds the-e parameter, which the extension agrees to use for extended pattern matching. For example, to get a line that includes an Allen or ALN, this can be used.

# grep-e ' Allen|aln '/etc/passwd  allen:x:1000:1000:allen:/home/allen:/bin/bashaln:x:1001:1001::/home/aln:/bin /bash#

Shell text filtering Programming (a): grep and regular form

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Shell text filtering Programming (a): grep and regular form

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Shell text filtering Programming (a): grep and regular form

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support