Shell text filtering programming (1): grep and regular expressions,

Source: Internet
Author: User

Shell text filtering programming (1): grep and regular expressions,

[Copyright statement: reprinted. Please retain the Source: blog.csdn.net/gentleliu. Mail: shallnew at 163 dot com]

Linux has many files, such as configuration files, log files, and user files. The file contains a large amount of information. We can use commands such as cat to easily output the information to the screen. However, if you want to analyze or extract data from the file, other tools are required. Linux provides such tools as grep, awk, and sed. Using these tools can greatly improve your work efficiency and help system administrators analyze data, linux developers can also save a lot of time in development, testing, and normal use. These tools are described in this series to implement text filter analysis.


Common grep options include:
-C only outputs the Count of matched rows.
-I is case insensitive (only applicable to single characters ).
-H: When querying multiple files, the file name is not displayed.
-L only names containing matching characters are output when multiple files are queried.
-N: the matching row and row number are displayed.
-S does not display the error message that does not exist or does not match the text.
-V: displays all rows that do not contain matched text.
In this section, examples of large numbers use the file/etc/passwd to filter objects.
I. Matching rows
The simplest (and most commonly used) method is to find a string in a file (or multiple files, for example, in the file/etc/passwd, find the line containing the string "user:
# grep "user" /etc/passwdusbmuxd:x:113:113:usbmuxd user:/:/sbin/nologinoprofile:x:16:16:Special user account to be used by OProfile:/var/lib/oprofile:/sbin/nologinqemu:x:107:107:qemu user:/:/sbin/nologinradvd:x:75:75:radvd user:/:/sbin/nologintss:x:59:59:Account used by the trousers package to sandbox the tcsd daemon:/dev/null:/sbin/nologinsaslauth:x:994:76:"Saslauthd user":/run/saslauthd:/sbin/nologinrpcuser:x:29:29:RPC Service User:/var/lib/nfs:/sbin/nologinnm-openconnect:x:992:991:NetworkManager user for OpenConnect:/:/sbin/nologin#
Generally, the string to be searched is enclosed in double quotation marks. The first is to avoid misunderstanding as the s h e l command, and the second is to search for strings composed of multiple words.
Use the-c option to output the number of matched rows:
# grep -c "user" /etc/passwd8#
Use the-n option to output matching rows and row numbers:
# grep -n "user" /etc/passwd18:usbmuxd:x:113:113:usbmuxd user:/:/sbin/nologin19:oprofile:x:16:16:Special user account to be used by OProfile:/var/lib/oprofile:/sbin/nologin22:qemu:x:107:107:qemu user:/:/sbin/nologin25:radvd:x:75:75:radvd user:/:/sbin/nologin27:tss:x:59:59:Account used by the trousers package to sandbox the tcsd daemon:/dev/null:/sbin/nologin30:saslauth:x:994:76:"Saslauthd user":/run/saslauthd:/sbin/nologin34:rpcuser:x:29:29:RPC Service User:/var/lib/nfs:/sbin/nologin38:nm-openconnect:x:992:991:NetworkManager user for OpenConnect:/:/sbin/nologin#
The first column outputs the row number, followed by the output row content.
We found that some rows matching "user" above return rows containing "rpcuser" and "trousers". If we want to precisely extract "user" rows, we can use the following:
# grep -n "\<user\>" /etc/passwd18:usbmuxd:x:113:113:usbmuxd user:/:/sbin/nologin19:oprofile:x:16:16:Special user account to be used by OProfile:/var/lib/oprofile:/sbin/nologin22:qemu:x:107:107:qemu user:/:/sbin/nologin25:radvd:x:75:75:radvd user:/:/sbin/nologin30:saslauth:x:994:76:"Saslauthd user":/run/saslauthd:/sbin/nologin38:nm-openconnect:x:992:991:NetworkManager user for OpenConnect:/:/sbin/nologin#
In an instant, 27 and 34 rows are missing.
To ignore the case sensitivity, use the-I option as follows:
[root@localhost shell_text_filter]# grep -ni "\<user\>" /etc/passwd12:ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin17:polkitd:x:999:999:User for polkitd:/:/sbin/nologin18:usbmuxd:x:113:113:usbmuxd user:/:/sbin/nologin19:oprofile:x:16:16:Special user account to be used by OProfile:/var/lib/oprofile:/sbin/nologin20:colord:x:998:998:User for colord:/var/lib/colord:/sbin/nologin22:qemu:x:107:107:qemu user:/:/sbin/nologin25:radvd:x:75:75:radvd user:/:/sbin/nologin30:saslauth:x:994:76:"Saslauthd user":/run/saslauthd:/sbin/nologin34:rpcuser:x:29:29:RPC Service User:/var/lib/nfs:/sbin/nologin35:nfsnobody:x:65534:65534:Anonymous NFS User:/var/lib/nfs:/sbin/nologin38:nm-openconnect:x:992:991:NetworkManager user for OpenConnect:/:/sbin/nologin 
A few more lines in an instant.
You can use option-v to filter rows that do not contain the specified string:
# grep -v "a" /etc/passwd  bin:x:1:1:bin:/bin:/sbin/nologinsync:x:5:0:sync:/sbin:/bin/syncshutdown:x:6:0:shutdown:/sbin:/sbin/shutdownnobody:x:99:99:Nobody:/:/sbin/nologinpolkitd:x:999:999:User for polkitd:/:/sbin/nologinusbmuxd:x:113:113:usbmuxd user:/:/sbin/nologinunbound:x:997:997:Unbound DNS resolver:/etc/unbound:/sbin/nologinqemu:x:107:107:qemu user:/:/sbin/nologinopenvpn:x:996:995:OpenVPN:/etc/openvpn:/sbin/nologintcpdump:x:72:72::/:/sbin/nologin#
The Rows listed above do not contain "".
Use the ps command to check whether the current system is running the program we need. The command is as follows:
# ps x | grep vsftpd1020 ?      Ss    0:00 /usr/sbin/vsftpd /etc/vsftpd/vsftpd.conf10264 pts/1    S+    0:00 grep --color=auto vsftpd#
However, if the printed result contains the current shell process, we can filter the output result using the-v option, as shown below:
# ps x | grep "vsftpd" | grep -v "grep"1020 ?      Ss    0:00 /usr/sbin/vsftpd /etc/vsftpd/vsftpd.conf#
This method is quite common.
Ii. Use a regular expression
In fact, grep filtering can be used together with regular expressions to add matching rules, which is more flexible to use. It is best to enclose the Regular Expression in single quotes to prevent confusion between the proprietary mode used in g r e p and some special methods of shell commands.
Here we will first talk about the regular expression. below is the basic metacharacter Character Set of the Regular Expression and Its Meaning:
^ Match only the beginning of a row
$ Only matches the end of a row
* Only one single character followed by *, matching 0 or more single characters
[] Only matches the characters in. It can be a single character or a character sequence. You can use-to represent the range of character sequences in []. For example, use [1-5] instead of [1 2 3 4 5].
\ Is used only to block the special meaning of a metacharacter. Because sometimes some metacharacters in s h e l contain
Special meaning. \ Can make it meaningless
. Only match any single character
P a t e r n \ {n \} is only used to match the occurrence times of p a t e r n. N is the number of times
P a t e r n \ {n, \} m only means the same as above, but the minimum number of times is n
P a t e r n \ {n, m \} Only means the same as above, but the number of occurrences of p a t e r n is between n and m.
The period "." can match any single character. "." Can match any character in the s c I set, either A letter or A number.
For example:
# grep '.mm..' /etc/passwdsmmsp:x:51:51::/var/spool/mqueue:/sbin/nologin## grep 'm..l' /etc/passwdmail:x:8:12:mail:/var/spool/mail:/sbin/nologinmailnull:x:47:47::/var/spool/mqueue:/sbin/nologin#
Match strings or character sequences at the beginning of a row. ^ only matches characters or words at the beginning of a row.
For example, match rows starting with ma:
# grep '^ma' /etc/passwd   mail:x:8:12:mail:/var/spool/mail:/sbin/nologinmailnull:x:47:47::/var/spool/mqueue:/sbin/nologin#
Regular Expressions can be used in combination with various modes:
# grep '^ma....ll' /etc/passwdmailnull:x:47:47::/var/spool/mqueue:/sbin/nologin#
^ Regular expressions are frequently used because a large number of extraction operations are usually performed at the beginning of the line.
At the end of a row, $ matches a string or character. $ matches a character at the end of a row, and $ matches a word after a match. For example, match rows ending with bash:
# grep 'bash$' /etc/passwdroot:x:0:0:root:/root:/bin/bashallen:x:1000:1000:allen:/home/allen:/bin/bashaln:x:1001:1001::/home/aln:/bin/bash#
To match all empty rows, perform the following operations:
^ $
The specific analysis is to match the first row and the end of the row. There is no mode in the middle, so it is empty.
If only one character row is returned, perform the following operations:
^ . $
Unlike a blank row, there is a mode between the beginning and end of the row, representing any single character.
Use * to match a single character in a string or its recurring series. Use this special character to match any character or string's repeated expressions multiple times.
Use [] to match a range or set. You can use commas to separate different strings to be matched in the ARC. Use "-" to indicate a string range, indicates that the string range starts from the Left character of "-" to the right character.
For example, if you want to match rows that contain a1ns or all, you can write them as follows:
# grep 'al[l,n]' /etc/passwdallen:x:1000:1000:allen:/home/allen:/bin/bashaln:x:1001:1001::/home/aln:/bin/bash#
The case-insensitive method can also be used:
# grep 'System' /etc/passwddbus:x:81:81:System message bus:/:/sbin/nologinpulse:x:995:994:PulseAudio System Daemon:/var/run/pulse:/sbin/nologin[root@localhost shell_text_filter]# grep '[Ss]ystem' /etc/passwddbus:x:81:81:System message bus:/:/sbin/nologinsystemd-journal-gateway:x:191:191:Journal Gateway:/var/log/journal:/usr/sbin/nologinpulse:x:995:994:PulseAudio System Daemon:/var/run/pulse:/sbin/nologin#
The number of occurrences of matching results in the \ {\} match mode. Use * to match all matching results at any time. However, if you specify the number of times, use \{\}, this mode has three forms:
Pattern \ {n \} match mode appears n times.
Pattern \ {n, \} match mode appears at least n times.
Pattern \ {n, m} match mode appears between n to m times, n, m is 0-2 5 any integer in 5.
To filter rows whose character m appears at least twice:
# grep 'm\{2,\}' /etc/passwdsmmsp:x:51:51::/var/spool/mqueue:/sbin/nologin#
Filter the rows that appear twice and end with 4:
# grep '9\{2,\}4' /etc/passwdpulse:x:995:994:PulseAudio System Daemon:/var/run/pulse:/sbin/nologinsaslauth:x:994:76:"Saslauthd user":/run/saslauthd:/sbin/nologin#
The grep command adds the-E parameter, which allows the extension mode to match. For example, you can use this method to obtain rows containing allen or a1ns.
# grep -E 'allen|aln' /etc/passwd  allen:x:1000:1000:allen:/home/allen:/bin/bashaln:x:1001:1001::/home/aln:/bin/bash#




Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.