Analysis of grep and regular expression of Linux system!

Source: Internet
Author: User
Tags glob egrep

The grep command can search for precise or ambiguous matching information either from text or from a stream of characters sent through a pipe or from a "-" symbol, and the grep family has three commands: Grep,egrep,fgrep.

GREP supports regular expression matching lookups

Egrep supports extended regular expression matching lookups

Fgrep does not support regular expression matching lookups

The format of the grep command:

grep [OPT] ' PATTERN ' FILE

Pattern is what you need to match

Exact match the meaning of single double quotes is the same

If pattern is an exact match (that is, only ordinary characters do not contain metacharacters) and the middle of the string contains no spaces, the quotation marks can be omitted

If pattern is an exact match but the middle of the string contains spaces, you must use quotation marks

The use condition of single double quotation mark in fuzzy match

You must also use quotation marks if pattern is a fuzzy match (that is, contains metacharacters):

If single quotation marks are used, all characters in pattern cannot be escaped

If you use a double-quote pattern, only $ "\ These three characters can be transferred

Opt is a variety of options supported by grep

Options for common information

-V Show grep version information


Options for matching principles

-E indicates that grep will support extended regular expressions and no longer supports regular regular expressions, grep-e = = Egrep

Options for matching controls

-e PATTERN supports matching multiple strings, each string with the-e option to differentiate between multiple strings when matching is a or relationship

The-e option can also be used to match "-" in a string, usually in the pattern of grep, where-,grep treats it as an option rather than as a string

Example: Grep-e "STRING1"-E "STRING2" ...-e "stringn" FILE

-F file can write multiple strings that need to be matched to a single file, one row for each match, and file instead of pattern for multiple or relational string matches

Example: Grep-f file_pattern FILE

-I ignores case in pattern

-V displays rows that do not match the pattern

-W matches according to the pattern content:

Exact words, both sides of the word must be non-character symbols (i.e. cannot be alphanumeric or underlined)

Anchor the first word or the end of a word

-X exactly matches the entire line (including the blank spaces that are not visible at the beginning of the line)

Options for common information

-C Displays the number of matched rows (does not output matching content)

--color= output Color-matched content (using pre-defined colors in the environment variable grep_colors)

--color={auto|always|never} not very clear, pending further additions

-L only output filenames that do not contain matching content when you look up matches in multiple files

-L only output filenames that contain matches when I find matches in multiple files

-M num when the number of rows matching content reaches NUM lines, grep stops the search and outputs the matches that were searched before the stop

-O only outputs matching specific strings, and other content in the matching row is not output

-Q Quiet mode, will not have any output content, find matches will return 0, no match found to return to 0

-S does not output any error messages that appear during the lookup process

The-Q and-s options are not recommended because they are compatible with grep for other systems, but use redirection to achieve the same effect

Output Information prefix control

-B Appends the previous offset (the number of bytes from the first character of the file to the match) when each matching row (or matched string) is output

-H is output with a file name before each matching row (for finding a single file), and the file name is output by default when locating multiple files

-h does not output file names before matching content, no matter how many files are found

-N Outputs matching content while outputting its line number

-t adds tab between matching information and additional information before it to make the format neat

-Z causes the file name to no longer follow other characters if the output contains a file name (such as no longer following the default colon)

-A num output matching line plus the NUM line below it

-B num output matching line plus num line above it

-C num output matching line plus its top and bottom NUM lines

Selection control of files and directories

-A use a binary file as a text file to process

--binary-files=type not very clear, waiting to be added later

Without-match equivalent to-i option

Text

-D action if input is a device,fifo or socket, you need to use action to handle it, the default action is read, and the other is skip is not very clear, pending further additions

-D action If input is a directory, it is not clear that the action will be used to process it , and it needs to be supplemented later

ACTION:

Read treats the directory as a normal file for reading

Skip Skipping

Recurse recursively reads files under all subdirectories in this directory, equivalent to the-r option

--exclude=glob searches for the contents of files whose file names and GLOB wildcards match to find matches

How to use:

Grep-h--exclude=c* "old"./*

C* is a wildcard for wildcard filenames

./* Specifies the range of files that need to be preceded by a file name, must be given a *, otherwise it will not match the content, (if not given *, with the-R option can also match)

--include=glob searches for the contents of files other than files whose file names and GLOB wildcards match to find matches

--exclude-from=file writing a wildcard scheme in a file, grep will not go to the file that matches the filename in the scheme to find the matching content

--exclude-dir=dir matches a lot of content in a directory and also allows some subdirectories to not accept matches, use this option

Grep-h-R--exclude-dir=2* "old"./*

-I.

-r-r Recursive Lookup

Other options

--mmap Enable MMAP system call instead of read system call

-U default grep reads the Windows system files to guess the file type by reading the front 32K content of the file, and if the decision is a text file, the CR symbol in the text is removed

The-u option denies this type of guessing, but instead makes the file content match one by one to determine file types

The format of Egrep and Fgrep is the same, except that the regular expressions supported by the pattern section are different


Regular expressions

Become regular expression, which is a combination of a series of regular character and metacharacters characters to achieve matching

Metacharacters: does not represent the meaning of the character itself, but is used for additional functional descriptions

. Match any single character

[] matches a single character in any range specified

Example: [ABC] represents a single character of A or a single B or a single C

[^] matches a single character specified outside any range

Example: [^ABC] Represents a single character that is not a and not B and not C

* matches a single character 0 or more times before it

* The default is greedy mode, the longer the match the better

Example: Eraxbuicdbdir This string I want to match a.*b, the result is axbuicdb instead of AXB, this is greedy mode

? Similar to *, but matches its previous single character 0 or 1 times

\{, \} Exactly matches the number of previous individual characters (\ is the function that prevents escaping)

\{m,n\} matches at least m times, up to N times
\{m,\} matches at least m times
\{0,n\} matches up to n times
\{m\} matches M-Times

Anchor symbol

\< (\b) Anchor word header \<word

\> Anchor Word Tail word\>

^ Anchor Line ^ to be placed at the front of pattern

$ anchor Line end $ to be placed in the last face of pattern

Grouping and referencing

\ (\) to refer to the STRING in \ (string\) as a whole

\ (string\) ... \1 \1 represents the first time a STRING is referenced, and a match can have multiple groupings \num to refer to the NUM group

\ This is a character that prevents the escaping of any metacharacters, and the single character after \ can only be represented as its character surface meaning

Cases:

ab\* This string is ab*,* no longer represents a match any time before its preceding character B

Extending regular Expressions

It supports a slightly different metacharacters and standard regular expression

+ Match + one character before 1 or more times
| str1| str2| STR3, supports the matching of a single string with a string as a whole
() group (AB) *

The referenced method is the same as the standard regular expression
() + represents multiple repeating groupings
{} match exact number of times, same as standard regular expression syntax

Need to note:? + | () {} is not justified in an extended expression, so you do not need to precede it with \, or you will get an error.

Regular expression more detailed content, left to follow up!!
































of which is to be matched. (-f is specified by POSIX.)

This article is from the "Wesley Linux First" blog, so be sure to keep this source http://rick008.blog.51cto.com/3349394/1410588

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.