Grep command Chinese manual (info grep translation), grep Chinese manual

Source: Internet
Author: User
Tags character classes glob perl regular expression expression engine egrep

Grep command Chinese manual (info grep translation), grep Chinese manual

1. This article is the info man translation, which is basically translated according to the original text. There are several unavailable options that are not translated, but the original text is given for the integrity of the article.
2. Some "(note:") in the Translation. You can add it for yourself to help you understand and explain it. It is not the original content!

Directory:

1 Overview
2 call the grep Program

2.1 command line options

2.1.1 General options
2.1.2 control matching mode
2.1.3 control output content
2.1.4 control the prefix of the output row
2.1.5 control the context of the output row
2.1.6 filter files and directories
2.1.7 other options

2.2 Exit status code
2.3 various grep programs

3. Regular Expression

3.1 Basic Structure
3.2 character class and brackets
3.3 backslash characters and special expressions
3.4 anchoring
3.5 backward reference and subexpression
3.6 comparison between basic and extended Regular Expressions

4. Example
5 known bugs

1 Introduction

'grep'Searches for rows that match the given pattern list in a given file. When a row can match, (default) the row will be copied to the standard output, or the output of other sequences will be generated based on the options you specify.

Although 'grep' is expected to match in the text line, it is not limited even if the size of an input line exceeds the available memory space, it can still match any string in a row. If the last byte of the input file is not a line break, 'grep' will automatically add one. Because the line breaks are also the delimiters of the pattern list, there is no way to match the line breaks in the text.

2 Invoking 'grep'

'grep'The common syntax format of the command line is:

grep OPTIONS PATTERN INPUT_FILE_NAMES

You can specify 0 or more OPTIONS. The specified PATTERN is visible only when "-e PATTERN" or "-f FILE" is not used. You can specify 0 or multiple INPUT_FILE_NAMES.

2.1 Command-line Options (Command line Options)

'grep'There are a large number of options available: some are in POSIX.2 and some are GNU extensions. Long Options are all GNU extension options, even if they come from POSIX. Short options specified by POSIX are explicitly labeled as convenient for POSIX portability programming. There are a few options to be compatible with older grep versions. There are several additional options to control which variant 'grep' match engine is used (Note: fgrep/grep/egrep ).

2.1.1 Generic Program Information

'--help'
Output a short grep command line to help and exit.

'-V'
'--version'
Output The 'grep' version number.

2.1.2 Matching Control (Control Matching mode)

'-e PATTERN'
'--regexp=PATTERN'
Specify the PATTERN here as the pattern to be matched. This option can be specified multiple times. It can protect the pattern starting. ('-E' is the option specified by POSIX .)

'-f FILE'
'--file=FILE'
Obtain the pattern list from the FILE. Each line contains a pattern. An empty FILE indicates that no pattern is specified, so no content is matched. ('-F' is the option specified by POSIX .)

'-i'
'-y'
'--ignore-case'
Ignore the case sensitivity in PATTERN and the Case sensitivity in the input file. '-Y' is an obsolete option to maintain compatibility with earlier versions. ('-I' is the option specified by POSIX .)

'-v'
'--invert-match'
Returns the matched result, that is, the row that does not match is selected. ('-V' is the option specified by POSIX .)

'-w'
'--word-regexp'
Select only the rows that match the entire word accurately. A word may consist of letters, numbers, and underscores. All these characters are the word boundary separators used to filter words. (Note: for example, the string "fstab (5)", grep-w 'fstab' or grep-W' fsta. 'can match the two words, but grep-W' fsta' cannot match any one)

'-x'
'--line-regexp'
Select only the rows that match the entire row. ('-X' is the option specified by POSIX .)
(Note: for example, if a row "abcde" and grep-x 'abc' cannot match this row, grep-x 'abcd. 'can match this row)

2.1.3 General Output Control (Control Output content)

'-c'
'--count'
No longer outputs matched content, but outputs the number of matched rows. If the "-v" option is specified, the number of unmatched rows is output. ('-C' is the option specified by POSIX .)

'--color[=WHEN]'
'--colour[=WHEN]'
Color the matched content and output it. The valid values of WHEN include: 'Never ', 'Always', and 'auto '.

'-L'
'--files-without-match'
No matching content is output, but the file name cannot be matched. When a row in a file is matched, the file will not be searched down.
(Note: It is opposite to the file name output by "-l)

'-l'
'--files-with-matches'
No match is output, but the file name that can be matched is output. When a row in a file is matched, the file will not be searched down. ('-L' is the option specified by POSIX .)

'-m NUM'
'--max-count=NUM'
When the row that matches successfully has NUM, stop reading the file. If a common file is used as the standard input, the matching NUM row is output. Grep will mark the position after the last matching row, so that another process called can be restored and further searched. For example, the following shell script:

while grep -m 1 PATTERNdoecho xxxxdone < FILE

The following shell script runs in a different way than the above script, because the pipeline is used here, which is not an object file:

cat FILE |while grep -m 1 PATTERNdoecho xxxxdone

(Note: If you have any questions about these two scripts, refer to the traps in the while loop. This article explains the difference between pipeline and file direct redirection)

'-o'
'--only-matching'
Output the matched string instead of the entire line. Each matched string uses a separate row output.

'-q'
'--quiet'
'--silent'
In silent mode, exit immediately, even if an error occurs. NO content is written to the standard output. If the content matches, the exit status code is 0. ('-Q' is the option specified by POSIX .)

'-s'
'--no-messages'
Disable output of error messages because the file does not exist or the file has no read permission. ('-S' is the option specified by POSIX .)

(Note: Due to the differences between POSIX and GNU grep, "-q" and "-s" should be avoided in portable scripts ", instead, use the redirection method to redirect to/dev/null)

2.1.4 Output Line Prefix Control (Control the Prefix of Output rows)

When output rows have prefixes to be output, their order is always: file name, row number, and byte offset. This order will not change because of the prefix control options.

'-b'
'--byte-offset'
Print the 0-based byte offset within the input file before each line of output. if '-O' (' -- only-matching ') is specified, print the offset of the matching part itself. when 'grep' runs on MS-DOS or MS-Windows, the printed byte offsets depend on whether the '-U' (' -- unix-byte-offsets ') option is used; see below.

'-H'
'--with-filename'
Output the file name that matches the file where the content is located. This is the default value when multiple input files are specified.

'-h'
'--no-filename'
Disable output file name. This is the default value when there is only one input file.

'--label=LABEL'
Display input actually coming from standard input as input coming from file LABEL. This is especially useful when implementing tools like 'zgrep'; e.g .:

gzip -cd foo.gz | grep --label=foo -H something

'-n'
'--line-number'
The row number of the output Matching content in the file. Each file is counted from 1. ('-N' is the option specified by POSIX .)

'-T'
'--initial-tab'
Make sure that the first character of actual line content lies on a tab stop, so that the alignment of tabs looks normal. this is useful with options that prefix their output to the actual content: '-H','-n', and '-B '. in order to improve the probability that lines from a single file will all start at the same column, this also causes the line number and byte offset (if present) to be printed in a minimum-size field width.

'-u'
'--unix-byte-offsets'
Report Unix-style byte offsets. this option causes 'grep' to report byte offsets as if the file were a Unix-style text file, I. e ., the byte offsets ignore the 'cr 'characters that were stripped. this will produce results identical to running 'grep' on a Unix machine. this option has no effect unless the '-B' option is also used; it has no effect on platforms other than MS-DOS and MS-Windows.

'-Z'
'--null'
When the output file name is put with "\ 0", this will replace the original characters, such as line breaks or colons. For example, each file output by "grep-lZ" is in the same line rather than a branch. "grep-HZ" makes the file name without a colon.

2.1.5 Context Line Control (Control the Context of the output Line)

Regardless of the options below, grep does not output the same row multiple times. If the "-o" option is specified, these options are invalid and a warning is given.

'-A NUM'
'--after-context=NUM'
In addition to the matched rows, the latter NUM rows matching the content are also output.

'-B NUM'
'--before-context=NUM'
In addition to the matched rows, the former NUM rows matching the content are also output.

'-C NUM'
'-NUM'
'--context=NUM'
In addition to the matched rows, the former NUM row and the latter NUM row are also output.

'--group-separator=STRING'
If '-A','-B 'or'-C' is used, use STRING to replace the default group separator.

(Note: The group separator indicates the context of the matched content. For example, "-A 2", when A row matches, the last two rows are also output, which is A group. If the next match is successful and the row matches after the group, the two groups are separated by "--" by default)

'--no-group-separator'
When '-A','-B 'or'-C' is used, different groups of adjacent output are not output.

2.1.6 File and Directory Selection (select files and directories)

'-a'
'--text'
Process a binary file as if it were text; this is equivalent to the '-- binary-files = text' option.

'--binary-files=TYPE'
If the first few bytes of a file indicate that the file contains binary data, assume that the file is of type TYPE. by default, TYPE is 'binary ', and 'grep' normally outputs either a one-line message saying that a binary file matches, or no message if there is no match. if TYPE is 'without-Match', 'grep' assumes that a binary file does not match; this is equivalent to the '-I' option. if TYPE is 'text', 'grep' processes a binary file as if it were text; this is equivalent to the '-a' option.Warning:'-- Binary-files = text' might output binary garbage, which can have nasty side effects if the output is a terminal and if the terminal driver interprets some of it as commands.

'-D ACTION'
'--devices=ACTION'
If an input file is a device, FIFO, or socket, use ACTION to process it. by default, ACTION is 'read', which means that devices are read just as if they were ordinary files. if ACTION is 'skip', devices, operating OS, and sockets are silently skipped.

'-d ACTION'
'--directories=ACTION'
If an input file is a directory, use ACTION to process it. by default, ACTION is 'read', which means that directories are read just as if they were ordinary files (some operating systems and file systems disallow this, and will cause 'grep' to print error messages for every directory or silently skip them ). if ACTION is 'skip', directories are silently skipped. if ACTION is 'recurse', 'grep' reads all files under each directory, recursively; this is equivalent to the '-R' option.

'--exclude=GLOB'
Ignore the files that basename can be matched by GLOB. GLOB wildcards include :"*","? "And" [...] ".

'--exclude-from=FILE'
Read exclude exclusion rules from FILE.

'--exclude-dir=DIR'
Filter out Directories Without recursive search and use DIR for matching.

'-I'
Process a binary file as if it did not contain matching data; this is equivalent to the '-- binary-files = without-Match' option.

'--include=GLOB'
Only search for files whose basename can be matched by GLOB.

'-r'
'-R'
'--recursive'
Recursively go to the given directory in the command line and search for each file and directory.

2.1.7 Other Options (Other Options)

'--line-buffered'
Use line buffering on output. This can cause a performance penalty.

'--mmap'
This option is ignored for backwards compatibility. it used to read input with the 'mmap' system call, instead of the default 'read' system call. on modern systems, '-- mmap' rarely if ever yields better performance.

'-U'
'--binary'
Treat the file (s) as binary. by default, under MS-DOS and MS-Windows, 'grep' guesses the file type by looking at the contents of the first 32kB read from the file. if 'grep' decides the file is a text file, it strips the 'cr 'characters from the original file contents (to make regular expressions with '^' and '$' work correctly ). specifying '-U' overrules this guesswork, causing all files to be read and passed to the matching mechanic verbatim; if the file is a text file with 'cr/LF 'pairs at the end of each line, this will cause some regular expressions to fail. this option has no effect on platforms other than MS-DOS andMS-Windows.

'-z'
'--null-data'
Use "\ 0" as the separator for the input line, instead of using line breaks to separate the two lines.
(Note: This provides a simple cross-row matching capability for grep. See the following example 14 .)

2.2 Exit Status (Exit Status code)

Normally, if the content matches, the exit status code is 0; otherwise, it is 1. However, if an error occurs, the exit status code is 2 unless the "-s" or "-q" option is used.

2.3 'grep' Programs (various grep Programs)

There are four grep programs that support different search engines. You can use the following four options to select which grep program to use.

'-G'
'--basic-regexp'The basic Regular Expression Engine is used to parse PATTERN. Therefore, only the basic regular expression (BRE) is supported ). This is the default grep program.

'-E'
'--extended-regexp'
The extended Regular Expression Engine is used to parse PATTERN. Therefore, the extended regular expression (ERE) is supported ). ('-E' is the option specified by POSIX .)

'-F'
'--fixed-strings'
Regular expressions are not recognized, but PATTERN is parsed using the literal meaning of characters. Therefore, exact matching of fixed strings is supported. ('-F' is the option specified by POSIX .)

'-P'
'--perl-regexp'
The perl Regular Expression Engine is used to parse PATTERN. Therefore, Perl regular expressions are supported. However, the program is in the research and testing phase, so a warning is given.

In addition, "grep-E" and "grep-F" can be abbreviated as egrep and fgrep respectively. However, these two short-term programs are traditional and have been abandoned. Although they are still supported, they are only used to be compatible with older versions of programs.
(Note: There are also zgrep and pgrep, but they are not grep family programs, zgrep is provided by gzip, and pgrep is used to view the ing relationship between process name and pid)

3 Regular Expressions (Regular expression)

A regular expression is an expression used to describe a string set. Regular Expressions are similar to arithmetic expressions, and various operators are used to combine short expressions. Grep can understand three different versions of Regular Expressions: Basic Regular Expressions BRE, extended regular expressions ERE, and Perl regular expressions. The following describes the content of the extended regular expression. The differences between BRE and ERE will be compared later. Perl regular expressions provide more complete functions and better performance. You can obtain detailed information from pcresyntax (3) and pcrepattern (3), but some operating systems may not.

3.1 Fundamental Structure (basic Structure)

The basic structure block is a regular expression that matches a single character. Most characters, including letters and numbers, can match themselves. For example, the regular expression "a" can match the letter. All metacharacters have special meanings and must be escaped using backslashes.

Regular Expressions can be used to represent the number of repetitions in the following ways.

'.'
Point "." can match any single character.

'?'
It can match the previous entry 0 or once. For example, "ca? B "can match" cb "or" cab ", but cannot match" caab ". If a group is used, such as "c (ca )? B "can match" cb "or" ccab ".

'*'
Match the previous entry 0 or any number of times.

'+'
Match the preceding entries once or multiple times.

'{N}'
Match the previous entries exactly n times.

'{N,}'
Match the previous entries N times or more times. That is, match at least N times.

'{,M}'
Match the previous entries up to M times. Match 0 to M times.

'{N,M}'
Match the previous entries N to M times.

Two regular expressions can be connected in series. The matching result after the concatenation is the matching result of the two regular expressions. For example, the regular expression "AB" is the regular expression that concatenates "a" and "B.

The two regular expressions can also be connected using the vertical line symbol "|", which means either of them can match any of the two regular expressions at the same time. For example, strings "acx", "bx", and "accb" can be matched by the regular expression "ac | B", where "accb" is matched at the same time.

The priority of the repeat is higher than that of the two. Select "|" and use parentheses to change the priority rule.

3.2 Character Classes and Bracket Expressions (Character class and brackets expression)

The Regular Expression in parentheses is a list of characters enclosed by "[" and. It can match any single character in the list. If the first character in the list is "^", it indicates that it does not match any single character in the list. For example, '[0123456789]' can match any number.

You can use the hyphen "-" To concatenate two characters to indicate the "range ". For example, "[a-d]" under the C character set is equivalent to "[abcd]". Most character set rules are the same as Dictionary sorting rules, which means that "[a-d]" is not equivalent to "[abcd]", but is equivalent to "[aBbCcDd]". You can set the value of the environment variable "LC_ALL" to C so that the sorting rules of the C character set are adopted.

Finally, we predefine several character classes with specific names, all of which are enclosed by brackets. As follows:

'[:alnum:]'
Match uppercase/lowercase letters and numbers. It is equivalent to the sum of character classes '[: alpha:]' and character classes '[: digit.

'[:alpha:]'
Letter character class. Match uppercase and lowercase letters. It is equivalent to the sum of character classes '[: lower:]' and character classes '[: upper.

'[:blank:]'
White space character class. Including space and tabs.

'[:cntrl:]'
Control character class. In ASCII, The octal code of these characters ranges from 000 to 037, and also includes 177 (DEL ).

'[:digit:]'
Numeric Character class. Including: '0 1 2 3 4 5 6 7 8 9 '.

'[:graph:]'
Plotting class. Including uppercase/lowercase letters, numbers, and punctuation marks. Equivalent

'[:lower:]'
Lowercase letters. Including 'a B c d e f g h I j k l m n o p q r s t u v w x y Z '.

'[:print:]'
Print character classes. Including uppercase/lowercase letters, numbers, punctuation marks, and spaces. It is equivalent to the character class '[: alnum:]' and the character class '[: punct:]' and space's and.

'[:punct:]'
Punctuation. Including :'! "# $ % & '() * +,-./:; <=>? @ [\] ^ _ '{| }~ '.

'[:space:]'
Space character class. Including space, tabs, vertical tabs, line breaks, carriage returns, and page breaks.

'[:upper:]'
Uppercase letters. Including 'a B c d e f g h I J K L M N O P Q R S T U V W X Y Z '.

'[:xdigit:]'
Hexadecimal class. Including: '0 1 2 3 4 5 6 7 8 9 a B c d e f a B c d e F '.

For example, "[[: alnum:]" indicates "[0-9A-Za-z]", "[^ [: digit:]" indicates [^ 0123456789], "[ABC [: digit:] "represents" [ABC0-9] ". Note that character classes must be included in extra brackets.

Most metacharacters in braces lose their special meanings and become common literal characters.

']'
The end of the brackets. To match the literal character, you must put it at the beginning of the Character List. That is, "[]...]".

'[.'
This symbol indicates the start of the sort symbol.
(Note: Sorting classes must be pre-defined in character sets before they can be used. For example, [. AB.] indicates that "AB" is used as the overall match and does not match a or B. However, by default, the "AB" sorting is not defined in the character set, so it cannot be used)

'.]'
Indicates the end of the sort symbol.

'[='
Indicates the beginning of the equivalence class.
(Note: for example, [= e =] indicates that the same letter of the first and third voices of the letter e is treated as the same character .)

'=]'
End of the equivalence class.

'[:'
The start of the character class.

':]'
The end of the character class.

'-'
This character is a range connector, so to match the literal meaning of this symbol, you need to put it at the beginning or end of the list or as the end character of the range.

'^'
This character indicates that it is not in the list. To match the literal meaning of a character, it must not be placed in the first character of the list.

3.3 The Backslash Character and Special Expressions (Backslash characters and Special Expressions)

The Backslash "\" is used to indicate special meanings, as follows:

'\b'
Matches null characters at the word boundary. (Note: The words in grep are composed of numbers, letters, and underscores. All other characters are word delimiters .)

'\B'
Opposite to "\ B", it indicates matching null characters with non-word boundary.

'\<'
Matches an empty character at the start of a word.

'\>'
Matches null characters at the end of a word.

(Note: \ bWORD \ B is equivalent to \ <word \>. In addition, the grep option "-w" also indicates matching the word boundary)

'\w'
The character that matches the word composition. Is a synonym for [_ [: alnum.

'\W'
It matches non-word characters and is a synonym for [^ _ [: alnum.

'\s'
Matches blank characters. It is a synonym for [[: space.

'\S'
Matches non-blank characters and is a synonym for [^ [: space.

For example, "\ brat \ B" matches the split "rat", "\ Brat \ B" matches "crate", but does not match "furry rat ".

3.4 Anchoring)

The Escape Character "^" and the dollar sign "$" are the anchor metacharacters that match the empty characters at the beginning and end of the line, respectively.

3.5 Back-references and Subexpressions (backward reference and subexpression)

The reverse reference "\ N" indicates matching the regular expression in the nth parentheses, where N is a single number. For example, "(a) \ 1" indicates "aa ". When you use the two-choice operator "|", if the group does not participate in the matching process, the back reference will fail. For example, "a (.) | B \ 1" cannot match "ba ". If multiple patterns are specified using "-e" or "-f FILE", the backward sequence values of each PATTERN are independent of each other.

(Note: for example, '([ac]) e \ 1 | B ([xyz]) \ 2t' can match either aea or cec, but not cea or aec, it can also match bxxt, byyt, or bzzt. However, if you replace "\ 2" with "\ 1", that is, '([ac]) e \ 1 | B ([xyz]) \ 1t ', B [xyz] at or B [xyz] ct cannot be matched because the first brace is left and cannot be involved in regular search on the right.

(Note: reverse reference is also called Back Reference or Back Reference)

3.6 Basic vs Extended Regular Expressions (Basic Regular Expressions and Extended Regular Expressions)

In the Basic regular expression, the metacharacters '? ',' + ',' {',' | ',' (', And') 'both represent the literal meaning. Instead, the version with the backslash is :'\? ',' + ',' {',' \ | ',' ('And ')'.

4 Usage (example)

The following are examples of GNU grep:

grep -i 'hello.*world' menu.h main.c

This command is used to list all the lines that contain the "hello" string and are followed by the "world" string in menu. h and main. c. There can be any number of characters between hello and world. Note that the "-I" option of the regular expression makes grep case-insensitive, so it can match "Hello, world! ".

The following are some frequently asked questions and answers when using grep.

5 Known Bugs (Known Bugs)

When the number of repetitions specified by "{n, m}" is large, grep consumes a large amount of memory. In addition, the blurrier the regular expression consumes more time and space, and the grep consumes a large amount of memory. The reverse reference function is very slow, so it may consume a lot of time. (Note: recursive search also consumes a huge amount of memory, which can easily cause a memory overflow error and exit early .)

Back to series article outline: http://www.cnblogs.com/f-ck-need-u/p/7048359.html

Reprinted please indicate the source: Success!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.