Basics of Linux (vi)--learning notes-Regular expressions

Source: Internet
Author: User
Tags alphabetic character lowercase posix

The text is primarily a note of study notes when learning the basics of Linux in an experimental building.
If you have copyright questions, please contact: 874870841@qq.com

Mastering basic commands: sed, grep, awk usage
Mastering regular expression Symbols and grammars

Grammar

character Description
\ Marks the next character as a special character, or a literal character. For example, "n" matches the character "n". ' \ n ' matches a newline character. The sequence "\" matches "\" and "(" matches "().
^ Matches the start position of the input string.
$ Matches the end position of the input string.
N n is a non-negative integer. Matches the determined n times. For example, "o{2}" cannot match "O" in "Bob", but can match two o in "food".
{N,} n is a non-negative integer. Match at least n times. For example, "o{2,}" cannot match "O" in "Bob", but can match all o in "Foooood". "O{1,}" is equivalent to "o+". "O{0,}" is equivalent to "o*".
{N,m} M and n are non-negative integers, of which n<=m. Matches n times at least and matches up to M times. For example, "o{1,3}" will match the first three o in "Fooooood". "o{0,1}" is equivalent to "O?". Notice that there is no space between the comma and the two number.
* Matches the preceding subexpression 0 or more times. For example, zo* can match "Z", "Zo", and "zoo". * is equivalent to {0,}.
+ Matches the preceding subexpression one or more times. For example, "zo+" can Match "Zo" and "Zoo", but cannot match "Z". + is equivalent to {1,}.
? Match the preceding subexpression 0 times or once. For example, "Do (es)?" You can match the "do" in "do" or "does".
? When the character is immediately following any other qualifier (*,+,?,{n},{n,},{n,m}), the match pattern is not greedy. Non-greedy patterns match as few strings as possible, while the default greedy pattern matches as many of the searched strings as possible. For example, for the string "Oooo", "o+?" A single "O" will be matched, and "o+" will match all "O".
. Matches any single character except "\ n". To match any character including "\ n", use a pattern like "(. |\n)".
(pattern) Matches the pattern and gets this matching substring. The substring is used for backward references. To match the parentheses character, use "(or").
X|y Match x or Y. For example, "Z|food" can match "Z" or "food". "(z|f) Ood" matches "Zood" or "food".
[XYZ] Character Set combination (character Class). Matches any one of the characters contained. For example, "[ABC]" can Match "a" in "plain". The special characters have only backslash \ Maintain special meaning and are used to escape characters. Other special characters Furus number, plus, all kinds of parentheses, etc. as ordinary characters. The caret is a negative character set if it appears in the first place, and only as a normal character if it appears in the middle of the string. Hyphens-If a character range description appears in the middle of a string, or only as a normal character if it appears in the first place.
[^XYZ] Exclude type (negate) character set combination. Matches any characters that are not listed. For example, "[^ABC]" can match "Plin" in "plain".
[A-z] The range of characters. Matches any character within the specified range. For example, "[A-z]" can match any lowercase alphabetic character in the range "a" through "Z".
[^a-z] The character range of the exclusion type. Matches any character that is not in the specified range. For example, "[^a-z]" can match any character that is not in the range "a" through "Z".

Priority Level
Precedence is from top to bottom, from left to right, and then down:

operator Description
\ Escape character
(), (?:), (?=), [] Parentheses and brackets
*, + 、?、 {n}, {n,}, {n,m} Qualifier
^, $, \ n any meta character Anchor points and sequences
| Choose

grep pattern Matching command
The grep command is used to print matching pattern strings in the output text, using regular expressions as criteria for pattern matching. grep supports three regular expression engines, each specified with three parameters:

Parameters Description
-E POSIX-extended regular expressions, ERE
-G POSIX basic regular expressions, BRE
-P Perl Regular Expressions, PCRE

Before using the grep command to use a regular expression, first describe its common parameters:

Parameters Description
-B To match a binary file as text
-C Count the number of pattern matches
-I. Ignore case
-N Displays the line number of the line that matches the text
-V Reverse-Select the contents of a mismatched line
-R Recursive Match Lookup
-A N n is a positive integer that means after, in addition to listing matching rows, also lists the following n rows
-B-N n is a positive integer that represents the meaning of before and lists the preceding n rows in addition to the matching rows
–color=auto Set the matches in the output to automatic color display

The following include a complete set of special symbols and instructions:

The
Special Symbols Description
[: alnum:] represents English and lowercase letters and numbers, that is, 0-9, A-Z, A-o
[: Alpha:] Represents any English or lowercase writing section, that is, A-Z, a-z
[: blank:] stands for a blank key and a [Tab] key.
[: CN TRL:] represents the control key on the keyboard, which includes CR, LF, Tab, Del. etc.
[:d igit:] represents a number, i.e. 0-9
[: Graph:] all keys except blank bytes (blank and [Tab] keys)
[: Lower:] represents a lowercase section, a-Z
[:p rint:] on behalf of any byte that can be printed
[:p unct:] represents a punctuation mark (punctuation symbol), i.e.:"'?!;:#$...
[: Upper:] represents uppercase bytes, that is, A-Z
[: space:] any bytes that generate whitespace, including blank keys, [Tab], CR, and so on
[: xdigit:] represents the number type of 16 digits, so include: 0-9, a-f, a-f numbers and bytes

Note: Special symbols are used because the above [a-z] does not work in all cases, it is also related to the host's current language, which is set in the lang environment variable value, ZH_CN. UTF-8 words [A-z], that is, all lowercase letters, other languages may be uppercase and lowercase alternating such as, "a a B b...z z", [A-z] may contain uppercase letters. So when using [a-z], make sure that the current language family's influence, using [: Lower:] will not have this problem.

Sed left editor

Sed-i ' 1s/sad/happy/' test # means to replace the "sad" of the first line in the test file with "Happy"
Parameters Description
-N Quiet mode, only the affected rows are printed, and the entire contents of the input data are printed by default
-E Used to add multiple execution commands to a script at once, executing multiple commands on the command line usually does not need to be added to this parameter
-F filename Specifies that the command in the filename file is executed
-R Using an extended regular expression, the default is a standard regular expression
-I. The input file content will be directly modified, rather than printed to the standard output device

SED executes the command format:

[N1] [, N2]command
[N1][~step]command

Some of these commands can be followed by a range of actions, such as:
$ Sed-i ' s/sad/happy/g ' Test # G represents global scope
$ Sed-i ' S/SAD/HAPPY/4 ' Test # 4 represents the fourth matching string in the specified row
Where n1,n2 represents the line number of the input, between them, the comma represents the line from N1 to N2, and if the ~ wave indicates all rows that start with step from N1; command is the action, and here are some common action directives:

Command Description
S In-line substitution
C Whole line substitution
A Inserts to the back of the specified line
I Inserts to the front of the specified line
P Prints the specified line, usually in conjunction with the-n parameter
D Delete specified line

A concise course of SED
SED Complete Handbook

awk Text Processing language
Although very powerful, feel temporarily not used, first pay a common built-in variable table bar

Variable name Description
FILENAME The current input file name, if there are multiple files, only the first one. If input is from standard input, an empty string
$ The contents of the current record
$N n Represents the field number, and the maximum is the value of the NF variable
Fs The field delimiter, represented by a regular expression, and the default is "" Space
Rs Enter the record delimiter by default "\ n", which is one row for a record
Nf Number of current record fields
Nr Number of records that have been read
FNR The number of records in the current input file, please note that it differs from NR
OFS Output field separator, default to "" Space
ORS Output record separator, default to \ n

Introduction to Awk Guide
A concise tutorial on awk
User Guide

Vim Big Adventure

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.