Regular Expression Basics

Last Update:2015-08-06 Source: Internet

Author: User

Tags perl regular expression egrep

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Regular Expressions (Regular expression, often abbreviated as regex, RegExp, or re): Also known as regular expressions, formal representations, regular expressions, regular expressions, and conventional representations. A regular expression uses a single string to describe and match a series of strings that conform to a certain syntactic rule. In many text editors, regular expressions are often used to retrieve and replace text that conforms to a pattern.

Basic syntax: a regular expression is often referred to as a pattern, which is used to describe or match a series of strings that conform to a certain syntactic rule.

Select: | Boy|girl can match boy or girl

Quantity Limit: *, + 、?、. If you do not add a quantifier to a pattern, it appears once and only once

+ indicates that the preceding character must appear at least once or more, as Goo+gle can match Goooogle,gooooooooooogle

? indicates that the preceding character appears up to 0 times or once as colou?r can match Color,colour

* indicates that the preceding character can not appear, or can appear one or more times such as 0*45 can match 42,042,0045, etc.

Scope and Precedence: () defines the scope and precedence of a pattern string, which can be simply understood as whether the pattern strings within the parentheses are as a whole. such as GR (E|a) y is equivalent to Gray|grey, (grand) father matches father and grandfather.

Regular expressions come in a number of different styles, and here are some of the regular expression matching rules that are commonly used as pcre subsets for Perl and Python to become languages and grep or Egrep.

PCRE (Perl Compatible Regular Expressions:perl language-compatible regular expressions) is a regular expression library written in C, written in Philop Hazel. Pcre is a lightweight library of functions that is much smaller than the regular expression libraries of boots value classes.

\ tell a character to be marked as a special character, or a literal character

^ matches the starting position of the input string

$ matches the end position of the input string

{n} n is a non-negative integer. Matches the determined n times. such as o{2} does not match Bob but can match food

{N,} n is a non-negative integer. Match at least n times. such as O{2,} cannot match Bob, but can match fooood. O{1,} is equivalent to o+. O{0,} equivalent to o*

{n,m} m and n are non-negative integers. Which n<=m. Matches at least n times and matches up to M times. such as o{1,3} will match the first three O o{0,1} in Foooood equivalent to O?

* Match previous expression 0 or more times

+ Match Previous expression 1 or more times

? Match previous expression 0 or 1 times

? The match pattern is non-greedy when the character immediately follows any other restriction (*,+,?, {n},{n,m}). Non-greedy mode matches the searched string as much as possible, while the default greedy pattern matches as many strings as you search for. O+? Matches only a single O and o+ will match all O

. To match any single character except \ n, to include \ n, you can use (. | \ n)

(pattern) matches the pattern and obtains this matching substring. The string is used for backward referencing. To criticize the parentheses character, use \ (or \)

x | Y matches x or y as Z | Food Match Z or food (z | f) oo can match zoo or foo

[XYZ] Character set (character Class). Matches any one of the characters contained. such as [ABC] can match a in plain. Where special characters have only backslashes \ Hold special meanings for escaping characters. Other special Furus numbers, plus signs, and various parentheses are all common characters. The caret ^ if present in the first place indicates a negative character set, if it appears in the string only as ordinary characters. Hyphen-Indicates a character range description if it appears in a string, or only as a normal character if it appears in the first place.

[^XYZ] Exclusion type (negate) character set. Matches any character that is not listed. such as [^ABC] can match the Plin in the plain

A [A-z] character range. Matches any character within the specified range.

[^a-z] Exclusion type character range. Matches any character that is not in the specified range.

Priority: Priority from top to bottom, descending from left to right.

\ escape Character

(), (?:), (? =), [] brackets and brackets

*,+,?, {n},{n,},{n,m} qualifier

^,$,\ any meta-character anchor points and sequences

| Choose

grep Pattern Matching command

grep is used to print the matching pattern string in the output text, which is used as a criterion for simulating matching with regular expressions. grep supports three regular expression engines, specified with three parameters:-e:posix extended Regular expression, ere-g:posix basic regular expression, bre-p:perl regular expression, PCRE

grep Common parameters:

-B match binary files as text

-C statistics in number of pattern matches

-I ignores case

-N Displays the line number of the line containing the matched text

-V inverse to output the contents of mismatched rows

-R Recursive matching lookup

-A n n is a positive integer. Represents after. In addition to listing matching rows, the following n rows are listed

The-b n n is a positive integer. Represents before. In addition to listing matching rows, the preceding n rows are listed

--color=auto setting matches in output to auto-color display

$ Touch Test

$ VIM Test

Enter a series of characters after the command line mode: wq! Save and exit

$ cat Test

$ grep-c Hello Test//Find Hello all line numbers in test

$ grep-i-N C Test//Query the string where C is located in test and return the number of rows he is in

$ grep-v Shell test//find strings other than shell in test

$grep Hello Test//Find the contents of the line where Hello is located in test

Using regular expressions

$ grep ' Shiyanlou '/etc/group//Find lines starting with Shiyanlou in/etc/group

$ grep ' ^shiyanlou '/etc/group//Find lines starting with Shiyanlou in/etc/group

$ Echo ' Zero\nzo\nzzo ' | grep ' z.*o '//In Zero, Nzo, Nzzo matches all strings with the beginning of z ending with O where \ n is the newline

$ Echo ' Zero\nzo\nzzo ' | grep ' Z.O '//matches in zero, Nzo, Nzzo with the beginning of the z ending with an O, and a string containing an arbitrary string in the middle where \ n is a newline

$ Echo ' Zero\nzo\nzzo ' | grep ' zo* '//will match a string that starts with Z and ends with any number of O \ n is a newline

$ Echo ' 1234\NABCD ' | grep ' [A-z] '//GERP is case-sensitive by default and matches all lowercase letters here

$ Echo ' 1234\NABCD ' | grep ' [0-9] '//match all the numbers

$ Echo ' 1234\NABCD ' | grep ' [[:d igit:]] '//Match all numbers

$ Echo ' 1234\NABCD ' | grep ' [[: Lower:]] '//Match all lowercase letters

$ Echo ' 1234\NABCD ' | grep ' [[: Upper:]] '//Match all uppercase letters

$ Echo ' 1234\NABCD ' | grep ' [[: Alnum:]] '//Match all letters and numbers, including 0-9 A-Z

$ Echo ' 1234\NABCD ' | grep ' [[: Alpha:]] '//Match all the letters

$ Echo ' 1234\NABCD ' | grep ' [[: Blank:]] '//Match all blank keys with TAB key both

$ Echo ' 1234\NABCD ' | grep ' [[: Cntrl:] '//Match the control keys on the keyboard, also including Cr LF Tab del, etc.

$ Echo ' 1234\NABCD ' | grep ' [[: Graph:]] '//Match all other keys except whitespace bytes

$ Echo ' 1234\NABCD ' | grep ' [[:p rint:]] '//Match any byte that can be listed

$ Echo ' 1234\NABCD ' | grep ' [[:p UNCT:] '//Match all punctuation

$ Echo ' 1234/NABCD ' | grep ' [[: Xdigit:]] '//Match all 16 binary numeric types

The reason for using special symbols is because the above [a-z] is not valid in all cases, and is related to the current language of the confidant, which is the value set on the lang environment variable. Zh_cn. UTF-8 words, [A-z] is all lowercase letters, and other languages may be case alternating.

$ Echo ' Geek|good ' | grep ' [^o] '//exclude O characters

Use an extended regular expression grep-e or egrep

$ Echo ' Zero\nzo\nzoo ' | Grep-e ' zo{1} '//Match only Zo

$ Echo ' Zero\nzo\nzoo ' | Grep-e ' zo{1,} '//match all words beginning with Zo

$ Echo ' www.baidu.com\nwww.shiyanlou.com\nwww.google.com ' | Grep-e ' www\. (Baidu|shiyanlou) \.com ' \ \ Match www.shiyanlou.com and ww.baidu.com

$ Echo ' www.shiyanlou.com\nwww.baidu.com\nwww.google.com ' | Grep-ev ' www\.baidu\.com ' \ \ Matches content that does not contain Baidu
Sed Flow Editor

The SED tool in the Man Handbook has the full name Sed-stream editor for filtering and transforming test (stream editor for filtering and converting text). SED is a non-interactive editor. SED command basic format sed [parameter] ... [Execute command] [Input file] ...

$ Sed-i ' 1s/sad/happy ' test #将test文件中第一行的sad替换为happy

Sed common parameter-n quiet mode. Only the affected rows are printed. Default printing of the full contents of the input data-E is used to add multiple execution commands to the script at one time, executing multiple commands on the command line usually does not need to add the parameter-f filename Specifies the command to execute the filename file-R uses an extended regular expression, which defaults to the standard regular expression -I will directly modify the input file contents instead of printing to the standard output device

The format of the SED editor's execution command is [N1] [, N2]command [N1][~step]command n1,n2 represents the line number of the input, between them, representing the line from N1 to N2, and, if it is ~, all rows that begin with step as stepping from N1. command to perform the action

$ Sed-i ' s/sad/happy/g ' test #g表示全局范围

$ Sed-i ' S/SAD/HAPPY/4 ' test #4表示指定行中的第四个匹配字符串

Sed common parameters s in-line substitution C positive line substitution a insert to the specified line after I insert to the front of the specified line p print the specified line usually with the-n parameter use D to delete the specified row

$ cp/etc/passwd ~

$ NL passwd | Sed-n ' 2,5p ' #打印2-5 rows

$ NL passwd | Sed-n ' 1~2p ' #打印奇数行

$ Sed-n ' s/shiyanlou/hehe/gp ' passwd #将输入文本中shiyanlou全部替换为hehe and only print replace that line

$ NL passwd | grep ' Shiyanlou '

$ Sed-n ' 21c\www.shiyanlou.com ' passwd #删除第21行

awk Text Processing language

Awk is an excellent tool for text processing. One of the most powerful data processing engines available in Linux and UNIX environments. It derives its name from the three initials of founder Alfred Aho,peter Jay Weinberger and Brain Wilson Kernighan. The three-bit creator has formally defined him as "style scan and process language". It allows you to create short programs that read input files, sort data, manipulate data, perform calculations on input, and generate reports.

$ ll/usr/bin/awk

All awk operations are done based on pattern-action. Like $ pattern {action} It encloses all action actions with a pair of {}. Where pattern is usually a relationship or regular expression that represents the text used to match the input, the action is the action that will be performed after the match. In a complete awk operation, there can be only one, and if there is no pattern, the default matches all the text entered. If there is no action, the matching content is printed to the screen.

awk handles text in a way that divides the text into fields and then processes the fields. By default, awk takes a space as a delimiter for a field.

AWK basic command Format awk [f FS] [-v var=value] [-F prog-file | ' Program test ' [File ...] Where the-f parameter is used to pre-specify the segment delimiter mentioned earlier. -V is used to pre-specify variables for the awk program. The-f parameter is used to specify the program file to be executed by the awk command, or to simply place the program statement here without the-f parameter. Finally, for awk to process text input, you can enter multiple text files at the same time.

$ VIM Test

$ Awk ' {print} ' test

$ Awk ' {if (nr==1) {print $ "\ n" $ "\ n" $ Else{print}} ' Test #将test中第一行的每个字段单独显示为一行

$ Awk ' {if (nr==1) {ofs= "\ n" print $, $, $3}else{print} ' test #将test中第一行的每个字段单独显示为一行

NR and OFS are variables built into awk. NR indicates the number of records currently being read, that is, the current number of processing rows. OFS represents the field delimiter at output, and the default is a space. The $N n is the corresponding field number. He represents a reference to the corresponding field. $ A refers to the entire contents of the current record.

$ Awk-f '. ' ' {if (nr==2) {print $ "\ t" $ "T" $ $}} ' test #将test中的字段以点为分隔符换成以空格为分隔符

$ Awk ' begin{fs=. '; O Fs= "\ T"}{if (nr==2) {print $, $ $, $ i} ' Test #print打印的非变量内容都需要用 ' "a pair of quotes to hold up

The built-in variable used by awk filename is the current input file name, and if there are multiple files, only the first one is represented. If the input is from a standard input, it is an empty string. The current record content $N N represents the field number, the maximum value is the value of the NF variable FS field separator, represented by the regular expression, the default is a space RS input record delimiter, the default is \ n, that is, one row for a record NF the current record field number NR has been read into the number of fields FNR current The number of records in the input file OFS the Output field delimiter, the default is a space ORS the output record separator, the default is \ n

Regular Expression Basics

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More