Regular Expression BASICS (learn more)

Source: Internet
Author: User

[From the regular expression]

 

A regular expression is composed of some common characters and metacharacters. Common characters include uppercase and lowercase letters and numbers, while metacharacters have special meanings. We will explain them below.

In the simplest case, a regular expression looks like a common query string. For example, the regular expression "testing" does not contain any metacharacters. It can match strings such as "testing" and "123 testing", but cannot match "testing ".

To make good use of regular expressions, correct understanding of metacharacters is the most important thing. The following table lists all metacharacters and a brief description of them.

 

Metacharacters Description
. Match any single character. For example, the regular expression R. T matches these strings: Rat, rut, and r t, but does not match root.
$ Match the row Terminator. For example, the regular expression weasel $ can match the end of the string "he's a weasel" but cannot match the string "they are a bunch of Weasels ."
^ Match the beginning of a row. For example, the regular expression ^ when in can match the start of the string "when in the course of human events", but cannot match "what and when in"
* Match 0 or multiple characters that match exactly before it. For example, the regular expression. * means that it can match any number of characters.
/ This is a quote used to match the metacharacters listed here as common characters. For example, the regular expression/$ is used to match the dollar sign, not the end of the line. Similarly, the regular expression/. is used to match the dot character, rather than any character wildcard.
[] [C1-c2] [^ c1-c2] Match any character in the brackets. For example, the regular expression R [aou] T matches rat, rot, and rut but does not match ret. You can use the hyphen (-) in brackets to specify the character range. For example, the regular expression [0-9] can match any number character. You can also specify multiple intervals, for example, the regular expression [A-Za-Z] can match any uppercase or lowercase letter. Another important usage is "exclude". To match characters other than the specified range, that is, the so-called supplement set, use the ^ character between the brackets on the left and the first character, for example, the regular expression [^ 269a-z] matches any character except 2, 6, 9, and all uppercase letters.
/</> Start (/<) and end (/>) of the match word ). For example, the regular expression/<The/> can match the "the" in the string "for the wise", but cannot match the "the" in the string "otherwise ". Note: This metacharacter is not supported by all software.
/(/) Defines the expression between/(and/) as a "group ), and save the characters matching the expression to a temporary region (a regular expression can save up to 9 characters). They can be referenced with symbols from/1 to/9.
| Perform logical "or" (OR) operations on the two matching conditions. For example, the regular expression (him | her) matches "It belongs to him" and "It belongs to her", but does not match "it belongs to them .". Note: This metacharacter is not supported by all software.
+ Match one or more characters that match exactly before it. For example, the regular expression 9 + matches 9, 99, and 999. Note: This metacharacter is not supported by all software.
? Match 0 or 1 character that is exactly before it. Note: This metacharacter is not supported by all software.
/{I/}/{I, j /} Matches a specified number of characters defined in the previous expression. For example, the regular expression a [0-9]/{3/} can match the character "A" followed by a string of exactly three numeric characters, such as A123 and a348, but does not match a1234. The regular expression [0-9]/{4, 6/} matches any four, five, or six consecutive numeric characters. Note: This metacharacter is not supported by all software.

 

 

The simplest metacharacters are vertices that can match any single character (note that line breaks are not included ). Assume that a file named test.txt contains the following lines:

He is a rat

He is in a rut

The food is rotten

I like root beer

We can use the grep command to test our regular expression,The grep command uses a regular expression to match each row of the specified file, and displays at least one row matching the expression.. Command

  Grep R. T test.txt

Search for the regular expression R. T in each row of the test.txt file, andPrint the matched rows. The regular expression R. T matches an r followed by any character followed by a T. Therefore, it will match the rat and rut in the file, instead of the rot in rotten, because the regular expression is case sensitive. To match both uppercase and lowercase letters, use the character range metacharacters (square brackets ). The regular expression [RR] can match both R and R. Therefore, to match an upper or lower case r followed by any character followed by a T, use this expression: [RR]. T.

To match the characters at the beginning of a line, use the Escape Character (^), which is also called an insert character. For example, if you want to find the line starting with "he" in text.txt, you may first use a simple expression "he", but this will match the line in the third, so use the regular expression ^ he, it only matches h at the beginning of the row.

Sometimes it is easier to specify "match all except ×××". When the Escape Character (^) appears in square brackets, it indicates "exclude". For example, to match with HE, however, if we exclude T or S (that is, the and she), we can use [^ st] He.

You can use square brackets to specify multiple character ranges. For example, the regular expression [A-Za-Z] matches any letter, including uppercase and lowercase letters; the regular expression [A-Za-Z] [A-Za-Z] * matches a letter followed by 0 or multiple letters (uppercase or lowercase ). Of course, we can also use metacharacters + to do the same thing, that is, [A-Za-Z] +, it is equivalent to [A-Za-Z] [A-Za-Z. However, note that metacharacters + are not supported by all programs that support regular expressions. For more information, see the regular expression syntax.

To specify a specific number of matches, use braces (note that you must use a backslash to escape ). To match all instances of 100 and 1000 and exclude 10 and 10000, use 10/{2, 3/}. This regular expression matches the pattern followed by 2 or 3 0 after the number 1. A useful change in the usage of this metacharacter is to ignore the second number. For example, the regular expression 0/{3,/} matches at least three consecutive zeros.

Simple Example

Here are some representative and simple examples.

  

VI command Function
: % S/* // G Replace one or more spaces with one space.
: % S/* $ // Remove all spaces at the end of the line
: % S/^ // Add a space to the header of each line
: % S/^ [0-9] [0-9] * // Remove all numeric characters from the beginning of the line
: % S/B [aeio] g/bug/g Change all bag, beg, big, and bog to a bug.
: % S/t/([aou]/) g/h/1 t/g Change all tags, Tog, and tug to hat, hot, and hug respectively (note that the Group usage and use/1 to reference the matched characters)
 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.