Introductory Introduction to Regular expressions

Last Update:2017-02-28 Source: Internet

Author: User

Tags range regular expression uppercase letter

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In short, regular expressions are a powerful tool that can be used for pattern matching and substitution. We can find regular expressions in almost all the Unix-based tools, such as the VI editor, the Perl or PHP scripting language, and awk or sed shell programs. In addition, scripting languages such as JavaScript clients also provide support for regular expressions. Thus, regular expressions have gone beyond the limits of a language or a system and become a widely accepted concept and function.

Regular expressions allow the user to build a matching pattern by using a series of special characters, then compare the matching pattern with the target objects such as data file, program input, and form input of the Web page, and execute the corresponding program according to whether the matching pattern is included in the comparison object.

For example, one of the most common applications of regular expressions is to verify that the e-mail addresses that users enter online are correctly formatted. If the user's e-mail address is properly formatted through a regular expression, the form information that the user fills out will be processed correctly, whereas if the user enters an e-mail address that does not match the regular expression, a prompt will pop up asking the user to re-enter the correct e-mail address. This shows that regular expressions play an important role in the logical judgment of Web applications.

Basic syntax

After a preliminary understanding of the function and function of regular expressions, let's take a look at the syntax format of regular expressions.

Regular expressions are generally as follows:

/love/

The part of the "/" delimiter is the pattern that will be matched in the target object. The user simply puts the pattern content that wants to find the matching object in between the "/" delimiters. Regular expressions provide specialized "meta characters" to enable users to customize schema content more flexibly. The term "metacharacters" refers to those special characters that have special meaning in regular expressions and can be used to specify the mode in which the leading character (that is, the character at the front of the metacharacters) appears in the target object.

The more commonly used meta characters include: "+", "*", and "?". where the "+" meta character stipulates that its leading character must appear consecutively or repeatedly in the target object, the "*" metacharacters specify that its leading characters must appear 0 or more consecutive times in the target object, and "?" Metacharacters specify that its leading object must appear 0 or one consecutive times in the target object.

Next, let's look at the specific application of regular expression meta characters.

/fo+/

Because the preceding regular expression contains a "+" metacharacters, it means that a string of one or more letters O can be matched with the "fool", "fo", or "football" in the target object after the letter F.

/eg*/

Because the above regular expression contains a "*" metacharacters, the representation can match the string of 0 or more letters g that are "easy", "ego", or "egg" in the target object, which follows the letter E.

/wil?/

Because the above regular expression contains "? A meta character that matches a string of 0 or one letter L that can occur consecutively after the letter I in the target object, such as "Win" or "Wilson".

In addition to metacharacters, users can specify exactly how often a pattern will appear in a matching object. For example

/jim{2,6}/

The regular expression above stipulates that the character M can appear consecutively 2-6 times in a matching object, so the regular expression above can match a string such as Jimmy or Jimmmmmy.

After a preliminary understanding of how to use regular expressions, let's look at how other important metacharacters are used.

S: Used to match a single spaces, including tab keys and line breaks;

S: Used to match all characters except a single spaces;

D: Used to match numbers from 0 to 9;

W: Used to match letters, numbers or underscore characters;

W: used to match all characters that do not match w;

. : Used to match all characters except for line breaks.

(Note: We can think of S and S and W and W as inverse)

Below, let's take a look at how to use the above metacharacters in regular expressions.

/s+/

The preceding regular expression can be used to match one or more whitespace characters in the target object.

/d000/

If we have a complex financial statement in hand, then we can easily find all sums up to thousand dollars through the regular expressions mentioned above.

In addition to the meta characters we have described above, there is another unique special character in the regular expression, that is, the locator. The locator character is used to specify where the match pattern appears in the target object.

The more commonly used locator characters include: "^", "$", "B" and "B". Where the "^" locator stipulates that the matching pattern must be present at the beginning of the target string, the "$" locator must have the match pattern present at the end of the target object, and the B-locator must be one of the two boundaries at the beginning or end of the target string, and "B" The locators specify that the matching object must be within two boundaries at the beginning and end of the target string, that is, the matching object can neither be the beginning of the target string nor the end of the target string. Similarly, we can think of "^" and "$" and "B" and "B" as two sets of locators that are reciprocal operations. For example:

/^hell/

Because the above regular expression contains the "^" Locator, you can match a string that starts with "hell", "Hello", or "Hellhound" in the target object.

/ar$/

Because the preceding regular expression contains a "$" locator, you can match a string that ends with "car", "bar", or "AR" in the target object.

/bbom/

Because the preceding regular expression pattern starts with a "B" locator, it can match a string that starts with "bomb" or "BOM" in the target object.

/manb/

Because the above regular expression pattern ends with a "B" locator, you can match a string that ends with "human", "Woman", or "man" in the target object.

In order to make it easier for users to set a matching pattern, regular expressions allow the user to specify a range in the matching pattern and not be limited to specific characters. For example:/[a-z]/

The regular expression above will match any uppercase letter from A to Z range.

/[a-z]/

The regular expression above will match any lowercase letter from a to Z range.

/[0-9]/

The regular expression above will match any number in the range from 0 to 9.

/([a-z][a-z][0-9]) +/

The regular expression above will match any string of letters and numbers, such as "aB0". The point to note here is that you can use "()" to group strings together in regular expressions. The "()" symbol contains content that must also appear in the target object. Therefore, these regular expressions will not match strings such as "ABC", because the last character in "ABC" is a letter rather than a number.

If we want to implement a "or" operation in a regular expression that is similar in programming logic, you can use the pipe character "|" If you choose one of several different modes to match. For example:

/to|too|2/

The regular expression above will match the "to", "too", or "2" in the target object.

There is also a more commonly used operator in the regular expression, that is, the negative character "[^]". Unlike the locator "^" described in the previous article, the negative character "[^]" stipulates that the string specified in the pattern cannot exist in the target object. For example:

/[^a-c]/

The above string will match any character except A,b, and C in the target object. Generally, when "^" appears inside "[]", it is regarded as a negation operator, and when "^" is outside "[]" or there is no "[]", it should be treated as a locator character.

Finally, the escape character "" is used when the user needs to add metacharacters to the pattern of regular expressions and find their matching objects. For example:

/th*/

The regular expression above will match the "th*" in the target object, not the "the".

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More