Regular expressions using the detailed explanation of the regular expression

Source: Internet
Author: User
Tags ereg uppercase letter
If we ask those Unix system enthusiasts what they like best, in addition to stable systems and the possibility of remotely booting, ten people will mention regular expressions, and if we ask them what the headaches are, they might be regular expressions in addition to the complex process control and installation process. So what is the regular expression. How to really grasp the regular expression and the correct use of flexibility. This article will be introduced in this paper, hoping to be helpful to those who are eager to understand and master regular expressions.

Introduction to Getting Started
In short, regular expressions are a powerful tool that can be used for pattern matching and substitution. We can find regular expressions in almost all the Unix-based tools, such as the VI editor, the Perl or PHP scripting language, and awk or sed shell programs. In addition, scripting languages such as JavaScript clients also provide support for regular expressions. Thus, regular expressions have gone beyond the limits of a language or a system and become a widely accepted concept and function.
Regular expressions allow the user to build a matching pattern by using a series of special characters, then compare the matching pattern with the target objects such as data file, program input, and form input of the Web page, and execute the corresponding program according to whether the matching pattern is included in the comparison object.
For example, one of the most common applications of regular expressions is to verify that the e-mail addresses that users enter online are correctly formatted. If the user's e-mail address is properly formatted through a regular expression, the form information that the user fills out will be processed correctly, whereas if the user enters an e-mail address that does not match the regular expression, a prompt will pop up asking the user to re-enter the correct e-mail address. This shows that regular expressions play an important role in the logical judgment of Web applications.

Basic syntax
After a preliminary understanding of the function and function of regular expressions, let's take a look at the syntax format of regular expressions.
Regular expressions are generally as follows:
/love/
The part of the "/" delimiter is the pattern that will be matched in the target object. The user simply puts the pattern content that wants to find the matching object in between the "/" delimiters. Regular expressions provide specialized "meta characters" to enable users to customize schema content more flexibly. The term "metacharacters" refers to those special characters that have special meaning in regular expressions and can be used to specify the mode in which the leading character (that is, the character at the front of the metacharacters) appears in the target object.
The more commonly used meta characters include: "+", "*", and "?". where the "+" meta character stipulates that its leading character must appear consecutively or repeatedly in the target object, the "*" metacharacters specify that its leading characters must appear 0 or more consecutive times in the target object, and "?" Metacharacters specify that its leading object must appear 0 or one consecutive times in the target object.
Next, let's look at the specific application of regular expression meta characters.
/fo+/
Because the preceding regular expression contains a "+" metacharacters, it means that a string of one or more letters O can be matched with the "fool", "fo", or "football" in the target object after the letter F.
/eg*/
Because the above regular expression contains a "*" metacharacters, the representation can match the string of 0 or more letters g that are "easy", "ego", or "egg" in the target object, which follows the letter E.
/wil?/
Because the above regular expression contains ". A meta character that matches a string of 0 or one letter L that can occur consecutively after the letter I in the target object, such as "Win" or "Wilson".
In addition to metacharacters, users can specify exactly how often a pattern will appear in a matching object. For example
/jim{2,6}/
The regular expression above stipulates that the character M can appear consecutively 2-6 times in a matching object, so the regular expression above can match a string such as Jimmy or Jimmmmmy.
After a preliminary understanding of how to use regular expressions, let's look at how other important metacharacters are used.
/s: Used to match a single spaces, including tab keys and line breaks;
/S: Used to match all characters except a single spaces;
/d: Used to match numbers from 0 to 9;
/w: Used to match letters, numbers, or underscore characters;
/w: Used to match all characters that do not match the/w;
. : Used to match all characters except for line breaks.
(Note: We can think of/s as well as/w and/w as inverse)
Below, let's take a look at how to use the above metacharacters in regular expressions.
s+/
The preceding regular expression can be used to match one or more whitespace characters in the target object.
D000/
If we have a complex financial statement in hand, then we can easily find all sums up to thousand dollars through the regular expressions mentioned above.
In addition to the meta characters we have described above, there is another unique special character in the regular expression, that is, the locator. The locator character is used to specify where the match pattern appears in the target object.
More commonly used locator characters include: "^", "$", "B" and "B". Where the "^" locator stipulates that the matching pattern must be present at the beginning of the target string, the "$" locator must have the match pattern present at the end of the target object, and the/b locator must be one of the two boundaries at the beginning or end of the target string, and the "/b" The locators specify that the matching object must be within two boundaries at the beginning and end of the target string, that is, the matching object can neither be the beginning of the target string nor the end of the target string. Similarly, we can consider "^" and "$" as well as "/b" and "/b" as two sets of locators that are mutually inverse operations. For example:
/^hell/
Because the above regular expression contains the "^" Locator, you can match a string that starts with "hell", "Hello", or "Hellhound" in the target object.
/ar$/
Because the preceding regular expression contains a "$" locator, you can match a string that ends with "car", "bar", or "AR" in the target object.
bbom/
Because the above regular expression pattern starts with the "/b" locator, it can match a string that starts with "bomb" or "BOM" in the target object.
/man/b/
Because the above regular expression pattern ends with a "/b" locator, you can match a string that ends with "human", "Woman", or "man" in the target object.
In order to make it easier for users to set a matching pattern, regular expressions allow the user to specify a range in the matching pattern and not be limited to specific characters. For example:
/[a-z]/
The regular expression above will match any uppercase letter from A to Z range.
/[a-z]/
The regular expression above will match any lowercase letter from a to Z range.
/[0-9]/
The regular expression above will match any number in the range from 0 to 9.
/([a-z][a-z][0-9]) +/
The regular expression above will match any string of letters and numbers, such as "aB0". The point to note here is that you can use "()" to group strings together in regular expressions. The "()" symbol contains content that must also appear in the target object. Therefore, these regular expressions will not match strings such as "ABC", because the last character in "ABC" is a letter rather than a number.
If we want to implement a "or" operation in a regular expression that is similar in programming logic, you can use the pipe character "|" If you choose one of several different modes to match. For example:
/to|too|2/
The regular expression above will match the "to", "too", or "2" in the target object.
There is also a more commonly used operator in the regular expression, that is, the negative character "[^]". Unlike the locator "^" described in the previous article, the negative character "[^]" stipulates that the string specified in the pattern cannot exist in the target object. For example:
/[^a-c]/
The above string will match any character except A,b, and C in the target object. Generally, when "^" appears inside "[]", it is regarded as a negation operator, and when "^" is outside "[]" or there is no "[]", it should be treated as a locator character.
Finally, the escape character "/" can be used when the user needs to add metacharacters to the pattern of regular expressions and find their matching objects. For example:
/th/*/
The regular expression above will match the "th*" in the target object, not the "the".

Working with instances

After a more comprehensive understanding of regular expressions, let's take a look at how regular expressions are used in perl,php and JavaScript.

In general, the use format of the Perl regular expressions is as follows:

Operator/regular-expression/string-to-replace/modifiers

An operator can be an M or S, representing a matching operation and a substitution operation, respectively.

In which, a regular expression is a pattern that will be matched or replaced, and can consist of any character, meta character, or locator symbol. The replacement string is a string that replaces the found pattern matching object when the S operator is used. The final parameter items are used to control different ways of matching or replacing. For example:

s/geed/good/

The first occurrence of the geed string will be found in the target object and replaced with the good. If we want to perform multiple lookup-substitution operations within the global scope of the target object, we can use the parameter "G", or s/love/lust/g.

In addition, the parameter "I" can be used if we do not need to limit the matching case form. For example

m/jewel/i

The regular expression above will match the Jewel,jewel, or Jewel, in the target object.

In Perl, a specific operator, "=~", is used to specify a matching object for a regular expression. For example:

$flag =~ s/abc/abc/

The above regular expression will replace the string ABC in the variable $flag with ABC.

Next, we add a regular expression to the Perl program to verify the validity of the user email address format. The code is as follows:

#!/usr/bin/perl
# get input
Print "What ' s your email address?/n";
$email = <STDIN>
Chomp ($email);
# Match and display result
if ($email =~/^ ([a-za-z0-9_-]) +@ ([a-za-z0-9_-]) + (/.[ A-za-z0-9_-]) +/)
{
Print ("Your email address is correct!/n");
}
Else
{
Print ("Please try again!/n");
}


If you prefer PHP, you can use the Ereg () function to perform pattern matching operations. The Ereg () function is used in the following format:
  

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.