Regular expression Tutorials match a set of characters in a detailed

Source: Internet
Author: User
Tags alphabetic character ranges
The example in this article describes a method that matches a set of characters in a regular expression tutorial. Share to everyone for your reference, as follows:

Note: In all examples, the expression match results are included in the source text between "and", some examples will be implemented in Java, if the Java itself is the use of regular expressions, will be described in the appropriate place. All Java examples are tested and passed under Jdk1.6.0_13.

One that matches one of several characters

In an example of a text file that matches an NA or SA in the match a single character in a regular expression tutorial, the regular expression used is. A.\.txt. If there is another file that is Cal.txt, it will also be matched to. What if you only want to match files that begin with Na or SA?

Now that you want to find only n or S, the use can match any character. Obviously not. In the regular expression, we can use [and] to define a character set, in the use of [and] to define the character set, all the characters between the two metacharacters are part of the collection, the character set matches the result is able to match any of the members of the set of text.

Consider an example similar to the one in the previous article:

Text:

Sales.txt

Na1.txt

Na2.txt

Sa1.txt

Sanatxt.txt

Cal.txt

Regular expression: [Ns]a.\.txt

Results:

Sales.txt

"Na1.txt"

"Na2.txt"

"Sa1.txt"

Sanatxt.txt

Cal.txt

Analysis: The regular expression used here begins with [Na], which matches the character N or S and does not match any other characters. [and] do not match any characters, they are only responsible for defining a character set. Next a matches a character a,\. Will match a. character itself, txt matches the TXT character itself, and the matching result is consistent with what we expected.

However, if there is a file in the file that is Usa1.txt, it will also be matched. This is a location-matching issue that will be discussed later.

Second, using the character set interval

In the above example, what if we just want to match a file that starts with Na or SA, followed by a number? Regular expression [Ns]a.\.txt,. will be matched to any one character, including numbers. This problem can be solved by using the character set together:

Sales.txt

Na1.txt

Na2.txt

Sa1.txt

San.txt

Sanatxt.txt

Cal.txt

Regular expression: [Ns]a[0123456789]\.txt

Results:

Sales.txt

"Na1.txt"

"Na2.txt"

"Sa1.txt"

San.txt

Sanatxt.txt

Cal.txt

Analysis: From the results we can see that we only match the file with Na or SA, followed by a number, and san.txt is not matched, because the character set [0123456789] is used to qualify the third character only as a number.

In regular expressions, a number of character ranges are used very frequently, such as 0-9,a-z, and so on, for the definition of simplified characters, regular expressions provide a special metacharacters-to define the character range. Like the example above, we can use regular expressions to match: [Ns]a[0-9]\.txt, the result is exactly the same as above.

The character range is not limited to numbers, as the following are valid character ranges:

[A-f]: matches all uppercase letters from a to F.

[A-Z]: matches all uppercase letters from a to Z.

[A-Z]: matches all letters from the ASCII character A to the ASCII character Z. But this interval is generally not used, just for illustrative purposes. Because they also contain characters such as [and ^] that are arranged between Z and a in ASCII.

The Kinsoku character range can be any character in the list of ASCII characters. But in actual use, the most commonly used is the number and alphabetic character range.

Note: When defining a character range, it is not allowed to have the trailing character of the interval less than the first character (e.g. [9-0]). -As metacharacters can only appear between [and], if anywhere outside [and], it is just a normal character and will only match the-itself.

Multiple character ranges can be given in the same character set, for example: [0-9a-za-z] will match any uppercase and lowercase letters and numbers.

Take a look at the example of matching colors in a Web page:

Text:

<span style= "Background-color: #3636FF; height:30px; width:60px; " > Testing </span>

Regular expression: #[0-9a-fa-f] [0-9a-fa-f] [0-9a-fa-f] [0-9a-fa-f] [0-9a-fa-f] [0-9a-fa-f]

Results: <span style= "Background-color:" #3636FF "; height:30px; width:60px; " > Testing </span>

Analysis: In a Web page, colors are generally represented as an RGB value starting with #, R for Red, G for Green, and b for blue, and any color can be reconciled by different combinations of RGB. The RGB value is represented by a 16-input value, such as #000000 for White, #FFFFFF代表黑色, #FF0000代表红色. So the regular expression that matches the color in the Web page is preceded by #, followed by 6 identical [0-9a-fa-f] character sets (this can be abbreviated to #[0-9a-fa-f]{6}, which is discussed in the following repeating matches).

Third, take non-matching

Character sets are typically used to specify a set of characters that must match one of them, but in some cases we need to do this in turn, giving a set of unwanted characters, in other words, any other character that matches the character in that character set.

For example, to match a file that starts with Na or SA and is not followed by a number:

Text:

Sales.txt

Na1.txt

Na2.txt

Sa1.txt

Sanatxt.txt

San.txt

Regular expression: [Ns]a[^0-9]\.txt

Results:

Sales.txt

Na1.txt

Na2.txt

Sa1.txt

Sanatxt.txt

"San.txt"

Analysis: This example uses a pattern that is exactly the opposite of the previous one, [0-9] only matches the number, and here [^0-9] matches the non-number.

Note: ^ between [and] represents a non, if it appears at the beginning of the regular expression, indicating that the position match matches, which will be discussed later. At the same time, the ^ effect will be used for all characters or range of characters in a given character set, not just the character or range of characters immediately following the ^ character. such as [^0-9a-z] means that no numbers or lowercase letters are matched.

Iv. Summary

metacharacters [and] are used to define a character set, meaning that it must match one of the characters in the collection. There are two ways to define a character set: one is to enumerate all the characters nonalphanumeric, and the other is to use metacharacters-given in the form of a character interval. The character set can use the metacharacters ^ to take the non, which will force the given character set to be excluded from the matching operation, except for the character in the character set, other characters can match.

In the following article, we will discuss the use of some meta characters in regular expressions.

It is hoped that this article will be helpful to everyone's regular expression learning.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.