Regular expression matching rules and examples

Source: Internet
Author: User

Regular Expressions-Matching RulesBasic Pattern Matching

Everything starts from the most basic. Patterns are the most basic elements of formal expressions, which are a set of characters that describe the character of a string. Patterns can be simple, consist of ordinary strings, or can be very complex, often with special characters representing a range of characters, repeating, or representing context. For example:

^once

This pattern contains a special character ^, which indicates that the pattern matches only those strings that begin with once. For example, the pattern matches the string "Once Upon a Time" and does not match "there once is a man from NewYork". Just as the ^ symbol represents the beginning, the $ symbol is used to match strings that end in a given pattern.

bucket$

This pattern matches the "who kept all of the cash in a bucket" and does not match "buckets". The characters ^ and $ are used together to indicate exact matches (the string is the same as the pattern). For example:

^bucket$

Matches only the string "bucket". If a pattern does not include ^ and $, then it matches any string that contains the pattern. Example: Mode

Once

With string

There once is a man from newyorkwho kept all of his cash in a bucket.

is a match.

The letters in the pattern (O-N-C-E) are literal characters, that is, they represent the letter itself, and the numbers are the same. Some other slightly more complex characters, such as punctuation and white characters (spaces, tabs, etc.), are used to escape sequences. All escape sequences begin with a backslash (\). The escape sequence for a tab is: \ t. So if we're going to check if a string starts with a tab, you can use this pattern:

Similarly, use \ n to indicate "new line" and \ r for carriage return. Other special symbols can be used in front with a backslash, such as the backslash itself with \ \, period. Use \. To indicate, and so on.

Character families

In programs in the Internet, regular expressions are often used to validate the user's input. When the user submits a form, to determine whether the input phone number, address, email address, credit card number, etc. is valid, with ordinary literal-based characters is not enough.

So to use a more liberal way of describing the pattern we want, it's a character cluster. To create a character cluster that represents all vowel characters, place all the vowels in a square bracket:

[Aaeeiioouu]

This pattern matches any vowel character, but can only represent one character. A hyphen can be used to represent a range of characters, such as:

[A-z]//Match all lowercase letters [a-z]//Match all capitals [a-za-z]//Match all the letters [0-9]//Match all the numbers [0-9\.\-]//Match all numbers, periods and minus signs [\f\r\t\n]//Match all white characters

Similarly, these also represent only one character, which is a very important one. If you want to match a string consisting of a lowercase letter and a single digit, such as "Z2", "T6" or "G7", but not "ab2", "r2d3", or "B52", use this pattern:

^[a-z][0-9]$

Although [A-z] represents a range of 26 letters, here it can only match a string with the first character being a lowercase letter.

The previous mention of ^ represents the beginning of a string, but it has another meaning. When used in a set of square brackets ^ is, it means "non" or "exclude" meaning, often used to remove a character. Also with the previous example, we require that the first character cannot be a number:

^[^0-9][0-9]$

This pattern matches "&5", "G7" and "2", but does not match "12" or "66". Here are a few examples of excluding specific characters:

[^a-z]//Excluding all characters except lowercase letters [^\\\/\^]//all characters except (\) (/) (^) [^\ "\"]//except for double quotation marks (") and single quotation marks (')

Special character "." (point, period) is used in regular expressions to denote all characters except the "New line". So the pattern "^.5$" matches any two-character string that ends with the number 5 and begins with other non-"new line" characters. Mode "." You can match any string, except for an empty string, and to include only a "new line" of strings.

The regular expressions for PHP have some built-in universal character clusters, which are listed below:

character Families Description
[[: Alpha:]] Any letter
[[:d Igit:]] Any number
[[: Alnum:]] Any letters and numbers
[[: Space:]] Any whitespace character
[[: Upper:]] Any capital letter
[[: Lower:]] Any lowercase letters
[[:p UNCT:]] Any punctuation
[[: Xdigit:]] Any 16 binary numbers, equivalent to [0-9a-fa-f]

Make sure the recurrence occurs

So far, you already know how to match a letter or number, but more likely, you might want to match a word or a group of numbers. A word consists of several letters, and a group of numbers has several singular parts. The curly braces ({}) following the character or character cluster are used to determine the number of occurrences of the preceding content.

character Families Description
^[a-za-z_]$ All the letters and underscores
^[[:alpha:]]{3}$ All 3-Letter words
^a$ Letter A
^a{4}$ Aaaa
^a{2,4}$ Aa,aaa or AAAA
^a{1,3}$ A,aa or AAA
^a{2,}$ A string containing more than two a
^a{2,} such as: Aardvark and Aaab, but not Apple
A{2,} such as: Baad and AAA, but Nantucket not
\T{2} Two tab characters
. {2} All two characters

These examples describe the three different uses of curly braces. A number, {x}, means "the preceding character or character cluster appears only x times"; A number plus a comma, {x,} means "x or more occurrences of the preceding content", and two comma-delimited numbers, {x, y} means "the preceding content appears at least x times, but not more than Y". We can extend the pattern to more words or numbers:

^[a-za-z0-9_]{1,}$//All strings containing more than one letter, number, or underscore ^[0-9]{1,}$//all positive ^\-{0,1}[0-9]{1,}$//all integers ^[-]? [0-9]+\.? [0-9]+$//All decimals

The last example is not very well understood, is it? So look at it: with all with an optional minus sign ([-]?) Start (^), followed by 1 or more digits ([0-9]+), and a decimal number (\.) Keep up with 1 or more numbers ([0-9]+), and nothing else ($). Below you will know the simpler way to use it.

Special characters "?" is equal to {0,1}, and they all represent: "0 or 1 preceding content" or "previous content is optional". So just the example can be simplified to:

^\-? [0-9] {0,}\.? [0-9] {0,}$

The special characters "*" are equal to {0,}, and they all represent "0 or more of the preceding content." Finally, the character "+" is equal to {1,}, which means "1 or more preceding contents", so the above 4 examples can be written as:

^[a-za-z0-9_]+$//All strings that contain more than one letter, number, or underscore ^[0-9]+$//all positive ^\-? [0-9]+$//All integer ^\-? [0-9]*\.? [0-9]*$//All decimals

Of course, this does not technically reduce the complexity of formal expressions, but it makes them easier to read.

Regular Expressions-ExampleSimple expression

The simplest form of a regular expression is a single ordinary character that matches itself in the search string. For example, a single-character pattern, such as a, will always match the letter a regardless of where it is in the search string. Here are some examples of single-word regular expression patterns:

/a//7//m/

You can combine a number of single characters to form large expressions. For example, the following regular expression combines single-character expressions: A, 7, and M.

/a7m/

Note that there is no concatenation operator. You only need to type another character after one character.

Character matching

a period (.) matches a variety of printed or nonprinting characters in a string, with only one character exception. The exception is the line break (\ n). The following regular expressions match AAC, ABC, ACC, ADC, and so on, as well as A1C, A2C, A-c, and a#c:

/a.c/

To match a string that contains a file name, and a period (.) is part of the input string, precede the period in the regular expression with the backslash (\) character. For example, the following regular expression matches filename.ext:

/filename\.ext/

These expressions only let you match "any" single character. You may need to match a specific group of characters in the list. For example, you might want to find chapter headings (Chapter 1, Chapter 2, and so on) that are represented by numbers.

Bracket expression

To create a list of matching character groups, place one or more individual characters within square brackets ([and]). When the word enclose characters inside the brackets, the list is called the bracket expression. As in any other position, a normal character is represented by itself within the brackets, that is, it matches itself once in the input text. Most special characters lose their meaning when they appear inside a bracket expression. However, there are some exceptions, such as:

    • If the] character is not the first item, it ends a list. To match the] character in the list, place it first, immediately after start [back.
    • The \ character continues to be an escape symbol. To match \ characters, use \ \.

A character enclosed in a bracket expression matches only a single character at that position in a regular expression. The following regular expressions match Chapter 1, Chapter 2, Chapter 3, Chapter 4, and Chapter 5:

/chapter [12345]/

Note that the position of the word Chapter and trailing spaces is fixed relative to the characters in brackets. The bracket expression specifies only the character set that matches the single character position immediately following the word Chapter and space. This is the Nineth character position.

To use a range instead of the character itself to represent a matching character group, use a hyphen (-) to separate the start and end characters in the range. The character value of a single character determines the relative order within the range. The following regular expression contains a range expression that is equivalent to the list in brackets shown above.

/chapter [1-5]/

When a range is specified in this manner, both the start value and the end value are included in the range. Note that it is also important that, in the Unicode sort order, the start value must precede the end value.

To include hyphens in bracket expressions, use one of the following methods:

    • Escape it with a backslash:
      [\-]
    • Place the hyphen at the beginning or end of the bracket list. The following expression matches all lowercase letters and hyphens:
      [-a-z] [a-z-]
    • Creates a range in which the start character value is less than the hyphen, and the ending character value is equal to or greater than the hyphen character. The following two regular expressions satisfy this requirement:
      [!--] [!-~]

To find all characters that are not in the list or range, place the caret (^) at the beginning of the list. If the insertion character appears anywhere else in the list, it matches itself. The following regular expression matches any number and character other than 1, 2, 3, 4, or 5:

/chapter [^12345]/

In the example above, the expression matches any number and character except 1, 2, 3, 4, or 5 in the nineth position. Thus, for example, Chapter 7 is a match, and Chapter 9 is also a match.

The above expression can be represented using a hyphen (-):

/chapter [^1-5]/

A typical use of a bracket expression is to specify any uppercase or lowercase letters or any number matching. The following expression specifies such a match:

/[a-za-z0-9]/
Replace and group

Replace with | Characters allow you to choose between two or more substitution options. For example, you can extend a chapter heading regular expression to return a more extensive match than the chapter title. However, this is not as simple as you might think. Replace Match | The largest expression on either side of the character.

You might think that the following expression matches the Chapter or section that appears at the beginning and end of the line, followed by one or two numbers:

/^chapter| Section [1-9][0-9]{0,1}$/

Unfortunately, the above regular expression either matches the word Chapter at the beginning of the line or matches the word section at the end of the row and any number followed. If the input string is Chapter 22, then the expression above matches only the word Chapter. If the input string is section 22, then the expression matches section 22.

To make regular expressions easier to control, you can use parentheses to limit the range of replacements, that is, make sure that it applies only to two words Chapter and sections. However, parentheses are also used to create sub-expressions and may be captured for later use, as described in the section on reverse references. You can make the regular expression match Chapter 1 or section 3 by adding parentheses at the appropriate position in the preceding regular expression.

The following regular expression uses parentheses to combine Chapter and section so that the expression works correctly:

/^ (chapter| section) [1-9][0-9]{0,1}$/

Although these expressions work correctly, chapter| The parentheses around the section also captures any one of the two matching words for later use. Because there is only one set of parentheses in the above expression, there is only one "child match" that is captured.

In the example above, you only need to use parentheses to combine the selection between the word Chapter and the section. To prevent the match from being saved for future use, place it before the regular expression pattern in parentheses:. The following modifications provide the same capabilities without saving sub-matches:

/^ (?: chapter| section) [1-9][0-9]{0,1}$/

In addition to the?: Metacharacters, two other non-capturing metacharacters create something that is known as a "lookahead" match. Forward lookahead use? = specified, which matches the search string in parentheses that match the starting point of the regular expression pattern. Reverse prediction First use?! Specifies that it matches the search string in the starting point of the string that does not match the regular expression pattern.

For example, suppose you have a document that contains references to Windows 3.1, Windows 95, Windows 98, and Windows NT. Further, assume that you need to update the document to change all references to Windows 95, Windows 98, and Windows NT to Windows 2000. The following regular expression (which is an example of a forward lookahead) matches Windows 95, Windows 98, and Windows NT:

/windows (? =95 |98 | NT)/

Once a match is found, the next match is searched immediately after the matched text (excluding the characters in the lookahead). For example, if the above expression matches Windows 98, the search will continue after windows, not after 98.

Other examples

Some examples of regular expressions are listed below:

tr>
regular expression description
/\b ([a-z]+) \1\b/gi The position where a word appears consecutively.
/(\w+): \/\/([^/:]+) (: \d*)? ( [^#]*]/ resolves a URL to a protocol, domain, port, and relative path.
/^ (?: chapter| section) [1-9][0-9]{0,1}$/ locates the location of the chapter.
/[-a-z]/ A to Z a total of 26 letters plus one-number.
/ter\b/ can match chapter, not terminal.
/\bapt/ can match chapter, not aptitude.
/windows (? =95 |98 | NT)/ can match Windows95 or Windows98 or WindowsNT, and when a match is found, the next search match starts after Windows.
/^\s*$/ matches a blank line.
/\d{2}-\d{5}/ validates an ID number consisting of two digits, a hyphen, plus 5 digits.
/<\s* (\s+) (\s[^>]*)? >[\s\s]*<\s*\/\1\s*>/ matches HTML tags.

Regular expression matching rules and examples

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.