Notes made when I learned Regular Expressions by myself. In fact, regular expressions are not difficult.

Last Update:2018-12-04 Source: Internet

Author: User

Tags control characters ereg

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

RT, regular expressions can handle a lot of things.

I. Regular Expression
1. Match character
1) header match character "^": for example, ^ 0754. Only strings starting with 0754 are matched.
2) the last match character "$": for example, 0754 $. Only the string ending with 0754 is matched.
3) full match: Combine ^ and $, such as ^ 0754 $, to match the 0754 string
2. escape characters
1) null characters:
Line feed/n
Press enter/R
Tab/T
2) Other characters:
"$"/$
"^"/^
"+"/+
"/"//
3. wildcard characters
1) *: Used to match whether the previous character appears zero or multiple times in the string.
Example 1: 'abc* ', matching all strings containing AB.
2) + No :...................................... once or multiple times.
Example 2: 'abc + ', matching all strings containing ABC.
3 )? No :...................................... zero or once.
Example 3: only strings that contain AB and ABC and no longer contain C are matched. Such as ABCA, AABC, and aaab, but ABCC won't work.
4. escape characters/$ and double and single quotation marks (PhP4 environment)
1) The regular expression itself is a string.
2) When the quotation marks contain $, double and single quotation marks are different. The differences are as follows:
(1) When the single quotation marks are used for definition, the interpreter will assign a value to the string variable for all characters (including $) in the quotation marks.
(2) When double quotation marks are used for definition, the interpreter translates the "$" character in the quotation marks and the valid characters (letters, numbers, and underscores) following the quotation marks into variables, the variable name ends only when an invalid character is encountered. The invalid character and its subsequent characters are considered as common characters assigned to the string variable until the next "$" is encountered.
(3) Note: when a single $ appears at the end of a double quotation mark and there are no characters after it, the interpreter will not translate it into a variable. It does not need to be escaped.
(4) If $ exists in the character to be matched, double quotation marks cannot be used to define this regular expression because escape characters/$ indicate different meanings in single and double quotation marks:
<1> in double quotes, '/' and '$' have the same meaning as that of a single '$', and both represent the end MATCH character, therefore, C/$ = C/$; double quotation marks, /$ represents only one character "$" at any time, Echo "C/$" returns C $, and/$ is exactly the same as a single $ (a single $ indicates that the $ cannot be a variable name with the subsequent characters, therefore, double quotation marks cannot be written to the "$" character that is not matched at the end. This is the reason why most regular expressions can only be used when matching $ ''.
<2> in single quotes,/$ only indicates the character "$", and the last matching character is $, regardless of whether there are valid variable name characters. In single quotes, /$ is actually two characters. If it is not used for regular expression matching, it makes no sense. Echo 'C/$ 'Still returns C/$. A single character is used as a regular expression. The/$ character in a single quotation mark represents the special character "$", and the last matching character is a separate $ character.
3) The regular expression's tail match "$" is the same as the variable definition:
Example 1: Define the regular expression as ^ AB $: $ pattern = "^ AB/$". Escape Character/$ represents the character $ in double quotation marks and the result is ^ AB $.
Example 2: use $ pattern = "^ AB $" in the preceding example. It is obviously incorrect. However, because $ is at the end and does not contain any other characters, it is still applicable.
Example 3: Regular Expression ending with character combination C $: $ pattern = 'C/$ ';
Example 4: For the above question, $ pattern = "C/$"; the regular expression regards/$ as the final match character, so only matching ends with C.

5. "[]" square brackets (character clusters) Usage
1) [] matches a character. In [], the character starting with ^ indicates taking a non-occurrence, that is, all the subsequent characters do not match.
Example 1: [a-zA-Z0-9] match all uppercase and lowercase letters and numbers.
Example 2: [/n/T/R/F] matches all null characters.
Example 3: [^ A-Z] does not match uppercase letters.
Example 4: ^ [^ 0-9] matches a character or string that does not start with a number.
2) The special character "." (period) matches all characters except the new line. The pattern ^. ABC $ matches any character ending with ABC, but cannot match itself. Mode "." can match any string except the null string and a string with only one "New Line" character.
Example 1: '^. ABC $'; matches all strings with ABC at the end and does not match decimals (new rows). If ABC is not matched.
Example 2: '.'; matches all strings, but does not match null values.
Example 3: '. abc'. It can match all strings containing ABC, decimal places, and so on, provided that ABC is not the first and ABC is not matched.
Example 4: '. ABC $'; matches all strings ending with ABC, any decimal places, and does not match ABC.
3) PHP provides built-in generic character clusters:
[[: Alpha:] Any letter
[[: Digit:] Any number
[[: Alnum:] Any letter or number
[[: Space:] any blank characters
[[: Upper:] Any uppercase letter
[[: Lower:] Any lowercase letter
[[: Punct:] any table Point Symbol
[[: Xdigit:] Any hexadecimal number
[[: Cntrl:] any character with an ASCII value less than 32
Note: The preceding character cluster has a feature. If the matched character or string contains this character, the matching is correct, no matter how the string is formed.
6. "{}" braces usage
1) square brackets can only match one character, while Multiple matching characters can only be implemented with {}: {} to determine the number of occurrences of the preceding content. {N} indicates n occurrences; {m, n} indicates M ~ N times, including M and N; {n,} indicates n times or more.
Example 1: ^ A {10 }$; matches aaaaaaaaaa.
Example 2: [0-9] {1 ,}$; match all> 0 values.
2) Relationship between "{}" and wildcard
? Equivalent to {0, 1} zero times or once
*... {0,} zero or countless times
+ ...... {1,} once or countless times
7. "()" Usage
The pattern enclosed by parentheses "()" indicates the child mode, for example, $ pattern = '([1-9] {1} [0-9] {3 }) -([0-1] {1} [1-2] {1})-([0-3] {1} ([0-9] | ))'; () Expanding is a sub-mode. () is equivalent to separating them and matching them separately without interfering with each other.
Ii. POSIX-style Regular Expression Functions
1. ereg
Ereg (pattern, String, [array $ regs]);
Eregi (pattern, String, [array $ regs]);
The ereg function finds the text that meets the pattern in string. If true is found, false is not found. If the third parameter $ regs exists, the found text will be placed in $ regs [0], and the regs array will store the results of child pattern matching expressed by parentheses at a time. $ Regs [1] stores the matching results of the first sub-mode. $ regs [2] is the second, and the order is from left to right, and so on. If no matching text is found, the value of the $ regs array will not be changed.
Note: If the matched text is found, no matter how many sub-modes are found> 9 or <9, ereg () will only change the value of the first 10 elements of the $ regs array. However, this does not affect the matching result of the function Pair Mode combination. Ereg always finishes matching first. If no matching text is found, it will be false. If yes, it will be true. If there is a sub-mode, it will gradually search for matched text in the string based on these sub-modes until the $ regs array is filled with 10 elements or all sub-modes are matched, if the sub-mode is less than 10, the remaining $ regs will be assigned a null value. In a word, the match is matched. $ regs is $ regs, and $ regs has only 10 values.
The eregi () function is basically the same as ereg (), but eregi is not case sensitive.
2. ereg_replace and eregi_replace
Ereg_replace (pattern, string replacement, string)
Eregi_replace (pattern, string replacement, string)
The text that meets the pattern in the string will be replaced with replacement. If the string contains text that matches pattern, the replaced value is returned. If no, the original string value is returned.
If the pattern contains a child pattern, the Child pattern can be retained without being replaced.
Example 1: the second sub-mode in pattern is not replaced. replacement can be written as follows: Replacement // 2. In this way, the string that matches the pattern will be replaced with replacement + pattern2, and pattern2 indicates the text that matches the second sub-pattern of the pattern in the text that matches the pattern. If "// 0" is used, the entire matching text is retained. This feature allows you to insert text after a specific string.
Replacement must be a string type variable. If not, it is forcibly converted to a string type during replacement.
3. Use the split () function and spliti () function
Split (pattern, String, [int limit]);
Spliti (pattern, String, [int limit]);
Split splits string into several parts using the pattern defined by regular expression pattern as the separator. If the separator is successful, the returned values are arrays composed of the separated parts. If the separator fails, false is returned. Optional limit indicates the maximum number of segments. If the limit value is 5, the string is only divided into five parts even if there are more than five strings that match the pattern, the last part is the rest part after removing the first four parts from the string. There are only five elements in the returned value.
Iii. Perl-style Regular Expressions and related functions
1. Perl regular syntax
Perl separator, which can be "/","! "And "{}".
Example 1:/^ [^ 0-9]/! ^ [0-9]! All three {^ [0-9]} are the same.
In the delimiter, delimiter characters are special sensitive characters and must be escaped. If you use the separator "/" and the regular expression uses the "/", you must use "//". If you use "/" and "! "No problem.
Example 2: // $ /! // $! The two are the same.
Example 3 :! ^ /! /! [0-9] $! /^ !! [0-9] $/both are the same
2. special characters in Perl
/An alarm character whose ASCII value is 7
/B word boundary
/A is equivalent to the escape sign ("/").
/B Non-word boundary
/CN control characters
/D single digit
/D single non-digit
/S single blank
/S single non-Blank
/W single letter or underline
/W single non-word characters (neither letters nor underscores)
/Z starts matching from the end of the target string
3. Advanced features
1) or operation "| ":
For example! ^ Ex | em! The matching condition is a string starting with ex or em. It can also be written! ^ E (X | M )!.
Note: () The content in represents the submode/
2) mode options after logical symbols
! Regular Expression! Logical options
A: Only the characters starting with the target string are matched.
E: This option allows the regular expression consisting of the Escape Character $ to match only the end character of the target string. If the M option is selected, the option is ignored.
U: This option disables searches with the maximum length. In general, search will try to find the longest matching string. For example, the matching result of the mode/A +/in the "caaaaab" string is "AAAAA", but the matching result of the mode/A +/u with this option is "".
S: learning the mode to improve the search speed.
I: This option is case-insensitive.
M: This option treats strings containing line breaks as multiple rows rather than one row. At this time, "$", "^" and other characters match each line break.
S: This option matches the period "." With the line break.
X: This option notifies the PHP interpreter to ignore non-escape space characters in the regular expression definition during analysis. In this way, spaces can be used in regular expressions to enhance readability, but escape characters must be used in expressions.
3) Extended Mode symbol.
(? # Comment) Add Comment comments to enhance the readability of regular expressions.
(? = Pattern) specifies that the pattern value must be followed after the pattern.
(?! Pattern) specifies that the pattern value cannot be followed after the pattern.
(? N) define the mode option n inside the mode rather than at the end.
(? :) Characters are consumed, and matching results are not captured.
Example: Echo ereg ("? : ^ A $ "," A "); // No output.

4. per regular function
1. preg_grep Function
Preg_grep (pattern, array input );
Search for the matching pattern string in the input array input, and return all matching strings. The returned value is an array composed of all matching strings.
2. preg_match Function
Preg_match (pattern, string subject, [array matches])
This function searches for a string that matches pattern in the subject string. If yes, a non-zero value is returned. Otherwise, a zero value is returned. If matches is selected, the matched string will be placed at the first element and can be read using $ matches [0, the matching results of parentheses are also placed in this array in order. The first is $ matches [1], the second is $ matches [2], and so on.
3. preg_match_all Function
Preg_match_all (pattern, subject, array matches, [int order])
This function is used to search for non-overlapping texts matching the pattern in the subject string. If the matching text is found, the number of matched texts is returned. Otherwise, 0 is returned. The matched text is placed in the two-dimensional array matches. Matches [0] stores all matching strings. The matching results of various embedded sub-modes are sequentially placed in the array matches [1] ~ [N.
The order parameter is optional. The optional values are preg_pattern_order and preg_set_order.
4. preg_replace Function
Preg_replace (pattern, replacement, subject, [int limit])
This function replaces the pattern-compliant part of the subject with replacement. The return value type is the same as that of the Subject type. If there is a replacement, the replaced value is returned. Otherwise, the original value is returned.
The parameter can be an array or a variable. There are several situations:
<1> If the subject parameter is of the array type. The function replaces each array element;
<2> If pattern is an array, the function replaces the pattern based on the Type in each pattern;
<3> if both pattern and replacement are arrays, replace them according to the elements in the two arrays;
<4> if the number of elements in replacement is less than that in pattern. Then the insufficient part will be replaced by a null string.
5. preg_split Function
Preg_split (pattern, subject, [int limit] [flages])
This function separates the subject string into several parts using the pattern defined by pattern, and returns an array containing the separated strings. Limit can limit the number of returned strings. If it is set to-1, no limit is imposed on the number of returned strings. Flags is also optional and has two values: preg_split_no_empty. The set function does not return an empty string, perg_split_delim_capture. This option sets the embedding sub-mode in pattern to be matched by the function.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More