Practical regular expression matching and replacing encyclopedia _ Regular expressions

Source: Internet
Author: User
Tags alphabetic character html tags regex expression regular expression

Regular expressions are useful for finding, matching, processing strings, replacing and converting strings, input and output, and so on. And all languages are supported, for example. NET regular libraries, JDK regular packages, Perl, JavaScript, and various scripting languages all support regular expressions. Some common regular expressions are sorted below.

Character

Describe

\ Marks the next character as a special character, or a literal character, or a backward reference, or a octal escape character. For example, ' n ' matches the character ' n '. ' \ n ' matches a newline character. Sequence ' \ ' matches ' \ ' and ' \ (' Matches ' (".
^ Matches the start position of the input string. If the Multiline property of the RegExp object is set, ^ also matches the position after ' \ n ' or ' \ R '.
$ Matches the end position of the input string. If the Multiline property of the RegExp object is set, the $ also matches the position before ' \ n ' or ' \ R '.
* Matches the preceding subexpression 0 or more times. For example, zo* can match "z" and "Zoo". * is equivalent to {0,}.
+ Matches the preceding subexpression one or more times. For example, ' zo+ ' can match "Zo" and "Zoo", but cannot match "Z". + is equivalent to {1,}.
? Match the preceding subexpression 0 times or once. For example, "Do (es)" can match "do" in "do" or "does". is equivalent to {0,1}.
{n} N is a non-negative integer. Matches the determined n times. For example, ' o{2} ' cannot match ' o ' in ' Bob ', but can match two o in ' food '.
{n,} N is a non-negative integer. Match at least N times. For example, ' o{2,} ' cannot match ' o ' in ' Bob ' but can match all o in ' Foooood '. ' O{1,} ' is equivalent to ' o+ '. ' O{0,} ' is equivalent to ' o* '.
{n,m} m and n are nonnegative integers, of which n <= m. Matches N times at least and matches up to M times. For example, "o{1,3}" will match the first three o in "Fooooood". ' o{0,1} ' is equivalent to ' o '. Notice that there is no space between the comma and the two number.
? When the character is immediately following any other qualifier (*, +,?, {n}, {n,}, {n,m}), the matching pattern is not greedy. Non-greedy patterns match as few strings as possible, while the default greedy pattern matches as many of the searched strings as possible. For example, for the string "oooo", ' o+? ' will match a single "O", and ' o+ ' will match all ' o '.
. Matches any single character except "\ n". To match any character including ' \ n ', use a pattern like ' [. \ n] '.
(pattern) Match pattern and get this match. The obtained matches can be obtained from the generated matches collection, the submatches collection is used in VBScript, and in JScript the $... the $ attribute. To match the parentheses character, use ' \ (' or ' \ ').
(?:pattern) Matches pattern but does not get a matching result, which means it is a non fetch match and is not stored for later use. This is useful for combining parts of a pattern with the "or" character (|). For example, ' Industr (?: y|ies) is a more abbreviated expression than ' industry|industries '.
(? =pattern) Forward lookup, matching the find string at the beginning of any string matching pattern. This is a non-fetch match, that is, the match does not need to be acquired for later use. For example, ' Windows (? =95|98| nt|2000) ' Can match windows in Windows 2000, but cannot match windows in Windows 3.1. It does not consume characters, that is, after a match occurs, the next matching search begins immediately after the last match, instead of starting after the character that contains the pre-check.
(?! pattern) Negative pre-check, matches the lookup string at the beginning of any mismatched pattern string. This is a non-fetch match, that is, the match does not need to be acquired for later use. For example, ' Windows (?! 95|98| nt|2000) ' Can match windows in Windows 3.1, but cannot match windows in Windows 2000. It does not consume characters, that is, after a match occurs, the next matching search begins immediately after the last match, instead of starting after the character that contains the pre-check.
x| y Match x or y. For example, ' Z|food ' can match "z" or "food". ' (z|f) Ood ' matches ' zood ' or ' food '.
[XYZ] Character set combination. Matches any one of the characters contained. For example, ' [ABC] ' can match ' a ' in ' plain '.
[^XYZ] Negative character set combination. Matches any characters that are not included. For example, ' [^ABC] ' can match ' P ' in ' plain '.
[A-Z] The range of characters. Matches any character within the specified range. For example, ' [A-z] ' can match any lowercase alphabetic character in the range ' a ' to ' Z '.
[^ A-Z] Negative character range. Matches any character that is not in the specified range. For example, ' [^a-z] ' can match any character that is not in the range of ' a ' to ' Z '.
\b Matches a word boundary , which is the position between the word and the space. For example, ' er\b ' can match ' er ' in ' never ', but cannot match ' er ' in ' verb '.
\b Matches a non-word boundary. ' er\b ' can match ' er ' in ' verb ', but cannot match ' er ' in ' Never '.
\cx Matches the control character indicated by x . For example, \cm matches a control-m or carriage return character. The value of x must be one-a-Z or a-Z. Otherwise, c is treated as a literal ' C ' character.
\d Matches a numeric character. equivalent to [0-9].
\d Matches a non-numeric character. equivalent to [^0-9].
\f Matches a page feed character. Equivalent to \x0c and \CL.
\ n Matches a line feed character. Equivalent to \x0a and \CJ.
\ r Matches a carriage return character. Equivalent to \x0d and \cm.
\s Matches any white space character, including spaces, tabs, page breaks, and so on. equivalent to [\f\n\r\t\v].
\s

Matches any non-white-space character. equivalent to [^ \f\n\r\t\v].

\ t Matches a tab character. Equivalent to \x09 and \ci.
\v Matches a vertical tab. Equivalent to \x0b and \ck.
\w Matches any word character that includes an underscore. Equivalent to ' [a-za-z0-9_] '.
\w Matches any non word character. Equivalent to ' [^a-za-z0-9_] '.
\xn Matches n, where n is the hexadecimal escape value. The hexadecimal escape value must be a determined two digits long. For example, ' \x41 ' matches ' A '. ' \x041 ' is equivalent to ' \x04 ' & ' 1 '. ASCII encoding can be used in regular expressions.
\Num Matches num, where num is a positive integer. A reference to the match that was obtained. For example, ' (.) \1 ' matches two consecutive identical characters.
\N Identifies a octal escape value or a backward reference. n is a backward reference if there are at least N obtained child expressions before \n . Otherwise, if n is an octal number (0-7), then N is an octal escape value.
\nm Identifies a octal escape value or a backward reference. nm is a backward reference if at least nm has obtained the subexpression before \nm . If at least N is fetched before \nm , then n is a backward reference followed by a literal m . If the preceding conditions are not satisfied, if both n and m are octal digits (0-7), thennm will match the octal escape value nm.
\nml If n is an octal number (0-3) and both m and l are octal digits (0-7), the octal escape value NML is matched .
/I Make regular expressions insensitive to case, (? i) is off case insensitive
(? i) te (? i) St should match test, but cannot match test or test.
/s Open "single mode", point number "." Match New Line character
/m Open multiline mode, where "^" and "$" match the front and back positions of the new line character.
^[0-9]*$ Only numbers can be entered
^\d{n}$ Only n digits can be entered
^\d{n,}$ Only digits of at least n digits can be entered
^\d{m,n}$ Only digits in m~n digits can be entered
^ (0| [1-9] [0-9]*) $ Only digits beginning with 0 and not 0 can be entered
^[0-9]+ (. [ 0-9]{2})? $ Only positive real numbers with two decimal digits can be entered
^[0-9]+ (. [ 0-9]{1,3})? $ Only positive real numbers with 1~3 decimal digits can be entered
^\+? [1-9] [0-9]*$ You can only enter a Non-zero positive integer
^\-[1-9][]0-9 "*$ You can only enter a Non-zero negative integer
^. {3}$ You can only enter characters with a length of 3
^[a-za-z]+$ You can only enter a string of 26 English letters
^[a-za-z0-9]+$ Only strings consisting of numbers and 26 English letters can be entered
^\w+$ Only strings consisting of numbers, 26 English letters, or underscores can be entered
^[a-za-z]\w{5,17}$ Verify user password: Begins with a letter, length between 6~18, and can only contain characters, numbers, and underscores.
[^%& ',; =?$\x22]+ Verify that the characters such as ^%& ',; =?$\ ' are included
^[\u4e00-\u9fa5]{0,}$ Only Chinese characters can be entered
^\w+ ([-+.] \w+) *@\w+ ([-.] \w+) *\.\w+ ([-.] \w+) *$ Verify email address
^ http://([\w-]+\.) +[\w-]+ (/[\w-./?%&=]*)? $ Verifying InternetURL
^\d{15}|\d{18}$ Verify ID Number (15-bit or 18-digit number)
^ ((2[0-4]\d|25[0-5]| [01]?\d\d?] \.) {3} (2[0-4]\d|25[0-5]| [01]?\d\d?] $ Verifying IP addresses
(\w) \1 Match two two overlapping occurrences of characters

For example, "AABBC11ASD" returns the result for the AA BB 113 Group match

< (? <tag>[^\s>]+) [^>]*>.*</\k<tag>> Match pairs of HTML tags
(?!) Do not appear, negative statement
The following example shows how to get the entire contents of a <a> label pair, even if it contains other HTML tags.

string @ "Url:<a href=" "1.html" "> "1.gif" ">test<span style=" "color:red;" ">
 
      
 
      
New Regex (@ "<\s*a[^>]*>" [^<]|< (?!) /a)) *<\s*/a\s*> "

Purpose: Match the keyword = "", for example, to get the keyword keyword,value; get equal value ABC and test

Expression: string (? <x>[^=]*?) *= * (? <y>[^;] *?);

Code:

private void Parsekeywords (string input)
{
 System.Text.RegularExpressions.MatchCollection mc = 
 System.Text.RegularExpressions.Regex.Matches (Input, @ "string (? <x>[^=]*?) *= * (? <y>[^;] *?);");
 
 if (MC!= null && MC. Count > 0)
 {
 foreach (System.Text.RegularExpressions.Match m in MC)
 {
 string keyword = m.groups["x"] . Value;
 String value = m.groups["Y"]. Value;
 }
 }
}

Screenshots:

2. Match and replace

Input: Public <%=classname%>extension:iext

Objective: To match the classname in the middle of <%=%> and replace

Expression:<%=.*%>

Code:

private string Replace (string input)
{return
 regex.replace (input, @ "<%=.*%>", New MatchEvaluator ( Refinecodetag), regexoptions.singleline);
 
String Refinecodetag (Match m)
{
 string x = M.tostring ();
 
 x = Regex.Replace (x, "<%=", "");
 x = Regex.Replace (x, "%>", "");
 
 return X.trim () + ",";
}

Screenshots:

Regular expression Option RegexOptions:

Explicitcapture

N

A named or numbered group is captured only if it is defined

IgnoreCase I Case-insensitive
Ignorepatternwhitespace X Eliminates non-escaped whitespace in the schema and enables annotations marked by #.
MultiLine M

Multiline mode, whose principle is to modify the meaning of ^ and $

Singleline S

Single-line mode, and multiline corresponds

Other features of regular expression substitution:

$number Replace the matching number group with the replacement expression

This code returns "01 012 03 05".

That is, each match result for Group One is replaced with the expression "0$1", "$" in "0$1" is substituted by the result of the group 1


 Public Static void Main ()
 
      
 string "1 3 5";
 s = Regex.Replace (s,@ "(\d+) (? #这个是注释)","0$1", regexoptions.compiled|regexoptions.ignorecase);
 Console.WriteLine (s);
 Console.ReadLine ();
 }

${name}

Replace the group named "name" with the expression,

The previous example of the regex expression changed to "0${name}" after the replacement with the @ "(? <name>\d+) (? #这个是注释)" Result is the same

$$

The escape character to do $, as in the previous example expression to @ "(? <name>\d+) (? #这个是注释)" and "$$${name}", the result is $ $ $

$& Replace the entire match
$` Replace the character before the match
$' Replace a matching character
$+ Replace the last matched group
$_ Replace the entire string

3. Match the filename in the URL

Input: http://www.jb51.net/page1.htm

Objective: To extract the filename from the URL address

Expression: S=s.replace (/(. *\/) {0,} ([^\.] +). */ig, "$");

Code:

string s = "http://www.jb51.net/page1.htm";
s = S.replace (/(. *\/) {0,} ([^\.] +). */ig, "$");

Screenshots:

The above is the entire content of this article, I hope to help you learn, but also hope that we support the cloud habitat community.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.