Regular Expression matching and replacement, regular expression replacement

Source: Internet
Author: User
Tags regex expression

Regular Expression matching and replacement, regular expression replacement

Regular Expressions are very useful, such as searching, matching, processing strings, replacing and converting strings, and input and output. All languages are supported, such as. NET regular expression library, JDK Regular Expression package, Perl, JavaScript, and other script languages. Below are some common regular expressions.

Character

Description

\ Mark the next character as a special character, a literal character, or a backward reference, or an octal escape character. For example, 'n' matches the character "n ". '\ N' matches a line break. The sequence '\' matches "\" and "\ (" matches "(".
^ Matches the start position of the input string. If the Multiline attribute of the RegExp object is set, ^ matches the position after '\ n' or' \ R.
$ Matches the end position of the input string. If the Multiline attribute of the RegExp object is set, $ also matches the position before '\ n' or' \ R.
* Matches the previous subexpression zero or multiple times. For example, zo * can match "z" and "zoo ". * Is equivalent to {0 ,}.
+ Match the previous subexpression once or multiple times. For example, 'Zo + 'can match "zo" and "zoo", but cannot match "z ". + Is equivalent to {1 ,}.
? Match the previous subexpression zero or once. For example, "do (es )? "Can match" do "in" do "or" does ".? It is equivalent to {0, 1 }.
{N} NIs a non-negative integer. MatchedNTimes. For example, 'O {2} 'cannot match 'O' in "Bob", but can match two o in "food.
{N,} NIs a non-negative integer. At least matchNTimes. For example, 'O {2,} 'cannot match 'O' in "Bob", but can match all o in "foooood. 'O {1,} 'is equivalent to 'o + '. 'O {0,} 'is equivalent to 'o *'.
{N,M} MAndNAll are non-negative integers, whereN<=M. Least matchNTimes and most matchingMTimes. For example, "o {1, 3}" matches the first three o in "fooooood. 'O {0, 1} 'is equivalent to 'o? '. Note that there must be no space between a comma and two numbers.
? When this character is followed by any other delimiter (*, + ,?, {N},{N,},{N,M}) The matching mode is not greedy. The non-Greedy mode matches as few searched strings as possible, while the default greedy mode matches as many searched strings as possible. For example, for strings "oooo", 'O ++? 'Will match a single "o", and 'O +' will match all 'O '.
. Matches any single character except "\ n. To match any character including '\ n', use a pattern like' [. \ n.
(Pattern) MatchPatternAnd obtain the matching. The obtained match can be obtained from the generated Matches set. It is used in VBScript.SubMatchesSet, which is used in JScript$0...$9Attribute. To match the parentheses, use '\ (' or '\)'.
(? :Pattern) MatchPatternBut does not get the matching result, that is, this is a non-get match and is not stored for future use. This is useful when you use the "or" character (|) to combine each part of a pattern. For example, 'industr (? : Y | ies) is a simpler expression than 'industry | industries.
(? =Pattern) Forward pre-query, in any matchPatternTo start from the string. This is a non-get match, that is, the match does not need to be obtained for future use. For example, 'windows (? = 95 | 98 | NT | 2000) 'can match "Windows" in "Windows 2000", but cannot match "Windows" in "Windows 3.1 ". Pre-query does not consume characters, that is, after a match occurs, the next matching search starts immediately after the last match, instead of starting after the pre-query characters.
(?!Pattern) Negative pre-query, in any MismatchPatternTo start from the string. This is a non-get match, that is, the match does not need to be obtained for future use. For example, 'windows (?! 95 | 98 | NT | 2000) 'can match "Windows" in "Windows 3.1", but cannot match "Windows" in "Windows 2000 ". Pre-query does not consume characters. That is to say, after a match occurs, the next matching search starts immediately after the last match, instead of starting after the pre-query characters.
X|Y MatchXOrY. For example, 'z | food' can match "z" or "food ". '(Z | f) ood' matches "zood" or "food ".
[Xyz] Character Set combination. Match any character in it. For example, '[abc]' can match 'A' in "plain '.
[^Xyz] Negative value character set combination. Match any character not included. For example, '[^ abc]' can match 'p' in "plain '.
[A-z] Character range. Matches any character in the specified range. For example, '[a-z]' can match any lowercase letter in the range of 'A' to 'Z.
[^A-z] Negative character range. Matches any character that is not within the specified range. For example, '[^ a-z]' can match any character that is not in the range of 'A' to 'Z.
\ B MatchWord boundaryThat is, the position between a word and a space. For example, 'er \ B 'can match 'er' in "never", but cannot match 'er 'in "verb '.
\ B Match non-word boundary. 'Er \ B 'can match 'er' in "verb", but cannot match 'er 'in "never '.
\ CX MatchingXThe specified control character. For example, \ cM matches a Control-M or carriage return character.XMust be a A-Z or one of a-z. Otherwise, c is treated as an original 'C' character.
\ D Match a numeric character. It is equivalent to [0-9].
\ D Match a non-numeric character. It is equivalent to [^ 0-9].
\ F Match a form feed. It is equivalent to \ x0c and \ cL.
\ N Match A linefeed. It is equivalent to \ x0a and \ cJ.
\ R Match a carriage return. It is equivalent to \ x0d and \ cM.
\ S Matches any blank characters, including spaces, tabs, and page breaks. It is equivalent to [\ f \ n \ r \ t \ v].
\ S

Match any non-blank characters. It is equivalent to [^ \ f \ n \ r \ t \ v].

\ T Match a tab. It is equivalent to \ x09 and \ cI.
\ V Match a vertical tab. It is equivalent to \ x0b and \ cK.
\ W Match any word characters that contain underscores. It is equivalent to '[A-Za-z0-9 _]'.
\ W Match any non-word characters. It is equivalent to '[^ A-Za-z0-9 _]'.
\ Xn MatchN, WhereNIt is a hexadecimal escape value. The hexadecimal escape value must be determined by the length of two numbers. For example, '\ x41' matches "". '\ X041' is equivalent to '\ x04' & "1 ". The regular expression can be ASCII encoded.
\Num MatchNum, WhereNumIs a positive integer. References to the obtained matching. For example, '(.) \ 1' matches two consecutive identical characters.
\N Identifies an octal escape value or a backward reference. If \NAt leastNObtained subexpressionsNIs backward reference. Otherwise, ifNIs an octal digit (0-7 ),NIt is an octal escape value.
\Nm Identifies an octal escape value or a backward reference. If \NmAt leastNmTo obtain the subexpressionNmIs backward reference. If \NmAt leastNNIs followed by textM. If none of the preceding conditions are metNAndMAll are Octal numbers (0-7), then \NmMatch the octal escape ValueNm.
\NmL IfNIt is an octal digit (0-3) andMAndLIf the values are Octal numbers (0-7), the octal escape value is matched.Nml.
/I Make the regular expression not case sensitive ,(? -I) Disable case insensitive.
(? I) te (? -I) st should match TEst, but cannot match teST or TEST.
/S Enable "single line mode", that is, the dot "." matches the New Line Character
/M Enable "multiline mode", that is, "^" and "$" match the front and back positions of the new line character.
^ [0-9] * $ Only numbers can be entered
^ \ D {n} $ Only numbers with n digits can be entered.
^ \ D {n,} $ Only numbers with at least n digits can be entered.
^ \ D {m, n} $ Only m ~ can be input ~ N-digit number
^ (0 | [1-9] [0-9] *) $ Only numbers starting with zero or zero
^ [0-9] + (. [0-9] {2 })? $ Only positive numbers with two decimal places can be entered.
^ [0-9] + (. [0-9] {1, 3 })? $ Only 1 ~ Positive number of three decimal places
^ \ +? [1-9] [0-9] * $ Only a non-zero positive integer can be entered.
^ \-[1-9] [] 0-9 "* $ Only a non-zero negative integer can be entered.
^. {3} $ Only 3 characters can be entered
^ [A-Za-z] + $ You can only enter a string consisting of 26 English letters.
^ [A-Za-z0-9] + $ Only strings consisting of digits and 26 English letters can be entered.
^ \ W + $ Only strings consisting of digits, 26 English letters, or underscores can be entered.
^ [A-zA-Z] \ w {5, 17} $ Verify User Password: starts with a letter and ranges from 6 ~ It can only contain characters, numbers, and underscores.
[^ % & ',; =? $ \ X22] + Check whether ^ % & ',; =? $ \ "And other characters
^ [\ U4e00-\ u9fa5] {0,} $ Only Chinese characters can be entered
^ \ W + ([-+.] \ w +) * @ \ w + ([-.] \ w + )*\. \ w + ([-.] \ w +) * $ Verify Email address
^Http ://([\ W-] + \.) + [\ w-] + (/[\ w -./? % & =] *)? $ Verify InternetURL
^ \ D {15} | \ d {18} $ ID number for verification (15-digit or 18-digit)
^ (2 [0-4] \ d | 25 [0-5] | [01]? \ D ?) \.) {3} (2 [0-4] \ d | 25 [0-5] | [01]? \ D ?) $ Verify IP Address
(\ W) \ 1 Match two overlapping characters

For example, "aabbc11asd", the returned results are aa bb 11, three groups of match

<(? <Tag> [^ \ s>] +) [^>] *>. * </\ k <tag> Matched HTML tags
(?!) No, negative declaration
The following example shows how to obtain all the content in a <a> tag pair, even if it contains other HTML tags.

string newsContent = @"url:<a href=""1.html"">test<span style=""color:red;"">
 
Regex</span></a>."; 
Regex regEnd = new Regex(@"<\s*a[^>]*>([^<]|<(?!/a))*<\s*/a\s*>",RegexOptions.Multiline); 

Purpose: match the keyword = "", for example, obtain the key word, value; obtain the equal value abc and test

Expression: string (? <X> [^ =] *?) * = *(? <Y> [^;] *?);

Code:

private void ParseKeywords(string input){ System.Text.RegularExpressions.MatchCollection mc =  System.Text.RegularExpressions.Regex.Matches(input, @"string (?<x>[^=]*?) *= *(?<y>[^;]*?);");  if (mc != null && mc.Count > 0) { foreach (System.Text.RegularExpressions.Match m in mc) { string keyword = m.Groups["x"].Value; string value = m.Groups["y"].Value; } }}

:

2. Match and replace

Input: public <% = classname %> Extension: IExt

Purpose: Match and replace the classname in the middle

Expression: <% =. * %>

Code:

private string Replace(string input){ return Regex.Replace(input, @"<%=.*%>", new MatchEvaluator(RefineCodeTag), RegexOptions.Singleline);} string RefineCodeTag(Match m){ string x = m.ToString();  x = Regex.Replace(x, "<%=", ""); x = Regex.Replace(x, "%>", "");  return x.Trim() + ",";}

:

RegexOptions:

ExplicitCapture

N

Only groups with names or numbers are captured.

IgnoreCase I Case Insensitive
IgnorePatternWhitespace X Remove non-escape white space in mode and enable annotation marked.
MultiLine M

The principle of Multiline mode is to modify the meaning of ^ and $.

SingleLine S

Single-line mode, which corresponds to MultiLine

Other functions of Regular Expression replacement:

$ Number Replace the number group with the replacement expression.

This code returns "01 012 03 05"

That is to say, each matching result of group 1 is replaced by the expression "0 $1". "$1" in "0 $1" is substituted by the matching result of Group 1.


public static void Main()
{ 
 string s = "1 12 3 5";
S = Regex. Replace (s, @ "(\ d + )(? # Note) "," 0 $1 ", RegexOptions. Compiled | RegexOptions. IgnoreCase );
 Console.WriteLine(s);
 Console.ReadLine();
 }

$ {Name}

Replace the matched group named "name" with an expression,

Change the Regex expression in the previous example @"(? <Name> \ d + )(? # This is a comment.) The Replacement Formula after "0 $ {name}" is the same.

$

Run the $ escape character. In the above example, the expression is changed @"(? <Name> \ d + )(? # Note) "and" $ {name} ", the result is" $1 $12 $3 $5"

$ & Replace the entire match
$' Replace the character before matching
$' Replace matching characters
$ + Replace the last matched group
$ _ Replace the entire string

3. Match the file name in the URL

Input: http://www.bkjia.com/page1.htm

Objective: To extract a file name from a URL

Expression: s = s. replace (/(. * \/) {0,} ([^ \.] +). */ig, "$2 ");

Code:

String s = "http://www.bkjia.com/page1.htm ";
S = s. replace (/(. * \/) {0,} ([^ \.] +). */ig, "$2 ");

:

The above is all the content of this article. I hope it will be helpful for your learning and support for helping customers.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.