C # getting started with regular expressions

Source: Internet
Author: User

C # getting started with regular expressions

If someone tells you that the string is not converted into a number, how do you determine whether the string is composed of all numbers? Split the string into a char array and put it into a loop to determine whether each char is a number? Then how do you determine whether the mobile phone number is valid? Is the IP address legal? Splitting a string into a char array is not a good solution. Is there a better solution? Yes, the regular expression is. What is a regular expression? It is a string syntax that can describe the format of a string. This article introduces regular expressions. As I have mentioned in other blogs, I do not like to list all rules. What I want to talk about is what a regular expression can do and how to use it.

In Wikipedia, Regular expressions are interpreted as Regular expressions (Regular Expression, regex, or regexp, abbreviated as RE). They are also translated as Regular and Regular expressions. in computer science, it refers to a single string used to describe or match a series of strings that conform to a certain syntax rule. Yes, the regular expression is also a string, but this string can be used to determine whether another string meets certain rules, such as whether it is composed of all numbers, whether it is a legal mobile phone number, or whether it is a legal IP address. Once you have mastered the regular expression, you have mastered a syntax that describes the regular expression. Of course, this syntax is not used to communicate with people, but to communicate with "machines, you only need to input ^ (\ w) \ 1 \ 2) + $ machines to tell you whether a string has grown into ababab... in this way. The following describes the regular expressions. I will not list all the syntaxes, because there are a lot of syntaxes online.

I will introduce the instance in Lesson's form, and then analyze the instance in detail, because the regular expression rules are really hard to remember, but the results are quite memorable, as long as you have seen that someone has implemented the regular expression to determine whether a string is composed of full numbers, you will never forget that the next time there will be similar requirements, at least you know that it will be very easy to use regular expressions, so the rest is to flip out the manual of regular expressions. After studying it for a while, it will naturally come out, because although the rules are hard to remember, they are easy to understand.

  • Lesson1 determines whether a string is composed of all characters.

The idea of this question is very simple. From the "Header" to the "tail", it's all numbers. When you open the manual, you can see that the starting character is ^, the ending character is $, and the number is \ d. Then you can see that you want to match multiple characters. There are two options, + and *, + appear at least once, * is 0 or multiple times, ^ \ d + $. This is done! Okay. Let's give it a try. In c #, the class related to regular expressions is System. text. regularExpressions. regex. The methods I use include Match, Replace, and IsMatch, which are convenient methods for matching, replacing, and determining whether to Match. The usage is also very simple. I will understand it after reading my example.

Internal static class RegexExtension
{// Convert matching information to string Information
Public static string Convert2String (this Match match, string enter)
{
StringBuilder builder = new StringBuilder ();
Builder. appendFormat ("matched string: {0}, whether the match is successful {1}, matched string: {2}, matched position: {3 }, the matching length is {4}, and a total of {5} results are matched ",
Enter, match. Success, match. Value, match. Index, match. Length, match. Groups. Count );
Return builder. ToString ();
}
}

Public static void Lesson1 (){
Console. WriteLine ("full number judgment :");
String [] enters = {"123123123", "123a123", "a123123", "", "0 "};
Foreach (var enter in enters ){
Console. WriteLine (
Regex. Match (enter, @ "^ \ d + $"). Convert2String (enter ));
}
}

Regex. the Match method matches the enter field with the rule, and converts the result to a string. In this example, only "123123123" will Match successfully, and all others will fail, even if "0, it is because there are spaces, and \ d only matches numbers. Is it easy? Next let's take a look at the second lesson.

  • Lesson2 determines whether the number is a valid landline or mobile phone number

Determine whether the phone number is fixed or mobile phone number. The fixed phone number is 7-8 digits and the mobile phone number is 11 digits. In this case, you need to match the two cases. Check the fixed phone number first, yes ^ \ d + $. You only need to add the number limit. Check the manual and find that {m, n} has several characters, and m indicates the minimum number of occurrences, n indicates the maximum number of occurrences. The + character should be the same as {1,}. If n is not entered, the maximum number of times is not limited. If m is not entered, the minimum number is not limited. Then there is a mobile phone with 11 digits starting with 1 and all numbers. It's easy: ^ 1 \ d {10} $. How can we combine them? Character | yes. The following is the code.

Public static void Lesson2 (){
Console. WriteLine ("whether it is a valid phone number. The telephone number rules are divided into two parts :");
Console. WriteLine ("fixed phone: 7-8 bits, mobile phone: 11 bits, start with 1, and are all numbers. ");
String landPhoneRule = @ "^ \ d {7, 8} $ ";
String handPhoneRule = @ "^ 1 \ d {10} $ ";
// Merge rules
String rule = string. Format ("{0} | {1}", landPhoneRule, handPhoneRule );
String [] enters = {
"1234567", // 7 digits, valid
"12345678", // 8-digit, valid
"13888888888", // 11 digits starting with 1, valid
"23888888888", // 11 digits starting with 2, illegal
"0123456789", // 10 digits, illegal
"1388888888a", // contains characters, illegal
"10111111111" // 11 digit, valid
};
Foreach (var enter in enters ){
Console. WriteLine (Regex. Match (enter, rule ));
}

Console. WriteLine ("the current rule for changing the mobile phone is changed to: The mobile phone must start with number 1, and the second and third digits cannot have numbers 0, and other digits remain unchanged ");
HandPhoneRule = @ "^ 1 [1-9] {2} \ d {8} $ ";
Rule = string. Format ("{0} | {1}", landPhoneRule, handPhoneRule );
Foreach (var enter in enters ){
Console. WriteLine (Regex. Match (enter, rule ));
}
}

We can see that 10111111111 can also be matched. Obviously this is not a mobile phone number, so I changed the rule later and added the limit that the second and third places are not 0. At this time, \ d does not meet the conditions. Check the manual and find the [] character. You can add a candidate character in it. For example, [123] indicates matching 123. You can also use-to add a range, for example, [0-9] And \ d are the same. Then the changed mobile phone number matching rule becomes ^ 1 [1-9] {2} \ d {8} $. The following is the third lesson.

  • Lesson3 determines whether the IP address is valid

In this case, I have simplified the judgment of IP addresses. The rule must be ***. ***. ***. * **, where each item has at least one character count and a maximum of three characters, and the value is 255> = ***> = 0, the first item cannot be 0 \ 00 \ 000. The judgment of this question is much more complicated. You can also see a short board of the regular expression, that is, the meaning of the characters cannot be obtained. I will explain this later. According to the conditions, the basic syntax is described earlier. This rule can be divided into two parts: the first and the last three. Item 1, 255> = ***> = 0, and cannot be 0 \ 00 \ 000. When I was thinking about this question, I first thought about the positive idea, that is, "describing what conditions are met". There are many situations: 01,001,011, 1-249,250-255, in many cases, there is no way to "Ignore" The regular expression. if it is a number, if (001 = 1) is true, but the regular expression cannot, you can only (0 [2] 1) | 1 to describe that both 1 and 001 meet the conditions. This is what I mentioned earlier and cannot get the meaning of the character itself, each character can only describe one character string. What is the first character and what is the second character, but cannot describe the character string by meaning. I use "Forward" to list every situation, the expression is as follows: (0 {2} [1-9] | 0 [1-9] \ d | 1 \ d {2} | 2 [0-4] \ d | 25 [0-5] | 0 [1-9] | [1-9] \ d | [1-9]). The significance is probably to list all the situations of 01,001,011, 1-249,250-255. There is no "error" in doing so, but it is too long. Is there any simple method? "Positive" cannot be considered, so let's "reverse" it: the first item is 1-3 digits, not all 0, not greater than 255, that is, it cannot be ^ 0 {1, 3 }, 2 [6-9] \ d, 25 [6-9], [3-9] \ d {2}, as long as it is not the preceding condition. How can we describe conditions that are not met? [^] Yes, but only one character can be specified. How many of the following are available in the manual? <! Such characters indicate "whether conditions are met nearby". For example :?! Add a forward negative pre-search,Windows(?!95|98|NT|2000)"Can match"Windows3.1"In"Windows", But cannot match"Windows2000"In"Windows", There are other. Here I use? <! , Indicates that the reverse query is negative, that is (? <! 123) 456 indicates that 23456 can be matched, but it cannot match the 123 with 456 above. If 123456 does not meet the conditions. The forward and reverse directions are the front and the back, which must be a match, and the negation is the exclusion. My first expression is ^ (\ d {1, 3 })(? <! ^ 0 {1, 3} | 2 [6-9] \ d | 25 [6-9] | [3-9] \ d {2 }), in the first case of the second item, the second item will never match. Next is the next three items. The next three items are all. ***, 255> = ***> = 0. The statement is (. [01]? \ D? \ D | 2 [0-4] \ d | 25 [0-5]) {3 }$, add up to ^ (\ d {1, 3 })(? <! ^ (0 {1, 3} | 2 [6-9] \ d | 25 [6-9] | [3-9] \ d {2 }))(. [01]? \ D? \ D | 2 [0-4] \ d | 25 [0-5]) {3} $. The Code is as follows:

Public static void Lesson3 (){
Console. WriteLine ("the question is to determine whether an IP address is valid. The rule is as follows :");
Console. WriteLine ("the format must be ***.***.***.***");
Console. WriteLine ("the first group of numbers must be greater than 1, and each group of numbers must be less than or equal to 255 ");
// Complex version string rule = @ "^ (\ d {1, 3 })(? <! ^ (0 {1, 3} | 2 [6-9] \ d | 25 [6-9] | [3-9] \ d {2 }))(. [01]? \ D? \ D | 2 [0-4] \ d | 25 [0-5]) {3} $ ";
String [] enters =
{
"255.255.255.255", // valid "21.1.1.1", // valid "256.0.0.0", // invalid
"300.2.2.250", // illegal "10.1.1.99", // legal "00.1.1.009", // illegal
"100.1.1.1" // valid };
Foreach (var enter in enters ){
Console. WriteLine (Regex. Match (enter, rule). Convert2String (enter ));
}
}

In this case, if the IP address can be converted to a number and determined to be valid, you only need to extract the number and judge whether it is greater than or equal to 0 & <= 255, there may also be simpler practices here. I didn't come up with it for a moment. Please leave a message to me.

  • Lesson4 function replacement

This question is the origin of my learning regular expressions. At that time, I was refactoring the code and found that there was a Redundant method to be written. Let's call this Redundant method MA, which should be changed to MB (another method). MA accepts two variables, MB accepts a variable. VS provides the function to automatically change the function name, but it cannot change the function variable. At that time, there were nearly 50 MA calls in the project, all of which should be replaced with MB. If you manually delete unnecessary parameters, it would be too disgusting. The VS search interface provides the replacement function. I can see that there is a regular expression in the search option. Can I use a regular expression to automatically replace it? So I started to study the regular expression, whether it was a day or two days. I learned the Regular Expression and tried it! For fear of mismatching, click replace, change one by one, and adjust the expression multiple times in the middle. In the end, all the replications were successful without manual changes. At that time, I was very happy, it seems that regular expressions are amazing. Therefore, I converted this scenario into the content of Lesson 4. All the above is about matching. This lesson will show you how to replace it. The following are the requirements of the question. The previous set of Console. WriteLine is the original statement, and the following four consoles. WriteLine are the same after replacement. It is to replace MA with MB, retain the first parameter of MA to MB, and discard the second parameter. Each statement is meaningless and only used for replacement.

// The following statement does not have any practical significance. It only simulates the specific use of the statement to be replaced. Console. WriteLine (MA ("a", "B ")),
Console. WriteLine ("a" + MA ("a", GetType (). ToString ()));
Console. WriteLine (MA ("a", GetType (). ToString ()));
Console. WriteLine ("a" + MA ("a", "a". Substring (1) + "B ");

// Replace it with Console. WriteLine (MB ("a") as follows "));
Console. WriteLine ("a" + MB (""));
Console. WriteLine (MB (""));
Console. WriteLine ("a" + MB ("a") + "B ");
// These two methods have no meaning. They just use MA to simulate the original function, and MB to simulate the public string MA (string a, string B) {return null ;}
Public string MB (string a) {return null ;}

First, match and analyze the four statements to be replaced: the start part is the function name MA, and the content is the part in (). If you find the appropriate '(' and ')', you can. However, the fact is that it is not possible to determine which ')' is appropriate. We can see that there is a ')' next to the statement, and there are two and three. The brackets must be exactly matched. There are no more matching options. Otherwise, the statement will be wrong. A ')' is a ", and the others have at least two ')'. Therefore, we can divide the two situations. The first one is "), the second is the first two of multiple parentheses. Here is a question: how to match only the first two? In regular expressions, there are two matching methods: greedy and non-greedy. Greedy means matching a few matches, such as a +. When matching aaaaaab, all the letters a are matched and the result is aaaaaa. If a + ?,? It originally meant 0 or 1, but here, it indicates that at most 1 is matched, and the result is. Another problem is that in regular expressions, () has special significance. If you give up its special meaning and just want to match the brackets, you must use the escape sub \, this ASC | code should be clear. Then the expression matching the MA function is MA \ (. +), \ s ?. +? (\) | ""\). Note that double quotation marks in c # should be like this @ ", followed by @, and then two" "to represent a double quotation mark. Reading other people's regular expressions may be difficult, so my suggestion is that you can not view my results first, as long as you understand the meaning of the question, view the manual, and try it yourself, you will be able to do it soon. Now you will understand it. Let's take a look at your own expressions and compare them with others. It is very likely that you are doing something simpler than me!

We can see in \ s? Then, I use. +? ) Or "). If only. + is used, there will be too many matching parentheses. Regular expressions are not enough and must be replaced. The difficulty of replacement lies in how to retain the first variable. Regular Expressions provide a mechanism to extract the value matching in parentheses, such as (\ w) \ 1 (\ w) \ 2, which can match aabb. The first \ w matches, because it is in brackets, it is recorded. You can use \ 1 to obtain the content in the first bracket, and so on. During replacement, C # (unknown in other languages) can use $1 to obtain the Matching content in the first bracket. The replaced statement is MB ($1 ). Note that the replaced statement does not need to be escaped, because replacement does not need to be matched. You only need to replace the character with the original character. The following code is used:

Public static void Lesson4 (){
Console. WriteLine ("Lesson 4, replacing functions. ");
String rule = @ "MA \ (. + ?), \ S ?. +? (\) | ""\))";
String [] enters =
{
@ "Console. WriteLine (MA (" "a" "," "B ""))",
@ "Console. WriteLine (" "a" "+ MA (" "a" ", GetType (). ToString ()));",
@ "Console. WriteLine (MA (" "a" ", GetType (). ToString ()));",
@ "Console. WriteLine (" "a" "+ MA (" "a" ",". Substring (1) + "" B "");"
};
String replacement = @ "MB ($1 )";
Foreach (var enter in enters)
{
Console. WriteLine (Regex. Match (enter, rule). Convert2String (enter ));
Console. WriteLine ("changed from {0} to {1} after replacement", enter, Regex. Replace (enter, rule, replacement ));
}
}

Regular Expressions seem very difficult, but if you repeat these four lessons, you have basically mastered the regular expressions, and there is no problem in daily application. Syntax is a bit hard to remember. My advice is to translate the manual at any time regardless of the syntax. You are welcome to interact with me in the comment area.

This article permanently updates link: https://www.bkjia.com/Linux/2018-03/151495.htm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.