Recently I have been studying the application of regular expressions in. net. I have come across a few good articles. Although I want to study it, I am afraid to look at it, so I will give it here for everyone to read!
The regular expression was first proposed by the mathematician stephen Kleene in 1956, based on his incremental research on natural language. Regular Expressions with complete syntax are used in character format matching and subsequently applied to the field of melt information technology. Since then, regular expressions have evolved over several periods, and the current standards have been approved and recognized by ISO (International Standards Organization.
Regular expressions are not a specialized language, but can be used to search for and replace text in a file or character. It has two standards: Basic Regular Expression (bre), extended regular expression (ere ). Ere includes bre functions and other concepts.
The regular expressions are implemented by xsh, egrep, sed, vi, and programs on unix platforms. They can be adopted by many languages, such as html and XML, which are usually only a subset of the entire standard. As regular expressions are transplanted to the cross-platform programming languages, their functions become increasingly complete and widely used.
2. Related Expressions
I can only say so much about regular expressions-it is a large knowledge system and cannot be explained in just a few words. Here I will only introduce the matching strings related to c # syntax analysis. For more information, see Regular Expression Specification [The Open Group] In this blog site. In addition, if you have a good understanding of regular expressions, you can skip each of the following explanations to complete the full text as soon as possible.
I> string "(/?.) *? "
In the regular expression, except for. $ ^... {[(|) * +? In addition, other characters match themselves. In the above formula, the quotation mark on both sides refers to the quotation marks matching both sides of the string. "/" Indicates a "" character. Followed by "?" Matches zero or one character. "." Matches any character.
"()" Indicates capturing matched substrings. Capture with () is automatically numbered starting from 1 according to the sequence of left parentheses. The first capture of zero element number is text that is matched by the entire regular expression pattern. The '*' following the brackets indicates that one or more of these substrings exist. That is, "*" acts on "(/?.)". .
"?" So that empty strings can also be captured.
Ii> verbatim string @ "(" "| .)*? "
Matching is similar to @ "Hello" "World ""! .
Match any term separated by | (vertical) characters; for example, cat | dog | tiger. Use the leftmost successful match.
Iii> C # xml Element/*** // s * in the Document Information <. *>
Matches c # automated xml documents. "S" indicates any blank characters. Note that do not modify the case sensitivity at will. Because the regular expression is case-sensitive, in its wildcard, case-sensitive characters often mean the opposite. For example, "s" indicates any non-blank characters. (The following "Z" is also true)
Iv> C # Content in the document/** // s ?. *
V> empty row ^ s * Z
"^" Specifies that the match must start with the string or line. "Z" indicates that the specified match must appear before the end of the string or the end of the string.
Vi> C # Note //.*
Vii> C # keywords (abstract | where | while | yield )... {1 }(. | (s) + |; |, | (| [)... {1}
Space limit: Only a few keywords are listed here (c # has at least 80 keywords ^_^ ). Note that the parser matches the first successful item on the left. Therefore, attention should be paid to the order of words with the inclusion relationship: the inclusion should be placed before the inclusion. For example: (in | int) It cannot be found after parsing, so it should be (int | in ).
In addition, all parentheses (... {| [| (|} |] | )).
3. related classes and their members [3]
[Serializable]
Public class Regex: ISerializable
// Indicates an unchangeable regular expression.
The regex class contains several static methods so that you can use regular expressions without explicitly creating a Regex object. The static method is equivalent to constructing a Regex object. The object is used once and then destroyed.
The regex class is unchangeable (read-only) and has inherent thread security. You can create a Regex object on any thread and share it between threads.
The above is taken from Microsoft's development documentation. We also need to use several of its members:
// Search for the regular expression match specified in the Regex constructor in the specified input string.
Public Match (
String intput
)
For the Match Class
[Serializable]
Public class Match: Group
// Indicates the matching result of a single regular expression. For more information about Group, see Microsoft development documentation.
We will use the following members.
// The starting position from scratch of the captured substring is found in the original string.
Public int Index... {get ;}
// The length of the captured substring.
Public int Length... {get ;}
// Match the actual substring captured.
Public int Value... {get ;}
// Obtain a value indicating whether the matching is successful.
Public bool Success... {get ;}
// Obtain the set of groups matched by regular expressions.
Public virtual GroupCollection Groups... {get ;}
// Start from the last matched position (that is, the character after the last matched character)
// Return a new Match that contains the next matching result.
Public Match NextMatch ();
And the Corresponding Members of the Group class (the first four attributes of the Match members listed above are inherited from the Group class, so these Members will not be listed one by one ).
The matching string must be specified during Regex class instance initialization. You can use the constructor to create an instance, use it, and then destroy it. Or directly use the static method, which is equivalent to creating an instance. However, after testing, I found that static methods are slightly slower than compiled Regex objects. See the following test data:
4. write code
Now we need to analyze the c # language elements listed in section 3. What I use is a row-by-row analysis (if you want to use multi-row analysis, the related expression needs to be modified [4]).
Using System. Text. RegularExpression;
// Some other codes ......
// First create a Regex instance (taking String Parsing as an example ).
Regex DoubleQuotedString = new Regex (""(//?.) *? "");
// Then match the string.
Match m;
For (m = DoubleQuotedString. Match (strSomeCodes); m. Success; m. NextMatch ())...{
Foreach (Group g in m. Groups )...{
// Do some drawings
}
}
The rest is to write the coloring code.
5. Source http://www.pscode.com/vb/scripts/ShowZip.asp? LngWId = 10 & lngCodeId = 2611 & strZipAccessCode = tp % 2FS26112472
Note:
[1] "Yes ...... The text mode is derived from the Regular Expression Language element in the. NET Framework general reference.
[2] Introduction to regular expressions here for introduction to regular expressions, refer to relevant content in ZDNet China Technology and Development.
[3] The signatures and comments of classes and functions in this section are from the Microsoft documentation.
[4] For details about multiline analysis, see. NET Framework regular reference Regular Expression Language Elements.
For more information, see http: // 203.191.151.199/19/category. aspx.