Parse C # files using regular expressions

Source: Internet
Author: User

From: http://blog.csdn.net/matrix2003b/archive/2004/07/29/55022.aspx

 

Use regular expressions to parse C # files (Updated) Jack H Hansen [2004-07-28]
Keywords C # Regular Expression (Regular Expression) Syntax Highlighting
Presumably many readers have written programs for program code coloring by syntax. This was a very difficult task a while ago. You need to write a lot of code analysis syntaxes-and this is often the most difficult part. It is not until the Regular Expression (Regular Expression) appears that we can be freed from the heavy work. Regular Expressions provide a series of methods (standard and mode) that allow us to efficiently create, compare, and modify strings, and quickly analyze a large amount of text and data to search, remove, and replace text patterns [1]. DotNET Framework provides the System. Text. RegularExpression namespace to implement the features they promise.

1. Regular Expression [2]

First, I would like to briefly introduce regular expressions.

The regular expression was first proposed by the mathematician Stephen Kleene in 1956, based on his incremental research on natural language. Regular Expressions with complete syntax are used in character format matching and subsequently applied to the field of melt information technology. Since then, regular expressions have evolved over several periods, and the current standards have been approved and recognized by ISO (International Standards Organization.

Regular expressions are not a specialized language, but can be used to search for and replace text in a file or character. It has two standards: Basic Regular Expression (BRE), extended regular expression (ERE ). ERE includes BRE functions and other concepts.

The regular expressions are implemented by xsh, egrep, sed, vi, and programs on UNIX platforms. They can be adopted by many languages, such as HTML and XML, which are usually only a subset of the entire standard. As regular expressions are transplanted to the cross-platform programming languages, their functions become increasingly complete and widely used.

2. Related Expressions

I can only say so much about regular expressions-it is a large knowledge system and cannot be explained in just a few words. Here I will only introduce the matching strings related to C # syntax analysis. For more information, see Regular Expression Specification [The Open Group] In this Blog site. In addition, if you have a good understanding of regular expressions, you can skip each of the following explanations to complete the full text as soon as possible.

I> string "(//?.) *? "

In the regular expression, except. $ ^ {[(|) * +? /, And other characters match themselves. In the above formula, the quotation mark on both sides refers to the quotation marks matching both sides of the string. "//" Indicates a "/" character. Followed by "?" Matches zero or one character. "." Matches any character except/n.

"()" Indicates capturing matched substrings. Capture with () is automatically numbered starting from 1 according to the sequence of left parentheses. The first capture of zero element number is text that is matched by the entire regular expression pattern. The '*' following the brackets indicates that one or more of these substrings exist. That is, "*" acts on "(//?.)" .

"?" So that empty strings can also be captured.

Ii> verbatim string @ "(" "| .)*? "

Matching is similar to @ "Hello" "World ""! .

Match any term separated by | (vertical) characters; for example, cat | dog | tiger. Use the leftmost successful match.

Iii> C # xml Element in the document information // s * <. *>

Matches C # automated XML documents. "/S" indicates any blank characters. Note that do not modify the case sensitivity at will. Because the regular expression is case-sensitive, in its wildcard, case-sensitive characters often mean the opposite. For example, "/S" indicates any non-blank characters. (The following "/Z" is also true)

Iv> C # Content in the document information /// s ?. *

V> Empty rows ^/s */Z

"^" Specifies that the match must start with the string or line. "/Z" indicates that the specified match must appear at the end of the string or before/n at the end of the string.

Vi> C # Note //.*

Vii> C # keywords (abstract | where | while | yield) {1} (/. | (/s) + |;|,|// (|/[) {1}

Space limit: Only a few keywords are listed here (C # has at least 80 keywords ^_^ ). Note that the parser matches the first successful item on the left. Therefore, attention should be paid to the order of words with the inclusion relationship: the inclusion should be placed before the inclusion. For example: (in | int) It cannot be found after parsing, so it should be (int | in ).

In addition, all parentheses (/{|/[|/(|/} |/] | /)).

3. related classes and their members [3]

[Serializable]

Public class Regex: ISerializable

// Indicates an unchangeable regular expression.

RegexClass contains several static methods, so that you do not need to explicitly createRegexThe object can use a regular expression. Using static methods is equivalent to constructingRegexObject, which is used once and then destroyed.

RegexClass is immutable (read-only) and has inherent thread security. Can be created on any threadRegexObjects and share them between threads.

The above is taken from Microsoft's development documentation. We also need to use several of its members:

// Search for the regular expression match specified in the Regex constructor in the specified input string.

Public Match (

String intput

)

For the Match Class

[Serializable]

Public class Match: Group

// Indicates the matching result of a single regular expression. For more information about Group, see Microsoft development documentation.

We will use the following members.

// The starting position from scratch of the captured substring is found in the original string.

Public int Index {get ;}

 

// The length of the captured substring.

Public int Length {get ;}

 

// Match the actual substring captured.

Public int Value {get ;}

 

// Obtain a value indicating whether the matching is successful.

Public bool Success {get ;}

 

// Obtain the set of groups matched by regular expressions.

Public virtual GroupCollection Groups {get ;}

 

// Start from the last matched position (that is, the character after the last matched character)

// Return a new Match that contains the next matching result.

Public Match NextMatch ();

And the Corresponding Members of the Group class (the first four attributes of the Match members listed above are inherited from the Group class, so these Members will not be listed one by one ).

The matching string must be specified during Regex class instance initialization. You can use the constructor to create an instance, use it, and then destroy it. Or directly use the static method, which is equivalent to creating an instance. However, after testing, I found that static methods are slightly slower than compiled Regex objects. See the following test data:

4. write code

Now we need to analyze the C # language elements listed in section 3. What I use is a row-by-row analysis (if you want to use multi-row analysis, the related expression needs to be modified [4]).

Using System. Text. RegularExpression;

// Some other codes ......

// First create a Regex instance (taking String Parsing as an example ).

Regex DoubleQuotedString = new Regex ("/"(////?.) *? /"");

// Then match the string.

Match m;

For (m = DoubleQuotedString. Match (strSomeCodes); m. Success; m. NextMatch ()){

Foreach (Group g in m. Groups ){

// Do some drawings

}

}

The rest is to write the coloring code.

5. Source Code

 

Note:

[1] "Yes ...... The text mode is derived from the Regular Expression Language element in the. NET Framework general reference.

[2] Introduction to regular expressions here for introduction to regular expressions, refer to relevant content in ZDNet China Technology and Development.

[3] The signatures and comments of classes and functions in this section are from the Microsoft documentation.

[4] For details about multiline analysis, see. NET Framework regular reference Regular Expression Language Elements.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.