Syntax highlighting for converting Regular Expressions in js

Source: Internet
Author: User

After learning Regular Expressions for a few days, we should summarize and sort out the writing results. We wanted to write syntax highlighting and matching before, but the level was not enough. We didn't understand the examples at all.

Then let's analyze the syntax highlighting implementation of the two great gods Cobalt Carbonate and Barret Lee.

Let's first talk about Barret Lee's article "how to implement regular expression highlight highlighting with a few small examples".

When I saw it before, I only realized the magic. In particular, the example of one-step separate matching is even more domineering. However, the author also said that separation is only for demonstration convenience, you can intuitively see what is matched in this step. Otherwise, the matching is completed in one step. You don't know what happened and the processing is complete.
Let's look at his regular expression.

Copy codeThe Code is as follows:
(// ^ \ S + | \ s + $/) // match the first and last Spaces
(/(["']) (? : \. | [^ \ N]) *? \ 1/) // match the string
(/\/(?! \ * | Span). + \/(?! Span) [gim] */) // match the regular span that he added last time. I don't think it should appear here.
(/(\/. * | \/\ * [\ S \ s] +? \ * \/) // Match the comment
(/(\ * \ S *) (@ \ w + )(? = \ S *)/) // match the mark in the comment
(/\ B (break | continue | do | for | in | function | if | else | return | switch | throw | try | catch | finally | var | while | with | case | new | typeof | instance | delete | void | Object | Array | String | Number | Boolean | Function | RegExp | Date | Math | window | document | navigator | location | true | false | null | undefined | NaN) \ B/) // match a keyword

The bearded brother may not want to repeat the wheel, but just wants to figure out how to make the wheel. So he writes this thing as soon as it is clicked, and does not have to go into details, but it is rough.
Of course, I am not talking about him. I just want to give a brief comment. After all, there are many excellent syntax highlighting plug-ins. You don't have to create them yourself. Just learn the principle.

Next, let's analyze the next article "how to implement regular expression JavaScript code highlighting"
In fact, this analysis is very detailed. I can only briefly explain it.
The sub-Cobalt Carbonate thinking has always been rigorous. I have read this article for more than an hour, but I have to look at it. I re-analyzed it this time, and then I implemented it myself, it took me half a day,
But it is really worth it. I really learned a lot.

Let's take a look at the general logic.

Copy codeThe Code is as follows:
(\/. * | \/\ * [\ S \ s] +? \ * \/) // Match the comment
(["']) (? : \. | [^ \ N]) *? \ 3) // match string
\ B (break | continue | do | for | in | function | if | else | return | switch | this | throw | try | catch | finally | var | while | with | case | new | typeof | instance | delete | void) \ B // matching keyword
\ B (Object | Array | String | Number | Boolean | Function | RegExp | Date | Math | window | document | navigator | location) \ B // match the built-in Object
\ B (true | false) \ B // matches a Boolean Value
\ B (null | undefined | NaN) \ B // matches various null values. I think this is suitable for a group of Boolean values.
(? : [^ \ W \ d] | \ $) [\ $ \ w] * // matches common variable names
(0 [xX] [0-9a-fA-F] + | \ d + (? : \. \ D + )? (? : [EE] \ d + )?) // Match the number (the former is not occupied, so there is a problem here)
(? : [^ \) \] \}] | ^ )(\/(?! \*)(? : \. | [^ \/\ N]) +? \/[Gim] *) // matches the Regular Expression
[\ S \ s] // any value that cannot be matched

Description of the last [\ S \ s] in the original text: we must match every character. Because they all require an HTML Escape.
The detailed code is shown below.

This is a very good article. I have read at least 10 times before and after, and I have understood it almost the last two days.

However, this Code also has some small flaws, such as the character string cannot match the line, and the character string matching is optimized.

In addition, the numbers cannot be fully matched. Only 0xff, 12.34, and 1e3 types can be matched. 123 12.3e + 3 formats cannot be matched.
I think the keyword order can be slightly optimized.
Because the traditional NFA engine only matches from left to right and stops the next branch operation when it matches.
Therefore, putting the most common keywords in front can improve some performance.
Finally, it is best to use new RegExp to Improve the Performance of code with a large amount of code.

Here is my regular expression and simple demo. (In fact, it is only the optimization of the source code of sub-Cobalt Carbonate ..)
First, let's look at the regular part:

Copy codeThe Code is as follows:
(\/. * | \/\ * [\ S \ S] *? \ * \/) // The matching comment is not modified.
("(? : [^ "\] | \ [\ S \ S]) *" | '(? : [^ '\] | \ [\ S \ S]) *') // The matching comment has been optimized.
\ B (true | false | null | undefined | NaN) \ B // matches Boolean and null values.
\ B (var | for | if | else | return | this | while | new | function | switch | case | typeof | do | in | throw | try | catch | finally | | instance | delete | void | break | continue) \ B // match the keyword. The keyword order is changed.
\ B (document | Date | Math | window | Object | location | navigator | Array | String | Number | Boolean | Function | RegExp) \ B // built-in Object, word Order changed
(? : [^ \ W \ d] | \ $) [\ $ \ w] * // The common variable name is not changed.
(0 [xX] [0-9a-fA-F] + | \ d + (? : \. \ D + )? (? : [EE] [+-]? \ D + )? | \. \ D + (? : [EE] [+-]? \ D + )?) // Match the number and fix the match.
(? : ^ | [^ \) \] \}]) (\/(?! \*)(? : \. | [^ \/\ N]) +? \/[Gim] *) // matches regular expressions. This is the most complex and often happens. I have no strength to modify it for the moment.
[\ S \ S] // match other

A group is merged with a Boolean value and a null value, and then the regular grouping is optimized. Therefore, two groups are reduced than the empty group.
2 and 3 are character string groups, because (["']) captures the quotation marks, while my regular expression does not.
This (true | false | null | undefined | NaN) can be separated if you do not like to put it in a group,
Is it the same group, just to distinguish coloring.
True in sublime text | false | null | undefined | NaN is a color, while notepad ++ only colors true | false.

Well, we should give an example.
I believe that many people have turned it off before seeing it, or just pulled the scroll bar and then turned it off.
However, I wrote this article to give it to my friends who have read it carefully. As long as someone has read it, I don't think it will be written in white.
Example:

Copy codeThe Code is as follows:
// Single line comment
/**
* Multi-line comment
* @ Date 2014-05-12 22:24:37
* @ Name Test
*/
Var str1 = "123 \" 456 ";
Var str2 = '2018 \ '20180101 ';
Var str3 = "123 \
456 ";

Var num = 123;
Var arr = [12, 12.34,. 12, 1e3, 1e + 3, 1e-3, 12.34e3, 12.34e + 3, 12.34e-3,. 1234e3];
Var arr = ["12", "12.34 ",'. 12, 1e3 ', '1e + 3, 1e-3', '12. 34e3, 12.34e + 3, 12.34e-3 ',". 1234e3 "];
Var arr = [/12 "," 12.34/,/"12 \/34"/];

For (var I = 0; I <1e3; I ++ ){
Var node = document. getElementById ("a" + I );
Arr. push (node );
}

Function test (){
Return true;
}
Test ();

 

(Function (window, undefined ){
Var _ re_js = new RegExp ('(\/. * | \/\ * [\ s \ S] *? \ * \/) | ("(? : [^ "\\\\] |\\\ [\\ S \ S]) *" | \'(? : [^ \ '\] | \ [\ S \ S]) * \') | \ B (true | false | null | undefined | NaN) \ B | \ B (var | for | if | else | return | this | while | new | function | switch | case | typeof | do | in | throw | try | catch | finally | with | instance | delete | void | break | continue) \ B | \ B (document | Date | Math | window | Object | location | navigator | Array | String | Number | Boolean | Function | RegExp) \ B | (? : [^ \ W \ d] |\\$) [\ $ \ w] * | (0 [xX] [0-9a-fA-F] + | \ d + (? : \. \ D + )? (? : [EE] [+-]? \ D + )? | \. \ D + (? : [EE] [+-]? \ D + )?) | (? : ^ | [^ \) \] \}]) (\/(?! \\*)(? : \\\\. | [^\\\\\/\\ N]) +? \/[Gim] *) | [\ s \ S] ', 'G ');

Function prettify (node ){
Var code = node. innerHTML. replace (/\ r \ n | [\ r \ n]/g, "\ n "). replace (/^ \ s + | \ s + $/g ,"");
Code = code. replace (_ re_js, function (){
Var s, a = arguments;
For (var I = 1; I <= 7; I ++ ){
If (s = a [I]) {
S = htmlEncode (s );
Switch (I ){
Case 1: // comment on com
Return '<span class = "com">' + s + '</span> ';
Case 2: // string str
Return '<span class = "str">' + s + '</span> ';
Case 3: // true | false | null | undefined | NaN val
Return '<span class = "val">' + s + '</span> ';
Case 4: // keyword kwd
Return '<span class = "kwd">' + s + '</span> ';
Case 5: // built-in object obj
Return '<span class = "obj">' + s + '</span> ';
Case 6: // numeric num
Return '<span class = "num">' + s + '</span> ';
Case 7: // regular reg
Return htmlEncode (a [0]). replace (s, '<span class = "reg">' + s + '</span> ');
}
}
}
Return htmlEncode (a [0]);
});
Code = code. replace (/(? : \ S * | (? :)*\*(? :) *) (@ \ W +) \ B/g, '* <span class = "comkey"> $1 </span>') // match the mark in the comment
. Replace (/(\ w +) (\ s * \ (| (? :) * \ () | (\ W +) (\ s * = \ s * function | (? :) * = (? :) * Function)/g, '<span class = "func"> $1 </span> $ 2') // matches the function
Return code;
}


Function htmlEncode (str ){
Var I, s = {
// "&":/&/G,
":/"/G,
"'":/'/G,
"<": // G,
"<Br>":/\ n/g,
"": // G,
"":/\ T/g
};
For (I in s ){
Str = str. replace (s [I], I );
}
Return str;
}

Window. pretparameters = pretparameters;
}) (Window );

You can use the following code for testing.

Code:

<! Doctype html> <ptml lang = "en"> <pead> <meta charset = "UTF-8"> <title> test </title> <style>/* highlight style */* {font-size: 12px;} code {word-break: break-all ;}</P> <P>. com {color: #008000;}/* Comment */. comkey {color: # FFA500;}/* annotation mark */. str {color: #808080;}/* string */. val {color: #000080;}/* true | false | null | undefined | NaN */. kwd {color: #000080; font: bold 12px 'comic sans Ms', sans-serif;}/* keyword */. obj {color: #0000 80;}/* built-in object */. num {color: # FF0000;}/* Number */. reg {color: # 8000FF;}/* regular */. func {color: # A355B9 ;} /* function */</style> </pead> <body> </P> <code id = "regdemon"> // single row comment/*** multiple line comment * @ date 22:24:37 * @ name test */var str1 = "123 \" 456 "; var str2 = '1970 \ '20140901'; var str3 = "123 \ 456"; </P> <P> var num = 123; var arr = [12, 456 ,. 12, 1e3, 1e + 3, 1e-3, 12.34e3, 12.34e + 3, 12.34e-3 ,. 1234e3]; var arr = ["1 2 "," 12.34 ",'. 12, 1e3 ', '1e + 3, 1e-3', '12. 34e3, 12.34e + 3, 12.34e-3 ',". 1234e3 "]; var arr = [/12", "12.34/,/" 12 \/34 "/]; </P> <P> for (var I = 0; I <1e3; I ++) {var node = document. getElementById ("a" + I); arr. push (node) ;}</P> <P> function test () {return true ;}test (); </P> <P> (function (window, undefined) {var _ re_js = new RegExp ('(\\/\\/. * | \/\ * [\ s \ S] *? \ * \/) | ("(?: [^ "\\\\] |\\\ [\\ S \ S]) *" | \'(?: [^ \ '\] | \ [\ S \ S]) * \') | \ B (true | false | null | undefined | NaN) \ B | \ B (var | for | if | else | return | this | while | new | function | switch | case | typeof | do | in | throw | try | catch | finally | with | instance | delete | void | break | continue) \ B | \ B (document | Date | Math | window | Object | location | navigator | Array | String | Number | Boolean | Function | RegExp) \ B | (?: [^ \ W \ d] |\\$) [\ $ \ w] * | (0 [xX] [0-9a-fA-F] + | \ d + (?: \. \ D + )? (?: [EE] [+-]? \ D + )? | \. \ D + (?: [EE] [+-]? \ D + )?) | (?: ^ | [^ \) \] \}]) (\/(?! \\*)(?: \\\\. | [^\\\\\/\\ N]) +? \/[Gim] *) | [\ s \ S] ', 'G'); </P> <P> function prettify (node) {var code = node. innerHTML. replace (/\ r \ n | [\ r \ n]/g, "\ n "). replace (/^ \ s + | \ s + $/g, ""); code = code. replace (_ re_js, function () {var s, a = arguments; for (var I = 1; I <= 7; I ++) {if (s = a [I]) {s = htmlEncode (s); switch (I) {case 1: // comment com return ''+ s + ''; case 2: // string str return ''+ s +''; case 3: // true | false | null | undefined | NaN Val return ''+ s +''; case 4: // keyword kwd return ''+ s +''; case 5: // built-in object obj return ''+ s +''; case 6: // number num return ''+ s +''; case 7: // regular reg return htmlEncode (a [0]). replace (s, ''+ s +'') ;}} return htmlEncode (a [0]) ;}; code = code. replace (/(?: \ S * | (? :)*\*(? :) *) (@ \ W +) \ B/g, '* $ 1') // matches the mark in the comment. replace (/(\ w +) (\ s * \ (| (? :) * \ () | (\ W +) (\ s * = \ s * function | (? :) * = (? :) * Function)/g, '$1 $ 2') // matches the return code of the function;} </P> <P> function htmlEncode (str) {var I, s = {// "&":/&/g, "":/"/g," '":/'/g," <": /</g, ">":/>/g, "": // \ n/g, "": // g, "": // \ t/g }; for (I in s) {str = str. replace (s [I], I);} return str ;}</P> <P> window. pretpipeline = pretpipeline;}) (window); </code> </P> </body> </ptml>
[Ctrl + A select all Note: If you need to introduce external Js, You need to refresh it to execute]

The results are almost combined with the ideas of the bearded brother and the sub-Cobalt Carbonate, and are now relatively complete.
I have not tested the compatibility or anything, and I have no need to test it. I am too tired to write various syntaxes by myself ..

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.