Crazy HTML compression for high-performance WEB development

Source: Internet
Author: User

Generally, gzip is rarely started for html, because the current html is dynamic and does not use browser cache. If gzip is enabled, compression is required for each request, it will consume server resources. It is better to start gzip for js and css because js and css will use cache. I personally think the biggest benefit of html compression is that once it is written, all programs can be used in the future without any additional development work.
In the article "JavaScript and CSS merging, compression, and Cache Management", we talked about one self-written component that automatically merges, compresses JS and CSS, and adds a version number. This time, the html compression function is added to this component. The process is very simple, that is, when the program starts (contextInitialized or Application_Start), It scans all html and jsp (aspx) for compression.
Compression considerations:
The implementation method is mainly to use a regular expression to find and replace. When compressing html, pay attention to the following points:
1. The content format in the pre and textarea labels must be retained and cannot be compressed.
2. Some Annotations cannot be removed when html annotations are removed, for example: <! -- [If IE 6]>... <! [Endif] -->
3. Compress comments in Embedded js, because the comments may appear in the string, such as: var url = "http://www.cnblogs.com"; // The previous // not the comments
When you remove the JS line break, you cannot directly follow the action content. You must have a space. Consider the following code:
Else
Return;
If no space is required, it is changed to elsereturn.
4. jsp (aspx) may use <%> to embed some server code, which also needs to be processed separately. The annotation Processing Method in it is the same as that in js.
Source code:
The following is the source code implemented by java. You can also click here to download the code. I believe everyone can understand it and easily change it to the net code: Copy codeThe Code is as follows: import java. io. StringReader;
Import java. io. StringWriter;
Import java. util .*;
Import java. util. regex .*;
/*************************************** ****
* Compress jsp and html code to remove all blank and line breaks.
* @ Author bearrui (AK-47)
* @ Version 0.1
* @ Date 2010-5-13
**************************************** ***/
Public class HtmlCompressor {
Private static String tempPreBlock = "% HTMLCOMPRESS ~ PRE &&&";
Private static String tempTextAreaBlock = "% HTMLCOMPRESS ~ TEXTAREA &&&";
Private static String tempScriptBlock = "% HTMLCOMPRESS ~ SCRIPT &&&";
Private static String tempStyleBlock = "% HTMLCOMPRESS ~ STYLE &&&";
Private static String tempJspBlock = "% HTMLCOMPRESS ~ JSP &&&";
Private static Pattern commentPattern = Pattern. compile ("<! -- \ S * [^ \ []. *? --> ", Pattern. DOTALL | Pattern. CASE_INSENSITIVE | Pattern. MULTILINE );
Private static Pattern itsPattern = Pattern. compile ("> \ s +? <", Pattern. DOTALL | Pattern. CASE_INSENSITIVE | Pattern. MULTILINE );
Private static Pattern prePattern = Pattern. compile ("<pre [^>] *?>. *? </Pre> ", Pattern. DOTALL | Pattern. CASE_INSENSITIVE | Pattern. MULTILINE );
Private static Pattern taPattern = Pattern. compile ("<textarea [^>] *?>. *? </Textarea> ", Pattern. DOTALL | Pattern. CASE_INSENSITIVE | Pattern. MULTILINE );
Private static Pattern jspPattern = Pattern. compile ("<% ([^-@] [\ w \ W] *?) %> ", Pattern. DOTALL | Pattern. CASE_INSENSITIVE | Pattern. MULTILINE );
// <Script> </script>
Private static Pattern scriptPattern = Pattern. compile ("(? : <Script \ s *> | <script type = ['\ "] text/javascript [' \"] \ s *> )(.*?) </Script> ", Pattern. DOTALL | Pattern. CASE_INSENSITIVE | Pattern. MULTILINE );
Private static Pattern stylePattern = Pattern. compile ("<style [^> ()] *?> (. +) </Style> ", Pattern. DOTALL | Pattern. CASE_INSENSITIVE | Pattern. MULTILINE );
// Single line comment,
Private static Pattern signleCommentPattern = Pattern. compile ("//.*");
// String Matching
Private static Pattern stringPattern = Pattern. compile ("(\" [^ \ "\ n] *? \ "| '[^' \ N] *? ')");
// Trim removes spaces and line breaks
Private static Pattern trimPattern = Pattern. compile ("\ n \ s *", Pattern. MULTILINE );
Private static Pattern trimPattern2 = Pattern. compile ("\ s * \ r", Pattern. MULTILINE );
// Multi-line comment
Private static Pattern multiCommentPattern = Pattern. compile ("/\\*.*? \ */", Pattern. DOTALL | Pattern. CASE_INSENSITIVE | Pattern. MULTILINE );
Private static String tempSingleCommentBlock = "% HTMLCOMPRESS ~ SINGLECOMMENT & "; // placeholder
Private static String tempMulitCommentBlock1 = "% HTMLCOMPRESS ~ MULITCOMMENT1 & "; // * placeholder
Private static String tempMulitCommentBlock2 = "% HTMLCOMPRESS ~ MULITCOMMENT2 & "; // */placeholder

Public static String compress (String html) throws Exception {
If (html = null | html. length () = 0 ){
Return html;
}
List <String> preBlocks = new ArrayList <String> ();
List <String> taBlocks = new ArrayList <String> ();
List <String> scriptBlocks = new ArrayList <String> ();
List <String> styleBlocks = new ArrayList <String> ();
List <String> jspBlocks = new ArrayList <String> ();
String result = html;
// Preserve inline java code
Matcher jspMatcher = jspPattern. matcher (result );
While (jspMatcher. find ()){
JspBlocks. add (jspMatcher. group (0 ));
}
Result = jspMatcher. replaceAll (tempJspBlock );
// Preserve PRE tags
Matcher preMatcher = prePattern. matcher (result );
While (preMatcher. find ()){
PreBlocks. add (preMatcher. group (0 ));
}
Result = preMatcher. replaceAll (tempPreBlock );
// Preserve TEXTAREA tags
Matcher taMatcher = taPattern. matcher (result );
While (taMatcher. find ()){
TaBlocks. add (taMatcher. group (0 ));
}
Result = taMatcher. replaceAll (tempTextAreaBlock );
// Preserve SCRIPT tags
Matcher scriptMatcher = scriptPattern. matcher (result );
While (scriptMatcher. find ()){
ScriptBlocks. add (scriptMatcher. group (0 ));
}
Result = scriptMatcher. replaceAll (tempScriptBlock );
// Don't process inline css
Matcher styleMatcher = stylePattern. matcher (result );
While (styleMatcher. find ()){
StyleBlocks. add (styleMatcher. group (0 ));
}
Result = styleMatcher. replaceAll (tempStyleBlock );
// Process pure html
Result = processHtml (result );
// Process preserved blocks
Result = processPreBlocks (result, preBlocks );
Result = processTextareaBlocks (result, taBlocks );
Result = processScriptBlocks (result, scriptBlocks );
Result = processStyleBlocks (result, styleBlocks );
Result = processJspBlocks (result, jspBlocks );
PreBlocks = taBlocks = scriptBlocks = styleBlocks = jspBlocks = null;
Return result. trim ();
}
Private static String processHtml (String html ){
String result = html;
// Remove comments
// If (removeComments ){
Result = commentPattern. matcher (result). replaceAll ("");
//}
// Remove inter-tag spaces
// If (removeIntertagSpaces ){
Result = itsPattern. matcher (result). replaceAll ("> <");
//}
// Remove multi whitespace characters
// If (removeMultiSpaces ){
Result = result. replaceAll ("\ s {2 ,}","");
//}
Return result;
}
Private static String processJspBlocks (String html, List <String> blocks ){
String result = html;
For (int I = 0; I <blocks. size (); I ++ ){
Blocks. set (I, compressJsp (blocks. get (I )));
}
// Put preserved blocks back
While (result. contains (tempJspBlock )){
Result = result. replaceFirst (tempJspBlock, Matcher. quoteReplacement (blocks. remove (0 )));
}
Return result;
}
Private static String processPreBlocks (String html, List <String> blocks) throws Exception {
String result = html;
// Put preserved blocks back
While (result. contains (tempPreBlock )){
Result = result. replaceFirst (tempPreBlock, Matcher. quoteReplacement (blocks. remove (0 )));
}
Return result;
}
Private static String processTextareaBlocks (String html, List <String> blocks) throws Exception {
String result = html;
// Put preserved blocks back
While (result. contains (tempTextAreaBlock )){
Result = result. replaceFirst (tempTextAreaBlock, Matcher. quoteReplacement (blocks. remove (0 )));
}
Return result;
}
Private static String processScriptBlocks (String html, List <String> blocks) throws Exception {
String result = html;
// If (compressJavaScript ){
For (int I = 0; I <blocks. size (); I ++ ){
Blocks. set (I, compressJavaScript (blocks. get (I )));
}
//}
// Put preserved blocks back
While (result. contains (tempScriptBlock )){
Result = result. replaceFirst (tempScriptBlock, Matcher. quoteReplacement (blocks. remove (0 )));
}
Return result;
}
Private static String processStyleBlocks (String html, List <String> blocks) throws Exception {
String result = html;
// If (compressCss ){
For (int I = 0; I <blocks. size (); I ++ ){
Blocks. set (I, compressCssStyles (blocks. get (I )));
}
//}
// Put preserved blocks back
While (result. contains (tempStyleBlock )){
Result = result. replaceFirst (tempStyleBlock, Matcher. quoteReplacement (blocks. remove (0 )));
}
Return result;
}
Private static String compressJsp (String source ){
// Check if block is not empty
Matcher jspMatcher = jspPattern. matcher (source );
If (jspMatcher. find ()){
String result = compressJspJs (jspMatcher. group (1 ));
Return (new StringBuilder (source. substring (0, jspMatcher. start (1 ))). append (result ). append (source. substring (jspMatcher. end (1 )))). toString ();
} Else {
Return source;
}
}
Private static String compressJavaScript (String source ){
// Check if block is not empty
Matcher scriptMatcher = scriptPattern. matcher (source );
If (scriptMatcher. find ()){
String result = compressJspJs (scriptMatcher. group (1 ));
Return (new StringBuilder (source. substring (0, scriptMatcher. start (1 ))). append (result ). append (source. substring (scriptMatcher. end (1 )))). toString ();
} Else {
Return source;
}
}
Private static String compressCssStyles (String source ){
// Check if block is not empty
Matcher styleMatcher = stylePattern. matcher (source );
If (styleMatcher. find ()){
// Remove comments and line feed
String result = multiCommentPattern. matcher (styleMatcher. group (1). replaceAll ("");
Result = trimPattern. matcher (result). replaceAll ("");
Result = trimPattern2.matcher (result). replaceAll ("");
Return (new StringBuilder (source. substring (0, styleMatcher. start (1 ))). append (result ). append (source. substring (styleMatcher. end (1 )))). toString ();
} Else {
Return source;
}
}
Private static String compressJspJs (String source ){
String result = source;
// Because the annotation may appear in the string, remove the special character in the string first.
Matcher stringMatcher = stringPattern. matcher (result );
While (stringMatcher. find ()){
String tmpStr = stringMatcher. group (0 );
If (tmpStr. indexOf ("//")! =-1 | tmpStr. indexOf ("/*")! =-1 | tmpStr. indexOf ("*/")! =-1 ){
String blockStr = tmpStr. replaceAll ("//", tempSingleCommentBlock). replaceAll ("// \ *", tempMulitCommentBlock1)
. ReplaceAll ("\ */", tempMulitCommentBlock2 );
Result = result. replace (tmpStr, blockStr );
}
}
// Remove comments
Result = signleCommentPattern. matcher (result). replaceAll ("");
Result = multiCommentPattern. matcher (result). replaceAll ("");
Result = trimPattern2.matcher (result). replaceAll ("");
Result = trimPattern. matcher (result). replaceAll ("");
// Restore the replaced string
Result = result. replaceAll (tempSingleCommentBlock, "//"). replaceAll (tempMulitCommentBlock1 ,"/*")
. ReplaceAll (tempMulitCommentBlock2 ,"*/");
Return result;
}
}

Note:

After the above method is used, run the program again. Is it good to find that the source code is changed to 1 line when you view the source code on each page? But you should pay attention to some problems when using it:
1. embedded js originally wanted to call yuicompressor for compression. yuicompressor will compile JavaScript before compressing JavaScript to check if it is legal, because many of the embedded JS may use some server-side code, for example, var now = <% = DateTime. now %>, the code will not be compiled, so yuicompressor cannot be used.
At last, I can only write and compress the JS Code myself, but it is rough and dry, so there is still a problem to solve, that is, if the developer does not have a plus sign after a javascript code, compressing the data into one row is likely to cause problems. Therefore, you must ensure that each statement must end with a semicolon.

2. Because all jsp (aspx) files are compressed when the program is started, html dynamically generated during user requests cannot be compressed.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.