Origin of Automatic Static Web Page Generator and HTML file parsing (redhorse World)

Source: Internet
Author: User
Tags xsl
Too many large websites have used static Web pages. In terms of performance, this is of course the best choice for such websites. Although we have always hoped to implement this function, we have not been eager to implement it, so we have been putting it on hold. Finally, my current project decided to use the static Web page generation technology.

I have thought about many solutions, but I reject them one by one. One solution is to use the XML solution. The idea of csdn is to use an XML file to save the data and define an XSL for parsing on the client. The biggest disadvantage of this solution is that it cannot handle complicated page la S. Imagine a complicated page. It is difficult for you to define an appropriate XSL, and the client overhead may not be accepted. The other is to use JavaScript to store data as JS files, but not all clients support Javascript. In my opinion, this structure is not good and it is not easy to manage. The last method is to use a template. Many people define special strings in the template, and then replace them a little bit, which is inefficient and prone to errors. It is difficult to support page layout and content changes.

After thinking about these issues, I still use the template definition method, but it is different from the template method mentioned above. I hope to implement some custom data controls that can be used in the template. When a static page is generated, the original template file is parsed, the control is automatically identified, and the custom data structure is used to automatically bind these data controls. Many experts can see at a glance that this method is similar to ASP. NET. In fact, my inspiration comes from it. Of course, at my level, I cannot reach its realm. In this way, you can freely change the design within a certain range without changing the program. Even if the layout changes a lot, if the data is not changed or the changes are small, the program does not need to change or requires only a small change.

After all, I have a little experience, and there will be a lot of unsuitable places. I hope you can come up with a better idea of generating static Web pages, and you are welcome to criticize and correct them.

After talking nonsense for a long time, let's get down to the point where we can think about how to implement this idea. As mentioned above, I divided the entire process into two steps: parse the template first, and then bind the data. At this stage, I only implement template parsing. Here we will first introduce the design of a custom data control. I use the HTML Format: <flag name = value> body </flag>. In this way, I can parse the custom control in the same way as parsing the HTML language.

Before implementation, I read the books on compilation principles and roughly read the lexical analysis. This is really complicated and I don't understand it. However, no matter what the situation is, there are still gains, especially the so-called state transition of the finite automatic machine inspired me. Special HTML syntax is not considered. Generally, HTML tags are in the format of <flag name = value> body </flag>. I have defined five statuses for character scanning: NULL, looking for controls, looking for Control headers, searching for control content, and searching for the end of a control. (Note: When I write this article, I think "looking for controls" is redundant, but this is the case now.) the conversion of these statuses depends on the five boundary characters I have defined: non-boundary characters, start boundary characters, end boundary characters, Closed Boundary characters, and brief end boundary characters. All of these are defined in the source code in the form of enumeration.

It's not too early. I want to go home. I will write it here today. I think the source code structure is clear and there are many comments. If someone is interested in studying its implementation methods in detail, you can simply look at the source code. If you still don't understand it or if I have time, I will continue to write down the entire idea.

BTW: Now the parsing is complete. The structure has been completed, but there are a lot of bugs and debugging is very troublesome. I hope you can give more feedback.

Good night, bye!

Too many large websites have used static Web pages. In terms of performance, this is of course the best choice for such websites. Although we have always hoped to implement this function, we have not been eager to implement it, so we have been putting it on hold. Finally, my current project decided to use the static Web page generation technology.

I have thought about many solutions, but I reject them one by one. One solution is to use the XML solution. The idea of csdn is to use an XML file to save the data and define an XSL for parsing on the client. The biggest disadvantage of this solution is that it cannot handle complicated page la S. Imagine a complicated page. It is difficult for you to define an appropriate XSL, and the client overhead may not be accepted. The other is to use JavaScript to store data as JS files, but not all clients support Javascript. In my opinion, this structure is not good and it is not easy to manage. The last method is to use a template. Many people define special strings in the template, and then replace them a little bit, which is inefficient and prone to errors. It is difficult to support page layout and content changes.

After thinking about these issues, I still use the template definition method, but it is different from the template method mentioned above. I hope to implement some custom data controls that can be used in the template. When a static page is generated, the original template file is parsed, the control is automatically identified, and the custom data structure is used to automatically bind these data controls. Many experts can see at a glance that this method is similar to ASP. NET. In fact, my inspiration comes from it. Of course, at my level, I cannot reach its realm. In this way, you can freely change the design within a certain range without changing the program. Even if the layout changes a lot, if the data is not changed or the changes are small, the program does not need to change or requires only a small change.

After all, I have a little experience, and there will be a lot of unsuitable places. I hope you can come up with a better idea of generating static Web pages, and you are welcome to criticize and correct them.

After talking nonsense for a long time, let's get down to the point where we can think about how to implement this idea. As mentioned above, I divided the entire process into two steps: parse the template first, and then bind the data. At this stage, I only implement template parsing. Here we will first introduce the design of a custom data control. I use the HTML Format: <flag name = value> body </flag>. In this way, I can parse the custom control in the same way as parsing the HTML language.

Before implementation, I read the books on compilation principles and roughly read the lexical analysis. This is really complicated and I don't understand it. However, no matter what the situation is, there are still gains, especially the so-called state transition of the finite automatic machine inspired me. Special HTML syntax is not considered. Generally, HTML tags are in the format of <flag name = value> body </flag>. I have defined five statuses for character scanning: NULL, looking for controls, looking for Control headers, searching for control content, and searching for the end of a control. (Note: When I write this article, I think "looking for controls" is redundant, but this is the case now.) the conversion of these statuses depends on the five boundary characters I have defined: non-boundary characters, start boundary characters, end boundary characters, Closed Boundary characters, and brief end boundary characters. All of these are defined in the source code in the form of enumeration.

It's not too early. I want to go home. I will write it here today. I think the source code structure is clear and there are many comments. If someone is interested in studying its implementation methods in detail, you can simply look at the source code. If you still don't understand it or if I have time, I will continue to write down the entire idea.

BTW: Now the parsing is complete. The structure has been completed, but there are a lot of bugs and debugging is very troublesome. I hope you can give more feedback.

Good night, bye!

Source code download

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.