C + + refactoring markdown–> HTML engine

Source: Internet
Author: User

I just wrote a simple scheme interpreter in Python, and in Python we did the previous conversion engine for pattern matching and string processing.

But the code structure itself is very dissatisfied, but the understanding of Python so that I can do after the start of the function, but the way of implementation is not very natural.

All the problems are finally attributed to one place, at the beginning of the string according to the space is divided into a list, this step is actually unreasonable, and the string of a lot of information to reduce, the current also brought some convenience, but it is not taken, The result is that some context-sensitive structures have to be repeated before and after selecting a suitable time to process back and forth.

So the recent refactoring of the above engine with C + + seems to be more structured than before.

There are several experiences:

1. C + + string operation is really poor, but in fact, can barely use, more comfortable than their own;

2. C + + 11 of the Regex is not strong, but it is relatively convenient to use, some simple match any fully capable, and support packet matching, more convenient, of course, can also use the regex in boost;

3. In the process of writing, it is necessary to constantly adjust the structure of the code, such as the first to write a separate italic, bold, strikethrough processing, and then found that the three is the same pattern, so the extraction as a common function, each case given a parameter to distinguish; then repeat the process.

4. As with the previous Python program, the key point is to differentiate between what is at the full line level, which are at the token level, which are context-sensitive, and how each has a different approach.

First, we introduce several auxiliary functions, the first of which are predicate functions, determine whether the current line is a syntactic structure, and the last Regex function is actually using regex to complete the replacement of the more complex lexical units in the string:

Links, pictures, superscript

BOOLIsunorderedlist (std::string&input) {    if((input[0] =='*'|| input[0] =='+'|| input[0] =='-') && input[1] ==' ') {        return true; }    Else {        return false; }}BOOLIsquote (std::string&input) {    if(input[0] =='>'&& input[1] ==' ') {        return true; }    Else {        return false; }}BOOLIsorderedlist (std::string&input) {    if(IsDigit (input[0]) && input[1] =='.'&& input[2] ==' ') {        return true; }    Else {        return false; }}BOOLIsherline (std::string&input) {std::string:: Iterator i;  for(i = Input.begin (); I! = Input.end (); i++) {        if(*i! ='-') {             Break; }    }    if(i = = Input.end () && input.size () >=3) {        return true; }    Else {        return false; }}codetype Iscodeblock (std::string&input)    {CodeType codesym; if(Input = =""' python") {Codesym=CodeType::P Ython; }    Else if(Input = =""' C + +") {Codesym=Codetype::cpp; }    Else{Codesym=Codetype::wrong; }    returnCodesym;}//Handle with link sup imgSTD::stringRegex (std::string&input) {    //\[*] (*), linkStd::regex Re_link ("/\\[(.*)\\]\\((.*)\\)"); Input=std::regex_replace (input, Re_link,"<a href=\ "$2\" target=\ "_blank\" >$1</a>"); Std::regex Re_sup ("\\[(.*)\\]\\[(.*)\\]"); Input= Std::regex_replace (input, Re_sup,"$1<sup>$2</sup>"); Std::regex re_img ("!\\[(.*)\\]\\((.*)\\)"); Input= Std::regex_replace (input, re_img,"&nbsp"); returninput;}

Where the codetype is defined in the secondary header file

enum class stateblock{    BEGIN, in    ,    END,}; enum class CodeType {    PYTHON,    CPP,    wrong}; extern CodeType code;

These two enum class variables are used to control the current state of the block structure and the code block structure, explaining that the block structure is affecting the context, and if you enter the block structure, then the processing of the lexical level is not done, and all whitespace characters need to be preserved, for the code block also need to do some special word coloring processing.

Here is the core part of the entire program, input is a line of markdown file, the output is an HTML file

STD::stringParse (std::string&input) {std::stringtemp=input; if(Input.empty ()) {return ""; }    if(state = =stateblock::in) {Std::regex Re_blank ("\\s"); Std::regex Re_lt ("<"); Temp= Std::regex_replace (temp, Re_blank," in"); Temp= Std::regex_replace (temp, RE_LT,"&lt"); }    //in Block    if(Prehandle (temp) = =0) {Temp=SetColor (temp); Temp.insert (0,"<p>"); Temp+="</p>"; returntemp; } Currentlineorder=false; Temp=token (temp); Temp=Regex (temp); if(input[0] =='#'&& input[1] ==' ') {Temp=handletitle (Input,1); }    Else if(input[0] =='#'&& input[1] =='#'&&input[2]==' ') {Temp=handletitle (Input,2); }    Else if(input[0] =='#'&& input[1] =='#'&&input[2] =='#'&& input[3]==' ') {Temp=handletitle (Input,3); }        Else if(isunorderedlist (input)) {temp=handleunorderedlist (temp); }    Else if(Isquote (input)) {temp=Handlequote (temp); }    Else if(isorderedlist (input)) {temp=handleorderedlist (temp); }    Else if(Isherline (input)) {temp=Handleherline (); }    Else{Temp+="</br>";    } closeorderlist (temp); returntemp;}

Each of these handle functions needs to write a separate handler, where some of the handler functions call a common function, which is a specific assignment by type.

Finally, in C + + write the original intention there is a, before using Python to complete the integration into QT must call external programs to achieve, each time the bug needs to recompile Python program and packaging, and then integration, very troublesome, so this time with C + + rewrite, can be easily integrated into QT, And also implemented a simple synchronous scrolling (currently let the left text input and the right side of the webview to maintain the same proportions to synchronize, in fact coarser, and want to fine, you can in the process of parse, remember each line markdown program corresponding to the right of the several lines, Create a vector to hold these values so that you can scroll to the right by the position on the left, or you can insert an anchor in the appropriate location to achieve a jump.

This time is more trivial, by the way also made a visual interface, is relaxed, and then empty will be the previous scheme to explain the function of the application and the end of the recursive optimization to fill up.

By the way recommend a book, "Python Source Analysis", you can look at the source of python2.7, the source of comments a lot, or relatively understood, plus the book to you clear the big frame, to understand some of the data structure of the memory model is a great help.

C + + refactoring markdown–> HTML engine

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.