Parsing of String tokens by Jcsql lexical analyzer

Source: Internet
Author: User

When I finished the lexical analyzer last week, I had a problem with a trivial but X-ache. After all, it is the first time to completely write the entire language of the compiler (so call it, the interpreter more reliable), due to lack of experience, in the string parsing this piece stopped two genius solved, here recorded for later reference. Oh yes, the reason why I want to own handwriting parser, not I do not know that there are automatic tools can be automatically generated, but I do not use, well, sure enough, high cold.

The function of the lexical analyzer is simply to divide the language into a single lexical unit (word) and give it a certain type. (If you do not understand its role, it is recommended to refer to lexical analysis)

For example:

A = 3;

We can then divide it into a meaningful unit and give the type:

<=,NE>

<3,NUM>

<;,SEMI>

and is the word sequence that divides well. In general, the lexical analyzer of a language always encounters the case of parsing a string component, such as an expression

val = "xxx" or val = ' xxx '

I do not refer to how other languages are analyzed, but only by guessing that they should be parsed into the following form:

<=,NE>

< '/', singlequote/doublequote>

Of course, this is not a good type of word allocation method, but at least it will not be a wrong way, until now, the grammar analysis phase of the work is very good, there may be a better way, but also need to refer to the predecessors.

The problem at this point is when you encounter a sentence like this:

A = "xxxxxxx" B= "Ssssssss"

Because in a SQL statement can not be a lifetime we will appear a string unit, so how to parse it was a bit of a brain. If you don't notice it, it will parse into

<=,NE>

< ",double_quote>

< ",double_quote>

< ",double_quote>

< ",double_quote>

If it is not controlled, this is obviously wrong, because b= is obviously two words here, not a STRING. So I began to look for ways to solve this problem: the first thought is to use the bool type to control, to judge the quotation marks appear in the single complex, if true is the plural, that is, the end of the symbol, so this problem solved. However, the solution to the problem of thinking in the brain is always one of the road to the ideal goal, many of the fork is in the hands of the time, so there is really a problem, see the following pseudo-code, wherein FLAG1 and 2 respectively represents the single quotation mark and double quotation mark judgment flag.

Get_next_token () { while(P! =val.size ()) {            if(flag) {std::stringv=get_string (); Continue; }            Switch(c) { Case '\ '':                if(flag==true)//The begin quote{flag=false; }                Else if(flag==false) {flag=true;                } consume ();  Break; default: Consume (); }        }    }

There is no problem with the code logic inside the switch statement: The problem is that the word is judged by a string, and when the next word is a string, the continueis executed after it is removed, and flag cannot be changed. So the next time to take the word into the function will enter the start if logic, at that time I made a lot of changes in this solution, all failed, the problem, and then have to find another solution, whenever this time all hate their brains are not clever enough, unexpectedly elegant solution to this problem, of course, the condition is limited, Cause I can't think of elegant solution,:p, I tend to the latter. Later also tried to use the counter way, is also a failure, breaking up on the drunk, good one, all woke up I was drunk, drunk daughter-in-law side pursed up to sleep. (eh?!) I'm not in the dorm? )

Fortunately, after a variety of bad methods, think of a final solution, using a stack, when the stack is kept with quotation marks (the current symbol falls on the quotation marks in the switch in each case stack, as shown in the code.) ), indicating that the word to be taken in this round is a string, when the current character falls on the quotation marks, the stack is judged whether there is a quotation mark, if any, then the closing quotation marks, then empty the stack.

//because the Get_token function is too long, only some fragments are pasted here    if(!quote_stack.empty ())//String_identifier.first stores the quotes    {        if(Quote_stack.top () = =c) {consume (); Chartemp =Quote_stack.top ();            Quote_stack.pop (); if(temp=='\ '')            {                returnToken (Tag::singlequote,"'"); }            returnToken (tag::D oublequote,"\""); }        Else{std::stringID =strings_with_termination (Quote_stack.top ()). C_STR ();            Token tk (tag::string, Id.c_str ()); if(!Id.empty ()) {                returntoken (TK); }        }    }    //Switch internal:         Case '\ '': Consume (); Quote_stack.push ('\ ''); returnToken (Tag::singlequote,"'");  Case '"': Consume (); Quote_stack.push ('"'); returnToken (tag::D oublequote,"\"");

This method is currently running well, due to the specificity of the task, the stack will hold up to two characters, because the stack inside by the deque implementation (c + + STL), space wasted a little bit, but this method will simplify the task, but also very good understanding, while compared to the method of flag, Flag is more likely to have the risk of unintentional assignment in other functions causing global variable pollution problems. Of course, you can replace it with a two-byte array, which is abstracted into a class to solve, and I'm not going to optimize it for the time being.

In fact, this is only a stopgap, I believe there must be elegant and more efficient design or method, look forward to learning.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.