Interesting questions in Lex

Source: Internet
Author: User
Tags escape quotes

Lex and Yacc are good lexical analyzers and parsers under UNIX, and under Linux, these two tools are both flex and bison, and are often used by C + + as a tool for building character analysis programs.

This article is not an introductory article, so let's say you already know the basic syntax for Lex and YACC
For an introductory article, refer to IBM's: "Yacc and Lex QuickStart"

Here we discuss some of its interesting uses and considerations

Recognition of strings

Conventional regular and matching problems are hard to come by, so here's a question, how do you recognize strings in C?

We know that the string is usually like this

"some \"string\" problem.\n"

But we will find that it contains a transfer character and quotation marks, how to simply write a regular formula as follows:

\"[^"]*\"

Will cause the quotation mark expression ability is not complete, cannot satisfy the C language request.

So we consider the inside of the expression part apart, first of all, although there is no quotation marks, but can let it have \" , so we changed the regular formula as follows:

\"(\\"|[^"])*\"

OK, then we can use this \" escape quotes, but how do you think so, that is a bit impatient, because there is a very important situation, that is, the latter half can actually contain \ , but in fact, we \ are actually escape character, to be paired with the use of, alone is not correct, so we should add a limit to it and not let it \ happen at random, then our regular becomes this:

\"(\\.|[^"\\])*\"

Well, this is the regular formula for our C-language string recognition.

Recognition of annotations

Well, solve the difficult problem of string recognition, then, found another situation, C language has two kinds of comments, how to correctly identify them?

// hello world/** * hello world */

First one is easier to implement, similar to the above method, as long as there is no line break in the comment:

//[^\n]*

But the following is a more complex, and of course, a simple way to implement

"/*"([^\*]|(\*)*[^\*/])*(\*)*

This regular is very complicated, let's break it down and explain

"/*"   ( [^\*]  |  (\*)*  [^\*/] )*  (\*)*   "*/"

( [^\*] | (\*)* [^\*/] )*This paragraph is looking for the non- * content, or the * later is * not / the part, this is allowed, someone asked, why * can't you follow * ?

This is because once can be followed * , the next match will not limit the / beginning of the match, in order to avoid this situation, to make restrictions, but also because there may be a continuous end of the * situation, so in the back to add a continuous*

Here, in fact, the use of other regular engine, there are simple solutions, specifically, you can refer to this English blog: "Finding Comments in Source Code Using Regular Expressions"

In addition, in the practical use of Lex, there is also a convenient way, that is, the use of fixed C code, processing comment Discard, the method is as follows:

"/*"                    comment();%%comment(){    char c, c1;loop:    while‘*‘0)        putchar(c);    if‘/‘0)    {        unput(c1);        goto loop;    }    if0)        putchar(c1);}

Interesting questions in Lex

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.