DIY Development Compiler (v) lexical analyzer for Minisharp language

Source: Internet
Author: User

A little bit of an integer constant and the lexical identifier above, called by Lex. One more parameter was passed when Definetoken. This parameter is an optional descriptive message that, if not passed, uses the string form of the regular expression directly. The regular expression of an identifier is more than 40,000 characters long and is not readable, so add an extra string to describe it. It will be used to generate compilation error messages in the future.

Finally, we'll write a regular expression for whitespace, line breaks, and annotations. These three are all written in accordance with the specifications of C # spec. There are two types of comments://start with a comment that begins with a newline until/* and a multiline comment that starts with */. You can learn how their regular expressions are written:

var Re_spacechar = RE. charsof (c = char.getunicodecategory (c) = = Unicodecategory.spaceseparator); whitespace = Lex. Definetoken (Re_spacechar | RE. CharSet ("\u0009\u000b\u000c")); Line_breaker = Lex. Definetoken (    RE. CharSet ("\u000d\u000a\u0085\u2028\u2029") |    RE. Literal ("\ r \ n")); var Re_inputchar = RE. Charsof (c =! ") \u000d\u000a\u0085\u2028\u2029 ". Contains (c)); var re_notslashorasterisk = RE. Charsof (c =! ") /*". Contains (c)); var re_delimitedcommentsection = RE. Symbol ('/') | (RE. Symbol (' * '). Many () >> re_notslashorasterisk); COMMENT = Lex. Definetoken (    RE. Literal ("//") >> Re_inputchar.many ()) |    (RE. Literal ("/*") >> Re_delimitedcommentsection.many () >> RE. Symbol (' * '). Many1 () >> RE. Symbol ('/'));

Finally, there is a bit of code to follow, generate Scannerinfo from the Lexicon object, and generate scanner:

Scannerinfo info = lexicon. Createscannerinfo (); Scanner Scanner = new Scanner (info); string Source = "//arbitrary Minisharp source Code"; StringReader sr = new StringReader (source); scanner. SetSource (New Sourcereader (SR)); scanner. Setskiptokens (whitespace. Index, Line_breaker. Index, COMMENT. Index);

This is done! We have created a complete minisharp lexical parser. Now it can analyze all the Minisharp source code. Note that we have set the lexical analyzer to ignore all whitespace characters, line breaks, and comments, which are considered for easy parsing. Readers can try to expand the lexical analyzer on their own, such as adding lexical constants, more keywords and operators, and even more new lexical characters than ever before. I wish you a happy practice! At the beginning of the next chapter, we want to go to another important part--The Grammatical Analysis section, please.

Also don't forget to follow my VBF project: Https://github.com/Ninputer/VBF and my Weibo: Http://weibo.com/ninputer Thank you for your support!

DIY Development Compiler (v) lexical analyzer for Minisharp language

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.