A little bit of an integer constant and the lexical identifier above, called by Lex. One more parameter was passed when Definetoken. This parameter is an optional descriptive message that, if not passed, uses the string form of the regular expression directly. The regular expression of an identifier is more than 40,000 characters long and is not readable, so add an extra string to describe it. It will be used to generate compilation error messages in the future.
Finally, we'll write a regular expression for whitespace, line breaks, and annotations. These three are all written in accordance with the specifications of C # spec. There are two types of comments://start with a comment that begins with a newline until/* and a multiline comment that starts with */. You can learn how their regular expressions are written:
var Re_spacechar = RE. charsof (c = char.getunicodecategory (c) = = Unicodecategory.spaceseparator); whitespace = Lex. Definetoken (Re_spacechar | RE. CharSet ("\u0009\u000b\u000c")); Line_breaker = Lex. Definetoken ( RE. CharSet ("\u000d\u000a\u0085\u2028\u2029") | RE. Literal ("\ r \ n")); var Re_inputchar = RE. Charsof (c =! ") \u000d\u000a\u0085\u2028\u2029 ". Contains (c)); var re_notslashorasterisk = RE. Charsof (c =! ") /*". Contains (c)); var re_delimitedcommentsection = RE. Symbol ('/') | (RE. Symbol (' * '). Many () >> re_notslashorasterisk); COMMENT = Lex. Definetoken ( RE. Literal ("//") >> Re_inputchar.many ()) | (RE. Literal ("/*") >> Re_delimitedcommentsection.many () >> RE. Symbol (' * '). Many1 () >> RE. Symbol ('/')); |
Finally, there is a bit of code to follow, generate Scannerinfo from the Lexicon object, and generate scanner:
Scannerinfo info = lexicon. Createscannerinfo (); Scanner Scanner = new Scanner (info); string Source = "//arbitrary Minisharp source Code"; StringReader sr = new StringReader (source); scanner. SetSource (New Sourcereader (SR)); scanner. Setskiptokens (whitespace. Index, Line_breaker. Index, COMMENT. Index); |
This is done! We have created a complete minisharp lexical parser. Now it can analyze all the Minisharp source code. Note that we have set the lexical analyzer to ignore all whitespace characters, line breaks, and comments, which are considered for easy parsing. Readers can try to expand the lexical analyzer on their own, such as adding lexical constants, more keywords and operators, and even more new lexical characters than ever before. I wish you a happy practice! At the beginning of the next chapter, we want to go to another important part--The Grammatical Analysis section, please.
Also don't forget to follow my VBF project: Https://github.com/Ninputer/VBF and my Weibo: Http://weibo.com/ninputer Thank you for your support!
DIY Development Compiler (v) lexical analyzer for Minisharp language