In-depth compilation principles-3-lexical analyzer

Source: Internet
Author: User

Introduction
At the beginning of the compiler's work, it is to read the source code (pre-compilation is not considered), and then remove some null characters, and then match the elements with their attributes (optional ), form a lexical unit. Multiple lexical units are connected to a lexical unit sequence. Since then, this method of analyzer has been completed. It's that simple. It can be seen that finding a word is the core work of the lexical analyzer. How can we obtain a word? Is the main content to be discussed in this section.

3.1 Workflow
The workflow of the lexical analyzer has been briefly described above. The following figure shows a more intuitive process:
Take a simple C language program block as an example to show the process of the lexical analyzer:

 

3.2 Internal Mechanism
In order to get a canonicalized word that complies with the language description, it is not as simple as imagined. "I just want to get a string !", Well, we are wrong. This is an arduous task, marked:

The following figure shows the internal mechanism of this method Analyzer:


1. First, determine a word, that is, read a string and a pattern match,
2. The mode is described by DFA. (DFA: deterministic finite automaton)
3 DFA can also be marked as a status transition diagram
4. The status chart can be generated using a regular expression.
5. Regular Expressions are generated by syntax conversion.
6 grammar production is defined by the programming language itself.

It should be said that the compilation principle is complicated. To know DFA, we need to learn discrete mathematics. To know the status chart, we need to know the 'fig', that is, we need to learn the data structure; it also involves regular expressions, programming languages, and so on. Therefore, it is a little difficult to learn the compilation principles. When encountering a concept, you must learn the knowledge points corresponding to this concept. Besides, this is just a lexical analyzer, followed by knowledge about syntax analyzer, code optimization algorithms, and computer architecture. But don't be discouraged. Just calm down and take it slowly. After learning it, we may not be able to write a compiler on our own. However, I think, for coding optimization, icdesign will certainly be helpful.

Conclusion 3.3
The core of the lexical analyzer is to find the operators that match the pattern. To know the mode description, we need to describe the specific programming language in general, first convert the question syntax, then the grammar produces the formula, then the regular expression, then the state transition graph, and then DFA.
After finding the word element, you also need to create a symbol table and form a lexical unit with the attribute of the word element. Then, the sequence is sent to the syntax analyzer to generate an abstract syntax tree, this is what we will talk about in the next section.

Author: rill_zhen

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.