C # lexical Analyzer (ii) input buffering and code positioning

Source: Internet
Author: User
Tags rollback

One, input buffer

Before describing how to do lexical analysis, let's say a question that is not mentioned--how to read a character stream from a source file. Why is the question so important? Because in lexical analysis, a character stream is required, it must be able to support a rollback operation (that is, multiple characters descriptors back into the flow and later read again).

To explain why you need to support a fallback operation, for a simple example, now you have to match two patterns:

Figure 1 The rollback process of the stream

The above is a simple matching process, in order to show the rollback process, in the later implementation of the DFA simulator will explain in detail how to match the morphemes.

Now look at the input-related classes in C # with the stream, which supports the search of the stream, but only in bytes; BinaryReader and TextReader support read characters, but they cannot support rollback. Therefore, you must complete this input buffer class, the general idea is to TextReader as the underlying character input, and then by their own class to complete the support of the fallback ability.

The principle of compiler gives a method of buffer pair, which simply means opening two buffers, and setting the buffer size is N characters. Each time the N characters are read into the buffer and the character operation is implemented on the buffer. If the data for the current buffer is already processed, the N new characters are read into another buffer, and then the new buffer is replaced.

This kind of data structure is very efficient, and as long as the proper pointers are maintained, the fallback function can be easily implemented. However, its buffer size is fixed, and the newly read characters overwrite the old characters. If you need to rewind too many characters (for example, when parsing a very long string), you are prone to errors. I solved the problem of overwriting old characters by using multiple buffers--if the buffer is not enough, open a new buffer instead of overwriting the old data.

If it's just a constant addition of buffers, then the amount of memory consumed will only increase, which makes no sense, so I've defined three actions to release buffers: Drop,accept and Accepttoken. The effect of the Drop is to mark all data before the current position as invalid (discarded), the buffer occupied by the marked invalid data is freed and can be reused; Accept returns data that is marked as invalid as a string rather than simply discarding; Accepttoken is to return the invalid data in the form of Token, so as to facilitate lexical analysis.

Such a data structure is similar to the deque in STL, but there is no need to randomly access and insert, delete data, only in the head and tail of the data operation, so I directly to multiple buffers using a two-way linked list into a loop, using three pointers current,first and last Point to a buffer with data in the list, as shown in the following illustration:

Figure 2 A linked list of multiple buffers, the red part indicates the data, the white part has no data

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.