Today the first part of the scanner, commonly known as scanners, also known as the lexical analyzer. To understand what scanner has done, we'll start with the whole process. First, the calculator gets the input of a string of characters, such as "1 + 2". If you do not learn the principle of compiling, how should the results be calculated? May use the stack, a digital stack a symbol stack and so on, but this simple operation is OK, if there are more than 10 of the number, decimals or parentheses, the situation will be very bad, and programming language is much more complex than arithmetic. The general compiler's practice is to separate the strings in token units. Tokens, translated as "tokens, tokens," are not intuitive, and you can understand the minimum type units in an expression (the sentence elements in the compiled language implementation pattern For example natural language are also appropriate). For example, the "1 + 2" mentioned above can be decomposed into a combination of "number + symbol + number". After the token is obtained, we can deal with it in the next section of the parsing section (Parser) based on the grammar, and we'll talk about it next. Lexical analysis can be implemented automatically using tools such as Lex, but in order to learn, we are still human flesh. Here to introduce a concept, state machines, also known as automata (Automation), as the name implies, is a machine with a state set, and can be carried out by the State of the migration (similar to the diagram, I suddenly want to be able to directly use the diagram to achieve state machines, and so on after the job to try). First, a simple example, indicating a bit integer, State 1 is the start status, at this point to the state machine input a 0-9 integer, state machine migrated to State 2 end (accept) status, when the character input completed and reached the end state, the match is successful. How should a multi-bit integer be represented? You can add a loop to the end state to accept multiple inputs, which is equivalent to "+" in the regular. The lower edge represents a separate 0. PS: In fact, the state machine can also represent the state. For example, the two-dollar arithmetic formula, of course, does not speak here, just let everyone know what the token is used to do. Well, after the concept of the state machine is understood, we'll talk about how to implement it. First of all, we've just talked about the idea of scanner to build tokens, so first create class tokens.
class Token { // slightly Private : Tokentype type_; int intvalue_; };
Type_ represents the type of token, where tokens are divided into four categories, namely, numbers (which are supported by integers only), symbols (+-*/), left and right brackets, and invalid states (invalid) Intvalue_ represent the values of integers
enum class Tokentype { INT, FLOAT, ADD, / / + SUB, // - MUL, // DIV, / // Left_par,/ / ( right_par, // ) INVALID, // Invalid type };
Secondly, because of the operator and miscellaneous, we can write a dictionary to store them with a key value. It has Hastoken, Findtoken and other convenient to call (too long do not write, see Code). With token, you can build a function that extracts tokens. Here the main talk about Getnexttoken, he is the core of the whole state machine, in the while loop, divided into two parts. The first part is state processing, according to the current state of the choice of processing mode, the second section of the migration. This phase identifies the char and determines the migration state. The initial state is start, which goes directly to the State migration, so it is essentially determined by the previous or several char, and is referred to the handle function for processing.
Token Scanner::getnexttoken (StringStream &expression) { //First CharAuto Currectchar =GetNextChar (expression); //State Judge while(!expression.eof ()) { //First Part Switch(state_) { CaseState::start: Break; CaseState::number:returnhandlenumberstate (expression, Currectchar); Break; CaseState::operator:returnhandleoperatorstate (expression, Currectchar); Break; CaseState::error:errortoken ("Error Input"); returnToken (tokentype::invalid); } stringCurrectstr; Currectstr.push_back (Currectchar); //Part II if(Iswdigit (Currectchar)) {State_=State::number; } Else if(dict_. Hastoken (CURRECTSTR)) {State_=State::operator; } Else{State_=State::error; } } returnToken (tokentype::invalid); }
The two handle function will deposit the qualifying char in buffer and return to the token. Handlenumberstate: receives two parameter characters stream and Currectchar (the number just iswdigit (). Reset state to start after the build represents the end of the match. The code is as follows:
Token scanner::handlenumberstate (StringStream &expression,CharCurrectchar) { //First Char stringbuffer; Buffer.push_back (Currectchar); while(!expression.eof () &&isdigit (Expression.peek ())) {Buffer+=GetNextChar (expression); } //Reset StateState_ =State::start; //string to int intvalue; Std::stringstream stream (buffer); Stream>>value; returnToken (tokentype::int, value); }
Handleoperatorstate: The only difference with number is that token is built by the self-built dictionary class.
Char Currectchar) { // First char string buffer; Buffer.push_back (Currectchar); = dict_. Findtoken (buffer); // Reset State State_ = State::start; return token; }
Gettokenlist, Getnexttokenlist is based on this to obtain all tokens within a row.
Scanner's content is now finished, followed by the parser section. In addition, writing things really difficult, stuttered, a lot of ideas can not express (╯-╰)
Homemade Calculator (i): Scanner