The experiment of compiling principle: completing the lexical analysis of C language
Let's talk about the overall framework:
Base class: Base encapsulates some basic character-judging functions, as follows:
int Charkind (char c);//Judge character type int spaces (char c); Whether the current space can eliminate int characters (char c);//is the letter int keyword (char str[]);//is the keyword int signwords (char str[]);//is the identifier int Numbers (char c);//Whether it is a numeric int integers (char str[]);//is an integer int floats (char str[]);//Whether it is a floating-point type
The derived class LexAn inherits the base and encapsulates the functions for line and word processing, as follows:
void Scanwords (); Handle each line void clearnotes ();//clear comments and extra spaces void getwords (int state),//process the word void Wordkind (char str[]);//Determine the type of word and output
The call relationship between functions is as follows:
Well, the overall framework is over, and we have a concrete implementation:
(i) Clear comments and extra spaces
(1) C language comments have//and/* Two forms, so if the current read in is/only to determine the next:
If it is/then the bank//After the affirmation is a comment, only need to save the comments, update the current line;
If it is *, then look for the until/* position, save the comment, update the current line, and then proceed with the operation (there may be multiple/*//).
Insufficient: The cross-line comment cannot be processed.
(2) handling extra space Here is more hasty, only to deal with the shape of if (a >= b), that is, the special symbol and the letter (number) between the space; as long as the spaces have special symbols at each end, then removing the current space will not cause an error.
void Lexan::clearnotes () {int I, j, k;int notecount = 0;int flag = 0;char note[100];/* comment */for (i = 0; bufferin[buffernum][i ]! = '; i++) {if (bufferin[buffernum][i] = = ' "') {flag = 1-flag;continue;} if (bufferin[buffernum][i] = = '/' && flag = = 0) {if (bufferin[buffernum][i + 1] = = '/') {for (j = i; bufferin[buffer NUM][J]! = ' + '; J + +) {note[notecount++] = bufferin[buffernum][j];} Note[notecount] = ']\n '; notecount = 0;fprintf (Fout, "[%s]----[Note bufferin[buffernum][i", note); if (bufferin[buffernum][i + 1] = = ' * ') {note[notecount++] = '/'; note[notecount++] = ' * '; for (j = i + 2; Bufferin[buffernum] [j]! = ' + '; J + +) {note[notecount++] = bufferin[buffernum][j];if (bufferin[buffernum][j] = = ' * ' && bufferin[buffernum][j + 1] = = '/') {j + = 2;note[notecount++] = Bufferin[buffernum][j];note[notecount] = ' + '; notecount = 0;fprintf (Fout, "[%s]-- --[note]\n, note); break;}} for (; Bufferin[buffernum][j]! = ' + '; j + +, i++) {Bufferin[buffernum][i] = Bufferin[buFFERNUM][J];} if (bufferin[buffernum][j] = = ' + ') {bufferin[buffernum][i] = '% ';}}}} Space for (i = 0, flag = 0; Bufferin[buffernum][i]! = ' + '; i++) {if (bufferin[buffernum][i] = = ' "') {flag = 1-flag;continu e;} if (bufferin[buffernum][i] = = "&& flag = = 0) {for (j = i + 1; bufferin[buffernum][j]! = ' + ' && Bufferin [Buffernum] [j] = = "; J + +) {}if (bufferin[buffernum][j] = = ' + ') {bufferin[buffernum][i] = ' + '; break;} if (bufferin[buffernum][j]! = ' spaces ' && (((bufferin[buffernum][j]) = = 1) | | (i > 0 && spaces (bufferin[buffernum][i-1]) = = 1))) {for (k = i; bufferin[buffernum][j]! = ' + '; j + +, k++) {bufferin[buffernum][k] = bufferin[buffernum][j];} Bufferin[buffernum][k] = ' + '; i--;}}} tab for (i = 0, flag = 0; Bufferin[buffernum][i]! = ' + '; i++) {if (bufferin[buffernum][i] = = ' \ t ') {for (j = i; bufferin[ BUFFERNUM][J]! = ' + '; J + +) {Bufferin[buffernum][j] = bufferin[buffernum][j + 1];} i =-1;}}}
(ii) The most important transformation of the state machine
Paint is not very good, I try to use the language to clear the description, we also need to combine the source analysis:
Mainly divided into < letters, 1> < numbers, 2> <$ _, 3> <4,/> (escaped) < =,5> <0,else >
The state initial value is set to 0:
(1) If the first character is a letter, then it can only be the identifier and the keyword, after which it encounters the end of the character except the number, the letter, the $,_, and the word.
(2) If the first character is a number, then it can only be a number, that is, octal, hexadecimal,. , number, $, after the end of the character except the above, remove the word.
(3) If the first is $ _, then only the identifier, that is, the letter, number, $, after the end of the character except the above, remove the word.
(4) If the first is a special character (". () = etc.), then separate processing, the process and the above-mentioned consistency, encountered the impossible combination end; This part looks at the code.
State machine void Lexan::getwords (int.) {char Word[100];int charcount = 0;int finish = 0;int Num;int I, J, k;for (i = 0; buffe Rscan[i]! = ' + '; i++) {switch (STATE/10) {case 0:switch (Charkind (Bufferscan[i])) {case 1:word[charcount++] = Bufferscan[i];state = 10; Break;case 2:word[charcount++] = Bufferscan[i];state = 20;break;case 3:word[charcount++] = Bufferscan[i];state = 30; Break;case 0:case 5:word[charcount++] = Bufferscan[i];switch (Bufferscan[i]) {case ' "': state = 41;break;case ' \ ': state = 42;break;case ' (': Case ') ': Case ' {': Case '} ': Case ' [': Case '] ': case '; ': Case ', ': Case '. ': state = 50;word[charcou NT] = ' n '; finish = 1;break;case ' = ': state = 43;break;default:state = 40;break;} break;default:word[charcount++] = Bufferscan[i]; break;} Break;case 1:switch (Charkind (Bufferscan[i])) {case 1:word[charcount++] = Bufferscan[i];state = 10;break;case 2:word[ charcount++] = Bufferscan[i];state = 20;break;case 3:word[charcount++] = Bufferscan[i];state = 30;break;case 0:case 5:wor D[charcounT] = ' + '; num = 0;while (word[num]! = ' \ ") Num++;<span style=" color: #ff6600; " >//length of processing!! if (num>7) word[7] = ' + '; </span>i--;finish = 1;state = 50;break;default:word[charcount++] = Bufferscan[i]; break;} Break;case 2:switch (Charkind (Bufferscan[i])) {case 1:word[charcount++] = Bufferscan[i];state = 20;break;case 2:word[ charcount++] = Bufferscan[i];state = 20;break;case 3:word[charcount++] = Bufferscan[i];state = 30;break;case 0:if (buffer Scan[i] = = '. ') {word[charcount++] = Bufferscan[i];state = 20;break;} Word[charcount] = ' + '; i--;finish = 1;state = 50;break;default:word[charcount++] = Bufferscan[i]; break;} Break;case 3:switch (Charkind (Bufferscan[i])) {case 1:word[charcount++] = Bufferscan[i];state = 30;break;case 2:word[ charcount++] = Bufferscan[i];state = 30;break;case 3:word[charcount++] = Bufferscan[i];state = 30;break;case 0:word[ CharCount] = ' + '; i--;finish = 1;state = 50;break;default:word[charcount++] = Bufferscan[i]; break;} Break;case 4:switch (state) {case 40:SWItch (Charkind (Bufferscan[i])) {case 1:word[charcount] = '% '; i--;finish = 1;state = 50;break;case 2:word[charcount] = ' '; i--;finish = 1;state = 50;break;case 3:word[charcount] = ' + '; i--;finish = 1;state = 50;break;case 0:word[charCount++] = Bufferscan[i];state = 40;break;default:word[charcount++] = Bufferscan[i]; break;} Break;case 41:word[charcount++] = bufferscan[i];if (bufferscan[i] = = ' "') {if (Charkind (bufferscan[i-1]) = = 4) {}else{wor D[charcount] = ' + '; finish = 1;state = 50;}} Break;case 42:word[charcount++] = bufferscan[i];if (bufferscan[i] = = ' \ ') {Word[charcount] = ' n '; finish = 1;state = 50;} Break;case 43:if (bufferscan[i] = = ' = ') {word[charcount++] = Bufferscan[i];state = 43;} Else{word[charcount] = ' + '; finish = 1;i--;state = 50;} break;default:word[charcount++] = Bufferscan[i]; break;} Break;case 5:finish = 0;state = 0;charcount = 0;i--;wordkind (word); break;default:break;} if (bufferscan[i + 1] = = ' + ') {Word[charcount] = ' + '; wordkind (Word);}}}
also note: In the experimental requirements, the length of the identifier of more than 7 is truncated directly. If normal processing is required, delete the red callout in the code.
(iii) Effect:
This project is all source on the individual Github , welcome All Star and Fork learning ha.
Compiler principle: C Language Lexical analyzer