Compiler principle: C Language Lexical analyzer

Source: Internet
Author: User

The experiment of compiling principle: completing the lexical analysis of C language


Let's talk about the overall framework:

Base class: Base encapsulates some basic character-judging functions, as follows:

int Charkind (char c);//Judge character type int spaces (char c); Whether the current space can eliminate int characters (char c);//is the letter int keyword (char str[]);//is the keyword int signwords (char str[]);//is the identifier int Numbers (char c);//Whether it is a numeric int integers (char str[]);//is an integer int floats (char str[]);//Whether it is a floating-point type


The derived class LexAn inherits the base and encapsulates the functions for line and word processing, as follows:

void Scanwords (); Handle each line void clearnotes ();//clear comments and extra spaces void getwords (int state),//process the word void Wordkind (char str[]);//Determine the type of word and output

The call relationship between functions is as follows:




Well, the overall framework is over, and we have a concrete implementation:


(i) Clear comments and extra spaces


(1) C language comments have//and/* Two forms, so if the current read in is/only to determine the next:

If it is/then the bank//After the affirmation is a comment, only need to save the comments, update the current line;

If it is *, then look for the until/* position, save the comment, update the current line, and then proceed with the operation (there may be multiple/*//).

Insufficient: The cross-line comment cannot be processed.

(2) handling extra space Here is more hasty, only to deal with the shape of if (a >= b), that is, the special symbol and the letter (number) between the space; as long as the spaces have special symbols at each end, then removing the current space will not cause an error.


void Lexan::clearnotes () {int I, j, k;int notecount = 0;int flag = 0;char note[100];/* comment */for (i = 0; bufferin[buffernum][i ]! = '; i++) {if (bufferin[buffernum][i] = = ' "') {flag = 1-flag;continue;} if (bufferin[buffernum][i] = = '/' && flag = = 0) {if (bufferin[buffernum][i + 1] = = '/') {for (j = i; bufferin[buffer NUM][J]! = ' + '; J + +) {note[notecount++] = bufferin[buffernum][j];} Note[notecount] = ']\n '; notecount = 0;fprintf (Fout, "[%s]----[Note bufferin[buffernum][i", note); if (bufferin[buffernum][i + 1] = = ' * ') {note[notecount++] = '/'; note[notecount++] = ' * '; for (j = i + 2; Bufferin[buffernum] [j]! = ' + '; J + +) {note[notecount++] = bufferin[buffernum][j];if (bufferin[buffernum][j] = = ' * ' && bufferin[buffernum][j + 1] = = '/') {j + = 2;note[notecount++] = Bufferin[buffernum][j];note[notecount] = ' + '; notecount = 0;fprintf (Fout, "[%s]-- --[note]\n, note); break;}} for (; Bufferin[buffernum][j]! = ' + '; j + +, i++) {Bufferin[buffernum][i] = Bufferin[buFFERNUM][J];} if (bufferin[buffernum][j] = = ' + ') {bufferin[buffernum][i] = '% ';}}}} Space for (i = 0, flag = 0; Bufferin[buffernum][i]! = ' + '; i++) {if (bufferin[buffernum][i] = = ' "') {flag = 1-flag;continu e;} if (bufferin[buffernum][i] = = "&& flag = = 0) {for (j = i + 1; bufferin[buffernum][j]! = ' + ' && Bufferin [Buffernum] [j] = = "; J + +) {}if (bufferin[buffernum][j] = = ' + ') {bufferin[buffernum][i] = ' + '; break;} if (bufferin[buffernum][j]! = ' spaces ' && (((bufferin[buffernum][j]) = = 1) | | (i > 0 && spaces (bufferin[buffernum][i-1]) = = 1))) {for (k = i; bufferin[buffernum][j]! = ' + '; j + +, k++) {bufferin[buffernum][k] = bufferin[buffernum][j];} Bufferin[buffernum][k] = ' + '; i--;}}} tab for (i = 0, flag = 0; Bufferin[buffernum][i]! = ' + '; i++) {if (bufferin[buffernum][i] = = ' \ t ') {for (j = i; bufferin[ BUFFERNUM][J]! = ' + '; J + +) {Bufferin[buffernum][j] = bufferin[buffernum][j + 1];} i =-1;}}}

(ii) The most important transformation of the state machine


Paint is not very good, I try to use the language to clear the description, we also need to combine the source analysis:

Mainly divided into < letters, 1> < numbers, 2> <$ _, 3> <4,/> (escaped) < =,5> <0,else >

The state initial value is set to 0:

(1) If the first character is a letter, then it can only be the identifier and the keyword, after which it encounters the end of the character except the number, the letter, the $,_, and the word.

(2) If the first character is a number, then it can only be a number, that is, octal, hexadecimal,. , number, $, after the end of the character except the above, remove the word.

(3) If the first is $ _, then only the identifier, that is, the letter, number, $, after the end of the character except the above, remove the word.

(4) If the first is a special character (". () = etc.), then separate processing, the process and the above-mentioned consistency, encountered the impossible combination end; This part looks at the code.


State machine void Lexan::getwords (int.) {char Word[100];int charcount = 0;int finish = 0;int Num;int I, J, k;for (i = 0; buffe Rscan[i]! = ' + '; i++) {switch (STATE/10) {case 0:switch (Charkind (Bufferscan[i])) {case 1:word[charcount++] = Bufferscan[i];state = 10; Break;case 2:word[charcount++] = Bufferscan[i];state = 20;break;case 3:word[charcount++] = Bufferscan[i];state = 30;  Break;case 0:case 5:word[charcount++] = Bufferscan[i];switch (Bufferscan[i]) {case ' "': state = 41;break;case ' \ ': state = 42;break;case ' (': Case ') ': Case ' {': Case '} ': Case ' [': Case '] ': case '; ': Case ', ': Case '. ': state = 50;word[charcou NT] = ' n '; finish = 1;break;case ' = ': state = 43;break;default:state = 40;break;} break;default:word[charcount++] = Bufferscan[i]; break;} Break;case 1:switch (Charkind (Bufferscan[i])) {case 1:word[charcount++] = Bufferscan[i];state = 10;break;case 2:word[ charcount++] = Bufferscan[i];state = 20;break;case 3:word[charcount++] = Bufferscan[i];state = 30;break;case 0:case 5:wor D[charcounT] = ' + '; num = 0;while (word[num]! = ' \ ") Num++;<span style=" color: #ff6600; " >//length of processing!! if (num>7) word[7] = ' + '; </span>i--;finish = 1;state = 50;break;default:word[charcount++] = Bufferscan[i]; break;} Break;case 2:switch (Charkind (Bufferscan[i])) {case 1:word[charcount++] = Bufferscan[i];state = 20;break;case 2:word[ charcount++] = Bufferscan[i];state = 20;break;case 3:word[charcount++] = Bufferscan[i];state = 30;break;case 0:if (buffer Scan[i] = = '. ') {word[charcount++] = Bufferscan[i];state = 20;break;} Word[charcount] = ' + '; i--;finish = 1;state = 50;break;default:word[charcount++] = Bufferscan[i]; break;} Break;case 3:switch (Charkind (Bufferscan[i])) {case 1:word[charcount++] = Bufferscan[i];state = 30;break;case 2:word[ charcount++] = Bufferscan[i];state = 30;break;case 3:word[charcount++] = Bufferscan[i];state = 30;break;case 0:word[ CharCount] = ' + '; i--;finish = 1;state = 50;break;default:word[charcount++] = Bufferscan[i]; break;} Break;case 4:switch (state) {case 40:SWItch (Charkind (Bufferscan[i])) {case 1:word[charcount] = '% '; i--;finish = 1;state = 50;break;case 2:word[charcount] = '  '; i--;finish = 1;state = 50;break;case 3:word[charcount] = ' + '; i--;finish = 1;state = 50;break;case 0:word[charCount++] = Bufferscan[i];state = 40;break;default:word[charcount++] = Bufferscan[i]; break;} Break;case 41:word[charcount++] = bufferscan[i];if (bufferscan[i] = = ' "') {if (Charkind (bufferscan[i-1]) = = 4) {}else{wor D[charcount] = ' + '; finish = 1;state = 50;}} Break;case 42:word[charcount++] = bufferscan[i];if (bufferscan[i] = = ' \ ') {Word[charcount] = ' n '; finish = 1;state = 50;} Break;case 43:if (bufferscan[i] = = ' = ') {word[charcount++] = Bufferscan[i];state = 43;} Else{word[charcount] = ' + '; finish = 1;i--;state = 50;} break;default:word[charcount++] = Bufferscan[i]; break;} Break;case 5:finish = 0;state = 0;charcount = 0;i--;wordkind (word); break;default:break;} if (bufferscan[i + 1] = = ' + ') {Word[charcount] = ' + '; wordkind (Word);}}}

also note: In the experimental requirements, the length of the identifier of more than 7 is truncated directly. If normal processing is required, delete the red callout in the code.


(iii) Effect:



This project is all source on the individual Github , welcome All Star and Fork learning ha.




Compiler principle: C Language Lexical analyzer

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.