Lexical analyzer (c)

Source: Internet
Author: User

Lexical analyzer: the theory of finite state machine is not difficult, but if we convert the theory of state machine into code, we need to think about the data structure design: [cpp] char charList [_ CHARLIST_SIZE] [15] = {0}; char charList_nu [_ CHARLIST_SIZE] = {0}; char charList_index = 0; char numList [_ NUMLIST_SIZE] [15] = {0}; char numList_nu [_ CHARLIST_SIZE] = {0}; char numList_index = 0; char delimilterList [_ DELIMILTER_SIZE] [15] = {0}; char delimilterList_nu [_ DELIMILTER_SIZE] = {0}; char delimilterList_index = 0; depends on the final output. To non-reserved words, numbers, and row numbers, we use three variables to record the length of the array represented by * _ index. Reserved Words: [cpp] char reserveList [_ RESERVE_NUM] [15] = {"void", "int", "char", "float", "double", "while ", "auto", "break", "case", "const", "continue", "default", "do", "else", "enum", "extern ", "for", "goto", "if", "long", "return", "short", "signed", "sizeof", "static", "struct ", "switch", "typedef", "union", "unsigned", "volatile", "redister"}; reserved words found after Google. I have not found main, it can be seen that main should be processed during preprocessing. State Machine: because the content in the book is incomplete, '_' is added to the isLetter's judgment when writing, +-signs are added to the judgment when judging numbers, and double quotation marks. Content to be compiled [cpp] void _ fun () {} int main () {int a =-111; int B = + 1; printf ("% d, ); return 0;} the code is correct. After running, we can see that the characters and constants have already been separated. Here I cannot, it is too troublesome. When double quotation marks are processed, when the first quotation mark is matched, It is the initial state of the state machine, and the next quotation mark reaches the final state, but if it is an error code (such ), that is, the final state cannot be reached. What should I do? In this case, I did not handle it separately. The output result is as follows: we can see that there is a total number of rows total nu is 7, which is wrong. Because we cannot match the second quotation mark when processing the initial state of the quotation mark, we naturally always regard/n as the content in the quotation mark and jump to the state machine where the quotation mark is the first State. As for the rollback pointer, since I did not read a character array at a time, I used a built-in function ungetc to put the character back into the stream. to output it in text, I used freopen, then, fprintf can be used. If fputc is used, only one input is allowed. Processing of numbers (for example): the first is the correct input, and the last two are unidentifiable parts. What I originally thought was that this recognition was definitely wrong, because it also identified the parts that should not be recognized, that is, there was a syntax error. But later I thought it was right, because an error is reported directly as long as there is something I don't know. We only need to pay attention to the parts to be identified. Here we can clearly see that I have identified-3.12e + 1.11. If you want line 2 and line 3 to not display things, you need to consider the problem that when the final state appears, such as recognition + 1e-e, first recognize + 1e -, then, when identifying e, an error occurs, and the final state cannot be reached. Then use ungetc to roll back and then read e again. Here it is treated as a character, that is, e is displayed in line 2, which is the reason. So does Line3. However, as I said before, an error can be reported if something cannot be identified. Here, I return an error, which is identified by the lastRetval global variable. This is the same as getLastError in windows programming. Some of them are incorrect because they are currently learned and written. I hope you can give me some advice. At present, the syntax analysis and semantic analysis are still unclear. Code: [cpp] # include <stdio. h> # include <iostream> # include <cstring> # define NUMERROR-1 # define _ RESERVE_NUM 32 # define _ DELIMILTER_NUM 8 # define _ DELIMILTER_SIZE 100 # define _ CHARLIST_SIZE 100 # define _ NUMLIST_SIZE 100 # define _ TOKEN_SIZE 100 # define COL 1000 # define LT 1 # define LE 2 # define EQ 3 using namespace std; FILE * fp; int lastRetval = 0; char charList [_ CHARLIST_SIZE] [15] = {0}; char cha RList_nu [_ CHARLIST_SIZE] = {0}; char charList_index = 0; char numList [_ NUMLIST_SIZE] [15] = {0}; char numList_nu [_ CHARLIST_SIZE] = {0 }; char numList_index = 0; char delimilterList [_ DELIMILTER_SIZE] [15] = {0}; char delimilterList_nu [_ DELIMILTER_SIZE] = {0}; char delimilterList_index = 0; char reserveList [_ RESERVE_NUM] [15] = {"void", "int", "char", "float", "double", "while", "auto ", "break", "case", "Const", "continue", "default", "do", "else", "enum", "extern", "for", "goto ", "if", "long", "return", "short", "signed", "sizeof", "static", "struct", "switch", "typedef ", "union", "unsigned", "volatile", "redister"}; char delimilter [_ DELIMILTER_NUM] [5] = {"+ ","-","*", "/", "<", ";", "<=", "=" // six plus two}; void nu_print (int nu) {int I, cindex, nindex, flag, hasPrint; cindex = nindex = 0; printf ("\ n ====================== each line to see ======== =============================\ n "); for (I = 0; I <nu; I ++) {hasPrint = 0; for (; cindex <= charList_index | nindex <= numList_index;) {flag = 0; if (charList_nu [cindex] = I + 1) {if (0 = hasPrint) {printf ("\ nline % d \ n", I + 1 ); hasPrint = 1;} printf ("% s", charList [cindex]); ++ cindex; flag = 1;} if (numList_nu [nindex] = I + 1) {if (0 = hasP Rint) {printf ("\ nline % d \ n", I + 1); hasPrint = 1;} printf ("% s", numList [nindex]); ++ nindex; flag = 1 ;}if (0 = flag) break ;}}void _ print (int nu) {int I, j; printf ("\ n ================= char of list ===============================\ n "); for (I = 0; I <charList_index; I ++) {printf ("% s nu: % d \ n", charList [I], charList_nu [I]); fprintf (fp, "% s nu: % d \ n", charList [I], charList_nu [I]);} printf ("\ n =================== c Onst number of list =========================\ n "); for (I = 0; I <numList_index; I ++) {printf ("% s nu: % d \ n", numList [I], numList_nu [I]); fprintf (fp, "% s nu: % d \ n ", numList [I], numList_nu [I]);} printf ("\ ntotal nu: % d \ n", nu);} bool isLetter (char) {if (a <= 'Z' & a> = 'A') | (A <= 'Z' & a> = 'A ') | '_' = a) {return true;} else return false;} bool isDigit (char) {if (a <= '9' & a> = '1') {retur N true;} else return false;} void concatenation (char token [_ TOKEN_SIZE], char str) {int len = strlen (token); token [len] = str ;} int reserve (char token [_ TOKEN_SIZE]) {int I, j; for (I = 0; I <_ RESERVE_NUM; I ++) {if (! Strcmp (& reserveList [I] [0], token) {return 1 ;}}for (I = 0; I <_ DELIMILTER_NUM; I ++) {if (! Strcmp (& delimilter [I] [0], token) {return 2 ;}} return 0 ;}int buildCharList (char token [_ TOKEN_SIZE]) {strcpy (& charList [charList_index] [0], token); ++ charList_index; return charList_index-1;} int buildNumList (char token [_ TOKEN_SIZE]) {strcpy (& numList [numList_index] [0], token); ++ numList_index; return numList_index-1;} int analysisCode (char str, int & nu) {int num; char token [_ TOKEN_SIZE]; me Mset (token, 0, sizeof (token); if ('\ n' = str) {++ nu; return' \ n ';} else if (isLetter (str) {while (isLetter (str) | isDigit (str) {concatenation (token, str); str = getchar ();} ungetc (str, stdin); int type = reserve (token); if (0 = type) {num = buildCharList (token); charList_nu [num] = nu ;} memset (token, 0, sizeof (token); return num;} else if (isDigit (str) | '+' = str | '-' = str) {if (NU MERROR = lastRetval) {return NUMERROR;} int dotFlag, eFlag, numFlag, fFlag; int eNum, dotNum, fNum; dotFlag = eFlag = 0; numFlag = fFlag = 1; eNum = dotNum = fNum = 0; while (isDigit (str) | 'E' = str | '. '= str |' + '= str |'-'= str) {if ('E' = str) {if (0 = eFlag | 1 = eNum) {ungetc (str, stdin); return NUMERROR;} dotFlag = 0; eFlag = 0; numFlag = 1; fFlag = 1; ++ eNum; dotNum = 0; fNum = 0;} else if ('+' = str | '-' = str) {if (0 = fFlag | 1 = fNum) {ungetc (str, stdin); return NUMERROR;} dotFlag = 0; eFlag = 0; numFlag = 1; fFlag = 0; ++ fNum;} else if ('. '= str) {if (0 = dotFlag | 1 = dotNum) {ungetc (str, stdin); return NUMERROR;} dotFlag = 0; eFlag = 0; numFlag = 1; fFlag = 0; ++ dotNum;} else if (isDigit (str) {dotFlag = 1; eFlag = 1; numFlag = 1; fFlag = 0;} concatenation (token, str); str = getchar ();} ungetc (str, stdin); num = buildNumList (token); numList_nu [num] = nu; memset (token, 0, sizeof (token); return num;} else if ('"' = str) {int flag = 0; while (0 = flag) {concatenation (token, str); str = getchar (); if ('"' = str) {flag = 1 ;}} concatenation (token, str );} else {for (int I = 0; I <6; I ++) {if (delimilter [I] [0] = str) {return Str ;}} if ('<' = str) {str = getchar (); if (' = str) {return LE;} ungetc (str, stdin); return LT;} if ('= str) {str = getchar (); if (' = str) {return EQ ;} ungetc (str, stdin); return '=';} // return NUMERROR;} int main () {freopen ("t1.txt", "rw", stdin ); fp = fopen ("D: // file.txt", "w"); char str; char token [_ TOKEN_SIZE]; int nu = 1; memset (token, 0, sizeof (token); while (scanf (" % C ", & str )! = EOF) {lastRetval = analysisCode (str, nu);} _ print (nu-1); nu_print (nu-1); fclose (stdin); return 0 ;}

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.