Redy lexical recognition-Design and Implementation of scanners

Source: Internet
Author: User
Back to document Homepage

(1) Introduction

Download the code: git clone git: // git.code.sf.net/p/redy/code redy-code

This chapter includes:

Design and Implementation of scanners

(2) Design and implementation of scanners

In the previous chapter, I introduced how to implement an efficient input buffer and briefly introduced the scanner. The scanner scans the source code of the language and identifies words consisting of these character sequences for subsequent syntax analysis.

In redy, the base text is composed of floating-point numbers, integers, long integers, operators, variables, strings, and keywords, the scanner's task is to identify one word after another from the input character sequence. Step 1: Use a struct to represent the scanner
#define SCANNER_DEFALUT_LITERIAL_SIZE 128struct scanner{struct lex_file* s_lf;int s_cur_token;int s_line;char* s_cur_literal;int s_literial_size;};

The scanner struct scanner has a total of five members, of which

  1. S_lf pointing to the input buffer of the scanned File
  2. S_cur_token indicates the word recognized by the scanner.
  3. S_line indicates the number of lines in the source file scanned by the scanner. It is used to accurately display the wrong position when scanning the wrong word.
  4. S_cur_literal
  5. S_literial_size: the size of the space in s_cur_literal
Step 2: create and destroy a scanner

There are two ways to create a scanner: one is to create a scanner for a given file name, and the other is to create a scanner from an opened file. Both functions call SC _init to initialize the scanner.

static void sc_init(struct scanner* sc,struct lex_file* lf){sc->s_lf=lf;sc->s_cur_token=TOKEN_UNKOWN;sc->s_line=1;sc->s_cur_literal=(char*)malloc(SCANNER_DEFALUT_LITERIAL_SIZE);sc->s_literial_size=SCANNER_DEFALUT_LITERIAL_SIZE;}struct scanner* sc_create(char* filename){struct lex_file* lf=lf_create(filename);if(lf==NULL){WARN("Open file[%s] Failed",filename);return NULL;}struct scanner* sc=(struct scanner*)malloc(sizeof(*sc));sc_init(sc,lf);return sc;}struct scanner* sc_stream_create(FILE* file){struct lex_file* lf=lf_stream_create(file);if(lf==NULL){WARN("Create Scanner Failed");return NULL;}struct scanner* sc=(struct scanner*)malloc(sizeof(*sc));sc_init(sc,lf);return sc;}

Scanner destruction

void sc_destory(struct scanner* sc){lf_destory(sc->s_lf);free(sc->s_cur_literal);free(sc);}

Step 3: scan the source file and return a word scanned by the scanner.

The scanner identifies the Character Sequence of the state machine in front of us. The starting state of the state machine is me_begin. The scanner obtains a character from the input buffer and then calls the state_next function to obtain the successor State of the current state, there are three follow-up statuses:

  1. The final State indicates that the scanner has scanned a Word file, but the scanner uses the largest recognition method. Therefore, the scanner also scans back to see if it can scan a word with a larger length. If not, therefore, you must call the lf_mark function to mark the current position to return to this position.
  2. The error status (lex_state_err) indicates that the Character Sequence scanned by the scanner cannot constitute a word. Therefore, you need to stop scanning and check whether the scanning has reached the final state before, if not, it indicates that an incorrect Character Sequence is found in the source program. If it has arrived, the previously recognized word is returned.
  3. Normal status, Continue scanning
At last, the scanner copies the scanned word to the member s_cur_literal through the SC _set_cur_literial function.

static void sc_set_cur_literial(struct scanner* sc,char* buf,int length){if(sc->s_literial_size<length+1){char* new_space=(char*)malloc(length+1);free(sc->s_cur_literal);sc->s_cur_literal=new_space;}memcpy(sc->s_cur_literal,buf,length);sc->s_cur_literal[length]='\0';}int sc_next_token(struct scanner* sc){struct lex_file* lf=sc->s_lf;char cur;char next=lf_next_char(lf);struct state* cur_state=&me_begin;struct state* next_state;struct state* finnal_state=NULL;while(1){cur=next;if(cur==EOF){sc->s_cur_token=TOKEN_EOF;break;}if(cur=='\n'){sc->s_line++;}next_state=state_next(cur_state,cur);if(next_state==&lex_state_err){if(finnal_state==NULL){sc->s_cur_token=TOKEN_ERR;}else{sc->s_cur_token=finnal_state->s_token;}break;}if(state_final(next_state)){finnal_state=next_state;lf_mark(lf);}next=lf_next_char(lf);cur_state=next_state;}sc_set_cur_literial(sc,lf->l_buf+lf->l_begin,lf->l_mark-lf->l_begin);lf_reset_to_mark(lf);return sc->s_cur_token;}

Step 4: Write a small program to test the scanner

int main(int argc,char** argv){if(argc<2){printf("usage %s [filename]\n",argv[0]);exit(0);}struct scanner* sc=sc_create(argv[1]);int token;int i=0;while(1){i++;if(i%5==0){printf("\n");}token=sc_next_token(sc);if(token==TOKEN_EOF){break;}if(token==TOKEN_ERR){goto err;}if(token==TOKEN_ID){if(symbol_type(sc_token_string(sc))==TOKEN_ID){printf("{variable,%s}  ",sc_token_string(sc));}else{printf("{keyword,%s}  ",sc_token_string(sc));}continue;}if(token==TOKEN_ANNO){continue;}if(token==TOKEN_WS){continue;}if(token==TOKEN_NEWLINE){printf("{newline}  ");continue;}printf("{%s,%s}  ",token_name(token),sc_token_string(sc));};return 0;err:printf("err token at line %d\n",sc->s_line);return -1;}

Running result:

Now we use a scanner to scan a small redy program to see how it works.

Redy program:

a=random()b=random()if a+b/2==557a.inc()if a/2a.dec()elsea.inc()endelif a+b/3==6a.dec()else b=a/2endprint aprint b

Running result

You can find the above programs under the folder tutorial/Lexical/scripts. After compiling the program, you can find the executable files in the bin directory, and put the test data under the folder debug_data.

Back to document Homepage

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.