In-depth compilation principles-4-C language implementation of a simple lexical analyzer

Last Update:2013-11-25 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Introduction

Just say no, fake tricks.

In this section, we will conduct an experiment and use the C language to implement a simple lexical analyzer to deepen our understanding of lexical analysis. If you are interested, analyze the source code by yourself. It is quite simple and there is no flow chart. Please forgive me. Let's start.

4.1 experiment description

For example, for the source program:

Begin x: = 9: if x> 9 then x: = 2 * x + 1/3; end #

After lexical analysis, the following sequence is output:

<1, begin> <10, x> <18, :=> <11,9> <26, ;>< 2, if> ......

4.1.1 simple lexical analysis

(1) KEYWORDS:

Begin if then while do end

All keywords are in lowercase.

(2) operators and operators

: = +-*/<= <>>>= ;()#

(3) other words are Identifiers (IDS) and Integer constants (SUM), which are defined by the following regular formula:

ID = letter (letter | digit )*

NUM = digit *

(4) spaces are composed of spaces, tabs, and line breaks. Spaces are generally used to separate IDS, SUM, operators, operators, and keywords.

4.1.2 types of codes corresponding to various word symbols:

Table 4.2.1 types of word symbols

Word symbols

Category Code

Word symbols

Category Code

Bgin

: =

Then

Wile

End

Lettet (letter | digit )*

> =

Dight *

;

(

)

4.2 source code reference

[Html]
# Include <stdio. h>
# Include <string. h>

Char prog [80], token [8], ch;
Int syn, p, m, n, sum;
Char * rwtab [6] = {"begin", "if", "then", "while", "do", "end "};

Void scaner (void );

Main ()
{
P = 0;
Printf ("\ n please input a string (end with '#'): \ n ");

Do {
Scanf ("% c", & ch );
Prog [p ++] = ch;
} While (ch! = '#');

P = 0;
Do {
Scaner ();
Switch (syn)
{
Case 11:
Printf ("(%-10d % 5d) \ n", sum, syn );
Break;

Case-1:
Printf ("you have input a wrong string \ n ");
// Getch ();
Return 0;
Break;

Default:
Printf ("(%-10 s % 5d) \ n", token, syn );
Break;
}
} While (syn! = 0 );
// Getch ();
}

Void scaner (void)
{
Sum = 0;

For (m = 0; m <8; m ++)
Token [m ++] = NULL;

Ch = prog [p ++];
M = 0;

While (ch = '') | (ch = '\ n '))
Ch = prog [p ++];

If (ch <= 'Z') & (ch> = 'A') | (ch <= 'Z ') & (ch> = 'A ')))
{
While (ch <= 'Z') & (ch> = 'A') | (ch <= 'Z ') & (ch> = 'A') | (ch> = '0') & (ch <= '9 ')))
{
Token [m ++] = ch;
Ch = prog [p ++];
}

P --;
Syn = 10;

For (n = 0; n <6; n ++)
If (strcmp (token, rwtab [n]) = 0)
{
Syn = n + 1;
Break;
}
}
Else if (ch> = '0') & (ch <= '9 '))
{
While (ch> = '0') & (ch <= '9 '))
{
Sum = sum * 10 + ch-'0 ';
Ch = prog [p ++];
}
P --;
Syn = 11;
}
Else
{
Switch (ch)
{
Case '<':
Token [m ++] = ch;
Ch = prog [p ++];
If (ch = ')
{
Syn = 22;
Token [m ++] = ch;
}
Else
{
Syn = 20;
P --;
}
Break;

Case '> ':
Token [m ++] = ch;
Ch = prog [p ++];
If (ch = ')
{
Syn = 24;
Token [m ++] = ch;
}
Else
{
Syn = 23;
P --;
}
Break;

Case '+ ':
Token [m ++] = ch;
Ch = prog [p ++];
If (ch = '+ ')
{
Syn = 17;
Token [m ++] = ch;
}
Else
{
Syn = 13;
P --;
}
Break;

Case '-':
Token [m ++] = ch;
Ch = prog [p ++];
If (ch = '-')
{
Syn = 29;
Token [m ++] = ch;
}
Else
{
Syn = 14;
P --;
}
Break;

Case '! ':
Ch = prog [p ++];
If (ch = ')
{
Syn = 21;
Token [m ++] = ch;
}
Else
{
Syn = 31;
P --;
}
Break;

Case '= ':
Token [m ++] = ch;
Ch = prog [p ++];
If (ch = ')
{
Syn = 25;
Token [m ++] = ch;
}
Else
{
Syn = 18;
P --;
}
Break;

Case '*':
Syn = 15;
Token [m ++] = ch;
Break;

Case '/':
Syn = 16;
Token [m ++] = ch;
Break;

Case '(':
Syn = 27;
Token [m ++] = ch;
Break;

Case ')':
Syn = 28;
Token [m ++] = ch;
Break;

Case '{':
Syn = 5;
Token [m ++] = ch;
Break;

Case '}':
Syn = 6;
Token [m ++] = ch;
Break;

Case ';':
Syn = 26;
Token [m ++] = ch;
Break;

Case '\"':
Syn = 30;
Token [m ++] = ch;
Break;

Case '#':
Syn = 0;
Token [m ++] = ch;
Break;

Case ':':
Syn = 17;
Token [m ++] = ch;
Break;

Default:
Syn =-1;
Break;
}
}
Token [m ++] = '\ 0 ';
}

4.3 Summary:

Lexical analysis refers to reading a string from the source code sequence of the program cyclically, determining its attributes according to the lexical requirements, and then forming a lexical unit. For programming languages in reality, the lexical structure is complicated and generally expressed using regular expressions.

Author: rill_zhen

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

In-depth compilation principles-4-C language implementation of a simple lexical analyzer

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

In-depth compilation principles-4-C language implementation of a simple lexical analyzer

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support