Use python to design HTML-Based C language syntax to highlight the display program

Last Update:2018-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1st-school year semester

Compilation and translation principle

Course schedule report

Class 02 (II)

Student ID 19

Name: Liu Xiaoming

Score

Instructor Lu chaohui

I. design purpose

Deepen understanding of compilation principles, strengthen hands-on practice and program development capabilities, and improve the ability to analyze and solve problems.

Ii. design tasks

1. Word Recognition

C language constant

C language identifier

2. Text Processing of the program

Uppercase letters of all c comments

Uppercase of all reserved words in C Language

3. Recursive descent Analysis

Iii. Design Process

1. Overall Design

After reading the C language source file, generate the relevant lexical analysis and output the HTML file highlighted by the lexical analysis for display. In addition, the word symbol table is output. The generated HTML file is named out.html, and the word symbol table file is token.txt.

Run the following command: Enter the DIST folder and run main *. C. Here, replace * with the C language file name and suffix C, but you can also use other suffix names. To start with the default settings, double-click the run. BAT file in the DIST directory and analyze the sample. c file by default.

The program is divided into three modules: the HTML module provides details about html file generation; the wordfix module provides lexical analysis steps; the main module provides file I/O and overall program control.

2. html. py

HTML file details are implemented. Includes the following functions:

Writehead ()

Used to generate HTML file headers

Writeline (line)

It is used to output some data to HTML files and add two types of line breaks to achieve line breaks of HTML source files and HTML display formats respectively.

Writeident (line)

Output identifier to HTML file

Writekeyword (line)

Output keywords to HTML files

Writecomment (line)

Output comments and preprocessing string to HTML files

Writeconst (line)

Outputs constants such as constants and strings to HTML files.

Writeiterator (line)

Output operators and operators to HTML files

Writetail (line)

Output The End Of The HTML file and close the HTML file

Fixmark (instr)

Because the browser cannot display some special characters, it can only convert the HTML file to another string in advance. The fixmark function provides this conversion. The characters to be converted include white spaces and &, ",>, and <.

The HTML module provides the main method for unit testing. The module only has one OUTFILE member for storing the output HTML file handle globally.

3. Main. py

Provides program startup and file I/O operations. Includes the following functions:

Openfile (filename)

If file service is enabled with error handling, the file handle is returned if the file is opened successfully. If the file fails, false is returned.

Showfile (filename)

Provides File opening test and display functions

The main method provides services such as opening a file and setting global variables for each module. The Lexical analyzer is enabled.

4. wordfix. py

The lexical analysis module contains the following members:

Infile

Source file used for reading, using a subset of C Language

OUTFILE

HTML file handle for output, which is not directly used in the program

Tokenfile

File handle used to output word symbols

Outlines

Output source code after case-insensitive conversion, String Array

Htmllines

Output the converted HTML file, String Array

Identlist

Identifier list, String Array

Digitlist

Number Constant list, String Array

Stringlist

String list, string array, stores C language strings read from the program.

Keywords

List of Reserved Words in C language and a few C ++

The module contains the following functions:

Isdigit (d)

Returns true and false if the input character is a number.

Ischar (c)

Determines whether the entered character is a negative character, including uppercase and lowercase letters.

Isblank (B)

Determines whether the input character is blank. blank spaces include spaces, tabs, line breaks, and carriage returns.

Iskeyword (word)

Determines whether the input word is a reserved word. If yes, the return position is used. If not, false is returned.

Wordfix ()

Lexical analysis functions are written according to the state conversion graph in the document. Because the program needs to output a source program with format and retained comments, the scanner does not read a buffer every time, but a row each time. Recognizes and processes the read rows.

The module also provides the main function for unit testing.

5. Word Recognition Process

After each row is read, the blank line at the beginning of the line is removed and written into the HTML document. Then, check whether there are other symbols. If not, it is an empty line and continue to the next line. If the content is divided into several situations, the first character to be read, if it is an English letter or underline, will enter the identification of the identifier. The next step of the identifier is to allow English letters, underscores, and numbers to finally read other characters and stop. After obtaining the identifier, search for the reserved word table and determine whether it is a reserved word, and then process it separately.

If the first character is a number, it is used for digit recognition. The next character of a number can be a number or a decimal point. It is recognized and saved to the numerical Constant list.

If the following characters are '//' or '#', it is recognized as a single line comment. It is also displayed as a single line comment in the HTML file, but it is differentiated when the case is changed. End with a single line comment

If the following character is '/*', the program enters the multi-line comment state. This is to find the '*/' string in the next row. If yes, exit the multi-line comment state. Then, output multi-line comments. The next time you read a row, you can determine whether the comment is in the multi-line State. If the comment is still in the multi-line state, you can continue searching for the '*/' string. If the comment is found, the comment is exited, if not found, comments are output to the HTML file.

If '"' is read, that is, double quotation marks, it starts to enter the string recognition status. No character in the string is recognized as any other lexical symbol. Line breaks are not allowed for strings.

Followed by double-character operator processing, including the commonly used 12 operators.

There are also single-character operator processing, including 22 commonly used operators.

At the bottom of the list, all the exception symbols are used as other symbols and continue to be written to the source file.

The exception handling part of the program is mainly set to process indexes other than the serial number of the read string array. The cause is the unexpected end of the row. Here we need to set different processing methods for the end of the row according to the previous status.

Iv. Design Experience

In this course design, I have practiced the latest compilation principles and the Python language that has been in use for less than 20 days. I have some experience in practice. First of all, the knowledge in the compilation principle is very practical, and later programming is very specific guidance, specific to variable settings. Many tools in the compilation principles course, such as state transition graphs and finite automaton, are good tools that greatly reduce the difficulty of compiler design. The Lexical analyzer examples provided in the book also help me a lot.

In addition, the Python language was a language I first came into contact with in 25th day of last month. It gave me a deep impression that it is a language that makes people feel comfortable in programming. Compiling lexical analyzer using python is the only example of completing a task ahead of schedule in multiple programming practices this semester. After 11 hours of programming, I completed the compilation and debugging of around 460 lines of programs. This time also includes compiling C language examples and HTML file processing and color configuration.

In this course, in addition to using Windows systems due to time constraints, all other software tools are open-source software, including python2.4.2, vim6.3, notepad ++ 3.4, gcc3.3.3, and grep. This gives me the confidence to work with open source software.

# Setup. py
From distutils. Core import setup
Import py2exe

Setup (console = ["Main. py", "html. py", "wordfix. py"])

# Run: Python setup. py py2exe

#-*-Coding: gb2312 -*-

##### Functions #####

Import sys
Import OS
Import html
Import wordfix

########## Global variables ##########
# Infile =''
Outfile1_open('out.html ', 'w ')

########## Functions ##########
Def openfile (filename ):
'Return a file handle to read'

'If return a false then failed'
'Else return a file handle'
If OS. Path. isfile (filename) = false:
Return false
Try:
F = open (filename, 'R ')
Failed t ioerror, detail:
If STR (detail) [] = 'errno 2 ':
Return false
Else:
Print detail
Else:
Return F

Def showfile (filename ):
'Test a text file if it can show out'
# Print 'in showfile ()'
# F = open (filename, 'R ')
F = openfile (filename)
If f = false:
Print 'file not found! '
SYS. Exit ()
While true:
Line = f. Readline ()
If line = '':
Break
If line [-1] = '/N ':
Line = line [0: Len (line)-1]
Print line
F. Close ()

##########

If _ name __= = '_ main __':
# Main ()
Print 'main ()'
# Print 'wordfix v 0.1'
# Print 'copyright @ 1999-2006, Harry gashero Liu .'
If Len (SYS. argv) <2:
Print 'not enough Params, exit! '
SYS. Exit ()
Else:
# Print 'input file: ', SYS. argv [1]
Pass
# Showfile (SYS. argv [1])
# F. Close ()
F = openfile (SYS. argv [1])
If f = false:
Print 'open file failed'
Else:
Wordfix. infile = f
Html. OUTFILE = OUTFILE
Html. writehead ()
Wordfix. OUTFILE = OUTFILE
Tokenfile1_open('token.txt ', 'w ')
Wordfix. tokenfile = tokenfile
Wordfix. wordfix ()
Html. writetail ()
Print 'end of program'
Print wordfix. identlist
Print wordfix. digitlist
Print wordfix. stringlist
Print wordfix. outlines
Export fff1_open('othertxt.txt ', 'w ')
# For X in wordfix. outlines:
# Fff. Write (x + '/N ')
# Fff. Close ()
Effecfff1_open('list.txt ', 'w ')
# For X in wordfix. identlist:
# Fff. Write (x + '/N ')
# Fff. Close ()

#-*-Coding: gb2312 -*-

# The output file handle
OUTFILE =''

Def writehead ():
"Write a HTML file's header"
OUTFILE. Write ('<HTML> OUTFILE. Write ('<title> word fix result </title>/N ')
OUTFILE. Write ('

Def writeline (line ):
"Write a HTML section to file"
OUTFILE. Write (fixmark (line) + ' /N ')

Def writeident (line ):
"Write ident in gray"
OUTFILE. Write ('' +/
Fixmark (line) + ' ')

Def writekeyword (line ):
"Write keyword in green"
OUTFILE. Write (' ' +/
Fixmark (line) + ' ')

Def writecomment (line ):
"Write comment in light blue"
OUTFILE. Write ('' +/
Fixmark (line) + ' ')

Def writeconst (line ):
"Write const in red"
OUTFILE. Write ('' +/
Fixmark (line) + ' ')

Def writeoper (line ):
"Write operator in yellow"
OUTFILE. Write (' ' +/
Fixmark (line) + ' ')
Def writetail ():
"Write a HTML file's tail"
OUTFILE. Write ('</body>/n OUTFILE. Close ()

Def fixmark (instr ):
'Fix space to HTML space'
NEWC =''
For C in instr:
If C = '':
NEWC + =''
Elif c = '/t ':
NEWC + =''
Elif c = '&':
NEWC + = '&'
Elif c = '"':
NEWC + = '"'
Elif c = '> ':
NEWC + = '>'
Elif c = '<':
NEWC + = '<'
Elif c = '/N ':
NEWC + = ' '
Else:
NEWC + = C
Return NEWC

# Unit Testing
If _ name __= = '_ main __':
F=open('test.html ', 'w ')
OUTFILE = f
##########
Writehead ()
Writeident ('python ')
Writekeyword ('int void shit ')
Writeline ('')
Writecomment ('a comment then ')
Writeconst ('20140901 ')
Writeiterator ('** ++ -- New delete ')
Writetail ()

F. Close ()

#-*-Coding: gb2312 -*-

Import html

##### Global variables #####
Infile = ''# input file, read source program, file handle type, opened
OUTFILE = ''# output file, output HTML source code, file handle type, opened
Tokenfile = ''# Word symbol output file for lexical analysis, file handle type, opened
Outlines = [] # output string list, including modified source code
Htmllines = [] # output the HTML string list
Identlist = [] # output identifier table
Digitlist = [] # constant output table
Stringlist = [] # output string table
KEYWORDS = ['auto', 'Break', 'case', 'Char ', 'contine', 'default ',/
'Do ', 'double', 'else', 'entry', 'enum', 'extern', 'for ',/
'Float', 'Goto ', 'if', 'int', 'long', 'new', 'null', 'Register ',/
'Return ', 'short', 'signed ', 'SIZE', 'static', 'struct ',/
'Switch ', 'typedef', 'Union ', 'unsigned', 'void', 'wait']

Def isdigit (d ):
"Judge whether the input character is a number"
If D in ['0', '1', '2', '3', '4', '5', '6', '7', '8 ', '9']:
Return true
Else:
Return false

Def ischar (c ):
"Determines whether the input character is an English character, including uppercase and lowercase letters"
If (ord (c)> = ord ('A') and ord (c) <= ord ('Z') or/
(Ord (c)> = ord ('A') and ord (c) <= ord ('Z ')):
Return true
Else:
Return false

Def isblank (B ):
"Determines whether the input character is blank, including spaces, tabs, line breaks, and carriage return"
If B in ['', '/t','/N', '/R']:
Return true
Else:
Return false

Def iskeyword (Word ):
"Determining whether the entered identifier is a keyword"
Try:
Nnn = keywords. Index (word)
Return NNN
Failed t valueerror:
Return false

Def wordfix ():
'Word fix'
Newline =''
Ch =''
Start = 0
Nowpos = 0
WORD =''
State =''
# Initial
Html. OUTFILE = OUTFILE
While true:
Line = infile. Readline ()
If line = '':
Break
If line [-1] = '/N ':
Line = line [0: Len (line)-1]
# Start Process
Newline =''
Start = 0
Nowpos = 0
WORD =''
Print 'line: ', line
If State = 'multicomment ':
# In the multi-line comment status
Newline = line
Try:
If line. Index (R '*/')! =-1:
State =''
Failed t valueerror:
# The multiline comment Terminator is not found.
State = 'multicomment'
Outlines. append (newline. Upper ())
Html. writecomment (newline)
Html. writeline ('')
Continue
Else:
State =''
While true:
Try:
Start = nowpos
Nowpos + = 1
Ch = line [start]
# Print 'doing CHAR: ', CH,': ', ord (CH), 'nowpos =', nowpos
While isblank (CH ):
# Remove all spaces
State = 'blank'
Newline + = CH
Html. writecomment (CH)
Start + = 1
Nowpos + = 1
Ch = line [start: nowpos]
If CH = '':
Html. writeline ('')
Break
If ischar (CH) or CH = '_':
# Recognition identifier
State = 'ident'
Nowpos + = 1
Ch = line [nowpos-1: nowpos]
While ischar (CH) or isdigit (CH) or CH = '_':
Nowpos + = 1
Ch = line [nowpos-1: nowpos]
# If CH = '':
# Break
Nowpos-= 1
WORD = line [start: nowpos]
If iskeyword (Word) = false:
# Identifier
Identlist. append (word)
Newline + = word
Tokenfile. Write ('Id/T/t' + word + '/N ')
Html. writeident (word)
Else:
# Keywords
Newline + = word. Upper ()
Tokenfile. Write ('key/T/t' + word + '/N ')
Html. writekeyword (word)
Start = nowpos
Continue #========================================
If isdigit (CH ):
# Recognition constant
State = 'digit'
Nowpos + = 1
Ch = line [nowpos-1: nowpos]
While isdigit (CH) or CH = '.':
Nowpos + = 1
Ch = line [nowpos-1: nowpos]
# If CH = '':
# Break
Nowpos-= 1
WORD = line [start: nowpos]
Digitlist. append (word)
Newline + = word
Tokenfile. Write ('digit/T/t' + word + '/N ')
Html. writeconst (word)
Start = nowpos
Continue #=================================================
Elif (line [start: Start + 2] = '//') or CH = '#':
# Single line comment. The C language preprocessing is also used as a single line comment
State = 'singlecomment'
Print 'A Single comment'
WORD = line [start:]
If ch! = '#':
Newline + = word. Upper ()
Else:
Newline + = word
Html. writecomment (word)
Html. writeline ('')
Outlines. append (newline)
Break # =
Elif line [start: Start + 2] = '/*':
# Multi-line comment
State = 'multicomment'
Print 'go into multi comment'
Try:
# You can find the multiline comment Terminator.
Nowpos = line [start + 1:]. Index ('*/')
State =''
Nowpos + = (start + 3)
WORD = line [start:]
Newline + = word. Upper ()
Html. writecomment (word)
Html. writeline ('')
Outlines. append (newline)
Start = nowpos
Break
Failed t valueerror:
# The End Of The multi-line comment is not found in the row.
State = 'multicomment'
WORD = line [start:]
Newline + = word. Upper ()
Html. writecomment (word)
Html. writeline ('')
Outlines. append (newline)
Break # = =
Elif CH = '"':
# Recognize strings
State = 'string'
Try:
Nowpos = line [start + 1:]. Index ('"')
State =''
Nowpos + = (start + 2)
WORD = line [start: nowpos]
Newline + = word
Html. writeconst (word)
Stringlist. append (word)
Start = nowpos
Continue
Failed t valueerror:
# The end of the string is not found. It is an error and is not processed.
State =''
WORD = line [start:]
Newline + = word
Html. writeconst (word)
Html. writeline ('')
Outlines. append (newline)
Break
Elif line [start: Start + 2] = '+ +' or line [start: Start + 2] = '--' or/
Line [start: Start + 2] = 'or line [start: Start + 2] = '! = 'Or/
Line [start: Start + 2] = '<' or line [start: Start + 2] = '>' or/
Line [start: Start + 2] = '+ =' or line [start: Start + 2] = '-=' or/
Line [start: Start + 2] = '* =' or line [start: Start + 2] = '/=' or/
Line [start: Start + 2] = '&' or line [start: Start + 2] = '| ':
WORD = line [start: Start + 2]
Newline + = word
Nowpos + = 1
Html. writetasks (word)
Tokenfile. Write ('samples/T/t' + word + '/N ')
Continue
Elif CH = '+' or CH = '-' or CH = '(' or CH = ')' or/
Ch = '[' or CH = ']' or CH = '*' or CH = '/' or/
Ch = ',' or CH = '{' or CH = '}' or/
Ch = ';' or CH = '&' or CH = '%' or CH = '~ 'Or/
Ch = '|' or CH = '^' or CH = '? 'Or CH =': 'or/
Ch = '<' or CH = '> ':
State = 'signal'
WORD = CH
Newline + = word
Html. writetasks (word)
Tokenfile. Write ('samples/T/t' + word + '/N ')
Continue
Else:
State = 'other sign'
Newline + = CH
# Print 'doing CHAR: ', CH,': ', ord (CH), 'in else'
Tokenfile. Write ('sign/T/t' + CH + '/N ')
Continue
Failed t indexerror:
# Read to the end of a row
If State = 'blank ':
# Read the end of the row when processing the blank space
Outlines. append (newline)
# Html. writecomment (newline)
Break
Elif state = 'ident ':
# Read the end of a row when processing the identifier
WORD = line [start:]
Newline + = word
If iskeyword (Word) = false:
# Identifier
Identlist. append (word)
Newline + = word
Tokenfile. Write ('Id: '+ word +'/N ')
Html. writeident (word)
Else:
# Keywords
Newline + = word
Tokenfile. Write ('key: '+ word +'/N ')
Html. writekeyword (word)
Outlines. append (newline)
Html. writeline ('')
Break
Elif state = 'singlecomment ':
Print 'singlecomment here'
Elif state = 'digit ':
# Reading the end of a row when recognizing numbers
WORD = line [start:]
Newline + = word
Digitlist. append (word)
Tokenfile. Write ('digit: '+ word +'/N ')
Html. writeconst (word)
Break
Else:
# Html. writecomment (newline)
Html. writeline ('')
Outlines. append (newline)
Break

######### Main () to unit testing ##########
If _ name __= = "_ main __":
If isdigit ('4 '):
Print 'digit 4'
If ischar ('C '):
Print 'Char C'
If isblank (''):
Print 'blank'
If iskeyword ('int '):
Print 'keyword INT: ', iskeyword ('int ')
Print 'end'

######################################## ##########

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Use python to design HTML-Based C language syntax to highlight the display program

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Use python to design HTML-Based C language syntax to highlight the display program

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support