Use python to design HTML-Based C language syntax to highlight the display program1st-school year semester Compilation and translation principle Course schedule report Class 02 (II) Student ID 19 Name: Liu Xiaoming Score Instructor Lu chaohui I. design purpose Deepen understanding of compilation principles, strengthen hands-on practice and program development capabilities, and improve the ability to analyze and solve problems. Ii. design tasks 1. Word Recognition C language constant C language identifier 2. Text Processing of the program Uppercase letters of all c comments Uppercase of all reserved words in C Language 3. Recursive descent Analysis Iii. Design Process 1. Overall Design After reading the C language source file, generate the relevant lexical analysis and output the HTML file highlighted by the lexical analysis for display. In addition, the word symbol table is output. The generated HTML file is named out.html, and the word symbol table file is token.txt. Run the following command: Enter the DIST folder and run main *. C. Here, replace * with the C language file name and suffix C, but you can also use other suffix names. To start with the default settings, double-click the run. BAT file in the DIST directory and analyze the sample. c file by default. The program is divided into three modules: the HTML module provides details about html file generation; the wordfix module provides lexical analysis steps; the main module provides file I/O and overall program control. 2. html. py HTML file details are implemented. Includes the following functions: Writehead () Used to generate HTML file headers Writeline (line) It is used to output some data to HTML files and add two types of line breaks to achieve line breaks of HTML source files and HTML display formats respectively. Writeident (line) Output identifier to HTML file Writekeyword (line) Output keywords to HTML files Writecomment (line) Output comments and preprocessing string to HTML files Writeconst (line) Outputs constants such as constants and strings to HTML files. Writeiterator (line) Output operators and operators to HTML files Writetail (line) Output The End Of The HTML file and close the HTML file Fixmark (instr) Because the browser cannot display some special characters, it can only convert the HTML file to another string in advance. The fixmark function provides this conversion. The characters to be converted include white spaces and &, ",>, and <. The HTML module provides the main method for unit testing. The module only has one OUTFILE member for storing the output HTML file handle globally. 3. Main. py Provides program startup and file I/O operations. Includes the following functions: Openfile (filename) If file service is enabled with error handling, the file handle is returned if the file is opened successfully. If the file fails, false is returned. Showfile (filename) Provides File opening test and display functions The main method provides services such as opening a file and setting global variables for each module. The Lexical analyzer is enabled. 4. wordfix. py The lexical analysis module contains the following members: Infile Source file used for reading, using a subset of C Language OUTFILE HTML file handle for output, which is not directly used in the program Tokenfile File handle used to output word symbols Outlines Output source code after case-insensitive conversion, String Array Htmllines Output the converted HTML file, String Array Identlist Identifier list, String Array Digitlist Number Constant list, String Array Stringlist String list, string array, stores C language strings read from the program. Keywords List of Reserved Words in C language and a few C ++ The module contains the following functions: Isdigit (d) Returns true and false if the input character is a number. Ischar (c) Determines whether the entered character is a negative character, including uppercase and lowercase letters. Isblank (B) Determines whether the input character is blank. blank spaces include spaces, tabs, line breaks, and carriage returns. Iskeyword (word) Determines whether the input word is a reserved word. If yes, the return position is used. If not, false is returned. Wordfix () Lexical analysis functions are written according to the state conversion graph in the document. Because the program needs to output a source program with format and retained comments, the scanner does not read a buffer every time, but a row each time. Recognizes and processes the read rows. The module also provides the main function for unit testing. 5. Word Recognition Process After each row is read, the blank line at the beginning of the line is removed and written into the HTML document. Then, check whether there are other symbols. If not, it is an empty line and continue to the next line. If the content is divided into several situations, the first character to be read, if it is an English letter or underline, will enter the identification of the identifier. The next step of the identifier is to allow English letters, underscores, and numbers to finally read other characters and stop. After obtaining the identifier, search for the reserved word table and determine whether it is a reserved word, and then process it separately. If the first character is a number, it is used for digit recognition. The next character of a number can be a number or a decimal point. It is recognized and saved to the numerical Constant list. If the following characters are '//' or '#', it is recognized as a single line comment. It is also displayed as a single line comment in the HTML file, but it is differentiated when the case is changed. End with a single line comment If the following character is '/*', the program enters the multi-line comment state. This is to find the '*/' string in the next row. If yes, exit the multi-line comment state. Then, output multi-line comments. The next time you read a row, you can determine whether the comment is in the multi-line State. If the comment is still in the multi-line state, you can continue searching for the '*/' string. If the comment is found, the comment is exited, if not found, comments are output to the HTML file. If '"' is read, that is, double quotation marks, it starts to enter the string recognition status. No character in the string is recognized as any other lexical symbol. Line breaks are not allowed for strings. Followed by double-character operator processing, including the commonly used 12 operators. There are also single-character operator processing, including 22 commonly used operators. At the bottom of the list, all the exception symbols are used as other symbols and continue to be written to the source file. The exception handling part of the program is mainly set to process indexes other than the serial number of the read string array. The cause is the unexpected end of the row. Here we need to set different processing methods for the end of the row according to the previous status. Iv. Design Experience In this course design, I have practiced the latest compilation principles and the Python language that has been in use for less than 20 days. I have some experience in practice. First of all, the knowledge in the compilation principle is very practical, and later programming is very specific guidance, specific to variable settings. Many tools in the compilation principles course, such as state transition graphs and finite automaton, are good tools that greatly reduce the difficulty of compiler design. The Lexical analyzer examples provided in the book also help me a lot. In addition, the Python language was a language I first came into contact with in 25th day of last month. It gave me a deep impression that it is a language that makes people feel comfortable in programming. Compiling lexical analyzer using python is the only example of completing a task ahead of schedule in multiple programming practices this semester. After 11 hours of programming, I completed the compilation and debugging of around 460 lines of programs. This time also includes compiling C language examples and HTML file processing and color configuration. In this course, in addition to using Windows systems due to time constraints, all other software tools are open-source software, including python2.4.2, vim6.3, notepad ++ 3.4, gcc3.3.3, and grep. This gives me the confidence to work with open source software. # Setup. py From distutils. Core import setup Import py2exe Setup (console = ["Main. py", "html. py", "wordfix. py"]) # Run: Python setup. py py2exe #-*-Coding: gb2312 -*- ##### Functions ##### Import sys Import OS Import html Import wordfix ########## Global variables ########## # Infile ='' Outfile1_open('out.html ', 'w ') ########## Functions ########## Def openfile (filename ): 'Return a file handle to read' 'If return a false then failed' 'Else return a file handle' If OS. Path. isfile (filename) = false: Return false Try: F = open (filename, 'R ') Failed t ioerror, detail: If STR (detail) [] = 'errno 2 ': Return false Else: Print detail Else: Return F Def showfile (filename ): 'Test a text file if it can show out' # Print 'in showfile ()' # F = open (filename, 'R ') F = openfile (filename) If f = false: Print 'file not found! ' SYS. Exit () While true: Line = f. Readline () If line = '': Break If line [-1] = '/N ': Line = line [0: Len (line)-1] Print line F. Close () ########## If _ name __= = '_ main __': # Main () Print 'main ()' # Print 'wordfix v 0.1' # Print 'copyright @ 1999-2006, Harry gashero Liu .' If Len (SYS. argv) <2: Print 'not enough Params, exit! ' SYS. Exit () Else: # Print 'input file: ', SYS. argv [1] Pass # Showfile (SYS. argv [1]) # F. Close () F = openfile (SYS. argv [1]) If f = false: Print 'open file failed' Else: Wordfix. infile = f Html. OUTFILE = OUTFILE Html. writehead () Wordfix. OUTFILE = OUTFILE Tokenfile1_open('token.txt ', 'w ') Wordfix. tokenfile = tokenfile Wordfix. wordfix () Html. writetail () Print 'end of program' Print wordfix. identlist Print wordfix. digitlist Print wordfix. stringlist Print wordfix. outlines Export fff1_open('othertxt.txt ', 'w ') # For X in wordfix. outlines: # Fff. Write (x + '/N ') # Fff. Close () Effecfff1_open('list.txt ', 'w ') # For X in wordfix. identlist: # Fff. Write (x + '/N ') # Fff. Close () #-*-Coding: gb2312 -*- # The output file handle OUTFILE ='' Def writehead (): "Write a HTML file's header" OUTFILE. Write ('<HTML> OUTFILE. Write ('<title> word fix result </title>/N ') OUTFILE. Write (' Def writeline (line ): "Write a HTML section to file" OUTFILE. Write (fixmark (line) + '<br>/N ') Def writeident (line ): "Write ident in gray" OUTFILE. Write ('<font size = 4 color = "# 0000ff">' +/ Fixmark (line) + '</font> ') Def writekeyword (line ): "Write keyword in green" OUTFILE. Write ('<font size = 4 color = "#00dd00"> <B>' +/ Fixmark (line) + '</B> </font> ') Def writecomment (line ): "Write comment in light blue" OUTFILE. Write ('<font size = 4 color = "# ff00ff">' +/ Fixmark (line) + '</font> ') Def writeconst (line ): "Write const in red" OUTFILE. Write ('<font size = 4 color = "# ff0000">' +/ Fixmark (line) + '</font> ') Def writeoper (line ): "Write operator in yellow" OUTFILE. Write ('<font size = 4 color = "#000000"> <B>' +/ Fixmark (line) + '</B> </font> ') Def writetail (): "Write a HTML file's tail" OUTFILE. Write ('</body>/n OUTFILE. Close () Def fixmark (instr ): 'Fix space to HTML space' NEWC ='' For C in instr: If C = '': NEWC + ='' Elif c = '/t ': NEWC + ='' Elif c = '&': NEWC + = '&' Elif c = '"': NEWC + = '"' Elif c = '> ': NEWC + = '>' Elif c = '<': NEWC + = '<' Elif c = '/N ': NEWC + = '<br>' Else: NEWC + = C Return NEWC # Unit Testing If _ name __= = '_ main __': F=open('test.html ', 'w ') OUTFILE = f ########## Writehead () Writeident ('python ') Writekeyword ('int void shit ') Writeline ('') Writecomment ('a comment then ') Writeconst ('20140901 ') Writeiterator ('** ++ -- New delete ') Writetail () F. Close () #-*-Coding: gb2312 -*- Import html ##### Global variables ##### Infile = ''# input file, read source program, file handle type, opened OUTFILE = ''# output file, output HTML source code, file handle type, opened Tokenfile = ''# Word symbol output file for lexical analysis, file handle type, opened Outlines = [] # output string list, including modified source code Htmllines = [] # output the HTML string list Identlist = [] # output identifier table Digitlist = [] # constant output table Stringlist = [] # output string table KEYWORDS = ['auto', 'Break', 'case', 'Char ', 'contine', 'default ',/ 'Do ', 'double', 'else', 'entry', 'enum', 'extern', 'for ',/ 'Float', 'Goto ', 'if', 'int', 'long', 'new', 'null', 'Register ',/ 'Return ', 'short', 'signed ', 'SIZE', 'static', 'struct ',/ 'Switch ', 'typedef', 'Union ', 'unsigned', 'void', 'wait'] Def isdigit (d ): "Judge whether the input character is a number" If D in ['0', '1', '2', '3', '4', '5', '6', '7', '8 ', '9']: Return true Else: Return false Def ischar (c ): "Determines whether the input character is an English character, including uppercase and lowercase letters" If (ord (c)> = ord ('A') and ord (c) <= ord ('Z') or/ (Ord (c)> = ord ('A') and ord (c) <= ord ('Z ')): Return true Else: Return false Def isblank (B ): "Determines whether the input character is blank, including spaces, tabs, line breaks, and carriage return" If B in ['', '/t','/N', '/R']: Return true Else: Return false Def iskeyword (Word ): "Determining whether the entered identifier is a keyword" Try: Nnn = keywords. Index (word) Return NNN Failed t valueerror: Return false Def wordfix (): 'Word fix' Newline ='' Ch ='' Start = 0 Nowpos = 0 WORD ='' State ='' # Initial Html. OUTFILE = OUTFILE While true: Line = infile. Readline () If line = '': Break If line [-1] = '/N ': Line = line [0: Len (line)-1] # Start Process Newline ='' Start = 0 Nowpos = 0 WORD ='' Print 'line: ', line If State = 'multicomment ': # In the multi-line comment status Newline = line Try: If line. Index (R '*/')! =-1: State ='' Failed t valueerror: # The multiline comment Terminator is not found. State = 'multicomment' Outlines. append (newline. Upper ()) Html. writecomment (newline) Html. writeline ('') Continue Else: State ='' While true: Try: Start = nowpos Nowpos + = 1 Ch = line [start] # Print 'doing CHAR: ', CH,': ', ord (CH), 'nowpos =', nowpos While isblank (CH ): # Remove all spaces State = 'blank' Newline + = CH Html. writecomment (CH) Start + = 1 Nowpos + = 1 Ch = line [start: nowpos] If CH = '': Html. writeline ('') Break If ischar (CH) or CH = '_': # Recognition identifier State = 'ident' Nowpos + = 1 Ch = line [nowpos-1: nowpos] While ischar (CH) or isdigit (CH) or CH = '_': Nowpos + = 1 Ch = line [nowpos-1: nowpos] # If CH = '': # Break Nowpos-= 1 WORD = line [start: nowpos] If iskeyword (Word) = false: # Identifier Identlist. append (word) Newline + = word Tokenfile. Write ('Id/T/t' + word + '/N ') Html. writeident (word) Else: # Keywords Newline + = word. Upper () Tokenfile. Write ('key/T/t' + word + '/N ') Html. writekeyword (word) Start = nowpos Continue #======================================== If isdigit (CH ): # Recognition constant State = 'digit' Nowpos + = 1 Ch = line [nowpos-1: nowpos] While isdigit (CH) or CH = '.': Nowpos + = 1 Ch = line [nowpos-1: nowpos] # If CH = '': # Break Nowpos-= 1 WORD = line [start: nowpos] Digitlist. append (word) Newline + = word Tokenfile. Write ('digit/T/t' + word + '/N ') Html. writeconst (word) Start = nowpos Continue #================================================= Elif (line [start: Start + 2] = '//') or CH = '#': # Single line comment. The C language preprocessing is also used as a single line comment State = 'singlecomment' Print 'A Single comment' WORD = line [start:] If ch! = '#': Newline + = word. Upper () Else: Newline + = word Html. writecomment (word) Html. writeline ('') Outlines. append (newline) Break # = Elif line [start: Start + 2] = '/*': # Multi-line comment State = 'multicomment' Print 'go into multi comment' Try: # You can find the multiline comment Terminator. Nowpos = line [start + 1:]. Index ('*/') State ='' Nowpos + = (start + 3) WORD = line [start:] Newline + = word. Upper () Html. writecomment (word) Html. writeline ('') Outlines. append (newline) Start = nowpos Break Failed t valueerror: # The End Of The multi-line comment is not found in the row. State = 'multicomment' WORD = line [start:] Newline + = word. Upper () Html. writecomment (word) Html. writeline ('') Outlines. append (newline) Break # = = Elif CH = '"': # Recognize strings State = 'string' Try: Nowpos = line [start + 1:]. Index ('"') State ='' Nowpos + = (start + 2) WORD = line [start: nowpos] Newline + = word Html. writeconst (word) Stringlist. append (word) Start = nowpos Continue Failed t valueerror: # The end of the string is not found. It is an error and is not processed. State ='' WORD = line [start:] Newline + = word Html. writeconst (word) Html. writeline ('') Outlines. append (newline) Break Elif line [start: Start + 2] = '+ +' or line [start: Start + 2] = '--' or/ Line [start: Start + 2] = 'or line [start: Start + 2] = '! = 'Or/ Line [start: Start + 2] = '<' or line [start: Start + 2] = '>' or/ Line [start: Start + 2] = '+ =' or line [start: Start + 2] = '-=' or/ Line [start: Start + 2] = '* =' or line [start: Start + 2] = '/=' or/ Line [start: Start + 2] = '&' or line [start: Start + 2] = '| ': WORD = line [start: Start + 2] Newline + = word Nowpos + = 1 Html. writetasks (word) Tokenfile. Write ('samples/T/t' + word + '/N ') Continue Elif CH = '+' or CH = '-' or CH = '(' or CH = ')' or/ Ch = '[' or CH = ']' or CH = '*' or CH = '/' or/ Ch = ',' or CH = '{' or CH = '}' or/ Ch = ';' or CH = '&' or CH = '%' or CH = '~ 'Or/ Ch = '|' or CH = '^' or CH = '? 'Or CH =': 'or/ Ch = '<' or CH = '> ': State = 'signal' WORD = CH Newline + = word Html. writetasks (word) Tokenfile. Write ('samples/T/t' + word + '/N ') Continue Else: State = 'other sign' Newline + = CH # Print 'doing CHAR: ', CH,': ', ord (CH), 'in else' Tokenfile. Write ('sign/T/t' + CH + '/N ') Continue Failed t indexerror: # Read to the end of a row If State = 'blank ': # Read the end of the row when processing the blank space Outlines. append (newline) # Html. writecomment (newline) Break Elif state = 'ident ': # Read the end of a row when processing the identifier WORD = line [start:] Newline + = word If iskeyword (Word) = false: # Identifier Identlist. append (word) Newline + = word Tokenfile. Write ('Id: '+ word +'/N ') Html. writeident (word) Else: # Keywords Newline + = word Tokenfile. Write ('key: '+ word +'/N ') Html. writekeyword (word) Outlines. append (newline) Html. writeline ('') Break Elif state = 'singlecomment ': Print 'singlecomment here' Elif state = 'digit ': # Reading the end of a row when recognizing numbers WORD = line [start:] Newline + = word Digitlist. append (word) Tokenfile. Write ('digit: '+ word +'/N ') Html. writeconst (word) Break Else: # Html. writecomment (newline) Html. writeline ('') Outlines. append (newline) Break ######### Main () to unit testing ########## If _ name __= = "_ main __": If isdigit ('4 '): Print 'digit 4' If ischar ('C '): Print 'Char C' If isblank (''): Print 'blank' If iskeyword ('int '): Print 'keyword INT: ', iskeyword ('int ') Print 'end' ######################################## ########## |