ArticleDirectory
- 1. Lexical Analysis
- 2. syntax analysis
- 3. Message Body Structure Management
- 4. Generate intermediate code
- 5. Todo
The previous blog explains how to automatically distribute messages in C ++. The key point is how to automatically generate the msg_dispatcher template class through the IDL file. Several netizens reminded me that the IDL parser would be hard to write, but that is true. In my first version, the IDL parser was used only for demo purposes. It took one night to piece together several Python functions. The msg_dispatcher class can be generated, but the parser'sCodeIt's so messy that there is no structure at all. To be honest, I need to develop, expand, and optimize the automatic message distribution framework in depth, so let's build a parser like a decent one. So I threw away the first version of parser code and implemented it again. Python is still used. Currently, only the CPP Code Generator is complete, and only the decode of the message body is supported. encode is not supported, and the syntax error reporting mechanism is not added. Not perfect, but after all, a good start is opened. Here we will talk about the implementation of the parser.
Complete sample code SVN Co http://ffown.googlecode.com/svn/trunk/fflib/lib/generator/
Demo IDL file: SVN Co http://ffown.googlecode.com/svn/trunk/fflib/lib/generator/example.idl
StructStudent_t
{
StructBook_t
{
Int16 pages;
};
StringAge;
};
1. Lexical Analysis
The advantage of using python is that it is very convenient to parse strings. First, you need to parse the IDL source file into a single word. I have defined a src_parser_t class to implement this function. The resolution is divided into the following steps:
1> read the content of the IDL source file
2> separate the source file content into a single row. You only need to split file_content_str.split ('\ n.
3> separate each line by space into a single word split ('').
4> remove the last character of a word with a semicolon.
The parsing code is as follows (only 80 rows are supported ):
From Pylib. inc Import *
Class Src_parser_t:
Def _ Init __ (Self, file ):
Self. File = File
Self. struct_def_mgr = struct_def_mgr_t ()
Self. file_content = ''
Self. all_words = []
F = open (file)
Self. file_content = f. Read ()
F. Close ()
Def Get_struct_def_mgr (Self ):
Return Self. struct_def_mgr
Def Parse_to_words (Self ):
All_line = self. file_content.split (' \ N ' )
For Line In All_line:
Words = line. Split ( ' ' )
For W In Words:
W = W. Strip ()
If W! = '' :
Self. all_words.append (W)
Def Build_struct_relation (Self ):
Struct_stack = []
Index = 0
While Index <Len (self. all_words ):
If Len (struct_stack) <1:
Struct_stack.append (self. struct_def_mgr)
Parent_struct = struct_stack [Len (struct_stack)-1]
Cur_word = self. all_words [Index]
If Cur_word = ' Struct ' :
Struct_def = struct_def_t (self. all_words [index + 1])
Parent_struct.add_struct (struct_def)
Struct_stack.append (struct_def)
Index = index + 1
Elif Cur_word = ' } ' Or Cur_word = ' }; ' :
Struct_stack.pop ()
Elif Cur_word = ' Int8 ' Or Cur_word = ' Int16 ' Or Cur_word = ' Int32 ' Or \
Cur_word = ' Float ' Or Cur_word = ' String ' :
Field_name = self. all_words [index + 1]. Split ( ' ; ' ) [0]
Field = field_def_t (field_name, cur_word, '' , '' )
Parent_struct.add_field (field)
Index = index + 1
Else :
If -1 = cur_word.find ( ' Dictionary ' ) And -1 = cur_word.find ( ' { ' ) And -1 = cur_word.find ( ' Array ' ):
Field_name = self. all_words [index + 1]. Split (' ; ' ) [0]
Field = field_def_t (field_name, cur_word, '' , '' )
Parent_struct.add_field (field)
Index = index + 1
Else :
Field_type = ''
Field_name = ''
Key_type =''
Val_type = ''
If -1! = Cur_word.find ( ' Array ' ):
Field_name = self. all_words [index + 1]. Split ( ' ; ' ) [0]
Word_split = cur_word.split ( ' < ' )
Field_type = word_split [0]
Key_type = word_split [1]. Split ( ' > ' ) [0]
Field = field_def_t (field_name, field_type, key_type, '' )
Parent_struct.add_field (field)
Index = index + 1
Elif -1! = Cur_word.find ( ' Dictionary ' ):
Field_name = self. all_words [index + 1]. Split ( ' ; ' ) [0]
Word_split = cur_word.split ( ' < ' )
Field_type = word_split [0]
Key_val_type = word_split [1]. Split ( ' > ' )
Key_type = key_val_type [0]. Split (' , ' ) [0]
Val_type = key_val_type [0]. Split ( ' , ' ) [1]
Field = field_def_t (field_name, field_type, key_type, val_type)
Parent_struct.add_field (field)
Index = index + 1
Index = index + 1
Def EXE (Self ):
Self. parse_to_words ()
Self. build_struct_relation ()
2. syntax analysis
The syntax rules of the IDL file are very simple. traverse all words and make the following judgments in sequence:
1> if the current word is struct, the next word is the new message body name. It may also be a submessage body, so don't worry, you only need to add the newly created message body object to the top struct object of the stack of a specific stack. struct_def_mgr exists in the stack by default. And press the new message body into the stack.
2> if it is int/string/float/array/dictionary, the next word is the field name of the message body. Add the new Field object to the struct_def object at the top of the stack.
3> '}' indicates that the current struct Parsing is complete. Pop pops up the struct_def object on the top of the stack.
4> ignore other fields
3. Message Body Structure Management
1> field_def_t describes the message body field information, including the field name, type, key_type, val_type, and parent message body object. For example, if array <int> is used, key_type is int. If dictionary <int, string> is used, key_type is int and val_type is string.
2> struct_def_t describes the information of a single message body, including the message body name, submessage set, and field object set.
3> struct_def_mgr_t maintains a set of all message bodies.
The Code is a comment:
Class Field_def_t:
Def _ Init __ (Self, name, type, key_type, val_type _):
Self. Name = Name
Self. Parent = none
Self. type = Type
Self. key_type = key_type
Self. val_type = val_type _
Def Get_name (Self ):
Return Self. Name
Def Get_parent (Self ):
Return Self. Parent
Def Set_parent (self, p ):
Self. Parent = P
Def Get_type (Self ):
Return Self. Type
Def Get_key_type (Self ):
Return Self. key_type
Def Get_val_type (Self ):
Return Self. val_type
Def Dump (self, prefix = '' ):
Print (Prefix, self. Name, self. type, self. key_type, self. val_type)
Class Struct_def_t:
Def _ Init __ (Self, name, parent = none ):
Self. Name = Name
Self. Parent = parent
Self. all_fields = {}
Self. sub_struct = []
Def Get_name (Self ):
Return Self. Name
Def Get_parent (Self ):
Return Self. Parent
Def Set_parent (self, parent ):
Self. Parent = parent
Def Add_field (self, field_def _):
Self. all_fields [field_def _. get_name ()] = field_def _
Field_def _. set_parent (Self)
Def Add_struct (self, struct_def _):
Self. sub_struct.append (struct_def _)
Struct_def _. set_parent (Self)
Def Get_all_struct (Self ):
Return Self. sub_struct
Def Get_all_field (Self ):
Return Self. all_fields
Def Get_parent (Self ):
Return Self. Parent
Def Has_field (self, name ):
If None = self. all_fields.get (name ):
Return False
Return True
Def Dump (self, prefix ='' ):
Print (Prefix, self. Name, ' Include struct: ' )
For Struct In Self. sub_struct:
Struct. Dump (prefix + " " )
Print (Prefix, self. Name, " Include fields: " )
For Field In Self. all_fields:
Self. all_fields [field]. Dump (prefix + " " )
Class Struct_def_mgr_t:
Def _ Init __ (Self ):
Self. all_struct = {}
Def Get_name (Self ):
Return ''
Def Add_struct (self, struct_def _):
Self. all_struct [struct_def _. get_name ()] = struct_def _
Struct_def _. set_parent (none)
Def Get_all_struct (Self ):
Return Self. all_struct
Def Get_struct (self, name ):
Return Self. all_struct [name]
Def Get_parent (Self ):
Return ''
Def Dump (Self ):
For Name In Self. all_struct:
Self. all_struct [name]. Dump ()
4. Generate intermediate code
The code is implemented by the code_generator_t class. Considering the need to support multiple languages in the future, the policy mode is used to generate intermediate code. Currently, only the code_generator of CPP is implemented. To enhance other languages, you only need to write a specific code_generator_t class.
5. Todo
1> the struct message body only supports decode from JSON. It does not support encode to JSON or decode from bin or encode to bin.
2> syntax errors are unfriendly.
3> multi-language support.