I saw the open-source XML file parser tinyxml a few days ago. I didn't understand how it was parsed, so I decided to implement it myself. I was not busy recently. Name it txml first. Now the resolution and query functions have been completed, allCodeIn less than 1000 rows, it will be improved.Source code must be shared
Let me briefly explain my ideas:
1: Read XML file information and store it in a character array;
2: traverse the array and parse the array into a tree;
3: query by path and query by attribute;
The most troublesome part of this parser is how to parse the character array into a tree. Let's first look at a simple XML file, which includes the file header, node, node name and node value, attribute name and attribute value, child node, parent node, comment, and so on.
XML version = "1.0" encoding = "UTF-8" ?>
comment --> items > item name =" chentaihan " > 89757 item > items >
Brief Introduction to the implementation of ParsingIt is not easy to make it clear. It may be easier to understand the code. Recursive Implementation, each time it is parsed from a node, that is, starting from the character "<" to the end of the character ">". The character <is followed by the name of the node, followed by the node attribute, character> If the last character is not <, It is the value of the node. If it is a character <, it may be a subnode or it may end. When a character <starts recursion, spaces and comments are passed directly. The Code is as follows:
Const Char * Txmlparser: parsecontent ( Const Char * P, xmlnode * Basenode ){ If (P = NULL |! * P) Return NULL; If (* P = ' < ' ) // Start a node { Bool Isnote; P = Skipnote (p, isnote ); // Skip comments If (Isnote ){ // Yes Annotation Parsecontent (p, basenode ); Return NULL ;} If (* P = ' / ' ) // End Node { While (P! = NULL & * P! = ' > ' ) {P ++ ;} ++ P = Skipwhitespace (p); parsecontent (p, basenode -> Parent ); // New Node } Else { // Node attributes String Name; While (P! = NULL & * P! = ' > ' & * P! = ' ' & * P! = ' / ' ) {Name. push_back ( * P ++ );} Xmlnode * Node = New Xmlnode (name, basenode); basenode -> Appendnode (node ); If (* P = ' > ' ){ ++ P =Skipwhitespace (p); parsecontent (p, node ); // New Node } Else {P = Getattr (p, node ); If (* P = ' / ' ){ While (P! = NULL & * P! = ' < ' ) P ++ ; Parsecontent (p, basenode ); // New Node } Else { ++ P = Skipwhitespace (p); parsecontent (p, node ); // New Node }}}} Else { // Node Value Getnodevalue (p, basenode );}}
Query by path. Use two arrays. Assume that the two arrays are A and B respectively. In the first query, the result is saved to array a, and a is used as the data source. Then, the query result is saved to B to clear the data in array, use B as the data source, store the query result to a, and perform the operation repeatedly. Finally, one of A and B is the query result. Of course, it can also be implemented using recursion. We all know that recursion is too deep, it is easy to crack the thread stack, and its performance is low.
Query by attribute. There is also no recursive implementation. There is a frequently asked question: print a tree by sequence. In this case, we also use a queue to search by sequence, which is to use the direct sub-nodes of the root node and the root node into the stack. If they do not match each other, they will get out of the queue.
// Query by attribute -- query by sequence using queues Xmlnode * xmlnode: selectsinglenodebyattr ( Const String & Attrname, Const String & Attrvalue, xmlnode * Node ){ If (Node = Null) Return NULL; If (Node-> attribute! = NULL & (* node-> attribute) [attrname] = Attrvalue ){ Return Node;} queue <Xmlnode *> List; For ( Int I = node-> childcount ()- 1 ; I> = 0 ; I --) {List. Push (( * Node-> Childnodes) [I]);} While (List. Size ()> 0 ) {Xmlnode * Tmpnode = List. Front (); If (Tmpnode-> attribute! = NULL & (* tmpnode-> attribute) [attrname] = Attrvalue ){ Return Tmpnode ;} For ( Int I = tmpnode-> childcount ()- 1 ; I> = 0 ; I -- ) {List. Push (( * Tmpnode-> Childnodes) [I]);} List. Pop ();} Return NULL ;}
After reading the search by attribute, we can easily understand the general implementation of the configurationmanager reading the configuration file in C #, because the configuration file is very simple, that is, there are multiple nodes under a node, in this way, the root node can basically ignore it. It is a dictionary, where the key stores the key value and the value stores the value. The time complexity of searching is O (1 ).
Simple test:
# Include " Xmldocument. h " Int Main () {xmldocument Doc; Doc. Load ( " Test.txt " ); Cout < " Xml header: " <Doc. Head (). c_str () < Endl; cout < " Show all student scores: " ; Doc. showxml (DOC); cout <Endl; vector <Xmlnode *> Vect; Doc. selectnodes ( " Students/student/courses/Course " , Vect); xmlnode * Node = Doc. selectsinglenode ( " Students/student/courses/Course " ); If (Node! = Null) {Node -> Showxml (*Node); cout < " Name: " <Node-> name (). c_str () < Endl; cout < " Attribute: " ; Node -> Showattr (); cout < Endl; cout < " Value: " <Node-> value (). c_str () < Endl; cout < " Childcount: " <Node-> childcount () <Endl < Endl; xmlnode: iterator ITER = Node-> Begin (); While (ITER! = Node-> End () {cout < " Name: " <(* (ITER). _ PTR)-> name (). c_str () < Endl; cout < " Value: " <(* (ITER). _ PTR)-> value (). c_str () < Endl; ITER ++ ;} Cout < " Find the node whose name is 'English: " < Endl; xmlnode * Node2 = Doc. selectsinglenodebyattr ( " Name " , " English " ); If (Node2! = Null) {node2 -> Showxml (* Node2) ;}} System ( " Pause " ); Return 0 ;}
The running result is as follows:
Not yet... To be continued... the features will be richer, and we are all worth looking forward!