This is a creation in Article, where the information may have evolved or changed.
Project Address: Json2xml
What is ANTLR
ANTLR (another tool for Language recognition) is a powerful parser generation tool that can be used to read, process, execute, and translate structured text and binary files. At present, the tool is widely used in the field of academic and industrial production, but also the foundation of many languages, tools and frameworks.
Today we use this tool to implement a go language version of the Json2xml converter;
The role of ANTLR
The syntax description of a language is called grammar, the tool can generate a parser for the language, and automatically build the parser number AST, while the ANTLR can also automatically generate a number of the ergodic, greatly reducing the cost of manual coding parser;
Practice begins
To get to the next, take Json2xml as an example to implement a tool;
Installation
Take MacOS as an example
brew install antlr
Editing JSON language parsing syntax
Derived from Http://json.orggrammar json;json:object | Array; object: ' {' Pair (', ' pair ') * '} ' # AnObject | ' {'} ' # nullobject; array: ' [' Value (', ' value ') * '] ' # Arrayofvalues | ' ['] ' # Nullarray;p air:string ': ' Value; value:string # STRING | Number # Atom | Object # ObjectValue | Array # Arrayvalue | ' True ' # Atom | ' False ' # Atom | ' NULL ' # Atom; Lcurly: ' {'; Lbrack: ' ['; STRING: ' ' ' (ESC | ~["\ \]) * '" '; fragment ESC: ' \ \ ' (["\\/BFNRT] | Unicode); Fragment Unicode: ' U ' hex hex hex hex; fragment hex: [0-9a-fa-f]; Number: '-'? INT '. ' INT EXP? 1.35, 1.35E-9, 0.3,-4.5 | '-'? INT EXP//1e10-3e4 | '-'? int// -3, Fragment int: ' 0 ' | ' 1 ' ... ' 9 ' 0 ' ... ' 9 ' *; No leading zerosfragment EXP: [Ee] [+\-]? INT; \-Since-means "Range" Inside [...] WS: [\t\n\r]+, Skip;
The above is a file edited in accordance with the ANTLR4 syntax format.
Generate parsing Base Code
# antlr4 -Dlanguage=Go -package json2xml JSON.g4
using ANTLR to build the target language to go, the package is named Json2xml Base code
The resulting files are as follows:
$ tree├── JSON.g4├── JSON.interp # 语法解析中间文件├── JSON.tokens # 语法分析tokens流文件├── JSONLexer.interp # 词法分析中间文件├── JSONLexer.tokens # 词法分析tokens流文件├── json_base_listener.go # 默认是listener模式文件├── json_lexer.go # 词法分析器├── json_listener.go # 抽象listener接口文件├── json_parser.go # parser解析器文件
Implementing the parser (listener example)
Package Mainimport ("FMT" "Io/ioutil" "Log" "OS" "Strings" "Testing" "C2j/parser/json2xml" Githu B.COM/ANTLR/ANTLR4/RUNTIME/GO/ANTLR ") Func init () {log. SetFlags (log. Lstdflags | Log. lshortfile)}type j2xconvert struct {*json2xml. Basejsonlistener XML MAP[ANTLR. Tree]string}func Newj2xconvert () *j2xconvert {return &j2xconvert{&json2xml. basejsonlistener{}, make (MAP[ANTLR. tree]string),}}func (J *j2xconvert) Setxml (CTX ANTLR. Tree, s string) {J.xml[ctx] = S}func (J *j2xconvert) GetXML (CTX ANTLR. Tree) string {return j.xml[ctx]}//J2xconvert Methodsfunc (J *j2xconvert) Exitjson (CTX *json2xml. Jsoncontext) {j.setxml (CTX, J.getxml (CTX). Getchild (0)));} Func (J *j2xconvert) Stripquotes (s string) string {if s = = "" | |! Strings. Contains (S, "\" ") {return S} return S[1:len (s) -1]}func (J *j2xconvert) Exitanobject (CTX *json2xml. Anobjectcontext) {sb: = strings. builder{} sb. WriteString ("\ n") for _, P: = Range CTX. Allpair () {sb. WriteString (J.getxml (P))} j.setxml (CTX, sb.) String ())}func (J *j2xconvert) Exitnullobject (CTX *json2xml. Nullobjectcontext) {j.setxml (CTX, "")}func (J *j2xconvert) Exitarrayofvalues (CTX *json2xml. Arrayofvaluescontext) {sb: = strings. builder{} sb. WriteString ("\ n") for _, P: = Range ctx. Allvalue () {sb. WriteString ("<element>") sb. WriteString (J.getxml (p)) sb. WriteString ("</element>") sb. WriteString ("\ n")} j.setxml (CTX, sb.) String ())}func (J *j2xconvert) Exitnullarray (CTX *json2xml. Nullarraycontext) {j.setxml (CTX, "")}func (J *j2xconvert) Exitpair (CTX *json2xml. Paircontext) {tag: = J.stripquotes (CTX. STRING (). GetText ()) V: = ctx. Value () r: = Fmt. Sprintf ("<%s>%s</%s>\n", Tag, J.getxml (v), tag) j.setxml (CTX, R)}func (J *j2xconvert) Exitobjectvalue (CTX * Json2xml. Objectvaluecontext) {j.setxml (CTX, J.getxml (CTX). Object ()))}func (J *j2xconvert) Exitarrayvalue (CTX *json2xml. Arrayvaluecontext) {j.setxml (CTX, J.getxml (CTX). Array ()))}func (J *j2xconvert) Exitatom (CTX *json2xml. Atomcontext) {j.setxml (CTX, CTX). GetText ())}func (J *j2xconvert) exitstring (CTX *json2xml. Stringcontext) {j.setxml (CTX, J.stripquotes (CTX). GetText ()))}func Testjson2xmlvisitor (t *testing. T) {f, err: = OS. Open ("Testdata/json2xml/t.json") if err! = Nil {panic (err)} defer f.close () content, err: = Ioutil. ReadAll (f) if err! = Nil {panic (ERR)}//Setup the input is: = ANTLR. Newinputstream (string content)//Create lexter lexer: = Json2xml. Newjsonlexer (IS) stream: = Antlr. Newcommontokenstream (Lexer, ANTLR. Lexerdefaulttokenchannel)//Create parser and tree P: = json2xml. Newjsonparser (stream) P.buildparsetrees = true Tree: = P.json ()//Finally AST tree j2x: = Newj2xconvert () Antlr. Parsetreewalkerdefault.walk (j2x, tree) log. Println (J2x.getxml (tree))}
The above code is relatively simple, look at the comment is good;
The general flow is as follows:
- New input stream
- New Lexical analyzer
- Generates a token stream that stores lexical symbols generated by the lexical parser tokens
- New parser parser, processing tokens
- Then, for grammar rules, start parsing
- Finally, the AST is traversed by the walker provided by default.
Where are the parameters and results for intermediate generation stored? OK, directly define a Map,map key to the tree storage;
xml map[antlr.Tree]string
Listener and visitor
ANTLR generated code has two defaults, the default is the listener implementation, to generate visitor, additional parameters-visitor.
The difference between these two mechanisms is that the listener's method is automatically called by the ANTLR provided by the Walker object, and the method in visitor mode must show the call visit method to access the child nodes. If you forget to call, the corresponding subtree will not be accessed.
Summarize
ANTLR is a powerful tool that allows common parsing work to be done with much less effort and with very high efficiency. At the same time, the tool separates the parsing process from the program itself, providing sufficient flexibility and maneuverability.