Python implementation json generator and recursive descent interpreter
GitHub Address: Https://github.com/EStormLynn/Python-JSON-Parser
Goal
Write a JSON parser from scratch, with the following characteristics:
- Standard-compliant JSON parser and generator
- Handwritten recursive descent interpreter (recursive descent parser)
- Using the Python language (2.7)
- Interpreter and generator less than 500 lines
- Perform performance analysis and optimization with Cprofile
Implementing content
- [x] Parse literal (true false null)
- [x] Parse numbers
- [x] Parse string
- [x] parsing Unicode
- [x] Parse array
- [X] Parse Object
- [x] Unit test
- [x] Generator
- [x] cprofile performance optimization
Learn more about what JSON is
JSON (JavaScript Object Notation) is a text format for data exchange, referring to the ECMA Standard, JSON data Interchange format, which first looks at a JSON data format:
{ "title": "Design Patterns", "subtitle": "Elements of reusable object-oriented software", "Author": [ "Erich Gamma", "Richard Helm.", "Ralph Johnson.", "John vlissides." ], "Year": the, "Weight": 1.8, "Hardcover": true, "publisher": { "Company": "Pearson Education", "Country": "India" }, "website": NULL}
In the tree structure of the JSON
- NULL: represented as null
- Boolean: expressed as true or false
- Number: The general floating-point number representation, in the next unit detailed description
- String: Expressed as "..."
- Array: expressed as [...]
- Object: Represented as {...}
Implementing an Interpreter
Es_parser is a handwritten recursive descent parser (recursive descent parser). Because the JSON syntax is very simple, you can omit the word breaker (tokenizer), directly detect the next character, you can know what kind of value it is, and then call the relevant analysis function. For the full JSON syntax, after skipping the whitespace, just detect the current character:
n ? literalt ? truef ? false" ? string0-9/- ? number[ ? array{ ? object
For JSON Typevalue and JSON string, 2 classes are written
class EsValue(object): = (‘type‘‘num‘‘str‘‘array‘‘obj‘) def__init__(self): self.type= JTYPE_UNKNOWclass context(object): def__init__(self, jstr): self=list(jstr) self=0
To parse extra spaces, tab stops, change behavior examples:
def es_parse_whitespace(context): ifnot context.json: return =0 while re.compile(‘[\s]+‘).match(context.json[pos]): +=1 = context.json[pos:]
Parse literal
The literal includes three kinds of false,true,null.
defEs_parse_literal (context, literal, mytype): E_value=Esvalue ()if "'. Join (Context.json[context.pos:context.pos+ Len(literal)])!=LiteralRaiseMyException ("Parse_state_invalid_value, literal error") E_value.type =MyType Context.json=Context.json[context.pos+ Len(literal):]returnPARSE_STATE_OK, E_valuedefEs_parse_value (Context, Typevalue):ifContext.json[context.pos]== ' t ':returnEs_parse_literal (Context,"true", jtype_true)ifContext.json[context.pos]== ' F ':returnEs_parse_literal (Context,"false", Jtype_false)ifContext.json[context.pos]== ' n ':returnEs_parse_literal (Context,"NULL", Jtype_null)
Parse numbers
The JSON number type, number is expressed in decimal, which consists mainly of 4 parts in order: Minus, Integer, Decimal, exponent. Only integers are required parts.
JSON can use scientific notation, which begins with an uppercase E or a lowercase e, then a sign, followed by one or more digits (0-9).
The JSON standard ECMA-404 represents the syntax in the form of graphs, which allows you to see more visually the paths that can go through parsing:
Python is a dynamic language, so num in Es_value can be an integer or a decimal,
class es_value(): def__init__(selftype): self.type=type self=0
Python can be cast to float and int for string types, but int (string) cannot handle scientific notation, so unification first turns into float and turns to int
=float(numstr)if isint: =int(typevalue.num)
The unit tests implemented include:
defTestnum ( Self):Print("\ n------------Test number-----------") Self. Assertequal (type( Self. Parse (" the")),type(1)) Self. Assertequal (type( Self. Parse ("1e4")),type(10000)) Self. Assertequal (type( Self. Parse (" -1.5")),type(-1.5)) Self. Assertequal (type( Self. Parse ("1.5e3")),type(1.500))
Parsing strings
For an escape character in a string, the escape character needs to be handled at load, \u, encoded into Unicode
defEs_parse_string (context): charlist={'\\"':'\"',"\ \ ":"\ '","\\b":"\b","\\f":"\f","\\r":"\ r","\\n":"\ n","\\t":"\ t","\\u":"U","\\\\":"\\","\\/":"/","\\a":"\a","\\v":"\v"} whileContext.json[pos]!= ' "':# Handling of the ideographic characters ifContext.json[pos]== '\\': C=Context.json[pos:pos+ 2]ifCinchCharlist:e_value.Str +=CHARLIST[C]Else: E_value.Str += "'. Join (Context.json[pos]) pos+= 1 ContinuePos+= 2 Else: E_value.Str += "'. Join (Context.json[pos]) pos+= 1E_value.type =Jtype_string Context.json=Context.json[pos+ 1:] Context.pos= 1 if ' \u ' inchE_value.Str: E_value.Str =E_value.Str. Encode (' latin-1 '). Decode (' Unicode_escape ')returnPARSE_STATE_OK, E_value
Unit tests:
defTestString ( Self):Print("\ n------------test String----------") Self. Assertequal (type( Self. Parse ("\" \\\\line1\\Nline2\"")),type("string"))# input \ \ is \ Self. Assertequal (type( Self. Parse ("\"ABC\\def\"")),type("string")) Self. Assertequal (type( Self. Parse ("\"NULL\"")),type("string")) Self. Assertequal (type( Self. Parse ("\"Hello world!\"")),type("string")) Self. Assertequal (type( Self. Parse ("\"\u751f\u5316\u5371\u673a\"")),type("string"))
es_dumps function, JSON generator
Dumps the Python dict structure into a JSON string
defEs_dumps (obj): obj_str= "" if isinstance(obj,BOOL):ifObj is True: Obj_str+= "True" Else: Obj_str+= "False" elifObj is None: Obj_str+= "NULL" elif isinstance(obj,basestring): forChinchObj.decode (' Utf-8 '):if u ' \u4e00 ' <=Ch<= u ' \u9fff ': Obj_str+= "\"" + repr(Obj.decode (' UTF-8 '))+ "\"" Break Else: Obj_str+= "\"" +Obj+ "\"" elif isinstance(obj,List): Obj_str+= ' [' if Len(obj): forIinchObj:obj_str+=Es_dumps (i)+ ", "Obj_str=obj_str[:-2] Obj_str+= '] ' elif isinstance(obj,int)or isinstance(obj,float):# NumberObj_str+= Str(obj)elif isinstance(obj,Dict): Obj_str+= ' {' if Len(obj): for(k, V)inchObj.items (): Obj_str+=Es_dumps (k)+ ": "Obj_str+=Es_dumps (v)+ ", "Obj_str=obj_str[:-2] Obj_str+= '} ' returnObj_str
Cprofile Performance Analysis
Import the Cprofile module for performance analysis, load the population of 34 provinces in China,
import cProfilefromimport*import jsoncProfile.run("print(es_load(\"china.json\"))")
Modify part of the code to use Python build-in, optimize the context structure, string at copy time than list performance significantly improved. Consumption time reduced from 20s to 1s
Python implementation json generator and recursive descent interpreter