Python implementation json generator and recursive descent interpreter

Source: Internet
Author: User
Tags parse string

Python implementation json generator and recursive descent interpreter

GitHub Address: Https://github.com/EStormLynn/Python-JSON-Parser

Goal

Write a JSON parser from scratch, with the following characteristics:

    • Standard-compliant JSON parser and generator
    • Handwritten recursive descent interpreter (recursive descent parser)
    • Using the Python language (2.7)
    • Interpreter and generator less than 500 lines
    • Perform performance analysis and optimization with Cprofile
Implementing content
    • [x] Parse literal (true false null)
    • [x] Parse numbers
    • [x] Parse string
    • [x] parsing Unicode
    • [x] Parse array
    • [X] Parse Object
    • [x] Unit test
    • [x] Generator
    • [x] cprofile performance optimization
Learn more about what JSON is

JSON (JavaScript Object Notation) is a text format for data exchange, referring to the ECMA Standard, JSON data Interchange format, which first looks at a JSON data format:

{    "title": "Design Patterns",    "subtitle": "Elements of reusable object-oriented software",    "Author": [        "Erich Gamma",        "Richard Helm.",        "Ralph Johnson.",        "John vlissides."    ],    "Year":  the,    "Weight": 1.8,    "Hardcover": true,    "publisher": {        "Company": "Pearson Education",        "Country": "India"    },    "website": NULL}

In the tree structure of the JSON

    • NULL: represented as null
    • Boolean: expressed as true or false
    • Number: The general floating-point number representation, in the next unit detailed description
    • String: Expressed as "..."
    • Array: expressed as [...]
    • Object: Represented as {...}
Implementing an Interpreter

Es_parser is a handwritten recursive descent parser (recursive descent parser). Because the JSON syntax is very simple, you can omit the word breaker (tokenizer), directly detect the next character, you can know what kind of value it is, and then call the relevant analysis function. For the full JSON syntax, after skipping the whitespace, just detect the current character:

n ? literalt ? truef ? false" ? string0-9/- ? number[ ? array{ ? object

For JSON Typevalue and JSON string, 2 classes are written

class EsValue(object):    = (‘type‘‘num‘‘str‘‘array‘‘obj‘)        def__init__(self):        self.type= JTYPE_UNKNOWclass context(object):    def__init__(self, jstr):        self=list(jstr)        self=0

To parse extra spaces, tab stops, change behavior examples:

def es_parse_whitespace(context):    ifnot context.json:        return    =0    while re.compile(‘[\s]+‘).match(context.json[pos]):        +=1    = context.json[pos:]
Parse literal

The literal includes three kinds of false,true,null.

defEs_parse_literal (context, literal, mytype): E_value=Esvalue ()if "'. Join (Context.json[context.pos:context.pos+ Len(literal)])!=LiteralRaiseMyException ("Parse_state_invalid_value, literal error") E_value.type =MyType Context.json=Context.json[context.pos+ Len(literal):]returnPARSE_STATE_OK, E_valuedefEs_parse_value (Context, Typevalue):ifContext.json[context.pos]== ' t ':returnEs_parse_literal (Context,"true", jtype_true)ifContext.json[context.pos]== ' F ':returnEs_parse_literal (Context,"false", Jtype_false)ifContext.json[context.pos]== ' n ':returnEs_parse_literal (Context,"NULL", Jtype_null)
Parse numbers

The JSON number type, number is expressed in decimal, which consists mainly of 4 parts in order: Minus, Integer, Decimal, exponent. Only integers are required parts.

JSON can use scientific notation, which begins with an uppercase E or a lowercase e, then a sign, followed by one or more digits (0-9).

The JSON standard ECMA-404 represents the syntax in the form of graphs, which allows you to see more visually the paths that can go through parsing:

Python is a dynamic language, so num in Es_value can be an integer or a decimal,

class es_value():    def__init__(selftype):        self.type=type        self=0

Python can be cast to float and int for string types, but int (string) cannot handle scientific notation, so unification first turns into float and turns to int

=float(numstr)if isint:    =int(typevalue.num)

The unit tests implemented include:

    defTestnum ( Self):Print("\ n------------Test number-----------") Self. Assertequal (type( Self. Parse (" the")),type(1)) Self. Assertequal (type( Self. Parse ("1e4")),type(10000)) Self. Assertequal (type( Self. Parse (" -1.5")),type(-1.5)) Self. Assertequal (type( Self. Parse ("1.5e3")),type(1.500))
Parsing strings

For an escape character in a string, the escape character needs to be handled at load, \u, encoded into Unicode

defEs_parse_string (context): charlist={'\\"':'\"',"\ \ ":"\ '","\\b":"\b","\\f":"\f","\\r":"\ r","\\n":"\ n","\\t":"\ t","\\u":"U","\\\\":"\\","\\/":"/","\\a":"\a","\\v":"\v"} whileContext.json[pos]!= ' "':# Handling of the ideographic characters        ifContext.json[pos]== '\\': C=Context.json[pos:pos+ 2]ifCinchCharlist:e_value.Str +=CHARLIST[C]Else: E_value.Str += "'. Join (Context.json[pos]) pos+= 1                ContinuePos+= 2        Else: E_value.Str += "'. Join (Context.json[pos]) pos+= 1E_value.type =Jtype_string Context.json=Context.json[pos+ 1:] Context.pos= 1        if ' \u ' inchE_value.Str: E_value.Str =E_value.Str. Encode (' latin-1 '). Decode (' Unicode_escape ')returnPARSE_STATE_OK, E_value

Unit tests:

    defTestString ( Self):Print("\ n------------test String----------") Self. Assertequal (type( Self. Parse ("\" \\\\line1\\Nline2\"")),type("string"))# input \ \ is \         Self. Assertequal (type( Self. Parse ("\"ABC\\def\"")),type("string")) Self. Assertequal (type( Self. Parse ("\"NULL\"")),type("string")) Self. Assertequal (type( Self. Parse ("\"Hello world!\"")),type("string")) Self. Assertequal (type( Self. Parse ("\"\u751f\u5316\u5371\u673a\"")),type("string"))
es_dumps function, JSON generator

Dumps the Python dict structure into a JSON string

defEs_dumps (obj): obj_str= ""    if isinstance(obj,BOOL):ifObj is True: Obj_str+= "True"        Else: Obj_str+= "False"    elifObj is None: Obj_str+= "NULL"    elif isinstance(obj,basestring): forChinchObj.decode (' Utf-8 '):if u ' \u4e00 ' <=Ch<= u ' \u9fff ': Obj_str+= "\"" + repr(Obj.decode (' UTF-8 '))+ "\""                 Break        Else: Obj_str+= "\"" +Obj+ "\""    elif isinstance(obj,List): Obj_str+= ' ['        if Len(obj): forIinchObj:obj_str+=Es_dumps (i)+ ", "Obj_str=obj_str[:-2] Obj_str+= '] '    elif isinstance(obj,int)or isinstance(obj,float):# NumberObj_str+= Str(obj)elif isinstance(obj,Dict): Obj_str+= ' {'        if Len(obj): for(k, V)inchObj.items (): Obj_str+=Es_dumps (k)+ ": "Obj_str+=Es_dumps (v)+ ", "Obj_str=obj_str[:-2] Obj_str+= '} '    returnObj_str
Cprofile Performance Analysis

Import the Cprofile module for performance analysis, load the population of 34 provinces in China,

import cProfilefromimport*import jsoncProfile.run("print(es_load(\"china.json\"))")

Modify part of the code to use Python build-in, optimize the context structure, string at copy time than list performance significantly improved. Consumption time reduced from 20s to 1s

Python implementation json generator and recursive descent interpreter

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.