Python processing JSON
(If you read poorly, you can poke here)
Concept
serialization (serialization): Converts the state information of an object into a process that can be stored or transmitted over a network, in the form of JSON, XML, and so on. Deserialization is the state of the deserialized object that is read from the storage area (Json,xml) and re-created.
JSON (JavaScript Object Notation): A lightweight data interchange format that is easier to read and write than XML, is easy to parse and generate, and JSON is a subset of JavaScript.
Python2.6 started adding JSON modules without additional download, and the Python JSON module serialization and deserialization process is encoding and decoding , respectively.
encoding: Converts a Python object encoding into a JSON string
decoding: Converting JSON format string decoding to Python object
For simple data types (string, Unicode, int, float, list, tuple, dict), they can be processed directly.
The Json.dumps method is encoding for simple data types:
import jsondata = [{‘a‘:"A",‘b‘:(2,4),‘c‘:3.0}] #list对象print "DATA:",repr(data)data_string = json.dumps(data)print "JSON:",data_string
Output:
DATA: [{‘a‘:‘A‘,‘c‘:3.0,‘b‘:(2,4)}] #python的dict类型的数据是没有顺序存储的JSON: [{"a":"A","c":3.0,"b":[2,4]}]
The output of JSON is similar to data, except for some subtle changes such as Python's tuple type becoming an array of JSON, the code conversion rules for Python to JSON are:
The Json.loads method handles decoding (decoding) conversions of simple data types
import jsondata = [{‘a‘:"A",‘b‘:(2,4),‘c‘:3.0}] #list对象data_string = json.dumps(data)print "ENCODED:",data_stringdecoded = json.loads(data_string)print "DECODED:",decodedprint "ORIGINAL:",type(data[0][‘b‘])print "DECODED:",type(decoded[0][‘b‘])
Output:
ENCODED: [{"a": "A", "c": 3.0, "b": [2, 4]}]DECODED: [{u‘a‘: u‘A‘, u‘c‘: 3.0, u‘b‘: [2, 4]}]ORIGINAL: <type ‘tuple‘>DECODED: <type ‘list‘>
During decoding, the JSON array is eventually converted to the Python list instead of the original tuple type, and the JSON-to-Python decoding rules are:
The humanistic care of JSON
Encoded JSON-formatted strings are compact output, and there is no order, so the dumps
method provides some optional parameters to make the output format more readable, such as sort_keys
telling the encoder to sort by dictionary (A to Z) output.
import jsondata = [ { ‘a‘:‘A‘, ‘b‘:(2, 4), ‘c‘:3.0 } ]print ‘DATA:‘, repr(data)unsorted = json.dumps(data)print ‘JSON:‘, json.dumps(data)print ‘SORT:‘, json.dumps(data, sort_keys=True)
Output:
DATA: [{‘a‘: ‘A‘, ‘c‘: 3.0, ‘b‘: (2, 4)}]JSON: [{"a": "A", "c": 3.0, "b": [2, 4]}]SORT: [{"a": "A", "b": [2, 4], "c": 3.0}
indent
Parameters are indented according to the data format and are clearer to read:
import jsondata = [ { ‘a‘:‘A‘, ‘b‘:(2, 4), ‘c‘:3.0 } ]print ‘DATA:‘, repr(data)print ‘NORMAL:‘, json.dumps(data, sort_keys=True)print ‘INDENT:‘, json.dumps(data, sort_keys=True, indent=2)
Output:
DATA: [{‘a‘: ‘A‘, ‘c‘: 3.0, ‘b‘: (2, 4)}]NORMAL: [{"a": "A", "b": [2, 4], "c": 3.0}]INDENT: [ { "a": "A", "b": [ 2, 4 ], "c": 3.0 }]
separators
The function of the parameter is to remove ,
, :
The following space, from the above output can be seen ",:" There is a space behind, which is to beautify the effect of the output, but in the process of transmitting data, the more streamlined the better, redundant things all removed, Therefore, the separators parameter can be added:
import jsondata = [ { ‘a‘:‘A‘, ‘b‘:(2, 4), ‘c‘:3.0 } ]print ‘DATA:‘, repr(data)print ‘repr(data) :‘, len(repr(data))print ‘dumps(data) :‘, len(json.dumps(data))print ‘dumps(data, indent=2) :‘, len(json.dumps(data, indent=2))print ‘dumps(data, separators):‘, len(json.dumps(data, separators=(‘,‘,‘:‘)))
Output:
DATA: [{‘a‘: ‘A‘, ‘c‘: 3.0, ‘b‘: (2, 4)}]repr(data) : 35dumps(data) : 35dumps(data, indent=2) : 76dumps(data, separators): 29
skipkeys
parameter, in the encoding process, the Dict object's key can only be a string object, and if it is another type, the exception that is thrown during the encoding process ValueError
. skipkeys
You can skip the processing of those non-string objects as keys.
import jsondata= [ { ‘a‘:‘A‘, ‘b‘:(2, 4), ‘c‘:3.0, (‘d‘,):‘D tuple‘ } ]try: print json.dumps(data)except (TypeError, ValueError) as err: print ‘ERROR:‘, errprint print json.dumps(data, skipkeys=True)
Output:
ERROR: keys must be a string[{"a": "A", "c": 3.0, "b": [2, 4]}]
Make JSON support custom data types
The above examples are based on Python's built-in type, and for custom types of data structures, the JSON module is not handled by default and throws an exception: TypeError xx is not JSON serializable
at this point you need to customize a conversion function:
import json class MyObj(object): def __init__(self, s): self.s = s def __repr__(self): return ‘<MyObj(%s)>‘ % self.sobj = .MyObj(‘helloworld‘)try: print json.dumps(obj)except TypeError, err: print ‘ERROR:‘, err#转换函数def convert_to_builtin_type(obj): print ‘default(‘, repr(obj), ‘)‘ # 把MyObj对象转换成dict类型的对象 d = { ‘__class__‘:obj.__class__.__name__, ‘__module__‘:obj.__module__, } d.update(obj.__dict__) return dprint json.dumps(obj, default=convert_to_builtin_type)
Output:
ERROR: <MyObj(helloworld)> is not JSON serializabledefault( <MyObj(helloworld)> ){"s": "hellworld", "__module__": "MyObj", "__class__": "__main__"} #注意:这里的class和module根据你代码的所在文件位置不同而不同
Conversely, if you want to decode JSON into a Python object, you also need to customize the conversion function to pass the arguments to the Json.loads method object_hook
:
#jsontest.pyimport jsonclass MyObj(object): def __init__(self,s): self.s = s def __repr__(self): return "<MyObj(%s)>" % self.sdef dict_to_object(d): if ‘__class__‘ in d: class_name = d.pop(‘__class__‘) module_name = d.pop(‘__module__‘) module = __import__(module_name) print "MODULE:",module class_ = getattr(module,class_name) print "CLASS",class_ args = dict((key.encode(‘ascii‘),value) for key,value in d.items()) print ‘INSTANCE ARGS:‘,args inst = class_(**args) else: inst = d return instencoded_object = ‘[{"s":"helloworld","__module__":"jsontest","__class__":"MyObj"}]‘myobj_instance = json.loads(encoded_object,object_hook=dict_to_object)print myobj_instance
Output:
MODULE: <module ‘jsontest‘ from ‘E:\Users\liuzhijun\workspace\python\jsontest.py‘>CLASS <class ‘jsontest.MyObj‘>INSTANCE ARGS: {‘s‘: u‘helloworld‘}[<MyObj(helloworld)>]MODULE: <module ‘jsontest‘ from ‘E:\Users\liuzhijun\workspace\python\jsontest.py‘>CLASS <class ‘jsontest.MyObj‘>INSTANCE ARGS: {‘s‘: u‘helloworld‘}[<MyObj(helloworld)>]
Using encoder with the decoder class to implement JSON-encoded conversions
Jsonencoder has an iterative interface iterencode(data)
that returns a series of encoded data, and the advantage is that it is easy to write data to a file or network stream, without having to read the data into memory at once.
import jsonencoder = json.JSONEncoder()data = [ { ‘a‘:‘A‘, ‘b‘:(2, 4), ‘c‘:3.0 } ]for part in encoder.iterencode(data): print ‘PART:‘, part
Output:
PART: [PART: {PART: "a"PART: :PART: "A"PART: ,PART: "c"PART: :PART: 3.0PART: ,PART: "b"PART: :PART: [2PART: , 4PART: ]PART: }PART: ]
encode
Method is equivalent to ‘‘.join(encoder.iterencode()
, and will do some error checking beforehand (such as non-string as Dict key), for the custom object, we only need some Jsonencoder default()
method, its implementation is similar to the function mentioned above convet_to_builtin_type()
.
import jsonimport json_myobjclass MyObj(object): def __init__(self,s): self.s = s def __repr__(self): return "<MyObj(%s)>" % self.sclass MyEncoder(json.JSONEncoder): def default(self, obj): print ‘default(‘, repr(obj), ‘)‘ # Convert objects to a dictionary of their representation d = { ‘__class__‘:obj.__class__.__name__, ‘__module__‘:obj.__module__, } d.update(obj.__dict__) return dobj = json_myobj.MyObj(‘helloworld‘)print objprint MyEncoder().encode(obj)
Output:
<MyObj(internal data)>default( <MyObj(internal data)> ){"s": "helloworld", "__module__": "Myobj", "__class__": "MyObj"}
To convert a Python object from JSON:
class MyDecoder(json.JSONDecoder): def __init__(self): json.JSONDecoder.__init__(self, object_hook=self.dict_to_object) def dict_to_object(self, d): if ‘__class__‘ in d: class_name = d.pop(‘__class__‘) module_name = d.pop(‘__module__‘) module = __import__(module_name) print ‘MODULE:‘, module class_ = getattr(module, class_name) print ‘CLASS:‘, class_ args = dict( (key.encode(‘ascii‘), value) for key, value in d.items()) print ‘INSTANCE ARGS:‘, args inst = class_(**args) else: inst = d return instencoded_object = ‘[{"s": "helloworld", "__module__": "jsontest", "__class__": "MyObj"}]‘myobj_instance = MyDecoder().decode(encoded_object)print myobj_instance
Output:
MODULE: <module ‘jsontest‘ from ‘E:\Users\liuzhijun\workspace\python\jsontest.py‘>CLASS: <class ‘jsontest.MyObj‘>INSTANCE ARGS: {‘s‘: u‘helloworld‘}[<MyObj(helloworld)>]
JSON format strings are written to the file stream
The above example is in memory operation, if the big data, encode him into a class file (File-like) more appropriate, load()
and dump()
the method can implement such a function.
import jsonimport tempfiledata = [ { ‘a‘:‘A‘, ‘b‘:(2, 4), ‘c‘:3.0 } ]f = tempfile.NamedTemporaryFile(mode=‘w+‘)json.dump(data, f)f.flush()print open(f.name, ‘r‘).read()
Output:
[{"a": "A", "c": 3.0, "b": [2, 4]}]
Similar to:
import jsonimport tempfilef = tempfile.NamedTemporaryFile(mode=‘w+‘)f.write(‘[{"a": "A", "c": 3.0, "b": [2, 4]}]‘)f.flush()f.seek(0)print json.load(f)
Output:
[{u‘a‘: u‘A‘, u‘c‘: 3.0, u‘b‘: [2, 4]}]
Reference:
Http://docs.python.org/2/library/json.html
Http://www.cnblogs.com/coser/archive/2011/12/14/2287739.html
http://pymotw.com/2/json/
Python processing JSON