JSON (JavaScript Object Notation) is a lightweight data interchange format that makes it easy for people to read and write. It also facilitates the analysis and generation of machines. Suitable for data interaction scenarios, such as data interaction between the foreground and background of a website.
The comparison between JSON and XML is comparable.
Python 2.7 comes with a JSON module that import json
can be used directly.
Official Document: Http://docs.python.org/library/json.html
JSON online parsing website: http://www.json.cn/#
Json
JSON is simply called objects and arrays in JavaScript, so these two structures are objects and arrays of two structures that can represent a variety of complex structures
Object: The object is represented in JS as the { }
enclosed content, the { key:value, key:value, ... }
structure of the key-value pairs, in the object-oriented language, key is the property of the object, value is the corresponding property value, so it is easy to understand, the value of the method is the object. Key Gets the property value, The type of this property value can be a number, a string, an array, an object.
Array: The array in JS is enclosed in parentheses [ ]
, the data structure is ["Python", "javascript", "C++", ...]
, the value is the same as in all languages, using index get, the type of field value can be number, string, array, object several.
Import JSON
The JSON module provides four functions:,,, dumps
dump
loads
load
for converting between string and Python data types.
1. Json.loads ()
Convert the JSON format string decoding to a Python object from JSON to Python type conversions against the following:
# json_loads.pyimport jsonstrList = ‘[1, 2, 3, 4]‘strDict = ‘{"city": "北京", "name": "大猫"}‘json.loads(strList) # [1, 2, 3, 4]json.loads(strDict) # json数据自动按Unicode存储# {u‘city‘: u‘\u5317\u4eac‘, u‘name‘: u‘\u5927\u732b‘}
2. Json.dumps ()
Implement the Python type into a JSON string and return a Str object to convert a Python object encoding into a JSON string
The conversion from the Python primitive type to the JSON type is compared to the following:
# json_dumps.pyImport JSONImport CHARDETLISTSTR = [1,2,3,4]tuplestr = (1,2,3,4) Dictstr = {"City":"Beijing","Name": "big cat"}json.dumps (LISTSTR) # ' [1, 2, 3, 4] ' Json.dumps (TUPLESTR) ' Span class= "hljs-comment" ># ' [1, 2, 3, 4] ' # Note: json.dumps () The ASCII encoding used by default when serializing # add parameter ensure_ascii=false disable ASCII encoding, press UTF-8 encoding # chardet.detect () to return the dictionary, Where confidence is the detection accuracy json.dumps (DICTSTR) # ' {"City": "\\u5317\\u4eac", "name": "\\u5927\\ u5218 "} ' Chardet.detect (Json.dumps (DICTSTR)) # {' confidence ': 1.0, ' Encoding ': ' ASCII '}< Span class= "Hljs-keyword" >print json.dumps (dictstr, Ensure_ascii=False) # {" City ":" Beijing "," name ":" Da Liu "}chardet.detect (Json.dumps (DICTSTR, ensure_ascii= false)) # {' confidence ': 0.99, ' encoding ': ' Utf-8 '}
Chardet is a very good code recognition module that can be installed by PIP
3. Json.dump ()
Serializing a python built-in type to a JSON object after writing to a file
# json_dump.pyimport jsonlistStr = [{"city": "北京"}, {"name": "大刘"}]json.dump(listStr, open("listStr.json","w"), ensure_ascii=False)dictStr = {"city": "北京", "name": "大刘"}json.dump(dictStr, open("dictStr.json","w"), ensure_ascii=False)
4. Json.load ()
Read a JSON-like string element in a file into a Python type
# json_load.pyimport jsonstrList = json.load(open("listStr.json"))print strList# [{u‘city‘: u‘\u5317\u4eac‘}, {u‘name‘: u‘\u5927\u5218‘}]strDict = json.load(open("dictStr.json"))print strDict# {u‘city‘: u‘\u5317\u4eac‘, u‘name‘: u‘\u5927\u5218‘}
JsonPath
JsonPath is an information extraction class library that extracts specified information from JSON documents and provides multiple language implementations, including: Javascript, Python, PHP, and Java.
JsonPath, for JSON, is equivalent to XPATH for XML.
: Https://pypi.python.org/pypi/jsonpath
Installation method: Click on Download URL
the link to download Jsonpath, after decompression to executepython setup.py install
Official Document: Http://goessner.net/articles/JsonPath
Jsonpath vs. XPath syntax:
The JSON structure is clear, readable, complex, and very easy to match, and the following table corresponds to the use of XPath.
XPath |
JSONPath |
Description |
/ |
$ |
Root node |
. |
@ |
Current node |
/ |
. Or[] |
Take child nodes |
.. |
N/A |
Take parent node, Jsonpath not supported |
// |
.. |
That is, regardless of location, select all eligible conditions |
* |
* |
Match all ELEMENT nodes |
@ |
N/A |
JSON is not supported based on property access because JSON is a key-value recursive structure and is not required. |
[] |
[] |
Iterator indicator (can be used to do simple iterative operations, such as array subscript, based on content selection, etc.) |
| |
[,] |
Supports long selection in iterators. |
[] |
?() |
Supports filtering operations. |
N/A |
() |
Supports expression evaluation |
() |
N/A |
Grouping, Jsonpath not supported |
Example:
We take the Http://www.lagou.com/lbs/getAllCitySearchLabels.json city JSON file as an example to get all the cities.
# jsonpath_lagou.pyimport urllib2import jsonpath Import Jsonimport chardeturl = ' http://www.lagou.com/lbs/ Getallcitysearchlabels.json ' request =urllib2. Request (URL) response = Urllib2.urlopen (request) HTML = Response.read () # Converts a JSON format string to a Python object jsonobj = json.loads (html) # starting from the root node, matching the name node CityList = Jsonpath.jsonpath (Jsonobj, "$". Name ') print citylistprint type (citylist) fp = open ( ' City.json ', ' W ') content = Json.dumps (citylist, Ensure_ascii=false) print contentfp.write (Content.encode ( Utf-8 ')) fp.close ()
Precautions:
Json.loads () is to convert the JSON format string decoding to a Python object, and if there is an error in json.loads, be aware of the encoding of the decoded JSON character.
If the encoding of the passed-in string is not UTF-8, you need to specify a character-encoded parameterencoding
dataDict = json.loads(jsonStrGBK);
Datajsonstr is a JSON string, assuming its encoding itself is non-UTF-8 words but GBK, then the above code will cause an error, instead of the corresponding:
dataDict = json.loads(jsonStrGBK, encoding="GBK");
If DATAJSONSTR specifies the appropriate encoding through encoding, but it also contains other encoded characters, you need to first convert DATAJSONSTR to Unicode, and then specify the encoding format to call Json.loads ()
``` python
Datajsonstruni = Datajsonstr.decode ("GB2312"); Datadict = Json.loads (Datajsonstruni, encoding= "GB2312");
##字符串编码转换这是中国程序员最苦逼的地方,什么乱码之类的几乎都是由汉字引起的。其实编码问题很好搞定,只要记住一点:####任何平台的任何编码 都能和 Unicode 互相转换UTF-8 与 GBK 互相转换,那就先把UTF-8转换成Unicode,再从Unicode转换成GBK,反之同理。``` python # 这是一个 UTF-8 编码的字符串utf8Str = "你好地球"# 1. 将 UTF-8 编码的字符串 转换成 Unicode 编码unicodeStr = utf8Str.decode("UTF-8")# 2. 再将 Unicode 编码格式字符串 转换成 GBK 编码gbkData = unicodeStr.encode("GBK")# 1. 再将 GBK 编码格式字符串 转化成 UnicodeunicodeStr = gbkData.decode("gbk")# 2. 再将 Unicode 编码格式字符串转换成 UTF-8utf8Str = unicodeStr.encode("UTF-8")
decode
To convert other encoded strings to Unicode encoding
encode
The role of converting Unicode encoding to another encoded string
一句话:UTF-8是对Unicode字符集进行编码的一种编码方式
JSON and Jsonpath of Python data extraction