JSON (JavaScript Object Notation) is a lightweight data interchange format that makes it easy for people to read and write. It also facilitates the analysis and generation of machines. Suitable for data interaction scenarios, such as data interaction between the foreground and background of a website.
Official Document: Http://docs.python.org/library/json.html
JSON online parsing website: http://www.json.cn/#
Json
JSON is simply called objects and arrays in JavaScript, so these two structures are objects and arrays of two structures that can represent a variety of complex structures
Object: The object is represented in JS as the { }
enclosed content, the { key:value, key:value, ... }
structure of the key-value pairs, in the object-oriented language, key is the property of the object, value is the corresponding property value, so it is easy to understand, the value of the method is the object. Key Gets the property value, The type of this property value can be a number, a string, an array, an object.
Array: The array in JS is enclosed in parentheses [ ]
, the data structure is ["Python", "javascript", "C++", ...]
, the value is the same as in all languages, using index get, the type of field value can be number, string, array, object several.
1. Import JSON
The JSON module provides four functions:,,, dumps
dump
loads
load
for converting between string and Python data types.
1.1, Json.loads ()
Convert the JSON format string decoding to a Python object from JSON to Python type conversions against the following:
# Json_loads.pyimport jsonstrlist = ' [1, 2, 3, 4] ' strdict = ' {"City": "Beijing", "name": "Big Cat"} ' Json.loads (strlist) # [1, 2, 3, 4]json.loads (strdict) # JSON data is automatically stored by Unicode # {u ' city ': U ' \u5317\u4eac ', U ' name ': U ' \u5927\u732b '}
1.2, Json.dumps ()
Implement the Python type into a JSON string and return a Str object to convert a Python object encoding into a JSON string
The conversion from the Python primitive type to the JSON type is compared to the following:
# json_dumps.pyimport Jsonimport chardetliststr = [1, 2, 3, 4]tuplestr = (1, 2, 3, 4) Dictstr = {"City": "Beijing", "name": "Big cat" }json.dumps (LISTSTR) # ' [1, 2, 3, 4] ' Json.dumps (tuplestr) # ' [1, 2, 3, 4] ' # Note: json.dumps () default ASCII encoding used when serializing # Add parameter ensure_as Cii=false disables ASCII encoding and returns the dictionary by Utf-8 Encoding # Chardet.detect (), where confidence is detection accuracy json.dumps (dictstr) # ' {"City": "\\u5317\\ U4eac "," name ":" \\u5927\\u5218 "} ' Chardet.detect (Json.dumps (DICTSTR)) # {' Confidence ': 1.0, ' Encoding ': ' ASCII '}print Json.dumps (Dictstr, Ensure_ascii=false) # {"City": "Beijing", "name": "Da Liu"}chardet.detect (Json.dumps (DICTSTR, Ensure_ascii =false) # {' Confidence ': 0.99, ' encoding ': ' Utf-8 '}
1.3, Json.dump ()
Serializing a python built-in type to a JSON object after writing to a file
# Json_dump.pyimport jsonliststr = [{"City": "Beijing"}, {"Name": "Big Liu"}]json.dump (Liststr, open ("Liststr.json", "W"), Ensure_ Ascii=false) Dictstr = {"City": "Beijing", "name": "Big Liu"}json.dump (Dictstr, open ("Dictstr.json", "W"), Ensure_ascii=false)
1.4, Json.load ()
Read a JSON-like string element in a file into a Python type
# Json_load.pyimport jsonstrlist = json.load (Open ("Liststr.json")) print strlist# [{u ' city ': U ' \u5317\u4eac '}, {u ' name ' : U ' \u5927\u5218 '}]strdict = json.load (Open ("Dictstr.json")) print strdict# {u ' city ': U ' \u5317\u4eac ', U ' name ': U ' \ u5927\u5218 '}
JsonPath
JsonPath is an information extraction class library that extracts specified information from JSON documents and provides multiple language implementations, including: Javascript, Python, PHP, and Java.
: Https://pypi.python.org/pypi/jsonpath
Installation method: Click on Download URL
the link to download Jsonpath, after decompression to executepython setup.py install
Official Document: Http://goessner.net/articles/JsonPath
Jsonpath vs. XPath syntax:
The JSON structure is clear, readable, complex, and very easy to match, and the following table corresponds to the use of XPath.
XPath |
JSONPath |
Description |
/ |
$ |
Root node |
. |
@ |
Current node |
/ |
. Or[] |
Take child nodes |
.. |
N/A |
Take parent node, Jsonpath not supported |
// |
.. |
That is, regardless of location, select all eligible conditions |
* |
* |
Match all ELEMENT nodes |
@ |
N/A |
JSON is not supported based on property access because JSON is a key-value recursive structure and is not required. |
[] |
[] |
Iterator indicator (can be used to do simple iterative operations, such as array subscript, based on content selection, etc.) |
| |
[,] |
Supports long selection in iterators. |
[] |
?() |
Supports filtering operations. |
N/A |
() |
Supports expression evaluation |
() |
N/A |
Grouping, Jsonpath not supported |
Jsonpath Crawl Pull Hook net case:
Address: Http://www.lagou.com/lbs/getAllCitySearchLabels.json
Goal: Get all cities.
# jsonpath_lagou.pyimport Urllib2import jsonpathimport jsonimport chardeturl = ' http://www.lagou.com/lbs/ Getallcitysearchlabels.json ' request =urllib2. Request (URL) response = Urllib2.urlopen (request) HTML = Response.read () # Converts a JSON-formatted string into a Python object jsonobj = json.loads (html # starting from the root node, match the name node CityList = Jsonpath.jsonpath (jsonobj, ' $ '). Name ') Print Citylistprint type (citylist) fp = open (' City.json ', ' w ') content = Json.dumps (citylist, Ensure_ascii=false) Print Contentfp.write (Content.encode (' Utf-8 ')) Fp.close ()
Precautions:
Json.loads () is to convert the JSON format string decoding to a Python object, and if there is an error in json.loads, be aware of the encoding of the decoded JSON character.
If the encoding of the passed-in string is not UTF-8, you need to specify a character-encoded parameterencoding
dataDict = json.loads(jsonStrGBK);
Datajsonstr is a JSON string, assuming its encoding itself is non-UTF-8 words but GBK, then the above code will cause an error, instead of the corresponding:
If DATAJSONSTR specifies the appropriate encoding through encoding, but contains other encoded characters, you need to convert the DATAJSONSTR to Unicode before specifying the encoding format to call Json.loads ()
- Datajsonstruni = Datajsonstr.decode ("GB2312"); Datadict = Json.loads (Datajsonstruni, encoding= "GB2312");
##字符串编码转换####任何平台的任何编码 都能和 Unicode 互相转换UTF-8 与 GBK 互相转换,那就先把UTF-8转换成Unicode,再从Unicode转换成GBK,反之同理。``` python # 这是一个 UTF-8 编码的字符串utf8Str = "你好地球"# 1. 将 UTF-8 编码的字符串 转换成 Unicode 编码unicodeStr = utf8Str.decode("UTF-8")# 2. 再将 Unicode 编码格式字符串 转换成 GBK 编码gbkData = unicodeStr.encode("GBK")# 1. 再将 GBK 编码格式字符串 转化成 UnicodeunicodeStr = gbkData.decode("gbk")# 2. 再将 Unicode 编码格式字符串转换成 UTF-8utf8Str = unicodeStr.encode("UTF-8")
decode
To convert other encoded strings to Unicode encoding
encode
The role of converting Unicode encoding to another encoded string
Python crawler Development "1th" "JSON and Jsonpath"