JSON (JavaScript Object Notation) is a lightweight data interchange format that makes it easy for people to read and write. It also facilitates the analysis and generation of machines. Suitable for data interaction scenarios, such as data interaction between the foreground and background of a website.
JSON and XML are comparable.
Python 3.X comes with a JSON module that can be used directly from the import JSON.
Official Document: Http://docs.python.org/library/json.html
JSON online parsing website: http://www.json.cn/#
Json
JSON is simply the object and array of JavaScript, so these two structures are objects and arrays of two structures, which can represent a variety of complex structures.
- Object: Object in JS is represented as {} in the content, data structure is {key1:value1, key2:value2, ...} The structure of the key-value pairs, in object-oriented language, key is the property of the object, value is the corresponding property value, so it is easy to understand that the value method is the object. Key Gets the property value, the type of the property value can be a number, a string, an array, an object.
- Arrays: Arrays in JS are [] enclosed content, data structures for [' Python ', ' JavaScript ', ' C + + ', ...], in the same way as all languages, using index get, the type of field value can be a number, string, array, object.
JSON module
The JSON module provides four functions: dumps, dump, loads, load, for converting between string and Python data types.
1.json.dumps ()
To convert the Python type to a JSON string, return a str object, from Python to JSON, type conversions against the following:
Python |
Json |
Dict |
Object |
List, tuple |
Array |
STR, Utf-8 |
String |
int, float |
Number |
True |
True |
False |
False |
None |
Null |
#!/usr/bin/python3#-*-conding:utf-8-*-__author__ = ' Mayi ' Import jsonliststr = [1, 2, 3, 4]tuplestr = (1, 2, 3, 4) dictst r = {"City": "Beijing", "name": "Ant"}print (Json.dumps (LISTSTR)) # [1, 2, 3, 4]print (Type (Json.dumps (LISTSTR))) # <class ' str ' >print (Json.dumps (TUPLESTR)) # [1, 2, 3, 4]print (Type (Json.dumps (TUPLESTR))) # <class ' str ' ># Note: json.dumps () ASCII encoding used by default when serializing # Add parameter ensure_ascii=false disable ASCII encoding, press UTF-8 encoding print (Json.dumps (dictstr, ensure_ascii = False)) # {"City" : "Beijing", "name": "Ant"}print (Type (Json.dumps (dictstr, ensure_ascii = False)) # <class ' str ' >
2.json.dump ()
Serializing a python built-in type to a JSON object after writing to a file
#!/usr/bin/python3#-*-conding:utf-8-*-__author__ = ' Mayi ' Import jsonliststr = [{"City": "Beijing"}, {"name": "Ant"}]json.dum P (liststr, open ("Liststr.json", "w", encoding = "Utf-8"), Ensure_ascii = False) Dictstr = {"City": "Beijing", "name": "Ant"}json. Dump (DICTSTR, open ("Dictstr.json", "w", encoding = "Utf-8"), Ensure_ascii = False)
3.json.loads ()
The JSON format string is decoded into a Python object, and the type conversions from JSON to Python are compared as follows:
Json |
Python |
Object |
Dict |
Array |
List |
String |
Utf-8 |
Number (int) |
Int |
Number (real) |
Float |
True |
True |
False |
False |
Null |
None |
#!/usr/bin/python3#-*-conding:utf-8-*-__author__ = ' Mayi ' Import jsonstrlist = ' [1, 2, 3, 4] ' strdict = ' {' City ': ' Beijing ', ' Name ":" Ant "} ' Print (Json.loads (strlist)) # [1, 2, 3, 4]# JSON data automatically press Utf-8 to store print (Json.loads (strdict)) # {' City ': ' Beijing ', ' name ' ': ' Ant '}
4.json.load ()
Read a JSON-like string in a file and convert it to a Python type
#!/usr/bin/python3#-*-conding:utf-8-*-__author__ = ' Mayi ' Import jsonstrlist = json.load (Open ("Liststr.json", "R", encoding = "Utf-8")) print (strlist) # [{' City ': ' Beijing '}, {' name ': ' ant '}]strdict = json.load (Open ("Dictstr.json", "R", encoding = "Utf-8")) print (strdict) # {' City ': ' Beijing ', ' name ': ' Ant '}
JsonPath
Jsonpath is an information extraction class library that extracts the specified information from a JSON document and provides multiple language implementations, including: JavaScript, Python, PHP, and Java.
Jsonpath, for JSON, is equivalent to XPath for XML.
- : Https://pypi.python.org/pypi/jsonpath
- Installation method: Download after extracting and then execute Python setup.py install
- Official Document: Http://goessner.net/articles/JsonPath
Jsonpath vs. XPath syntax:
The Jsonpath structure is clear, readability is high, the complexity is low, very easy to match, the following table corresponds to the use of XPath.
Xpath |
JSONPath |
Describe |
/ |
$ |
Root node |
. |
@ |
Current node |
/ |
. or [] |
Take child nodes |
.. |
N/A |
Take parent node, Jsonpath not supported |
// |
.. |
Select all eligible nodes, regardless of location |
* |
* |
Match all ELEMENT nodes |
@ |
N/A |
Based on property access, Jsonpath does not support |
[] |
[] |
Iterators (can be used to do simple iterative operations, such as array subscript, based on content selection, etc.) |
| |
[,] |
Supports long selection in iterators |
[] |
? () |
Support filtering operations |
N/A |
() |
Supports expression evaluation |
() |
N/A |
Grouping, Jsonpath not supported |
Example:
To pull the net City JSON file: Http://www.lagou.com/lbs/getAllCitySearchLabels.json For example, get all the city names.
#!/usr/bin/python3#-*-conding:utf-8-*-__author__ = ' Mayi ' Import urllib.requestimport jsonimport jsonpath# Pull Hook net city json file URL = ' Http://www.lagou.com/lbs/getAllCitySearchLabels.json ' # user-agent header = {' user-agent ': ' Mozilla /5.0 (Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/39.0.2171.71 safari/537.36 '}# URL, together with headers, constructs the requests request, which will be shipped with Chrome browser user-agentrequest = urllib.request.Request (url, headers = header) # Send this request to the server response = Urllib.request.urlopen (Request) # get page content: byteshtml = Response.read () # transcoding: bytes to strhtml = Html.decode ("Utf-8") # Convert the JSON format string to a Python object obj = json.loads (HTML) # starts at the root node and matches the name node city_list = Jsonpath.jsonpath (obj, ' $. Name ') # Print Gets the name node print (city_list) # prints its type print (Type (city_list)) # Write local Disk file with open ("City.json", "w", encoding = " Utf-8 ") as f: content = Json.dumps (city_list, ensure_ascii = False) f.write (content)
Crawler--json modules and Jsonpath modules