Crawler--json modules and Jsonpath modules

Source: Internet
Author: User
Tags file url xpath

JSON (JavaScript Object Notation) is a lightweight data interchange format that makes it easy for people to read and write. It also facilitates the analysis and generation of machines. Suitable for data interaction scenarios, such as data interaction between the foreground and background of a website.

JSON and XML are comparable.

Python 3.X comes with a JSON module that can be used directly from the import JSON.

Official Document: Http://docs.python.org/library/json.html

JSON online parsing website: http://www.json.cn/#

Json

JSON is simply the object and array of JavaScript, so these two structures are objects and arrays of two structures, which can represent a variety of complex structures.

    1. Object: Object in JS is represented as {} in the content, data structure is {key1:value1, key2:value2, ...} The structure of the key-value pairs, in object-oriented language, key is the property of the object, value is the corresponding property value, so it is easy to understand that the value method is the object. Key Gets the property value, the type of the property value can be a number, a string, an array, an object.
    2. Arrays: Arrays in JS are [] enclosed content, data structures for [' Python ', ' JavaScript ', ' C + + ', ...], in the same way as all languages, using index get, the type of field value can be a number, string, array, object.
JSON module

The JSON module provides four functions: dumps, dump, loads, load, for converting between string and Python data types.

1.json.dumps ()

To convert the Python type to a JSON string, return a str object, from Python to JSON, type conversions against the following:

Python Json
Dict Object
List, tuple Array
STR, Utf-8 String
int, float Number
True True
False False
None Null
#!/usr/bin/python3#-*-conding:utf-8-*-__author__ = ' Mayi ' Import jsonliststr = [1, 2, 3, 4]tuplestr = (1, 2, 3, 4) dictst r = {"City": "Beijing", "name": "Ant"}print (Json.dumps (LISTSTR)) # [1, 2, 3, 4]print (Type (Json.dumps (LISTSTR))) # <class ' str ' >print (Json.dumps (TUPLESTR)) # [1, 2, 3, 4]print (Type (Json.dumps (TUPLESTR))) # <class ' str ' ># Note: json.dumps () ASCII encoding used by default when serializing # Add parameter ensure_ascii=false disable ASCII encoding, press UTF-8 encoding print (Json.dumps (dictstr, ensure_ascii = False)) # {"City" : "Beijing", "name": "Ant"}print (Type (Json.dumps (dictstr, ensure_ascii = False)) # <class ' str ' >

2.json.dump ()

Serializing a python built-in type to a JSON object after writing to a file

#!/usr/bin/python3#-*-conding:utf-8-*-__author__ = ' Mayi ' Import jsonliststr = [{"City": "Beijing"}, {"name": "Ant"}]json.dum P (liststr, open ("Liststr.json", "w", encoding = "Utf-8"), Ensure_ascii = False) Dictstr = {"City": "Beijing", "name": "Ant"}json. Dump (DICTSTR, open ("Dictstr.json", "w", encoding = "Utf-8"), Ensure_ascii = False)

3.json.loads ()

The JSON format string is decoded into a Python object, and the type conversions from JSON to Python are compared as follows:

Json Python
Object Dict
Array List
String Utf-8
Number (int) Int
Number (real) Float
True True
False False
Null None
#!/usr/bin/python3#-*-conding:utf-8-*-__author__ = ' Mayi ' Import jsonstrlist = ' [1, 2, 3, 4] ' strdict = ' {' City ': ' Beijing ', ' Name ":" Ant "} ' Print (Json.loads (strlist)) # [1, 2, 3, 4]# JSON data automatically press Utf-8 to store print (Json.loads (strdict)) # {' City ': ' Beijing ', ' name ' ': ' Ant '}

4.json.load ()

Read a JSON-like string in a file and convert it to a Python type

#!/usr/bin/python3#-*-conding:utf-8-*-__author__ = ' Mayi ' Import jsonstrlist = json.load (Open ("Liststr.json", "R", encoding = "Utf-8")) print (strlist) # [{' City ': ' Beijing '}, {' name ': ' ant '}]strdict = json.load (Open ("Dictstr.json", "R", encoding = "Utf-8")) print (strdict) # {' City ': ' Beijing ', ' name ': ' Ant '}
JsonPath

Jsonpath is an information extraction class library that extracts the specified information from a JSON document and provides multiple language implementations, including: JavaScript, Python, PHP, and Java.

Jsonpath, for JSON, is equivalent to XPath for XML.

    • : Https://pypi.python.org/pypi/jsonpath
    • Installation method: Download after extracting and then execute Python setup.py install
    • Official Document: Http://goessner.net/articles/JsonPath
Jsonpath vs. XPath syntax:

The Jsonpath structure is clear, readability is high, the complexity is low, very easy to match, the following table corresponds to the use of XPath.

 
Xpath JSONPath Describe
/ $ Root node
. @ Current node
/ . or [] Take child nodes
.. N/A Take parent node, Jsonpath not supported
// .. Select all eligible nodes, regardless of location
* * Match all ELEMENT nodes
@ N/A Based on property access, Jsonpath does not support
[] [] Iterators (can be used to do simple iterative operations, such as array subscript, based on content selection, etc.)
| [,] Supports long selection in iterators
[] ? () Support filtering operations
N/A () Supports expression evaluation
() N/A Grouping, Jsonpath not supported
Example:

To pull the net City JSON file: Http://www.lagou.com/lbs/getAllCitySearchLabels.json For example, get all the city names.

#!/usr/bin/python3#-*-conding:utf-8-*-__author__ = ' Mayi ' Import urllib.requestimport jsonimport jsonpath# Pull Hook net city json file URL = ' Http://www.lagou.com/lbs/getAllCitySearchLabels.json ' # user-agent header = {' user-agent ': ' Mozilla /5.0 (Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/39.0.2171.71 safari/537.36 '}# URL, together with headers, constructs the requests request, which will be shipped with Chrome browser user-agentrequest = urllib.request.Request (url, headers = header) # Send this request to the server response = Urllib.request.urlopen (Request) # get page content: byteshtml = Response.read () # transcoding: bytes to strhtml = Html.decode ("Utf-8") # Convert the JSON format string to a Python object obj = json.loads (HTML) # starts at the root node and matches the name node city_list = Jsonpath.jsonpath (obj, ' $. Name ') # Print Gets the name node print (city_list) # prints its type print (Type (city_list)) # Write local Disk file with open ("City.json", "w", encoding = " Utf-8 ") as f:    content = Json.dumps (city_list, ensure_ascii = False)    f.write (content)

Crawler--json modules and Jsonpath modules

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.