Crawler-json module and jsonpath module, crawler jsonjsonpath

Source: Internet
Author: User

Crawler-json module and jsonpath module, crawler jsonjsonpath

JSON (JavaScript Object Notation) is a lightweight data exchange format, which makes it easy for people to read and write. It also facilitates machine parsing and generation. Suitable for Data Interaction scenarios, such as data interaction between the front-end and backend of a website.

JSON is comparable to XML.

Python 3.x comes with the JSON module, which can be used directly by importing json.

Official documents: http://docs.python.org/library/json.html

Json online resolution site: http://www.json.cn /#

JSON

Json is simply an object and an array in JavaScript. Therefore, the two structures are objects and arrays. These two structures can represent various complex structures.

Json Module

The json module provides four functions: dumps, dump, loads, and load for conversion between string and Python data types.

1. json. dumps ()

Converts the Python type to a Json string and returns a str object. The conversion from Python to Json type is as follows:

Python Json
Dict Object
List, tuple Array
Str, UTF-8 String
Int, float Number
True True
False False
None Null
#! /Usr/bin/python3 #-*-conding: UTF-8-*-_ author _ = 'mayi' import jsonlistStr = [1, 2, 3, 4] tupleStr = (1, 2, 3, 4) dictStr = {"city": "Beijing", "name": "ant"} print (json. dumps (listStr) # [1, 2, 3, 4] print (type (json. dumps (listStr) # <class 'str'> print (json. dumps (tupleStr) # [1, 2, 3, 4] print (type (json. dumps (tupleStr) # <class 'str'> # Note: json. default ascii encoding used for dumps () serialization # Add the parameter ensure_ascii = False to disable ascii encoding, print by UTF-8 encoding (json. dumps (dictStr, ensure_ascii = False) # {"city": "Beijing", "name": "ant"} print (type (json. dumps (dictStr, ensure_ascii = False) # <class 'str'>

2. json. dump ()

Serialize the Python built-in type into a Json object and write it to the file.

#! /Usr/bin/python3 #-*-conding: UTF-8-*-_ author _ = 'mayi' import jsonlistStr = [{"city": "Beijing "}, {"name": "ant"}] json. dump (listStr, open ("listStr. json "," w ", encoding =" UTF-8 "), ensure_ascii = False) dictStr = {" city ":" Beijing "," name ":" ant "} json. dump (dictStr, open ("dictStr. json "," w ", encoding =" UTF-8 "), ensure_ascii = False)

3. json. loads ()

Decodes and converts a Json string to a Python object. The type conversion from Json to Python is as follows:

Json Python
Object Dict
Array List
String UTF-8
Number (int) Int
Number (real) Float
True True
False False
Null None
#! /Usr/bin/python3 #-*-conding: UTF-8-*-_ author _ = 'mayi' import jsonstrList = '[1, 2, 3, 4] 'strdict = '{"city": "Beijing", "name": "ant"} 'print (json. loads (strList) # [1, 2, 3, 4] # json data is automatically stored in print (json. loads (strDict) # {'city': 'beijing', 'name': 'ant '}

4. json. load ()

Reads A Json string from a file and converts it to the Python type.

#! /Usr/bin/python3 #-*-conding: UTF-8-*-_ author _ = 'mayi' import jsonstrList = json. load (open ("listStr. json "," r ", encoding =" UTF-8 ") print (strList) # [{'city': 'beijing'}, {'name ': 'ant'}] strDict = json. load (open ("dictStr. json "," r ", encoding =" UTF-8 ") print (strDict) # {'city': 'beijing', 'name': 'ant '}
JsonPath

JsonPath is an information extraction class library. It is a tool for extracting specified information from JSON documents and provides implementation versions in multiple languages, including JavaScript, Python, PHP, and Java.

For JSON, JsonPath is equivalent to XPATH for XML.

  • : Https://pypi.python.org/pypi/jsonpath
  • Installation Method: Decompress the package and run python setup. py install.
  • Official documents: http://goessner.net/articles/JsonPath
Comparison between JsonPath and XPath Syntax:

JsonPath has a clear structure, high readability, low complexity, and easy matching. The following table corresponds to the use of XPath.

Xpath JSONPath Description
/ $ Root Node
. @ Current Node
/ . Or [] Subnode Extraction
.. N/ Obtain the parent node. Jsonpath is not supported.
// .. Select all qualified nodes regardless of their locations
* * Match All element nodes
@ N/ JsonPath does not support attribute-based access.
[] [] Iterator (simple iteration operations can be performed inside, such as array subscript and Value Selection Based on content)
| [,] Support multiple selections in the iterator
[] ? () Supports Filter Operations
N/ () Expressions supported
() N/ Group, not supported by JsonPath
Example:

Take the hook net city JSON file: http://www.lagou.com/lbs/getAllCitySearchLabels.json as an example, get all the city names.

#! /Usr/bin/python3 #-*-conding: UTF-8-*-_ author _ = 'mayi' import urllib. requestimport jsonimport jsonpath # url = 'HTTP: // www.lagou.com/lbs/getAllCitySearchLabels.json'# User-Agent header = {'user-agent': 'mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/100'} # construct a Request together with headers, this request will be accompanied by chrome's User-Agentrequest = urllib. request. request (url, headers = header) # Send this Request to the server response = urllib. request. urlopen (request) # obtain the page content: byteshtml = response. read () # transcoding: bytes to strhtml = html. decode ("UTF-8") # convert a json string to a python object obj = json. loads (html) # match the name node city_list = jsonpath from the root node. jsonpath (obj, '$ .. name ') # print the obtained name node print (city_list) # print its type print (type (city_list) # Write the file to the local disk with open ("city. json "," w ", encoding =" UTF-8 ") as f: content = json. dumps (city_list, ensure_ascii = False) f. write (content)

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.