JSON and Jsonpath of Python data extraction

Source: Internet
Author: User
Tags xpath

JSON (JavaScript Object Notation) is a lightweight data interchange format that makes it easy for people to read and write. It also facilitates the analysis and generation of machines. Suitable for data interaction scenarios, such as data interaction between the foreground and background of a website.

The comparison between JSON and XML is comparable.

Python 2.7 comes with a JSON module that import json can be used directly.

Official Document: Http://docs.python.org/library/json.html

JSON online parsing website: http://www.json.cn/#

Json

JSON is simply called objects and arrays in JavaScript, so these two structures are objects and arrays of two structures that can represent a variety of complex structures

  1. Object: The object is represented in JS as the { } enclosed content, the { key:value, key:value, ... } structure of the key-value pairs, in the object-oriented language, key is the property of the object, value is the corresponding property value, so it is easy to understand, the value of the method is the object. Key Gets the property value, The type of this property value can be a number, a string, an array, an object.

  2. Array: The array in JS is enclosed in parentheses [ ] , the data structure is ["Python", "javascript", "C++", ...] , the value is the same as in all languages, using index get, the type of field value can be number, string, array, object several.

Import JSON

The JSON module provides four functions:,,, dumps dump loads load for converting between string and Python data types.

1. Json.loads ()

Convert the JSON format string decoding to a Python object from JSON to Python type conversions against the following:

# json_loads.pyimport jsonstrList = ‘[1, 2, 3, 4]‘strDict = ‘{"city": "北京", "name": "大猫"}‘json.loads(strList) # [1, 2, 3, 4]json.loads(strDict) # json数据自动按Unicode存储# {u‘city‘: u‘\u5317\u4eac‘, u‘name‘: u‘\u5927\u732b‘}
2. Json.dumps ()

Implement the Python type into a JSON string and return a Str object to convert a Python object encoding into a JSON string

The conversion from the Python primitive type to the JSON type is compared to the following:

# json_dumps.pyImport JSONImport CHARDETLISTSTR = [1,2,3,4]tuplestr = (1,2,3,4) Dictstr = {"City":"Beijing","Name": "big cat"}json.dumps (LISTSTR) # ' [1, 2, 3, 4] ' Json.dumps (TUPLESTR) ' Span class= "hljs-comment" ># ' [1, 2, 3, 4] ' # Note: json.dumps () The ASCII encoding used by default when serializing # add parameter ensure_ascii=false disable ASCII encoding, press UTF-8 encoding # chardet.detect () to return the dictionary, Where confidence is the detection accuracy json.dumps (DICTSTR) # ' {"City": "\\u5317\\u4eac", "name": "\\u5927\\ u5218 "} ' Chardet.detect (Json.dumps (DICTSTR)) # {' confidence ': 1.0, ' Encoding ': ' ASCII '}< Span class= "Hljs-keyword" >print json.dumps (dictstr, Ensure_ascii=False) # {" City ":" Beijing "," name ":" Da Liu "}chardet.detect (Json.dumps (DICTSTR, ensure_ascii= false)) # {' confidence ': 0.99, ' encoding ': ' Utf-8 '}        

Chardet is a very good code recognition module that can be installed by PIP

3. Json.dump ()

Serializing a python built-in type to a JSON object after writing to a file

# json_dump.pyimport jsonlistStr = [{"city": "北京"}, {"name": "大刘"}]json.dump(listStr, open("listStr.json","w"), ensure_ascii=False)dictStr = {"city": "北京", "name": "大刘"}json.dump(dictStr, open("dictStr.json","w"), ensure_ascii=False)
4. Json.load ()

Read a JSON-like string element in a file into a Python type

# json_load.pyimport jsonstrList = json.load(open("listStr.json"))print strList# [{u‘city‘: u‘\u5317\u4eac‘}, {u‘name‘: u‘\u5927\u5218‘}]strDict = json.load(open("dictStr.json"))print strDict# {u‘city‘: u‘\u5317\u4eac‘, u‘name‘: u‘\u5927\u5218‘}
JsonPath

JsonPath is an information extraction class library that extracts specified information from JSON documents and provides multiple language implementations, including: Javascript, Python, PHP, and Java.

JsonPath, for JSON, is equivalent to XPATH for XML.

: Https://pypi.python.org/pypi/jsonpath

Installation method: Click on Download URL the link to download Jsonpath, after decompression to executepython setup.py install

Official Document: Http://goessner.net/articles/JsonPath

Jsonpath vs. XPath syntax:

The JSON structure is clear, readable, complex, and very easy to match, and the following table corresponds to the use of XPath.

XPath JSONPath Description
/ $ Root node
. @ Current node
/ .Or[] Take child nodes
.. N/A Take parent node, Jsonpath not supported
// .. That is, regardless of location, select all eligible conditions
* * Match all ELEMENT nodes
@ N/A JSON is not supported based on property access because JSON is a key-value recursive structure and is not required.
[] [] Iterator indicator (can be used to do simple iterative operations, such as array subscript, based on content selection, etc.)
| [,] Supports long selection in iterators.
[] ?() Supports filtering operations.
N/A () Supports expression evaluation
() N/A Grouping, Jsonpath not supported
Example:

We take the Http://www.lagou.com/lbs/getAllCitySearchLabels.json city JSON file as an example to get all the cities.

# jsonpath_lagou.pyimport urllib2import jsonpath Import Jsonimport chardeturl =  ' http://www.lagou.com/lbs/ Getallcitysearchlabels.json ' request =urllib2. Request (URL) response = Urllib2.urlopen (request) HTML = Response.read () # Converts a JSON format string to a Python object jsonobj = json.loads (html) # starting from the root node, matching the name node CityList = Jsonpath.jsonpath (Jsonobj, "$". Name ') print citylistprint type (citylist) fp = open ( ' City.json ',  ' W ') content = Json.dumps (citylist, Ensure_ascii=false) print contentfp.write (Content.encode ( Utf-8 ')) fp.close ()              
Precautions:

Json.loads () is to convert the JSON format string decoding to a Python object, and if there is an error in json.loads, be aware of the encoding of the decoded JSON character.

If the encoding of the passed-in string is not UTF-8, you need to specify a character-encoded parameterencoding

dataDict = json.loads(jsonStrGBK);
    • Datajsonstr is a JSON string, assuming its encoding itself is non-UTF-8 words but GBK, then the above code will cause an error, instead of the corresponding:

        dataDict = json.loads(jsonStrGBK, encoding="GBK");
    • If DATAJSONSTR specifies the appropriate encoding through encoding, but it also contains other encoded characters, you need to first convert DATAJSONSTR to Unicode, and then specify the encoding format to call Json.loads ()

``` python

Datajsonstruni = Datajsonstr.decode ("GB2312"); Datadict = Json.loads (Datajsonstruni, encoding= "GB2312");

##字符串编码转换这是中国程序员最苦逼的地方,什么乱码之类的几乎都是由汉字引起的。其实编码问题很好搞定,只要记住一点:####任何平台的任何编码 都能和 Unicode 互相转换UTF-8 与 GBK 互相转换,那就先把UTF-8转换成Unicode,再从Unicode转换成GBK,反之同理。``` python # 这是一个 UTF-8 编码的字符串utf8Str = "你好地球"# 1. 将 UTF-8 编码的字符串 转换成 Unicode 编码unicodeStr = utf8Str.decode("UTF-8")# 2. 再将 Unicode 编码格式字符串 转换成 GBK 编码gbkData = unicodeStr.encode("GBK")# 1. 再将 GBK 编码格式字符串 转化成 UnicodeunicodeStr = gbkData.decode("gbk")# 2. 再将 Unicode 编码格式字符串转换成 UTF-8utf8Str = unicodeStr.encode("UTF-8")

decodeTo convert other encoded strings to Unicode encoding

encodeThe role of converting Unicode encoding to another encoded string

一句话:UTF-8是对Unicode字符集进行编码的一种编码方式

JSON and Jsonpath of Python data extraction

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.