Simple use cases for Requests/lxml

Source: Internet
Author: User

ByteArray ([Source [, encoding [, errors]])

ByteArray ([Source [, encoding [, errors]]) returns a byte array. The ByteArray type is a mutable sequence, and the value range of the elements in the sequence is [0, 255].

parameter Source:

If source is an integer, returns an initialized array of length source;

If source is a string, the string is converted to a sequence of bytes according to the specified encoding;

If source is an iterative type, the element must be an integer in [0, 255];

If source is an object that is consistent with the buffer interface, this object can also be used to initialize the ByteArray.

Use of the lxml library

Compare Details http://www.crifan.com/python_try_lxml_parse_html/

In short, selector = etree. HTML(HTML) is used to return a <element html at 0x28b0620>

It consists of <element head at 0x28c13f0>,<element body at 0x28c1fa8> and other components

Use content = Selector.xpath ('//span[@class = "CTT"]) to parse the XML,

Use of XPath

XPath is basically a tree-like approach to describing the path in an XML document. For example, use "/" as a separation between the upper and lower levels. The first "/" represents the root node of the document (note that it does not refer to the tag node at the outermost of the document, but

Refers to the document itself). For example, for an HTML file, the outermost node should be "/html". To locate an HTML tag, you can use an absolute path in a file-like path, such as Page.xpath (U "/html/body

/P "), it will find all the P tags under the body node, or you can use a relative path in a file path like this: Page.xpath (U"//p "), which will find all the P tags in the entire HTML code.

In addition, you can use conditions such as [@class] to further filter the content to narrow down the scope.

When you extract content, you may encounter nested tags, which can be used to xpath(‘string(.)‘) extract all the strings directly

Code sample

#-*-coding:utf8-*-__author__='liu_100'ImportRequests fromlxmlImportEtreecookie= {'Cookies':'_t_wm=8a2006293dfe5dc8c4d35223168328e8; Sub=_2a256te82derxgedh6vcz-srpytiihxvzz1f-rdv6punbunbelrp3kw1lheskxduojyw0wfpmv0w89pmwwxf5_w.; subp= 0033wrsxqpxfm725ws9jqgmf55529p9d9wf1xfn7lmtjssvpaxdgfnzf5jpx5k2hugl.fo24eo-r1kb0eob2djloiexlxk-lb--lb.blxk-lb--lb.blxk-l1 2ql12zlxkblb.2lb.2lxk-lbonl1k5t; suhb=0rr6esvipulf8c; alf=1466944614; ssologinstate=1464352614'}url='http://weibo.cn/u/1890493665'#html = requests.get (URL). Content#Print HTMLhtml = requests.get (URL, cookies=cookies). Content#html = requests.get (URL, cookies=cookie). Text#html = bytes (ByteArray (HTML, encoding= ' Utf-8 '))selector =etree. HTML (HTML) content= Selector.xpath ('//span[@class = "CTT"]') foreachinchContent:text= Each.xpath ('string (.)')    PrintText

Simple use cases for Requests/lxml

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.