ByteArray ([Source [, encoding [, errors]])
ByteArray ([Source [, encoding [, errors]]) returns a byte array. The ByteArray type is a mutable sequence, and the value range of the elements in the sequence is [0, 255].
parameter Source:
If source is an integer, returns an initialized array of length source;
If source is a string, the string is converted to a sequence of bytes according to the specified encoding;
If source is an iterative type, the element must be an integer in [0, 255];
If source is an object that is consistent with the buffer interface, this object can also be used to initialize the ByteArray.
Use of the lxml library
Compare Details http://www.crifan.com/python_try_lxml_parse_html/
In short, selector = etree. HTML(HTML) is used to return a <element html at 0x28b0620>
It consists of <element head at 0x28c13f0>,<element body at 0x28c1fa8> and other components
Use content = Selector.xpath ('//span[@class = "CTT"]) to parse the XML,
Use of XPath
XPath is basically a tree-like approach to describing the path in an XML document. For example, use "/" as a separation between the upper and lower levels. The first "/" represents the root node of the document (note that it does not refer to the tag node at the outermost of the document, but
Refers to the document itself). For example, for an HTML file, the outermost node should be "/html". To locate an HTML tag, you can use an absolute path in a file-like path, such as Page.xpath (U "/html/body
/P "), it will find all the P tags under the body node, or you can use a relative path in a file path like this: Page.xpath (U"//p "), which will find all the P tags in the entire HTML code.
In addition, you can use conditions such as [@class] to further filter the content to narrow down the scope.
When you extract content, you may encounter nested tags, which can be used to xpath(‘string(.)‘)
extract all the strings directly
Code sample
#-*-coding:utf8-*-__author__='liu_100'ImportRequests fromlxmlImportEtreecookie= {'Cookies':'_t_wm=8a2006293dfe5dc8c4d35223168328e8; Sub=_2a256te82derxgedh6vcz-srpytiihxvzz1f-rdv6punbunbelrp3kw1lheskxduojyw0wfpmv0w89pmwwxf5_w.; subp= 0033wrsxqpxfm725ws9jqgmf55529p9d9wf1xfn7lmtjssvpaxdgfnzf5jpx5k2hugl.fo24eo-r1kb0eob2djloiexlxk-lb--lb.blxk-lb--lb.blxk-l1 2ql12zlxkblb.2lb.2lxk-lbonl1k5t; suhb=0rr6esvipulf8c; alf=1466944614; ssologinstate=1464352614'}url='http://weibo.cn/u/1890493665'#html = requests.get (URL). Content#Print HTMLhtml = requests.get (URL, cookies=cookies). Content#html = requests.get (URL, cookies=cookie). Text#html = bytes (ByteArray (HTML, encoding= ' Utf-8 '))selector =etree. HTML (HTML) content= Selector.xpath ('//span[@class = "CTT"]') foreachinchContent:text= Each.xpath ('string (.)') PrintText
Simple use cases for Requests/lxml