Python programmers, especially those who do reptiles, know that the HTTP request Library requests,requests perfectly embodies the meaning of the word "for humans".
Its author is a high-value photography enthusiast Kennethreitz, Kennethreitz wrote a lot of libraries, in addition to requests, and pipenv, a better integration of package management and environmental management tools. Date Time library Maya and so on.
These two days, he's got a new project called requests-html,html Parsing for humans link: https://github.com/kennethreitz/requests-html, as the name implies, it is used to parse of the HTML document. The Star of the project in just two days has been over 3000
We used to write reptiles, parsing HTML pages will usually choose BeautifulSoup or lxml library, although the BeautifulSoup API is relatively friendly, but its parsing performance is low, and lxml use XPath syntax, parsing speed, but the code is not readable Sex, now Kennethreitz out of this HTML parsing library inherits the fine tradition of the requests library-for humans.
We know that requests is only responsible for the network request, but does not parse the response results, you can interpret the requests-html as a requsts library that can parse the HTML document.
Requests-html's code is actually very small, currently less than 200 lines, are based on the existing framework for two times the package, making it easier for developers to invoke. It relies on pyquery, requests, lxml and other libraries.
Installation
pip install requests-html
How to use
>>> from requests_html import session# 返回一个Response对象>>> r = session.get(‘https://python.org/‘)
Get all Links
>>> r.html.links{‘/users/membership/‘, ‘/about/gettingstarted/‘}# 使用css选择器的方式获取某个元素>>> about = r.html.find(‘#about‘)[0]>>> print(about.text)AboutApplicationsQuotesGetting StartedHelpPython Brochure
Another very appealing feature is the ability to convert HTML to markdown text
# 将html转换为Markdown文本>>> print(about.markdown)* [About](/about/)* [Applications](/about/apps/)* [Quotes](/about/quotes/)* [Getting Started](/about/gettingstarted/)* [Help](/about/help/)* [Python Brochure](http://brochure.getpython.info/)
In fact, through the study of Python, small series also experienced a lot, although easy to get started, but the difficulty of the advanced! As an experienced, small compiled a number of learning materials, I hope that the single-digit learning to help!
Need to be small partners can sweep the QR code below, or directly add the number: kele22558!
Kenneth Reitz, the greatest engineer in the Python field, is out of business again!