1. Beautiful Soup Introduction
Beautiful Soup is a python library that parses data from HTML and XML files to provide a customary way to traverse the search and modify the parse tree, which greatly reduces the time it takes to run the crawler.
Beautiful soup automatically converts the input document to Unicode encoding, and the output document is converted to UTF-8 encoding. You don't have to think about encoding, unless the document does not specify an encoding, Beautiful soup cannot automatically recognize the encoding. Then, you just need to explain the original encoding method.
Beautiful soup has become as good a Python interpreter as lxml and Html6lib, providing users with the flexibility to provide different analytic strategies or strong speeds.
2. Beautiful Soup Installation
PIP can be installed quickly, the latest version is BEAUTIFULSOUP4.
$ pip Install Beautifulsoup4
After installation, import BS4 can be used.
Import BS4
Original address: http://www.cnblogs.com/wuwenyan/p/4768572.html
"Python Crawler Learning Notes (2)" Beautiful Soup Library related Knowledge Points summary