Previously we were using Python's own parser, Html.parser. Official web side There are some other parsers, we learn from each other.
Parser |
How to use |
Advantages |
Disadvantages |
Htm.parser |
BeautifulSoup (markup, ' Html.parser ') |
1. Python comes with 2, the resolution speed is passable 3, fault-tolerant strong |
2.7 Prior versions, and 3.3 not including 2.7 are not supported |
lxml ' s HTML parser |
BeautifulSoup (markup, ' lxml ') |
1, very fast 2, fault-tolerant strong |
To install the C language Library |
lxml ' s XML parser |
BeautifulSoup (markup,[' lxml ', ' xml ') BeautifulSoup (markup, ' XML ') |
1. Fast speed 2. The only parser that supports XML |
Need to install the C language Library |
Html5lib |
BeautifulSoup (markup, ' Html5lib ') |
1. The strongest fault tolerance 2. Parsing documents in Browser mode 3. Generate HTML5 Format documents |
1. Slow speed 2, do not rely on external expansion |
That's all you got to know.
Python Crawler---beautifulsoup (2)