Crawler python3 Crawl Web page resource mode (1. The simplest:
- Import' http://www.baidu.com/'print2. Via Request
- Import' http://www.baidu.com 'print1.import urllib.request the difference between the ' wd ' python 'opt-webpage 'on ' ie ' GBK 'get and POST requests is that post requests usually have "side effects" ' Mozilla/4.0 (compatible; MSIE 5.5; Windows NT) 'user-agent '
Import Urllib.requestFrom Urllib.error import Urlerror, httperrorreq=urllib.request.request (' http://www.baidu.com ') Try: Urllib.request.urlopen (req) except Urlerror as E:print (E.reason) httperror
1.Openers:2.Handles: Import Urllib.requestPassword_mgr=urllib.request.httppasswordmgrwithdefaultrealm () top_level_url= "http://example.com/foo/" Password_ Mgr.add_password (None,top_level_url, ' why ', ' 1223 ') Handler=urllib.request.httpbasicauthhandler (password_mgr) Opener=urllib.request.build_opener (handler) a_url= ' http://www.baidu.com/' Opener.open (a_url) Urllib.request.install_opener (opener)The latter contains the port number.
Python3 Crawling Web pages