First look for a Web site that can provide proxy IP, and then crawl the IP address and port number on the site. Finally, the crawling out of IP to do proxy access to the specified Web site.
The key place I marked with the red arrow. The paging parsing code is as follows
Def getproxyip (): proxy = [] for I in range (1, 3): Print (i) header = {' user-agent ': ' mozilla/5.0 ' ( X11;
Linux x86_64) ' applewebkit/537.36 (khtml, like Gecko) ' ' Ubuntu chromium/44.0.2403.89 ' chrome/44.0.2403 . The ' safari/537.36 '} req = Urllib.request.Request (url= ' http://www . xicidaili.com/nt/{0} '. Format (i), headers=header) R = Urllib.request.urlopen (req) soup = BeautifulSoup (R, ' Html.parser ', from_encoding= ' utf-8 ') Table = soup.find (' table ', attrs={' id ': ' ip_list '}) tr = Table.find_al L (' tr ') [1:] #解析得到代理ip的地址, port, and type for item in Tr:tds = Item.find_all (' TD ') Temp_di
ct = {} kind = ' {0}:{1} '. Format (Tds[1].get_text (). Lower (), Tds[2].get_text ()) proxy.append (kind) Return proxy
The head is modeled as a browser request. The final resolution of the IP and port number results in the proxy. It then starts using the delegate to access the specified Web site.
Proxy_handler = Urllib.request.ProxyHandler ({' http ': proxy_dict})
opener = Urllib.request.build_opener (proxy_ Handler)
Urllib.request.install_opener (opener)
req = Urllib.request.Request (url= "http://blog.csdn.net/" u013692888/article/details/52714103 ", Headers=header)
Urllib.request.urlopen (req)
Source Address Https://github.com/Ahuanghaifeng/python3-ip