Python crawler (3)--ssl certificate and handler processor

Source: Internet
Author: User
Tags ssl certificate

One, SSL certificate issues   

    

In the previous article, we created a small reptile that downloaded several web pages of the Shanghai Chain Home property. In fact, when we use the Urllib networking process, we encounter a problem with restricted access to the certificate.

Handling HTTPS requests for SSL certificate validation, if SSL certificate validation does not pass, warns the user that the certificate is untrusted (that is, no AC authentication is passed).

    

On the left we can see the SSL authentication failed, so we will have to deal with the SSL certificate in the future, let the program actively ignore the SSL certificate validation error, can be accessed normally. For example we visit 12306.

1  fromUrllibImportRequest2 #import python's SSL processing module3 ImportSSL4 5 #Ignore SSL validation failure6context=Ssl._create_unverified_context ()7 8Url="https://www.12306.cn/mormhweb/"9 TenResponse=request.urlopen (url,context=context) OneHtml=Response.read () A Print(HTML)

    

Second, handler processor and custom opener

We have been using the Urlopen, it is a module to help us build a good special opener. But this basic urlopen () is not supported by proxies, cookies and other Http/https advanced features. So we need to use the handler processor to customize the opener to meet the features we need.

1 Importurllib.request2 3Url="http://www.whatismyip.com.tw/"4 5 #The parameter is a dictionary type, the key represents the type of proxy, and the value is the proxy IP and port number6Proxy_support=urllib.request.proxyhandler ({'http':'117.86.199.19:8118'})7 8 #then create a opener that contains the proxy9Opener=Urllib.request.build_opener (Proxy_support)Tenopener.addheaders=[("user-agent","mozilla/5.0 (Macintosh; u;intelmacosx10_6_8;en-us) applewebkit/534.50 (Khtml,likegecko) version/5.1safari/534.50")] One  A  - #The first is to use Install_opener () to install into the default environment, and later you use the Urlopen () function, which is done with your custom opener . - Urllib.request.install_opener (opener) theResponse=urllib.request.urlopen (URL) -  - #the second one uses a disposable opener.open () to open - #req=urllib.request.request (URL) + #Response=opener.open (req) -  +Html=response.read (). Decode ('Utf-8') A Print(HTML)

We can see that the IP that visited the website has been replaced by proxy IP. In the above setup agent process, we also use addheaders this function, to the request attached useragent,useragent Chinese name for the user agent, is part of the HTTP protocol, belongs to the head domain components, useragent also referred to as UA. It is a special string header that provides access to the Web site with information such as the type and version of the browser you are using, the operating system and version, the browser kernel, and so on. This is also one of the most common means of countering anti-reptiles.

Python crawler (3)--ssl certificate and handler processor

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.