The target site certificate encountered by crawled Web page is not a legitimate issue.
When you use Jsoup to crawl to parse a Web page, the following exception occurs.
Javax.net.ssl.SSLHandshakeException:sun.security.validator.ValidatorException:PKIX Path Building failed:
Sun.security.provider.certpath.SunCertPathBuilderException:unable to find valid certification path to requested target At Sun.security.ssl.Alerts.getSSLException (alerts.java:192) at Sun.security.ssl.SSLSocketImpl.fatal (Sslsock etimpl.java:1627) at Sun.security.ssl.Handshaker.fatalSE (handshaker.java:204) at Sun.security.ssl.Handshak Er.fatalse (handshaker.java:198) at Sun.security.ssl.ClientHandshaker.serverCertificate (clienthandshaker.java:994 ) at Sun.security.ssl.ClientHandshaker.processMessage (clienthandshaker.java:142) at Sun.security.ssl.Hands
Haker.processloop (handshaker.java:533) at Sun.security.ssl.Handshaker.process_record (handshaker.java:471) At Sun.security.ssl.SSLSocketImpl.readRecord (sslsocketimpl.java:904) at Sun.security.ssl.SSLSocketImpl.performIni
Tialhandshake (sslsocketimpl.java:1132) At Sun.security.ssl.SSLSocketImpl.writeRecord (sslsocketimpl.java:643)
Identify an invalid SSL certificate issue. As many Web sites are now upgraded from HTTP to HTTPS, it may be that the original site SSL is not deployed, resulting in invalid certificates, or perhaps the certificate itself is not recognized. There is a problem with certificate validation errors for crawling its web page. The latest version of 1.9.2 has a validatetlscertificates (Boolean false) interface for downloading Web pages using the Jsoup interface.
Jsoup.connect (URL). Timeout (30000). useragent (UA). Validatetlscertificates (False). Get ()
The Java default collection of certificates does not exist for most self-signed certificates, and for HTTP requests that do not use third-party libraries, we can manually create TrustManager to resolve them. Identify the site of the link you want to establish, otherwise this method is not recommended
public static InputStream getbydisablecertvalidation (String URL) {trustmanager[] trustallcerts = new trustmanager[] {NE
W X509trustmanager () {public x509certificate[] Getacceptedissuers () {return new x509certificate[0]; Checkclienttrusted (x509certificate[] certs, String authtype) {} public void checkservertrusted (X50
9certificate[] certs, String authtype) {}}}; hostnameverifier HV = new Hostnameverifier () {public boolean verify (String hostname, sslsession sessions) {return
True
}
};
try {Sslcontext sc = sslcontext.getinstance ("SSL");
Sc.init (NULL, Trustallcerts, New SecureRandom ());
Httpsurlconnection.setdefaultsslsocketfactory (Sc.getsocketfactory ());
Httpsurlconnection.setdefaulthostnameverifier (HV);
URL url = new url (URL);
Httpsurlconnection URLConnection = (httpsurlconnection) url.openconnection ();
InputStream is = Urlconnection.getinputstream ();
return is; The catch (Exception e) {} RetuRN null; }
Refer:
http://snowolf.iteye.com/blog/391931
Http://stackoverflow.com/questions/1828775/how-to-handle-invalid-ssl-certificates-with-apache-httpclient