Download Web pages using HttpClient

Source: Internet
Author: User

Httpcore

Some basic encapsulation of HTTP protocol client programming has been done. For example, format the request header and parse the response header. Linef Ormatter is used to format the request header information, while the actual implementation is BASICLINEF Ormatter

On Httpresponseparser parse the response header.

the request header information is encapsulated in a httpparams. Basichttpparams uses a hash table to implement Httpparams.
Httpprotocolparams contains a specific method to set parameters , for example, to set the HTTP protocol version number of the Setversion party
Method. The org.apache.http.HttpVersion encapsulates all possible HTTP protocol version numbers. The HTTP protocol already defined
Version is 1.1/1.0/0.9. For example, use Httpprotocolparams to set the HTTP protocol version to 1.1.

New basichttpparams (); // setting parameters to the params
Httpprotocolparams.setversion (params, httpversion.http_l_l);

Set the connection parameter httpparams.

New basichttpparams (); // Setting the connection timeout Httpconnectionparams.setconnectiontimeout (params, * +); // set the socket timeout Httpconnectionparams.setsotimeout (params, * +); // set the socket cache size Httpconnectionparams.setsocketbuffersize (params, 8192);

Httpprotocolparams has a setuseragent method that sets the client type.

// set the parameters to the same IE7 as the httpprotocolparams.setuseragent (params,"mozilla/4.0" (compatible; MSIE 6.0; Windows NT 5.1) ");

The HTTP protocol processor is a collection of protocol interceptors that implements the "chain of responsibility" model . Each protocol Interceptor Works
Specific aspects that the interceptor is responsible for. For example, Requesttargethost adds host information to the request header,
Requestuseragent adds user_agent information to the request header.

An HTTP response is a message sent back to the client by the server after receiving and interpreting the request message. of the response message
The first line contains the protocol version, followed by the number status code and the associated text segment.

New"OK"//http/1.1//OK//http/1.1 OK 
Simulation browser
Private StaticListgetheads () {//Header InformationString useragent = "mozilla/5.0 (Windows; U Windows NT 5.1; ZH-CN; rv:1.9.1.2) "; List<Header> headers =NewArraylist(); Headers.add (NewBasicheader ("Accept-charset", "gb2312,utf-8;q=0.1, *;q=0.7")); Headers.add (NewBasicheader ("Accept-language", "ZH-CN, zh;q=0.5")); Headers.add (NewBasicheader ("User-agent", useragent));returnheaders;} List<Header> headers =getheads (); Closeablehttpclient httpclient=httpclientbuilder.create (). Setdefaultheaders (Headers). build ();
Retry

The Httprequestretryhandler interface determines whether an HTTP request can be executed after encountering a recoverable exception
Retry. The Defaulthttprequestretryhandler class implements 3 retries, and the code for multiple trials 2 times is shown below.

Httprequestretryhandler Retryhandler =newtrue// retry 5 times closeablehttpclient HttpClient =httpclientbuilder.create (). Setretryhandler (Retryhandler). build ();

The code that modifies the time-out setting.

Config int sockettimeout =; int connectiontimeout =; // Request Configuration Requestconfig requestconfig = requestconfig.custom (). Setconnecttimeout (ConnectionTimeout). SetSocketTimeout ( sockettimeout). Build (); // Create client HttpClient HttpClient = httpclientbuilder.create (). Setdefaultrequestconfig (Requestconfig). build ();
Crawling compressed web pages
some Site page content return format is gzip compression format, so after getting the return result to determine whether the content is compressed, if so, first to extract, and then parse the content. The header information returned by such a webpage will explain content-encoding:gzip

Reference: http://blog.csdn.net/qy20115549/article/details/52912532

Crawl Web pages that need to be signed in

Reference: http://www.cnblogs.com/Michael2397/p/7811699.html

Agent

Reference: http://www.cnblogs.com/Michael2397/p/7821930.html

Download Web pages using HttpClient

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.