HttpClient Simple Introduction

Source: Internet
Author: User
Tags form post ssl connection

Functions of HTTP://LOVELAOMAO.BLOGBUS.COM/TAG/HTTPCLIENT/1 and HttpClient
Based on standard, pure Java, http1.0 and 1.1 are realized.
In an extensible OO framework, all methods of HTTP are implemented (GET, POST, PUT, DELETE, Head,options,and TRACE)

Encryption operations that support HTTPS (HTTP on SSL)

Transparently through the HTTP proxy to establish a connection

By using the Connect method, by establishing an HTTPS connection through the HTTP proxy

Make connections transparently across socks (versions 5 and 4) using a local Java socket

Supports authentication with basic, Digest, and NTLM encryption

Supports multi-part form POST method for uploading large files

Plug-in Secure socket implementation for easy access to third-party solutions

Connection management, support for multi-threaded applications, support for setting a single host total connection and maximum number of connections, automatic detection and shutdown of failed connections

Send the request flow directly to the server's port

Directly read the response sent from the server's port

Support for persistent connections with persistance settings in http/1.0 in KeepAlive and http/1.1

Direct access to the answer code and header information sent by the server

can set the connection timeout time

The Httpmethods implements command Pattern to allow concurrent requests or efficient connection multiplexing

Following the Apache Software license protocol, the source code is free

2. Preparatory work

For jre1.3.*, if you want to httpclient support HTTPS, you will need to download and install Jsse and JCE. The following steps are installed:
1) Download Jsse and JCE.
2) Check that there are no jar packages associated with Jsse and JCE in Classpath
3) Copy Us_export_policy.jar, Local_policy.jar, Jsse.jar, Jnet.jar, Jce1_2_x.jar, Sunjce_provider.jar, Jcert.jar to the directory:
UNIX: $JDK _home/jre/lib/ext
Windows:%jdk_home%\jre\lib\ext
4) Modify the Java.security file in the following directory.
UNIX: $JDK _home/jre/lib/security/
Windows:%jdk_home%\jre\lib\security\
5)
Will
#
# List of providers and their preference orders:
#
Security.provider.1=sun.security.provider.sun
Security.provider.2=com.sun.rsajca.provider
Switch
#
# List of providers and their preference orders:
#
Security.provider.1=com.sun.crypto.provider.sunjce
Security.provider.2=sun.security.provider.sun
Security.provider.3=com.sun.rsajca.provider
Security.provider.4=com.sun.net.ssl.internal.ssl.provider

HttpClient also requires the installation of commons-logging, below with httpclient piece installation.

3. Obtain the source code
cvs-d:p server:[email protected]:/home/cvspublic Login
Password:anoncvs
cvs-d:p server:[email protected]:/home/cvspublic Checkout jakarta-commons/logging
cvs-d:p server:[email protected]:/home/cvspublic Checkout Jakarta-commons/httpclient

Compile:
CD jakarta-commons/logging
Ant Dist
CP Dis/*.jar. /httpclient/lib/

Cd.. /httpclient
Ant Dist


4, using HttpClient programming basic step to gather

Creates an instance of the HttpClient.

Create a method (Deletemethod,entityenclosingmethod,expectcontinuemethod,getmethod,headmethod,multipartpostmethod, Optionsmethod,postmethod,putmethod,tracemethod) An instance of the target URL is generally available as a parameter.

Let HttpClient execute this method.

Read the response information.

Release the connection.

Handle the answer.

In the process of executing the method, there are two kinds of exceptions, one is httprecoverableexception, that is, contingency error occurs, the general retry may succeed, the other is IOException, serious error.
Here is one of the routines in this tutorial that you can download.

5. Certification

HttpClient three different authentication schemes: Basic, Digest and NTLM. These programs can be used for server or agent-to-client authentication, referred to as server authentication or proxy authentication.
1) Server Authentication (server authentication)
HttpClient processing server authentication is almost transparent and requires only the developer to provide login information (login credentials). The login information is stored in an instance of the Httpstate class and can be obtained or set through SetCredentials (String realm, Credentials cred) and getcredentials (String realm). Note that you set the login information required for non-specific site access and set the realm parameter to null. HttpClient built-in automatic authentication can be closed by the Setdoauthentication (Boolean doauthentication) method of the HttpMethod class, and this shutdown affects only the current instance of HttpMethod.

Preemptive authentication (preemptive authentication) can be opened by the following methods.
Client.getstate (). Setauthenticationpreemptive (True);
In this mode, HttpClient will proactively pass the Basic authentication response information to the server, even if in some cases the server may return a failed authentication response, primarily to reduce the build-up of the connection. To make each new Httpstate instance a preemptive authentication, you can set system properties as follows.
Setsystemproperty (Authenticator.preemptive_property, "true");

The httpclient implementation of the preemptive authentication follows rfc2617.

2) Agent authentication (proxy authentication)
In addition to the login information to be stored separately, proxy authentication and server authentication almost consistent. Use Setproxycredentials (String realm, Credentials cred) and getproxycredentials (string realm) to access login information.
3) Certification Scheme (authentication schemes)
Basic
is the oldest and most compatible (?) specified in HTTP. scenario, which unfortunately is also the least secure, because it transmits the user name and password in plaintext. It requires a usernamepasswordcredentials instance that can specify the server-side access space or take the default logon information.
Digest
is a program added in HTTP1.1, although not as much as basic software, but it is widely used. The digest scheme is much safer than the basic scheme because it does not transmit the actual password over the network, but it transmits a random number (nonce) from the server using this password. It requires a usernamepasswordcredentials instance that can specify the server-side access space or take the default logon information.
Ntlm
This is the most complex authentication protocol supported by HttpClient. It m$ the design of a private protocol that does not have a public specification description. At first, due to the design flaws, the security of NTLM is worse than Digest, later after a servicepack patch, security is more digest high. NTLM requires an Ntcredentials instance. Note that because NTLM does not use the concept of Access space (realms), HttpClient uses the server's domain name for the access space. It is also important to note that the user name provided to Ntcredentials, do not prefix the domain name-such as: "Adrian" is correct, and "Domain\adrian" is wrong.

The working mechanism of NTLM authentication differs greatly from basic and digest. These differences are generally handled by httpclient, but understanding these differences helps avoid errors when using NTLM authentication.

From a httpclientapi point of view, NTLM works just like any other authentication method, where the difference is the need to provide ' ntcredentials ' instances instead of ' usernamepasswordcredentials ' (in fact, the former only expands the latter)

For NTLM authentication, the access space is the domain name of the machine to which it is connected, which can be problematic for multi-domain hosts. Only the domain name specified in the HttpClient connection is the domain name that is used for authentication. It is recommended that you set the realm to NULL to use the default settings.

NTLM only authenticates a connection and not a request, so it is important that a new connection is established every time a certification is made and that it remains connected during the authentication process. Therefore, NTLM cannot be used for both proxy authentication and server authentication, nor for http1.0 connections or for cases where the server does not support persistent connections.

6. redirect

Due to technical limitations, and to ensure the stability of the 2.0 release API, HttpClient is not automatically redirected, but httpclient can be supported for redirects to the same host, the same port, and with the same protocol. Situations that cannot be handled automatically, including situations that require human interaction, or the ability to exceed httpclient.
When a server redirection instruction refers to a different host, HttpClient simply takes the redirect status code as the reply state. All return codes from 300 to 399 (with both ends) represent a redirect response. Common ones are:

301 permanent movement. httpstatus.sc_moved_permanently

302 Temporary movement. Httpstatus.sc_moved_temporarily

303 See other. Httpstatus.sc_see_other

307 temporary redirection. Httpstatus.sc_temporary_redirect

When a simple redirect is received, the program should extract the new URL from the HttpMethod object and download it. Also, it's a good idea to limit the number of redirects, which avoids recursive loops. The new URL can be extracted from the field location from scratch, as follows:
String redirectlocation;
Header Locationheader = Method.getresponseheader ("location");
if (Locationheader! = null) {
Redirectlocation = Locationheader.getvalue ();
} else {
The response is invalid and does not provide the new location for
The resource. Report an error or possibly handle the response

Like a 404 Not Found error.

}

Special redirects:

300 multi-choice. Httpstatus.sc_multiple_choices
304 no changes. Httpstatus.sc_no t_modified
305 use a proxy. Httpstatus.sc_use_proxy

7, character encoding (character encoding)

The head of a request or answer to an HTTP protocol (in the HTTP protocol, the packet is divided into two parts, part is the head, consists of some name value pairs, part of the body (body), is the real data (such as HTML pages, etc.), must be US-ASCII encoded, This is because the head does not pass data and only describes the data to be transmitted some information, one exception is the cookie, it is the data but the transmission through the head, so it also must be encoded with US-ASCII.
The body part of the HTTP packet, which can be encoded in any way, is iso-8859-1 by default and can be specified with the header field Content-type. The Addrequestheader method can be used to set the encoding mode, and the coding method is obtained by Getresponsecharset. For documents of the type such as HTML or XML, their own content-type can also specify the encoding method, mainly distinguish the scope of the two to get the correct real decoding.
The encoding standard for URLs, specified by RFC1738, can only be composed of printable 8-bit/byte us-ascii characters, 80-FF is not us-ascii characters, and 00-1f is a control character, and the characters used in both regions need to be encoded (encoded).
  
8. Cookies

HttpClient can automatically manage cookies, including allowing the server to set cookies and automatically return cookies to the server when needed, and it also supports manually setting cookies and sending them to the server side. Unfortunately, there are several norms that conflict with how cookies are handled: Netscape cookies, RFC2109, RFC2965, and a large number of software vendors ' cookie implementations do not follow any specifications. To address this situation, HttpClient provides a policy-driven approach to cookie management. The cookie specifications supported by HttpClient are:
Netscape Cookie Draft, is the earliest cookie specification, based on rfc2109. Although this specification differs significantly from rc2109, this can be compatible with some servers.

RFC2109 is the first official cookie specification published by the web. In theory, all servers should follow this specification when processing cookies (version 1), which is why HttpClient is set as the default specification. Unfortunately, this specification is so restrictive that many servers incorrectly implement the specification or are still functioning Netscape specifications. In this case, the compatibility specification should be used.

Compatibility specification, designed to be compatible with as many servers as possible, even if they do not follow standard specifications. When resolving a cookie problem, consider adopting a compatibility specification.

The RFC2965 specification has not been supported by HttpClient for the time being (added in a later version), it defines the cookie version 2, and explains the lack of version 1cookie, RFC2965 intentionally replaced rfc2109.
There are two ways to specify the use of the cookie specification in HttpClient,
HttpClient client = new HttpClient ();
Client.getstate (). Setcookiepolicy (cookiepolicy.compatibility);

The specifications set by this method are valid only for the current httpstate, and the parameters may be cookiepolicy.compatibility,cookiepolicy.netscape_draft or cookiepolicy.rfc2109.

System.setproperty ("Apache.commons.httpclient.cookiespec", "Compatibility");
This method refers to the specification, which is valid for each newly established Httpstate object, and the parameter may have the value "Compatibility", "Netscape_draft" or "RFC2109".
There are often problems with cookies that cannot be resolved, but changes to the compatibility specification can be resolved.

9, how to use httpclient encounter problems?

Use a browser to access the server to confirm that the server is responding properly

If the agent is enabled, turn off the agent and try

Find another server to try (if running different Server software is better)

Check that the code is written as described in the tutorial

Set log level to debug to find out why the problem occurred

Open Wiretrace to track client-to-server communication to where the problem really occurs

Use Telnet or netcat to manually send information to the server, which is suitable for testing when you are guessing that a cause has been found

Run Netcat as a listener and use it as a server to check how HttpClient handles the answer.

With the latest httpclient, the bug may have been fixed in the latest release

Ask for help with mailing lists

Report a bug to Bugzilla.

10. SSL

With Java secure Socket Extension (JSSE), HttpClient fully supports the HTT on secure Sockets layer (SSL) or IETF Transport layer Security (TLS) protocol P. Jsse has been jre1.4 and later, the previous version needs to be manually installed, see the Sun Web site or This study note for a specific process.
Using SSL in HttpClient is very simple, refer to the following two examples:
HttpClient HttpClient = new HttpClient ();
GetMethod httpget = new GetMethod ("https://www.verisign.com/");
Httpclient.executemethod (HttpGet);
System.out.println (Httpget.getstatusline (). toString ());
, if you pass an agent that requires authorization, the following:
HttpClient HttpClient = new HttpClient ();
Httpclient.gethostconfiguration (). SetProxy ("Myproxyhost", 8080);
Httpclient.getstate (). Setproxycredentials ("My-proxy-realm", "Myproxyhost",
New Usernamepasswordcredentials ("My-proxy-username", "My-proxy-password"));

GetMethod httpget = new GetMethod ("https://www.verisign.com/");
Httpclient.executemethod (HttpGet);

System.out.println (Httpget.getstatusline (). toString ());

The steps to customize SSL in HttpClient are as follows:

Provides a socket factory that implements the Org.apache.commons.httpclient.protocol.SecureProtocolSocketFactory interface. This socket factory is responsible for hitting a port to the server, using a standard or third-party SSL function library, and initiating operations like a connection handshake. Typically, this initialization happens automatically when a port is created.

Instantiates a Org.apache.commons.httpclient.protocol.Protocol object. When you create this instance, you need a legitimate protocol type (such as HTTPS), a custom socket factory, and a default port of medium (such as port 443 for HTTPS).
Protocol Myhttps = new Protocol ("https", new Mysslsocketfactory (), 443);
This instance can then be set to the processor of the Protocol.
HttpClient HttpClient = new HttpClient ();
Httpclient.gethostconfiguration (). Sethost ("www.whatever.com", 443, Myhttps);
GetMethod httpget = new GetMethod ("/");
Httpclient.executemethod (HttpGet);

By calling the Protocol.registerprotocol method, the custom instance is registered as the default processor for a particular protocol. This makes it easy to customize your own protocol type (such as Myhttps).
Protocol.registerprotocol ("Myhttps",
New Protocol ("https", new Mysslsocketfactory (), 9443));
...
HttpClient HttpClient = new HttpClient ();
GetMethod httpget = new GetMethod ("myhttps://www.whatever.com/");
Httpclient.executemethod (HttpGet);
If you want to replace the HTTPS default processor with your own custom processor, simply register it as "https".
Protocol.registerprotocol ("https",
New Protocol ("https", new Mysslsocketfactory (), 443));
HttpClient HttpClient = new HttpClient ();
GetMethod httpget = new GetMethod ("https://www.whatever.com/");
Httpclient.executemethod (HttpGet);

Known limitations and issues

A persistent SSL connection does not work on Sun's less than 1.4JVM, due to a bug in the JVM.

Non-preemptive authentication (non-preemptive authentication) fails with proxy access to the server due to a design flaw in the HttpClient, which is modified later in the release.

Handling of problems encountered
Many problems, especially when the JVM is below 1.4, are caused by the installation of Jsse.
The following code can be used as the final detection method.

Import Java.io.BufferedReader;
Import Java.io.InputStreamReader;
Import Java.io.OutputStreamWriter;
Import Java.io.Writer;
Import Java.net.Socket;

Import Javax.net.ssl.SSLSocketFactory;

public class Test {

public static final String target_https_server = "www.verisign.com";
public static final int target_https_port = 443;

public static void Main (string[] args) throws Exception {

Socket socket = Sslsocketfactory.getdefault ().
Createsocket (Target_https_server, Target_https_port);
try {
Writer out = new OutputStreamWriter (
Socket.getoutputstream (), "iso-8859-1");
Out.write ("get/http/1.1\r\n");
Out.write ("Host:" + target_https_server + ":" +
Target_https_port + "\ r \ n");
Out.write ("agent:ssl-test\r\n");
Out.write ("\ r \ n");
Out.flush ();
BufferedReader in = new BufferedReader (
New InputStreamReader (Socket.getinputstream (), "iso-8859-1"));

String line = null;
while (line = In.readline ()) = null) {
System.out.println (line);
}
} finally {
Socket.close ();
}
}
}

11, httpclient multi-threaded processing

The main purpose of using multithreading is to implement parallel downloads. In the httpclient run process, each HTTP protocol method uses a httpconnection instance. Because a connection is a limited resource, each connection can only be used by one thread and method at a time, so you need to ensure that the connection is properly allocated when needed. HttpClient uses a JDBC connection pool-like approach to manage connections, and this management is done by Multithreadedhttpconnectionmanager.
Multithreadedhttpconnectionmanager ConnectionManager =
New Multithreadedhttpconnectionmanager ();
HttpClient client = new HttpClient (ConnectionManager);
This is where the client can be used to execute multiple methods in multiple threads. Each time the Httpclient.executemethod () method is called, the link Manager is requested to apply for a connection instance, the link instance is checked out (checkout), and the manager must be returned after the link has been used. Manager supports two settings: Maxconnectionsperhost maximum number of parallel links per host, default is 2
Maxtotalconnections Client Total Parallel link Maximum number, default is 20

When the manager reuses the link, it takes the way that the early restitution is first reused (least recently used approach).
Because the HttpClient program is used instead of the httpclient itself to read the body of the reply package, httpclient cannot decide what time the connection is no longer used, This also requires that you explicitly call Releaseconnection () manually after you have read the body of the reply package to release the requested link.
Multithreadedhttpconnectionmanager ConnectionManager = new Multithreadedhttpconnectionmanager ();
HttpClient client = new HttpClient (ConnectionManager);
...
In a thread.
GetMethod get = new GetMethod ("http://jakarta.apache.org/");
try {
Client.executemethod (get);
Print response to stdout
System.out.println (Get.getresponsebodyasstream ());
} finally {
Be sure the connection was released back to the connection
Manager
Get.releaseconnection ();
}
A method.releaseconnection () must be matched to each httpclient.executemethod.

12. http method

There are 8 types of HTTP methods supported by HttpClient, which are described below.

1. Options

The HTTP method options are used to send a request to the server, and you want to get the feature options that are available for the resource in request/reply that is flagged by the request URL (requests URL). In this way, the client can decide what actions and/or requirements to take on a resource before taking concrete action, or understand the functionality provided by the server. The most typical application of this method is to get what HTTP methods the server supports.
HttpClient has a class called Optionsmethod, to support this HTTP method, using this class of Getallowedmethods method, it can be very simple to implement the above typical application.

Optionsmethod options = new Optionsmethod ("http://jakarta.apache.org");
Execute the method and do the corresponding exception handling
...
Enumeration allowedmethods = Options.getallowedmethods ();
Options.releaseconnection ();

2. Get

The HTTP method get is used to retrieve any information (in the form of entities) of the request URI (Request-uri) flag, and the word "get" is meant to be "acquired". If the request URI points to a data processing process, the data generated by this procedure is returned in the form of an entity in the reply, rather than the return of the Code of the procedure.
If the HTTP package contains If-modifiedsince, If-unmodified-since, If-match, If-none-match, or If-range header fields, then get becomes "conditional get", That is, only entities that meet the criteria described in the above fields are retrieved, which can reduce some of the unnecessary network transmissions, or reduce multiple requests for a resource (such as the first check, the second download). (The General browser, there is a temporary directory, to cache some of the Web page information, when browsing a page again, only to download the modified content to speed up the browsing speed, that is the truth.) As for checking, it is often better to use head than get. If the HTTP package contains a range header field, the request URI specifies the entity in which only the part that determines the scope condition is taken back. (using a multi-threaded download tool for friends, it may be easier to understand this point)
The typical application of this method is to download the document from the Web server. HttpClient defines a class called GetMethod to support this method, using the GetMethod class Getresponsebody, Getresponsebodyasstream, or The getresponsebodyasstring function can take the information of a document (such as an HTML page) in the reply package body. Of these three functions, Getresponsebodyasstream is usually the best approach, mainly because it avoids caching all downloaded data before processing the downloaded document.

GetMethod get = new GetMethod ("http://jakarta.apache.org");
Executes the method and processes the failed request.
...
InputStream in = Get.getresponsebodyasstream ();
The input stream is used to process the information.

Get.releaseconnection ();

The most common improper use of getmethod is that it does not read all the data of the answering body. Also, be careful to explicitly release the link manually.

3. Head

The head method of HTTP is exactly the same as the Get method, except that the server cannot include the principal (Message-body) in the reply package and must not contain the principal. Using this method allows the customer to get some basic information about it without having to download the resource back. This method is commonly used to check the accessibility of hyperlinks and resources have recently been modified.
The most typical application of the head method of HTTP is to obtain basic information about the resource. HttpClient defines the Headmethod class to support this method, and the Headmethod class, like other *method classes, uses Getresponseheaders () to retrieve header information without its own special method.

Headmethod head = new Headmethod ("http://jakarta.apache.org");
Executes the method and processes the failed request.
...
Retrieves the header field information for the reply package.
header[] headers = head.getresponseheaders ();

Retrieve only the information for the last Modified Date field.
String lastmodified = Head.getresponseheader ("last-modified"). GetValue ();

4. Post

Post has the meaning of "residency" in English, and the HTTP method post is to require the server to accept the entity in the request package and use it as a subordinate resource for the request URI. Essentially, this means that the server is saving this entity information and is usually handled by a server-side program. The design intent of the Post method is to implement the following functions in a uniform way:
Commentary on existing resources

Publish information to a BBS, newsgroup, mailing list, or similar article group

Submit a piece of data to the data processing process

To extend a database by appending operations
These operations are expected to produce certain "side effects" on the server side, such as modifying the database.
HttpClient defines the Postmethod class to support the HTTP method, in HttpClient, there are two basic steps to using the POST method: Preparing the data for the request package, and then reading the information of the server's reply package. By calling the Setrequestbody () function to provide data for the request package, it can receive three types of parameters: an input stream, an array of name values, or a string. As for the reading of the reply packet, the method of calling getresponsebody* is the same as the method that the Get method handles the reply package.
The common problem is that you do not read all the answers (whether it is useful to the program) or do not release the linked resources.

HttpClient Simple Introduction

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.