1. About httpclient
Httpclient is a sub-project under Apache Jakarta common. It can be used to provide an efficient, up-to-date, and function-rich client programming toolkit that supports HTTP protocol, it also supports the latest HTTP Version and recommendations. This article first introduces httpclient, and then provides some solutions to common problems based on the author's actual work experience. HTTP protocol may be the most widely used and important protocol on the Internet. More and more Java applications need to directly access network resources through http protocol. Although the JDK java.net package provides
The basic functions of the protocol, but for most applications, the functions provided by the JDK library are not rich and flexible. Httpclient is a sub-project under Apache Jakarta common. It is used to provide an efficient, up-to-date, and function-rich client programming toolkit that supports http protocol. It also supports the latest versions and suggestions of HTTP protocol. Httpclient has been applied to many projects. For example, the other two open-source projects cactus and htmlunit on Apache Jakarta both use httpclient. Now the latest version of httpclient is httpclient.
4.0-beta2
2. httpclient features
The main functions provided by httpclient are listed below. For more details, see the httpclient homepage.
(1) Implement all HTTP methods (get, post, put, Head, etc)
(2) Support for automatic steering
(3) Support for HTTPS
(4) Support for proxy servers
3. Use of httpclient basic functions
(1) Get Method
The following six steps are required to use httpclient:
1. Create an httpclient instance
2. Create an instance of a certain connection method. Here is getmethod. Input the address to be connected in the getmethod constructor.
3. Call the execute method of the Instance created in step 1 to execute the Method Instance created in step 2.
4. Read response
5. Release the connection. The connection must be released no matter whether the execution method is successful or not.
6. process the obtained content
Based on the above steps, we will compile the code that uses the get method to obtain the content of a webpage.
In most cases, the default httpclient constructor is sufficient. Httpclient = new httpclient ();
Create an instance of the get method. Enter the address to be connected in the get method constructor. Getmethod will automatically process the forwarding process. If you want to remove the automatic forwarding process, you can call setfollowredirects (false ). Getmethod = new getmethod (".....");
Call the executemethod method of the Instance httpclient to execute getmethod. Because it is a program executed on the network, when running the executemethod method, you need to handle two exceptions: httpexception and ioexception. The first exception may be caused by incorrect input protocol when getmethod is constructed, for example, accidentally writing "HTTP" into "HTP", or abnormal content returned by the server, this exception is irrecoverable. The second exception is generally caused by network issues. For this exception (ioexception ), httpclient automatically tries to re-execute the executemethod Method Based on the recovery policy you specified. The restoration policy of httpclient can be customized (implemented through the implementation interface httpmethodretryhandler ). Use the httpclient method setparameter to set your implemented recovery policy. This document uses the default recovery policy provided by the system. This policy will automatically retry three times when a second type of exception occurs. The returned result of executemethod is an integer that indicates the status code returned by the server after the method is executed, the status code indicates whether the method is successfully executed, whether authentication is required, or whether the page jumps. By default, the getmethod instance automatically handles the jumps.
// Set the default recovery policy. When an exception occurs, the system automatically retries three times. Here, you can set a custom recovery policy.
Getmethod. getparams (). setparameter (httpmethodparams. retry_handler,
New defaulthttpmethodretryhandler ());
// Execute getmethod
Int statuscode = client.exe cutemethod (getmethod );
If (statuscode! = Httpstatus. SC _ OK ){
System. Err. println ("method failed:" + getmethod. getstatusline ());
}
After the returned status code is correct, you can obtain the content. There are three methods to get the content of the target address: first, getresponsebody, which returns the binary byte stream of the target; second, getresponsebodyasstring, which returns the string type, it is worth noting that the string returned by this method is encoded according to the system's default encoding method, so the returned string value may have an incorrect encoding type, this is described in detail in the "character encoding" section. The third is getresponsebodyasstream, which is the best method for transmitting a large amount of data in the target address. Here we use the simplest getresponsebody method.
Byte [] responsebody = method. getresponsebody ();
Release the connection. The connection must be released no matter whether the execution method is successful or not. Method. releaseconnection ();
Processing content. In this step, content is processed according to your needs. In this example, the content is simply printed to the console. System. Out. println (new string (responsebody ));
The following is the complete code of the program, which can also be found in test. getsample in the attachment.
Package test;
Import java. Io. ioexception;
Import org. Apache. commons. httpclient .*;
Import org. Apache. commons. httpclient. Methods. getmethod;
Import org. Apache. commons. httpclient. Params. httpmethodparams;
Public class getsample {
Public static void main (string [] ARGs ){
// Construct an httpclient instance
Httpclient = new httpclient ();
// Create an instance of the get Method
Getmethod = new getmethod ("...");
// Use the default recovery policy provided by the system
Getmethod. getparams (). setparameter (httpmethodparams. retry_handler,
New defaulthttpmethodretryhandler ());
Try {
// Execute getmethod
Int statuscode = httpclient.exe cutemethod (getmethod );
If (statuscode! = Httpstatus. SC _ OK ){
System. Err. println ("method failed :"
+ Getmethod. getstatusline ());
}
// Read content
Byte [] responsebody = getmethod. getresponsebody ();
// Process content
System. Out. println (new string (responsebody ));
} Catch (httpexception e ){
// A fatal exception may occur because the protocol is incorrect or the returned content is incorrect.
System. Out. println ("please check your provided HTTP address! ");
E. printstacktrace ();
} Catch (ioexception e ){
// Network exception
E. printstacktrace ();
} Finally {
// Release the connection
Getmethod. releaseconnection ();
}
}
}
(2) Post Method
According to rfc2616, post is interpreted as follows: the POST method is used to send a request to the target server, requiring it to accept the entity attached to the request and treat it as a request queue (request-line) add a new sub-item to the resource specified by the request URI in. Post is designed to implement the following functions in a uniform way:
Annotation of existing resources)
Send messages to bulletin boards, newsgroups, email lists, or similar discussion groups
Submit data blocks. For example, submit the form results to the data processing process.
Expand the database through additional operations
Calling postmethod in httpclient is similar to getmethod. Except for the difference between the postmethod instance and getmethod, the remaining steps are similar. In the following example, the same steps as getmethod are omitted. The steps are described only in different places and described in the example of logging on to the BBS of Tsinghua University.
The steps before constructing postmethod are the same. Like getmethod, constructing postmethod requires a URI parameter. After the postmethod instance is created, you need to fill in the Form value for the method instance. In the BBS login form, there must be two fields. The first one is the user name (domain name is ID ), the second is the password (the domain name is passwd ). The fields in the form are represented by namevaluepair. The first parameter of the constructor of this class is the domain name, and the second parameter is the value of this field. The setrequestbody method is used to set all the values in the form to postmethod. In addition, after successfully logging on to the BBS, it will switch to another page. However, httpclient does not support automatic forwarding for requests that require subsequent services, such as post and put, therefore, you must handle the page redirection. For details about page redirection, see the "Automatic redirection" section below. The Code is as follows:
String url = "....";
Postmethod = new postmethod (URL );
// Fill in the values of various form fields
Namevaluepair [] DATA = {New namevaluepair ("ID", "youusername "),
New namevaluepair ("passwd", "yourpwd ")};
// Put the form value in postmethod
Postmethod. setrequestbody (data );
// Execute postmethod
Int statuscode = httpclient.exe cutemethod (postmethod );
// Httpclient cannot automatically handle forwarding requests that require receiving successor services, such as post and put.
// 301 or 302
If (statuscode = httpstatus. SC _moved_permanently |
Statuscode = httpstatus. SC _moved_temporarily ){
// Retrieve the redirection address from the beginning
Header locationheader = postmethod. getResponseHeader ("location ");
String location = NULL;
If (locationheader! = NULL ){
Location = locationheader. getvalue ();
System. Out. println ("the page was redirected to:" + location );
} Else {
System. Err. println ("location field value is null .");
}
Return;
}
[Edit this section]
4. FAQs about httpclient
The following describes some common problems when using httpclient.
Character encoding
The encoding of a target page may appear in two places. The first place is the HTTP header returned by the server, and the other is the HTML/XML page.
The Content-Type field in the HTTP header may contain character encoding information. For example, the returned header may contain such information: Content-Type: text/html; charset = UTF-8. This header indicates that the page is encoded as a UTF-8, but the header information returned by the server may not match the content. For example, for some double byte countries, the server may return the encoding type is UTF-8, but the real content is not UTF-8 encoding, so you need to get the page encoding information in another place; but if the server returns a code that is not a UTF-8, but a specific code, such as gb2312, the server may return the correct encoding information. You can use the getresponsecharset () method of the method object to obtain the encoding information in the HTTP header.
For files such as XML or HTML, the author is allowed to specify the encoding type directly on the page. For example, a tag such as <meta http-equiv = "Content-Type" content = "text/html; charset = gb2312"/> or <? XML version = "1.0" encoding = "gb2312"?> In such cases, tags may conflict with the encoding information returned in the HTTP header. You need to determine whether the encoding type is actually a real encoding.
Automatic Steering
According to rfc2616's definition of automatic steering, there are two main types: 301 and 302. 301 indicates permanent removal (moved permanently). When 301 is returned, it indicates that the requested resource has been moved to a fixed new place, any request initiated to this address will be forwarded to the new address. 302 indicates temporary redirection. For example, if the server-side servlet program calls the sendredirect method, the client will get a 302 code, in this case, the Location Value in the header information returned by the server is the destination address of the sendredirect redirection.
Httpclient supports automatic redirection, but requests for subsequent services such as post and put are not supported at the moment, therefore, if 301 or 302 is returned after the POST method is submitted, you must handle it yourself. As shown in the postmethod example above: If you want to enter the page after logging on to BBS, you must re-initiate the login request. The request address can be obtained in the header field location. However, it should be noted that sometimes location may return relative paths, so you need to process the value returned by location to initiate a request to the new address.
Besides the information contained in the header, page redirection may also occur on the page. The tag that causes automatic page Forwarding is <meta http-equiv = "refresh" content = "5; url =...">. If you want to handle this situation in the program, you have to analyze the page to achieve the redirection. Note that the URL value in the tag above can also be a relative address. If so, you need to process it before forwarding.
Process HTTPS protocol
Httpclient provides SSL support. JSSE must be installed before using SSL. In Versions later than sun 1.4, JSSE has been integrated into JDK. If you are using a version earlier than jdk1.4, JSSE must be installed. Different JSSE manufacturers have different implementations. The following describes how to use httpclient to open an HTTPS connection. There are two ways to enable the HTTPS connection. The first is to obtain the certificate issued by the server and import it to the local keystore; another way is to automatically accept certificates by extending the httpclient class.
Method 1: obtain the certificate and import the local keystore:
Install JSSE (skip this step if you are using JDK 1.4 or later ). This document uses IBM's JSSE as an example. Download the JSSE installation package from the IBM website. Decompress the package and copy the ibmjsse. jar package to the <Java-Home> \ Lib \ ext \ directory.
Obtain and import the certificate. The certificate can be obtained through ie:
1. Use IE to open the https url to be connected. The following dialog box is displayed:
2. Click "view Certificate", select "details" in the pop-up dialog box, and click "Copy to file" to generate a certificate file to access the Web Page Based on the Wizard provided.
3. Step 1 of The Wizard. On the welcome page, click "Next ",
4. in step 2 of the wizard, select the exported file format. By default, click "Next ",
5. Step 3 of The Wizard: Enter the exported file name, enter the exported file name, and click "Next ",
6. Step 4 of the wizard, click "finish" to complete the wizard
7. The last dialog box is displayed, indicating that the export is successful.
Use the keytool tool to import the exported certificate to the local keystore. The keytool command is in <Java-Home> \ bin \, open the command line window, and run the following command in the <Java-Home> \ Lib \ SECURITY \ directory:
Keytool-import-noprompt-keystore cacerts-storepass changeit-alias yourentry1-file your. Cer
The alias parameter is followed by the unique identifier of the current certificate in the keystore, but the case is case insensitive. The parameter file is followed by the path and file name of the certificate exported through IE; to delete the certificate just imported to the keystore, run the following command:
Keytool-delete-keystore cacerts-storepass changeit-alias yourentry1
Write the program to access the HTTPS address. To test whether the request can be connected to https, you just need to change the getsample example to an HTTPS address.
Getmethod = new getmethod ("your url ");
Problems that may occur when running the program:
1. Thrown exception java.net. socketexception: algorithm SSL not available. This exception may occur because jsseprovider is not added. If an ibm jsse provider is used, add the following line to the program:
If (Security. getprovider ("com. IBM. JSSE. ibmjsseprovider") = NULL)
Security. addprovider (New ibmjsseprovider ());
Alternatively, you can open <Java-Home> \ Lib \ SECURITY \ Java. Security.
Security. provider.1 = sun. Security. provider. Sun
Security. provider.2 = com. IBM. crypto. provider. ibmjce
Add security. provider.3 = com. IBM. JSSE. ibmjsseprovider
2. Thrown exception java.net. socketexception: SSL implementation not available. This exception may occur because you did not copy ibmjsse. jar to the <Java-Home> \ Lib \ ext \ directory.
3. thrown an exception javax.net. SSL. sslhandshakeexception: Unknown certificate. This exception indicates that your JSSE has been correctly installed, but it may be because you have not imported the certificate into the keystore currently running JRE. Please follow the steps described above to import your certificate.
Method 2: Expand the httpclient class to automatically accept certificates
Because this method automatically receives all certificates, there are certain security issues, so please carefully consider your system security requirements before using this method. The procedure is as follows:
Provides a custom socket Factory (test. mysecureprotocolsocketfactory ). This custom class must implement the org. apache. commons. httpclient. protocol. secureprotocolsocketfactory, which calls the custom x509trustmanager (test. myx509trustmanager). You can obtain
Create an org. apache. commons. httpclient. protocol. protocol instance. Specify the protocol name and default port number protocol myhttps = new protocol ("HTTPS", new mysecureprotocolsocketfactory (), 443 );
Register the created HTTPS protocol object protocol. registerprotocol ("HTTPS", myhttps );
Then open the HTTPS target address in normal programming mode. For the code, see test. nocertifhtthttpsgetsample.
[Edit this section]
5. process the Proxy Server
It is very easy to use the proxy server in httpclient. You can call the setproxy method in httpclient. The first parameter of the method is the proxy server address, and the second parameter is the port number. In addition, httpclient also supports socks proxy.
Httpclient. gethostconfiguration (). setproxy (hostname, Port );