|
Level: Advanced Fa Hua Jin, software engineer, ibm csdl, IBM Zhang hongchen, software engineer, ibm csdl, IBM November 10, 2005
Httpclient is a sub-project under Apache Jakarta common. It can be used to provide an efficient, up-to-date, and function-rich client programming toolkit that supports HTTP protocol, it also supports the latest HTTP Version and recommendations. This article first introduces httpclient, and then provides some solutions to common problems based on the author's actual work experience.
Httpclient Introduction HTTP protocol may be the most widely used and important protocol on the Internet. More and more Java applications need to directly access network resources through http protocol. Although JDK's java.net package provides basic functions for accessing the HTTP protocol, for most applications, the functions provided by the JDK library itself are not rich and flexible. Httpclient is a sub-project under Apache Jakarta common. It is used to provide an efficient, up-to-date, and function-rich client programming toolkit that supports http protocol. It also supports the latest versions and suggestions of HTTP protocol. Httpclient has been applied in many projects. For example, httpclient is used by two well-known open-source projects, cactus and htmlunit. For more applications that use httpclient, see container. The httpclient project is very active and many people are using it. Currently, the httpclient version is 3.0 RC4 released in 5.10.11.
Httpclient The main functions provided by httpclient are listed below. For more details, see the httpclient homepage.
- All HTTP methods (get, post, put, Head, etc.) are implemented)
- Automatic redirection is supported.
- Support for HTTPS
- Support for proxy servers
The following describes how to use these functions one by one. First, we must install httpclient.
- Httpclient can be downloaded at http://jakarta.apache.org/commons/httpclient/downloads.html
- Httpclient uses the sub-project logging under Apache Jakarta common. You can extract the commons-logging.jar from the downloaded package and add it to the classpath.
- Httpclient uses the sub-project codec under Apache Jakarta common, you can download from this address http://jakarta.apache.org/site/downloads/downloads_commons-codec.cgi to the latest Common codec, remove the commons-codec-1.x.jar from the downloaded package and add it to classpath
Use of httpclient basic functions Get Method The following six steps are required to use httpclient: 1. Create an httpclient instance 2. Create an instance of a certain connection method. Here is getmethod. Input the address to be connected in the getmethod constructor. 3. Call the execute method of the Instance created in step 1 to execute the Method Instance created in step 2. 4. Read response 5. Release the connection. The connection must be released no matter whether the execution method is successful or not. 6. process the obtained content Based on the above steps, we will compile the code that uses the get method to obtain the content of a webpage.
- In most cases, the default httpclient constructor is sufficient.
Httpclient = new httpclient (); |
- Create an instance of the get method. Enter the address to be connected in the get method constructor. Getmethod will automatically process the forwarding process. If you want to remove the automatic forwarding process, you can call setfollowredirects (false ).
Getmethod = new getmethod ("http://www.ibm.com /"); |
- Call the executemethod method of the Instance httpclient to execute getmethod. Because it is a program executed on the network, when running the executemethod method, you need to handle two exceptions: httpexception and ioexception. The first exception may be caused by incorrect input protocol when getmethod is constructed, for example, accidentally writing "HTTP" into "HTP", or abnormal content returned by the server, this exception is irrecoverable. The second exception is generally caused by network issues. For this exception (ioexception ), httpclient automatically tries to re-execute the executemethod Method Based on the recovery policy you specified. The restoration policy of httpclient can be customized (implemented through the implementation interface httpmethodretryhandler ). Use the httpclient method setparameter to set your implemented recovery policy. This document uses the default recovery policy provided by the system. This policy will automatically retry three times when a second type of exception occurs. The returned result of executemethod is an integer that indicates the status code returned by the server after the method is executed, the status code indicates whether the method is successfully executed, whether authentication is required, or whether the page jumps. By default, the getmethod instance automatically handles the jumps.
// Set the default recovery policy. In case of an exception, the system will automatically retry three times. Here, you can also set the custom restoration policy getmethod. getparams (). setparameter (httpmethodparams. retry_handler, new defaulthttpmethodretryhandler (); // execute getmethodint statuscode = client.exe cutemethod (getmethod); If (statuscode! = Httpstatus. SC _ OK) {system. Err. println ("method failed:" + getmethod. getstatusline ());} |
- After the returned status code is correct, you can obtain the content. There are three methods to get the content of the target address: first, getresponsebody, which returns the binary byte stream of the target; second, getresponsebodyasstring, which returns the string type, it is worth noting that the string returned by this method is encoded according to the system's default encoding method, so the returned string value may have an incorrect encoding type, this is described in detail in the "character encoding" section. The third is getresponsebodyasstream, which is the best method for transmitting a large amount of data in the target address. Here we use the simplest getresponsebody method.
Byte [] responsebody = method. getresponsebody (); |
- Release the connection. The connection must be released no matter whether the execution method is successful or not.
Method. releaseconnection (); |
- Processing content. In this step, content is processed according to your needs. In this example, the content is simply printed to the console.
System. Out. println (new string (responsebody )); |
The following is the complete code of the program, which can also be found in test. getsample in the attachment.
Package test; import Java. io. ioexception; import Org. apache. commons. httpclient. *; import Org. apache. commons. httpclient. methods. getmethod; import Org. apache. commons. httpclient. params. httpmethodparams; public class getsample {public static void main (string [] ARGs) {// construct the httpclient instance httpclient = new httpclient (); // create an instance of the get method getmethod = new getmethod ("http://www.ibm.com"); // use The default restoration policy getmethod. getparams (). setparameter (httpmethodparams. retry_handler, new defaulthttpmethodretryhandler (); try {// execute getmethod int statuscode = httpclient.exe cutemethod (getmethod); If (statuscode! = Httpstatus. SC _ OK) {system. err. println ("method failed:" + getmethod. getstatusline ();} // read the content byte [] responsebody = getmethod. getresponsebody (); // process the content system. out. println (new string (responsebody);} catch (httpexception e) {// a fatal exception may occur, either because the protocol is incorrect or the returned content is faulty. out. println ("please check your provided HTTP address! "); E. printstacktrace ();} catch (ioexception e) {// network exception E. printstacktrace ();} finally {// release the connection getmethod. releaseconnection ();}}} |
POST method According to rfc2616, post is interpreted as follows: the POST method is used to send a request to the target server, requiring it to accept the entity attached to the request and treat it as a request queue (request-line) add a new sub-item to the resource specified by the request URI in. Post is designed to implement the following functions in a uniform way:
- Annotation of existing resources)
- Send messages to bulletin boards, newsgroups, email lists, or similar discussion groups
- Submit data blocks. For example, submit the form results to the data processing process.
- Expand the database through additional operations
Calling postmethod in httpclient is similar to getmethod. Except for the difference between the postmethod instance and getmethod, the remaining steps are similar. In the following example, the same steps as getmethod are omitted. The steps are described only in different places and described in the example of logging on to the BBS of Tsinghua University.
- The steps before constructing postmethod are the same. Like getmethod, constructing postmethod requires a URI parameter. In this example, the logon address is http://www.newsmth.net/bbslogin2.php. After the postmethod instance is created, you need to fill in the Form value for the method instance. In the BBS login form, there must be two fields. The first one is the user name (domain name is ID ), the second is the password (the domain name is passwd ). The fields in the form are represented by namevaluepair. The first parameter of the constructor of this class is the domain name, and the second parameter is the value of this field. The setrequestbody method is used to set all the values in the form to postmethod. In addition, after successfully logging on to the BBS, it will switch to another page. However, httpclient does not support automatic forwarding for requests that require subsequent services, such as post and put, therefore, you must handle the page redirection. For details about page redirection, see the "Automatic redirection" section below. The Code is as follows:
String url = "http://www.newsmth.net/bbslogin2.php"; postmethod = new postmethod (URL); // enter the value of each form field namevaluepair [] DATA = {New namevaluepair ("ID ", "youusername"), new namevaluepair ("passwd", "yourpwd")}; // put the form value in postmethod. setrequestbody (data); // execute postmethodint statuscode = httpclient.exe cutemethod (postmethod); // httpclient requests that require receiving the next service, such as post and put cannot automatically process forwarding // 301 or 302if (statu Scode = httpstatus. SC _moved_permanently | statuscode = httpstatus. SC _moved_temporarily) {// retrieve the redirection Address Header locationheader = postmethod. getResponseHeader ("location"); string location = NULL; If (locationheader! = NULL) {location = locationheader. getvalue (); system. out. println ("the page was redirected to:" + location);} else {system. err. println ("location field value is null. ");} return ;} |
For complete program code, see test. postsample in the attachment.
Some common problems during httpclient usage The following describes some common problems when using httpclient. Character encoding The encoding of a target page may appear in two places. The first place is the HTTP header returned by the server, and the other is the HTML/XML page.
- The Content-Type field in the HTTP header may contain character encoding information. For example, the returned header may contain such information: Content-Type: text/html; charset = UTF-8. This header indicates that the page is encoded as a UTF-8, but the header information returned by the server may not match the content. For example, for some double byte countries, the server may return the encoding type is UTF-8, but the real content is not UTF-8 encoding, so you need to get the page encoding information in another place; but if the server returns a code that is not a UTF-8, but a specific code, such as gb2312, the server may return the correct encoding information. You can use the getresponsecharset () method of the method object to obtain the encoding information in the HTTP header.
- For files such as XML or HTML, the author is allowed to specify the encoding type directly on the page. For example, a tag such as <meta http-equiv = "Content-Type" content = "text/html; charset = gb2312"/> or <? XML version = "1.0" encoding = "gb2312"?> In such cases, tags may conflict with the encoding information returned in the HTTP header. You need to determine whether the encoding type is actually a real encoding.
Automatic Steering According to rfc2616's definition of automatic steering, there are two main types: 301 and 302. 301 indicates permanent removal (moved permanently). When 301 is returned, it indicates that the requested resource has been moved to a fixed new place, any request initiated to this address will be forwarded to the new address. 302 indicates temporary redirection. For example, if the server-side servlet program calls the sendredirect method, the client will get a 302 code, in this case, the Location Value in the header information returned by the server is the destination address of the sendredirect redirection. Httpclient supports automatic redirection, but requests for subsequent services such as post and put are not supported at the moment, therefore, if 301 or 302 is returned after the POST method is submitted, you must handle it yourself. As shown in the postmethod example above: If you want to enter the page after logging on to BBS, you must re-initiate the login request. The request address can be obtained in the header field location. However, it should be noted that sometimes location may return relative paths, so you need to process the value returned by location to initiate a request to the new address. Besides the information contained in the header, page redirection may also occur on the page. The label that causes automatic page Forwarding is: <meta http-equiv = "refresh" content = "5; url = http://www.ibm.com/us">. If you want to handle this situation in the program, you have to analyze the page to achieve the redirection. Note that the URL value in the tag above can also be a relative address. If so, you need to process it before forwarding. Process HTTPS protocol Httpclient provides SSL support. JSSE must be installed before using SSL. In Versions later than sun 1.4, JSSE has been integrated into JDK. If you are using a version earlier than jdk1.4, JSSE must be installed. Different JSSE manufacturers have different implementations. The following describes how to use httpclient to open an HTTPS connection. There are two ways to enable the HTTPS connection. The first is to obtain the certificate issued by the server and import it to the local keystore; another way is to automatically accept certificates by extending the httpclient class. Method 1: obtain the certificate and import the local keystore:
- Install JSSE (skip this step if you are using JDK 1.4 or later ). This document uses IBM's JSSE as an example. Download the JSSE installation package from the IBM website. Decompress the package and copy the ibmjsse. jar package to the <Java-Home>/lib/EXT/directory.
- Obtain and import the certificate. The certificate can be obtained through ie:
1. Use IE to open the https url to be connected. The following dialog box is displayed: 2. Click "view Certificate", select "details" in the pop-up dialog box, and click "Copy to file" to generate a certificate file to access the Web Page Based on the Wizard provided. 3. Step 1 of The Wizard. On the welcome page, click "Next ", 4. in step 2 of the wizard, select the exported file format. By default, click "Next ", 5. Step 3 of The Wizard: Enter the exported file name, enter the exported file name, and click "Next ", 6. Step 4 of the wizard, click "finish" to complete the wizard 7. The last dialog box is displayed, indicating that the export is successful.
Use the keytool tool to import the exported certificate to the local keystore. Run the keytool command in <Java-Home>/bin/, open the command line window, and run the following command in the <Java-Home>/lib/security/directory:
Keytool-import-noprompt-keystore cacerts-storepass changeit-alias yourentry1-file your. Cer |
The alias parameter is followed by the unique identifier of the current certificate in the keystore, but the case is case insensitive. The parameter file is followed by the path and file name of the certificate exported through IE; to delete the certificate just imported to the keystore, run the following command:
Keytool-delete-keystore cacerts-storepass changeit-alias yourentry1 |
- Write the program to access the HTTPS address. To test whether the request can be connected to https, you just need to change the getsample example to an HTTPS address.
Getmethod = new getmethod ("https://www.yourdomain.com "); |
Problems that may occur when running the program: 1. Thrown exception java.net. socketexception: algorithm SSL not available. This exception may occur because jsseprovider is not added. If an ibm jsse provider is used, add the following line to the program:
If (Security. getprovider ("com. IBM. JSSE. ibmjsseprovider") = NULL) Security. addprovider (New ibmjsseprovider ()); |
Alternatively, you can open <Java-Home>/lib/security/Java. Security.
Security. provider.1 = sun. Security. provider. sunsecurity. provider.2 = com. IBM. crypto. provider. ibmjce |
Add security. provider.3 = com. IBM. JSSE. ibmjsseprovider 2. Thrown exception java.net. socketexception: SSL implementation not available. This exception may occur because you have not copied ibmjsse. jar to the <Java-Home>/lib/EXT/directory. 3. thrown an exception javax.net. SSL. sslhandshakeexception: Unknown certificate. This exception indicates that your JSSE has been correctly installed, but it may be because you have not imported the certificate into the keystore currently running JRE. Please follow the steps described above to import your certificate.
Method 2: Expand the httpclient class to automatically accept certificates Because this method automatically receives all certificates, there are certain security issues, so please carefully consider your system security requirements before using this method. The procedure is as follows:
- Provides a custom socket Factory (test. mysecureprotocolsocketfactory ). This custom class must implement the org. apache. commons. httpclient. protocol. secureprotocolsocketfactory, which calls the custom x509trustmanager (test. myx509trustmanager). You can obtain
- Create an org. Apache. commons. httpclient. Protocol. protocol instance and specify the protocol name and default port number.
Protocol myhttps = new protocol ("HTTPS", new mysecureprotocolsocketfactory (), 443 ); |
- Register the created HTTPS protocol object
Protocol. registerprotocol ("HTTPS", myhttps ); |
- Then open the HTTPS target address in normal programming mode. For the code, see test. nocertifhtthttpsgetsample.
Processing Proxy Server It is very easy to use the proxy server in httpclient. You can call the setproxy method in httpclient. The first parameter of the method is the proxy server address, and the second parameter is the port number. In addition, httpclient also supports socks proxy.
Httpclient. gethostconfiguration (). setproxy (hostname, Port ); |
Conclusion From the above introduction, we can know that httpclient supports the HTTP protocol very well and is easy to use. It provides fast version updates and powerful functions, with sufficient flexibility and scalability. Httpclient is a rare tool for programmers who want to directly access HTTP resources in Java applications. References
- Commons logging includes various log API implementations. You can go to the site http://jakarta.apache.org/commons/logging/to get detailed information.
- Commons codec contains some general decoding/Encoding algorithms. Including voice encoding, hexadecimal, base64, and URL encoding. Details can be found at http://jakarta.apache.org/commons/codec/
- Rfc2616 is a document about HTTP/1.1.
- SSL-SSL was developed by Netscape Communications Corporation in 1994, and TLS V1.0 is a standard defined by Internet Engineering Task Force (IETF). It is based on SSL V3.0, in addition, the encryption algorithm used is somewhat different from that used. For example, SSL uses the message authentication code (MAC) algorithm to generate the integrity check value, while TLS uses the Hashing for message authentication code (HMAC) algorithm of the key.
- Ibm jsse provides Java implementation of SSL (Secure Sockets Layer) and TLS (Transport Layer Security ).
- Keytool is a tool for managing keys and certificates. For detailed usage information, see http://www.doc.ic.ac.uk/csg/java/1.3.1docs/tooldocs/solaris/keytool.html.
- The home page of httpclient is response.
Author Profile
|
|
|
Kingfa Hua is a software engineer working at ibm csdl. He enjoys researching various new technologies and has experience in Java Network Development and web development. |
|
|
|
Chen zhanghong is a software engineer of ibm csdl and is currently engaged in the development of enterprise e-commerce applications. |
|