Use httpclient to simulate the get and post browsers

Source: Internet
Author: User
Tags http authentication


Generally, we use the IE or Navigator browser to access a Web server, which is used to view information on the page or submit some data. Only
Only some common pages, some can be used only after users log on, or authentication and encrypted transmission, such as HTTPS. Currently, the browser we use does not construct
Problem. However, in some cases, you may need to access such pages through programs, for example, "steal" some data from others' webpages, or use the pages provided by some sites to complete certain functions, for example, we
If we want to know the location of a mobile phone number and we do not have such data, we have to use the existing websites of other companies to complete this function, at this time, we need to submit the mobile phone number to the webpage and
Parse the desired data on the page. If the other side is just a very simple page, our program will be very simple, and there is no need to put a bid on it here. But considering some services
For Authorization issues, the pages provided by many companies are not accessible through a simple URL, but must be registered and then logged on before using the pages that provide services, this involves
Cookie processing. We know that currently popular dynamic web technologies such as ASP and JSP do not process session information through cookies. To enable our programs to use what others provide
The service page requires the program to log on to the service page first, and then access the service page. In this process, you need to process the cookie by yourself. Think about it when you use java.net. httpurlconnection
How terrible it is to complete these functions! Besides, this is just a common "stubborn" in what we call stubborn Web servers "! Is it like uploading files over HTTP? No header required
Pain points: these problems can be easily solved with "it!

We cannot list all possible stubbornness. We will deal with several of the most common problems. Of course, as mentioned earlier
If we use java.net. httpurlconnection to solve these problems by ourselves, we should first introduce the next open source.
Code project. This project is httpclient in the Apache open-source organization. It is affiliated to the Jakarta Commons project. The current version is 2.0rc2.
There is already a net sub-project under commons, but the httpclient is proposed separately. It can be seen that access to the HTTP server is not easy.

  
The commons-httpclient project is specially designed to simplify communication programming between HTTP clients and servers. It can be used to easily solve the previous headache. For example
If you no longer care about HTTP or HTTPS communication, tell it that you want to use HTTPS. The rest will be done by httpclient. This article will focus on
Several problems frequently encountered in HTTP client programs are described separately how to use httpclient to solve them. In order to make readers more familiar with this project, we first give a simple
Example to read the content of a web page, and then gradually solve all the problems in the forward.

1. Read webpage (HTTP/https) Content

The following is a simple example to access a page.

/**//*
* Created on 2003-12-14 by liudong
*/

Package HTTP. Demo;
Import java. Io. ioexception;
Import org. Apache. commons. httpclient .*;
Import org. Apache. commons. httpclient. Methods .*;
/***//**
* The simplest HTTP client is used to demonstrate how to access a page through get or post.
* @ Author liudong
*/
Public class simpleclient ...{
Public static void main (string [] ARGs) throws ioexception
...{
Httpclient client = new httpclient ();
// Set the proxy server address and port
// Client. gethostconfiguration (). setproxy ("proxy_host_addr", proxy_port );
// Use the get method. If the server needs to connect over https, you only need to replace the HTTP in the following URL with HTTPS
Httpmethod method = new getmethod ("http://java.sun.com ";);
// Use the post Method
// Httpmethod method = new postmethod ("http://java.sun.com ";);
Client.exe cutemethod (method );
// Print the status returned by the server
System. Out. println (method. getstatusline ());
// Print the returned information
System. Out. println (method. getresponsebodyasstring ());
// Release the connection
Method. releaseconnection ();
}
}

 
In this example, first create an HTTP client (httpclient) instance, then select the submit method as get or post, and finally on the httpclient instance
Execute the submission method, and finally read the results returned by the server from the selected submission method. This is the basic process of using httpclient. In fact, you can use a line of code to handle the entire request.
The process is very simple!

2. Submit parameters to the webpage in get or post Mode

In fact, in the previous simplest example, we have already introduced
How to request a page using the get or POST method? This section differs from the setting of the parameters required for the page when the page is submitted. We know that if it is a GET request method, then all parameters are directly put
The URL to the page is separated by a question mark with the page address, each parameter is separated with &, for example: http://java.sun.com? Name = liudong &
Amp; mobile = 123456, but it is a little troublesome to use the post method. The example in this section shows how to query the city where the mobile phone number is located. The Code is as follows:

/**//*
* Created on 2003-12-7 by liudong
*/
Package HTTP. Demo;
Import java. Io. ioexception;
Import org. Apache. commons. httpclient .*;
Import org. Apache. commons. httpclient. Methods .*;
/***//**
* Parameter submission demo
* This program is connected to a page used to query the location of the mobile phone number.
* Query the province and city where code segment 1330227 is located.
* @ Author liudong
*/

Public class simplehttpclient ...{
Public static void main (string [] ARGs) throws ioexception
...{
Httpclient client = new httpclient ();
Client. gethostconfiguration (). sethost ("www.imobile.com.cn", 80, "HTTP ");
Httpmethod method = getpostmethod (); // use post to submit data
Client.exe cutemethod (method );
// Print the status returned by the server
System. Out. println (method. getstatusline ());
// Print the result page
String response = new string (method. getresponsebodyasstring (). getbytes ("8859_1 "));
// Print the returned information
System. Out. println (response );
Method. releaseconnection ();
}
/***//**
* Use get to submit data
* @ Return
*/
Private Static httpmethod getgetmethod ()...{
Return new getmethod ("/simcard. php? Simcard = 1330227 ");
}
/***//**
* Use post to submit data
* @ Return
*/
Private Static httpmethod getpostmethod ()...{
Postmethod post = new postmethod ("/simcard. php ");
Namevaluepair simcard = new namevaluepair ("simcard", "1330227 ");
Post. setrequestbody (New namevaluepair []... {simcard });
Return post;
}
}

 
In the preceding example, the page is http://www.imobile.com.cn/simcard.php?sisiis simcard, and the value is the mobile phone number.
The first seven digits of the mobile phone number. The server returns the province, city, and other details of the submitted mobile phone number. To submit a GET request, you only need to add parameter information after the URL, while post requires
You need to use the namevaluepair class to set the parameter name and its corresponding value.

3. Process page redirection

In JSP/Servlet
In programming, the response. sendredirect method uses the redirection mechanism in HTTP. It works with <JSP: Forward
...> The difference is that the latter implements page Jump on the server, that is, the application container loads the content of the page to be redirected and returns it to the client. The former returns a status code, possible Status Codes
The values are shown in the following table. Then, the client reads the URL of the page to be redirected and reloads the new page. It is such a process, so we need to pass
The httpmethod. getstatuscode () method determines whether the returned value is a value in the following table to determine whether a jump is required. If you have confirmed that you need to jump to the page, you can
To obtain the new address by reading the location attribute in the HTTP header.

Status Code
Constant corresponding to httpservletresponse
Detailed description

301
SC _moved_permanently
The page has been permanently moved to another address.

302
SC _moved_temporarily
The page is temporarily moved to another address.

303
SC _see_other
The client request address must be accessed through another URL

307
SC _temporary_redirect
Same as SC _moved_temporarily

The following code snippet demonstrates how to handle page redirection.
Client.exe cutemethod (post );
System. Out. println (post. getstatusline (). tostring ());
Post. releaseconnection ();
// Check for redirection
Int statuscode = post. getstatuscode ();
If (statuscode = httpstatus. SC _moved_temporarily) |
(Statuscode = httpstatus. SC _moved_permanently) |
(Statuscode = httpstatus. SC _see_other) |
(Statuscode = httpstatus. SC _temporary_redirect ))
... {// Read the new URL
Header header = post. getResponseHeader ("location ");
If (header! = NULL )...{
String newuri = header. getvalue ();
If (newuri = NULL) | (newuri. Equals ("")))
Newuri = "/";
Getmethod redirect = new getmethod (newuri );
Client.exe cutemethod (redirect );
System. Out. println ("Redirect:" + redirect. getstatusline (). tostring ());
Redirect. releaseconnection ();
} Else ...{
System. Out. println ("invalid redirect ");
} We can compile two JSP pages on our own, one of which is redirected to another using the response. sendredirect method to test the above example.

4. Enter the user name and password to log on.

 
This section is supposed to be the most common problem in HTTP client programming. The content of many websites is only visible to registered users, in this case, you must use the correct user name and password to log on successfully,
To browse the desired page. Because the HTTP protocol is stateless, that is, the connection validity period is limited to the current request, and the connection is closed after the request content ends. In this case, in order to save the user's login letter
Information must use the cookie mechanism. Take JSP/Servlet as an example. When a browser requests a JSP or servlet page, the application server returns a parameter named
JSESSIONID (varies with application servers). The value is a long and unique string cookie. This string value is the session ID currently accessing the site. Each access through the browser
Cookie information such as JSESSIONID must be included on other pages of the site. The application server obtains the corresponding session information based on the session ID.

  
For websites that require user logon, user data is generally stored in the session of the server after the user logs in successfully. In this way, when other pages are accessed, the application server sends cookies Based on the browser.
Read the session ID of the current request to obtain the corresponding session information. Then, you can determine whether user data exists in the session information. If so, you can access the page, otherwise, you must use
Enter the account and password to log on. This is a common way to use JSP to develop websites to process user logon.

In this way, for HTTP clients, such
If you want to access a protected page, you must simulate the work done by the browser. First, you must request to log on to the page and then read the cookie value; request the logon page again and add each parameter required for the logon page;
The final page required by the request. Of course, cookie information must be included in all requests except for the first request so that the server can determine whether the current request has passed verification. So much,
If you use httpclient, you don't even need to add a line of code. You just need to pass the login information to execute the login process, and then directly access the desired page to access a common page.
There is no difference, because the httpclient class has already done everything for you. It's great! The following example implements such an access process.

/**//*
* Created on 2003-12-7 by liudong
*/
Package HTTP. Demo;
Import org. Apache. commons. httpclient .*;
Import org. Apache. commons. httpclient. Cookie .*;
Import org. Apache. commons. httpclient. Methods .*;
/***//**
* Shows an example of a logon form.
* @ Author liudong
*/
Public class formlogindemo ...{
Static final string logon_site = "localhost ";
Static final int maid = 8080;
Public static void main (string [] ARGs) throws exception ...{
Httpclient client = new httpclient ();
Client. gethostconfiguration (). sethost (logon_site, logon_port );
// Simulate the login. jsp-> main. jsp
Postmethod post = new postmethod ("/Main. jsp ");
Namevaluepair name = new namevaluepair ("name", "LD ");
Namevaluepair pass = new namevaluepair ("password", "LD ");
Post. setrequestbody (New namevaluepair []... {name, pass });
Int status = client.exe cutemethod (post );
System. Out. println (post. getresponsebodyasstring ());
Post. releaseconnection ();
// View Cookie Information
Cookiespec = cookiepolicy. getdefaspec SPEC ();
Cookie [] cookies = cookiespec. Match (logon_site, logon_port, "/", false, client. getstate (). getcookies ());
If (cookies. Length = 0 )...{
System. Out. println ("NONE ");
} Else ...{
For (INT I = 0; I <cookies. length; I ++ )...{
System. Out. println (Cookies [I]. tostring ());
}
}
// Access the required page main2.jsp
Getmethod get = new getmethod ("/main2.jsp ");
Client.exe cutemethod (get );
System. Out. println (get. getresponsebodyasstring ());
Get. releaseconnection ();
}
}

5. Submit XML format parameters

The parameter for submitting XML format is very simple. It is just a contenttype problem during submission. The following example shows how to read XML Information from a file and submit it to the server, this process can be used to test web services.

Import java. Io. file;
Import java. Io. fileinputstream;
Import org. Apache. commons. httpclient. httpclient;
Import org. Apache. commons. httpclient. Methods. entityenclosingmethod;
Import org. Apache. commons. httpclient. Methods. postmethod;
/***//**
* Used to demonstrate the example of submitting XML format data
*/
Public class postxmlclient ...{
Public static void main (string [] ARGs) throws exception ...{
File input = new file ("test. xml ");
Postmethod post = new postmethod ("http: // localhost: 8080/httpclient/XML. jsp ");
// Set the request content to be directly read from the file
Post. setrequestbody (New fileinputstream (input ));
If (input. Length () <integer. max_value)
Post. setrequestcontentlength (input. Length ());
Else
Post. setrequestcontentlength (entityenclosingmethod. content_length_chunked );
// Specify the request content type
Post. setRequestHeader ("Content-Type", "text/XML; charset = GBK ");
Httpclient = new httpclient ();
Int result = httpclient.exe cutemethod (post );
System. Out. println ("response status code:" + result );
System. Out. println ("response body :");
System. Out. println (post. getresponsebodyasstring ());
Post. releaseconnection ();
}
}

6. upload files over HTTP

Httpclient uses a separate httpmethod subclass to process file uploads. This class is multipartpostmethod, which has encapsulated the file upload details, all we need to do is to tell it that we want to upload the full path of the file. The following code snippet demonstrates how to use this class.

Multipartpostmethod filepost = new multipartpostmethod (TargetUrl );
Filepost. addparameter ("FILENAME", targetfilepath );
Httpclient client = new httpclient ();
// Because the file to be uploaded may be large, set the maximum connection timeout here
Client. gethttpconnectionmanager (). getparams (). setconnectiontimeout (5000 );
Int status = client.exe cutemethod (filepost); in the above Code, targetfilepath is the path of the file to be uploaded.

7. Access the authentication enabled page

 
We often encounter such a page. When we access it, a browser dialog box will pop up asking us to enter the user name and password, this user authentication method is different from the form-based
User authentication. This is an HTTP Authentication Policy. httpclient supports three authentication methods: basic, summary, and NTLM authentication. Basic authentication is the simplest, most common, but most disturbing.
Full; Digest authentication is performed in HTTP
The authentication method added in 1.1, while NTLM is defined by Microsoft rather than general specifications. The latest version of NTLM is more secure than digest authentication.

The following example shows how to access a page protected by authentication:

Import org. Apache. commons. httpclient. httpclient;
Import org. Apache. commons. httpclient. usernamepasswordcredentials;
Import org. Apache. commons. httpclient. Methods. getmethod;
Public class basicauthenticationexample ...{
Public basicauthenticationexample ()...{
}
Public static void main (string [] ARGs) throws exception ...{
Httpclient client = new httpclient ();
Client. getstate (). setcredentials (
"Www.verisign.com ",
"Realm ",
New usernamepasswordcredentials ("username", "password ")
);
Getmethod g

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.