Use httpclient to simulate the browser get post

Source: Internet
Author: User
Tags http authentication

Generally, we use the IE or Navigator browser to access a Web server, which is used to view information on the page or submit some data. The accessed pages are only some common pages, which can be used only after users log on, or require authentication and encrypted transmission, such as HTTPS. Currently, the browser we use does not solve these problems. However, in some cases, you may need to access such pages through programs, for example, "steal" some data from others' webpages, or use the pages provided by some sites to complete certain functions, for example, if we want to know the location of a mobile phone number and we do not have such data, we have to use the existing websites of other companies to complete this function, at this time, we need to submit the mobile phone number to the webpage and parse the desired data from the returned page. If the other side is just a very simple page, our program will be very simple, and there is no need to put a bid on it here. However, considering some service authorization issues, the pages provided by many companies are not accessible through a simple URL, however, you must register and log on to the service provider page before you can use the service provider page. This involves cookie processing. We know that currently popular ** page technologies such as ASP and JSP do not process session information through cookies. In order for our program to use the service pages provided by others, we require the program to log on first and then access the service page. In this process, we need to process the cookies on our own. When you use java.net. how terrible is httpurlconnection to complete these functions! Besides, this is just a common "stubborn" in what we call stubborn Web servers "! Is it like uploading files over HTTP? There is no headache. It is easy to solve these problems with "it!

We cannot list all possible stubbornness. We will deal with several of the most common problems. Of course, as mentioned earlier, if we use java.net. httpurlconnection is terrible to solve these problems. Therefore, before we start, we should first introduce an open source project, which is httpclient in the Apache open-source organization, it is affiliated to the commons project of Jakarta. The current version is 2.0rc2. There is already a net sub-project under commons, but the httpclient is proposed separately. It can be seen that access to the HTTP server is not easy.

The commons-httpclient project is specially designed to simplify communication programming between HTTP clients and servers. It makes it easy for you to solve the headache. For example, if you no longer care about HTTP or HTTPS communication, tell it that you want to use HTTPS, the rest is handed over to httpclient for you. This article will introduce how to use httpclient to solve the problems we often encounter when writing HTTP client programs, in order to make readers more familiar with this project, we first give a simple example to read the content of a web page, and then gradually solve the problem of moving forward? /Font>

1. Read webpage (HTTP/https) Content

The following is a simple example to access a page.

/*

* Created on 2003-12-14 by liudong

*/

Package HTTP. Demo;

Import java. Io. ioexception;

Import org. Apache. commons. httpclient .*;

Import org. Apache. commons. httpclient. Methods .*;

/**

* The simplest HTTP client is used to demonstrate how to access a page through get or post.

* @ Author liudong

*/

Public class simpleclient {

Public static void main (string [] ARGs) throws ioexception

{

Httpclient client = new httpclient ();

// Set the proxy server address and port

// Client. gethostconfiguration (). setproxy ("proxy_host_addr", proxy_port );

// Use the get method. If the server needs to connect over https, you only need to replace the HTTP in the following URL with HTTPS

Httpmethod method = new getmethod ("http://java.sun.com ");

// Use the post Method

// Httpmethod method = new postmethod ("http://java.sun.com ");

Client.exe cutemethod (method );

// Print the status returned by the server

System. Out. println (method. getstatusline ());

// Print the returned information

System. Out. println (method. getresponsebodyasstring ());

// Release the connection

Method. releaseconnection ();

}
}

In this example, create an HTTP client (httpclient) instance first, select get or post as the submission method, and then execute the submission method on the httpclient instance, finally, read the server feedback results from the selected submission method. This is the basic process of using httpclient. In fact, using a single line of code can handle the entire request process, which is very simple!

2. Submit parameters to the webpage in get or post Mode

In fact, in the previous simplest example, we have introduced how to use get or post to request a page. This section differs from this section in that the parameters required for the page are set when the page is submitted, we know that if it is a GET request method, then all parameters are directly placed after the URL of the page separated by a question mark and page address, each parameter is separated with &, for example: http://java.sun.com? Name = liudong & mobile = 123456, but it may be a little bit difficult to use the post method. The example in this section shows how to query the city where the mobile phone number is located. The Code is as follows:

/*

* Created on 2003-12-7 by liudong

*/

Package HTTP. Demo;

Import java. Io. ioexception;

Import org. Apache. commons. httpclient .*;

Import org. Apache. commons. httpclient. Methods .*;

/**

* Parameter submission demo

* This program is connected to a page used to query the location of the mobile phone number.

* Query the province and city where code segment 1330227 is located.

* @ Author liudong

*/

Public class simplehttpclient {

Public static void main (string [] ARGs) throws ioexception

{

Httpclient client = new httpclient ();

Client. gethostconfiguration (). sethost ("www.imobile.com.cn", 80, "HTTP ");

Httpmethod method = getpostmethod (); // use post to submit data

Client.exe cutemethod (method );

// Print the status returned by the server

System. Out. println (method. getstatusline ());

// Print the result page

String response =

New String (method. getresponsebodyasstring (). getbytes ("8859_1 "));

// Print the returned information

System. Out. println (response );

Method. releaseconnection ();

}

/**

* Use get to submit data

* @ Return

*/

Private Static httpmethod getgetmethod (){

Return new getmethod ("/simcard. php? Simcard = 1330227 ");

}

/**

* Use post to submit data

* @ Return

*/

Private Static httpmethod getpostmethod (){

Postmethod post = new postmethod ("/simcard. php ");

Namevaluepair simcard = new namevaluepair ("simcard", "1330227 ");

Post. setrequestbody (New namevaluepair [] {simcard });

Return post;

}

}

In the preceding example. The get method only needs to add parameter information after the URL, while the POST method needs to set the parameter name and its corresponding value through the namevaluepair class.

3. Process page redirection

In JSP/servlet programming, the response. sendredirect method uses the redirection mechanism in HTTP. It works with <JSP: Forward…> The difference is that the latter implements page Jump on the server, that is, the application container loads the content of the page to be redirected and returns it to the client. The former returns a status code, the possible values of these status codes are shown in the following table. Then, the client reads the URL of the page to be redirected and reloads the new page. This is a process, so we need to use httpmethod. getstatuscode () method to determine whether the returned value is a value in the following table to determine whether to jump. If you have confirmed that you need to jump to the page, you can obtain the new address by reading the location attribute in the HTTP header.

Status Code
Constant corresponding to httpservletresponse
Detailed description

301
SC _moved_permanently
The page has been permanently moved to another address.

302
SC _moved_temporarily
The page is temporarily moved to another address.

303
SC _see_other
The client request address must be accessed through another URL

307
SC _temporary_redirect
Same as SC _moved_temporarily

The following code snippet demonstrates how to handle page redirection.

Client.exe cutemethod (post );

System. Out. println (post. getstatusline (). tostring ());

Post. releaseconnection ();

// Check for redirection

Int statuscode = post. getstatuscode ();

If (statuscode = httpstatus. SC _moved_temporarily) |

(Statuscode = httpstatus. SC _moved_permanently) |

(Statuscode = httpstatus. SC _see_other) |

(Statuscode = httpstatus. SC _temporary_redirect )){

// Read the new URL

Header header = post. getResponseHeader ("location ");

If (header! = NULL ){

String newuri = header. getvalue ();

If (newuri = NULL) | (newuri. Equals ("")))

Newuri = "/";

Getmethod redirect = new getmethod (newuri );

Client.exe cutemethod (redirect );

System. Out. println ("Redirect:" + redirect. getstatusline (). tostring ());

Redirect. releaseconnection ();

} Else

System. Out. println ("invalid redirect ");

}

We can write two JSP pages by ourselves, one of which is redirected to another by using the response. sendredirect method to test the above example.

4. Enter the user name and password to log on.

This section is supposed to be the most common problem in HTTP client programming. The content of many websites is only visible to registered users, in this case, you must use the correct user name and password to successfully log on to the desired page. Because the HTTP protocol is stateless, that is, the connection validity period is limited to the current request, and the connection is closed after the request content ends. In this case, the cookie mechanism must be used to save user login information. Take JSP/Servlet as an example. When a browser requests a JSP or servlet page, the application server returns a parameter named JSESSIONID (varies with application servers ), the value is a long, unique string cookie. The string value is the session ID currently accessing the site. Each time the browser accesses other pages of the site, it must carry cookie information such as JSESSIONID. The application server obtains the corresponding session information based on the session ID.

For websites that require user logon, user data is generally stored in the session of the server after the user logs in successfully. When other pages are accessed, the Application Server reads the session ID of the current request based on the cookie sent by the browser to obtain the corresponding session information. Then, it can determine whether the user information exists in the session information, if yes, you can access the page. Otherwise, you must enter your account and password to log on to the logon page. This is a common way to use JSP to develop websites to process user logon.

In this way, for the HTTP client, if you want to access a protected page, you must simulate the work done by the browser. First, you need to request the login page and then read the cookie value; request the logon page again and add each parameter required for the logon page. The final page required for the request is displayed. Of course, cookie information must be included in all requests except for the first request so that the server can determine whether the current request has passed verification. If you use httpclient, you don't even need to add a line of code. You just need to pass the login information to execute the login process, and then directly access the desired page, it is no different from accessing a common page, because the httpclient class has already helped you do everything you should do. It's great! The following example implements such an access process.

/*

* Created on 2003-12-7 by liudong

*/

Package HTTP. Demo;

Import org. Apache. commons. httpclient .*;

Import org. Apache. commons. httpclient. Cookie .*;

Import org. Apache. commons. httpclient. Methods .*;

/**

* Shows an example of a logon form.

* @ Author liudong

*/

Public class formlogindemo {

Static final string logon_site = "localhost ";

Static final int maid = 8080;

Public static void main (string [] ARGs) throws exception {

Httpclient client = new httpclient ();

Client. gethostconfiguration (). sethost (logon_site, logon_port );

// Simulate the login. jsp-> main. jsp

Postmethod post = new postmethod ("/Main. jsp ");

Namevaluepair name = new namevaluepair ("name", "LD ");

Namevaluepair pass = new namevaluepair ("password", "LD ");

Post. setrequestbody (New namevaluepair [] {name, pass });

Int status = client.exe cutemethod (post );

System. Out. println (post. getresponsebodyasstring ());

Post. releaseconnection ();

// View Cookie Information

Cookiespec = cookiepolicy. getdefaspec SPEC ();

Cookie [] cookies = cookiespec. Match (logon_site, logon_port, "/", false, client. getstate (). getcookies ());

If (cookies. Length = 0 ){

System. Out. println ("NONE ");

} Else {

For (INT I = 0; I <cookies. length; I ++ ){

System. Out. println (Cookies [I]. tostring ());

}

}

// Access the required page main2.jsp

Getmethod get = new getmethod ("/main2.jsp ");

Client.exe cutemethod (get );

System. Out. println (get. getresponsebodyasstring ());

Get. releaseconnection ();

}

}

5. Submit XML format parameters

The parameter for submitting XML format is very simple. It is just a contenttype problem during submission. The following example shows how to read XML Information from a file and submit it to the server, this process can be used to test web services.

Import java. Io. file;

Import java. Io. fileinputstream;

Import org. Apache. commons. httpclient. httpclient;

Import org. Apache. commons. httpclient. Methods. entityenclosingmethod;

Import org. Apache. commons. httpclient. Methods. postmethod;

/**

* Used to demonstrate the example of submitting XML format data

*/

Public class postxmlclient {

Public static void main (string [] ARGs) throws exception {

File input = new file ("test. xml ");

Postmethod post = new postmethod ("http: // localhost: 8080/httpclient/XML. jsp ");

// Set the request content to be directly read from the file

Post. setrequestbody (New fileinputstream (input ));

If (input. Length () <integer. max_value)

Post. setrequestcontentlength (input. Length ());

Else post. setrequestcontentlength (entityenclosingmethod. content_length_chunked );

// Specify the request content type

Post. setRequestHeader ("Content-Type", "text/XML; charset = GBK ");

Httpclient = new httpclient ();

Int result = httpclient.exe cutemethod (post );

System. Out. println ("response status code:" + result );

System. Out. println ("response body :");

System. Out. println (post. getresponsebodyasstring ());

Post. releaseconnection ();

}

}

6. upload files over HTTP

Httpclient uses a separate httpmethod subclass to process file uploads. This class is multipartpostmethod, which has encapsulated the file upload details, all we need to do is to tell it that we want to upload the full path of the file. The following code snippet demonstrates how to use this class.

Multipartpostmethod filepost = new multipartpostmethod (TargetUrl );

Filepost. addparameter ("FILENAME", targetfilepath );

Httpclient client = new httpclient ();

// Because the file to be uploaded may be large, set the maximum connection timeout here

Client. gethttpconnectionmanager (). getparams (). setconnectiontimeout (5000 );

Int status = client.exe cutemethod (filepost );

In the code above, targetfilepath is the path of the file to be uploaded.

7. Access the authentication enabled page

We often encounter such a page. When we access it, a browser dialog box will pop up asking us to enter the user name and password, this method of user authentication is different from the form-based user authentication described earlier. This is an HTTP Authentication Policy. httpclient supports three authentication methods: basic, summary, and NTLM authentication. Basic authentication is the simplest, generic, but insecure. Digest authentication is the authentication method added to HTTP 1.1, while NTLM is defined by Microsoft rather than general rules, the latest version of NTLM is more secure than digest authentication.

The following example shows how to access a page protected by authentication:

Import org. Apache. commons. httpclient. httpclient;

Import org. Apache. commons. httpclient. usernamepasswordcredentials;

Import org. Apache. commons. httpclient. Methods. getmethod;

Public class basicauthenticationexample {

Public basicauthenticationexample (){

}

Public static void main (string [] ARGs) throws exception {

Httpclient client = new httpclient ();

Client. getstate (). setcredentials (

"Www.verisign.com ",

"Realm ",

New usernamepasswordcredentials ("username", "password ")

);

Getmethod get = new getmethod ("https://www.verisign.com/products/index.html ");

Get. setdoauthentication (true );

Int status = client.exe cutemethod (get );

System. Out. println (status + "" + Get. getresponsebodyasstring ());

Get. releaseconnection ();

}

}

8. Use httpclient in multi-threaded Mode

Multiple Threads simultaneously access httpclient. For example, multiple files can be downloaded from one site. For a single httpconnection, only one thread can be accessed at a time. To ensure that no conflict occurs in the multi-threaded working environment, httpclient uses a multi-threaded connection manager class: multithreadedhttpconnectionmanager, to use this class, you only need to input it when constructing the httpclient instance. The Code is as follows:

Multithreadedhttpconnectionmanager connectionmanager =

New multithreadedhttpconnectionmanager ();

Httpclient client = new httpclient (connectionmanager );

You can access the client instance in the future.

References:

Httpclient home: http://jakarta.apache.org/commons/httpclient/
How NTLM works: http://davenport.sourceforge.net/ntlm.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.