Multi-thread processing of Cookies, SSL, and httpclient, HTTP method _ PHP

Source: Internet
Author: User
-The authors sunggsun @ 20: 268 and CookiesHttpClient can automatically manage cookies, including allowing the server to set cookies and automatically returning them to the server as needed, it also allows you to manually set the cookie and send it to the server. Unfortunately, there are several conflicting rules on how to handle cookies: draft NetscapeCookie, RFC2109, RFC29 cookie


-Author: sunggsun @ 20:26





8. Cookies

HttpClient can automatically manage cookies, including allowing the server to set cookies and automatically return the cookies to the server as needed. It also supports manually setting cookies and then sending them to the server. Unfortunately, there are several conflicting rules on how to handle cookies: The draft Netscape cookie, RFC2109, RFC2965, and a large number of software vendors whose Cookie implementations do not follow any rules. to handle this situation, HttpClient provides a policy-driven cookie management method. HttpClient supports the following cookie specifications:

The draft Netscape cookie is the earliest cookie specification based on rfc2109. Although this specification is significantly different from rc2109, it can be compatible with some servers.

Rfc2109 is the first official cookie specification released by w3c. Theoretically, all servers must follow this specification when processing cookies (version 1). For this reason, HttpClient sets it as the default specification. Unfortunately, this specification is so strict that many servers improperly implement it or are still working on the Netscape specification. In this case, compatibility specifications should be used.

Compatibility specifications are designed to be compatible with as many servers as possible, even if they do not comply with standard specifications. When there is a problem with cookie resolution, compatibility specifications should be considered.

The RFC2965 specification is not currently supported by HttpClient (later versions will be added). It defines cookie version 2 and illustrates the deficiency of version 1 cookie. RFC2965 intends to replace rfc2133 for a long time.
In HttpClient, there are two methods to specify the use of cookie specifications,
HttpClient client = new HttpClient ();
Client. getState (). setCookiePolicy (CookiePolicy. COMPATIBILITY );
The specification set in this method is only valid for the current HttpState. the parameter value can be CookiePolicy. COMPATIBILITY, CookiePolicy. NETSCAPE_DRAFT, or CookiePolicy. RFC2109.

System. setProperty ("apache. commons. httpclient. cookiespec", "COMPATIBILITY ");
This method is a standard that is valid for each newly created HttpState object. the parameter value can be "COMPATIBILITY", "NETSCAPE_DRAFT" or "RFC2109 ".
Cookie resolution is often not possible, but it can be solved by changing to Compatibility specifications.

  
9. what should I do if I encounter problems using HttpClient?


Use a browser to access the server to check whether the server responds normally.

If you are using the proxy, turn off the proxy and try again.

Try another server (if different server software is running better)

Check whether the code is written as described in the tutorial

Set log level to debug to find out the cause of the problem

Open wiretrace to track the communication between the client and the server, so as to identify where the problem occurs

Use telnet or netcat to manually send the information to the server. this is suitable for testing when the cause is found.

Run netcat as a listener and use it as a server to check how httpclient handles the response.

Use the latest httpclient and try to fix the bug in the latest version.

Seek help from the email list

Report bugs to bugzilla.

  

10. SSL

With Java Secure Socket Extension (JSSE), HttpClient fully supports HTTP over the Secure Sockets Layer (SSL) or IETF Transport Layer Security (TLS) Protocol. JSSE is already in jre1.4 and later versions. you need to manually install and set it in earlier versions. for details, refer to the Sun website or this learning note.
It is very simple to use SSL in HttpClient. refer to the following two examples:
HttpClient httpclient = new HttpClient ();
GetMethod httpget = new GetMethod ("https://www.verisign.com /");
Httpclient.exe cuteMethod (httpget );
System. out. println (httpget. getStatusLine (). toString ());
If you use the proxy that requires authorization, it is as follows:
HttpClient httpclient = new HttpClient ();
Httpclient. getHostConfiguration (). setProxy ("myproxyhost", 8080 );
Httpclient. getState (). setProxyCredentials ("my-proxy-realm", "myproxyhost ",
New UsernamePasswordCredentials ("my-proxy-username", "my-proxy-password "));
GetMethod httpget = new GetMethod ("https://www.verisign.com /");
Httpclient.exe cuteMethod (httpget );
System. out. println (httpget. getStatusLine (). toString ());

To customize SSL in HttpClient, follow these steps:

Provides a socket factory that implements the org. apache. commons. httpclient. protocol. SecureProtocolSocketFactory interface. This socket factory is responsible for making a port to the server, using standard or third-party SSL function libraries, and initializing operations such as connection handshakes. Generally, this initialization operation is automatically performed when the port is created.

Instantiate an org. apache. commons. httpclient. protocol. Protocol object. When creating this instance, you need a valid protocol type (such as https), a custom socket factory, and a default client medium (such as port 443 of https ).
Protocol myhttps = new Protocol ("https", new MySSLSocketFactory (), 443 );
The instance can then be set as a protocol processor.
HttpClient httpclient = new HttpClient ();
Httpclient. getHostConfiguration (). setHost ("www. whatever.com", 443, myhttps );
GetMethod httpget = new GetMethod ("/");
Httpclient.exe cuteMethod (httpget );


Call the Protocol. registerProtocol method to register the custom instance as the default processor of a specific Protocol. Therefore, you can easily customize your protocol type (such as myhttps ).
Protocol. registerProtocol ("myhttps ",
New Protocol ("https", new MySSLSocketFactory (), 9443 ));
...
HttpClient httpclient = new HttpClient ();
GetMethod httpget = new GetMethod ("myhttps: // www.whatever.com /");
Httpclient.exe cuteMethod (httpget );
To replace the default https processor with a custom processor, you only need to register it as "https.
Protocol. registerProtocol ("https ",
New Protocol ("https", new MySSLSocketFactory (), 443 ));
HttpClient httpclient = new HttpClient ();
GetMethod httpget = new GetMethod ("https://www.whatever.com /");
Httpclient.exe cuteMethod (httpget );

Known limitations and problems

Continuous SSL connections cannot work on Sun's less than 1.4JVM, which is caused by a JVM bug.

Non-preemptive authentication (Non-preemptive authentication) fails when you access the server through a proxy. this is due to the design defect of HttpClient and will be modified in future versions.

Troubleshooting
Many problems, especially when the jvm is lower than 1.4, are caused by jsse installation.
The following code can be used as the final detection method.

Import java. io. BufferedReader;
Import java. io. InputStreamReader;
Import java. io. OutputStreamWriter;
Import java. io. Writer;
Import java.net. Socket;

Import javax.net. ssl. SSLSocketFactory;

Public class Test {

Public static final String TARGET_HTTPS_SERVER = "www.verisign.com ";
Public static final int TARGET_HTTPS_PORT = 443;

Public static void main (String [] args) throws Exception {

Socket socket = SSLSocketFactory. getDefault ().
CreateSocket (TARGET_HTTPS_SERVER, TARGET_HTTPS_PORT );
Try {
Writer out = new OutputStreamWriter (
Socket. getOutputStream (), "ISO-8859-1 ");
Out. write ("GET/HTTP/1.1rn ");
Out. write ("Host:" + TARGET_HTTPS_SERVER + ":" +
TARGET_HTTPS_PORT + "rn ");
Out. write ("Agent: SSL-TESTrn ");
Out. write ("rn ");
Out. flush ();
BufferedReader in = new BufferedReader (
New InputStreamReader (socket. getInputStream (), "ISO-8859-1 "));
String line = null;
While (line = in. readLine ())! = Null ){
System. out. println (line );
}
} Finally {
Socket. close ();
}
}
}


  
11. httpclient multithreading

The main purpose of multithreading is to achieve parallel download. When httpclient is running, each http method uses an HttpConnection instance. Connection is a limited resource. each connection can only be used by one thread and method at a certain time. therefore, ensure that the connection is allocated correctly when necessary. HttpClient uses a method similar to the jdbc connection pool to manage connections. the management is completed by MultiThreadedHttpConnectionManager.
MultiThreadedHttpConnectionManager connectionManager =
New MultiThreadedHttpConnectionManager ();
HttpClient client = new HttpClient (connectionManager );
Therefore, the client can be used to execute multiple methods in multiple threads. Every time you call the httpclient.exe cuteMethod () method, you will go to the connection manager to apply for a connection instance. after the application is successful, the linked instance is checked out (checkout), and you must return it to the manager after the link is used. The manager supports two settings: maxConnectionsPerHost, the maximum number of parallel connections for each host. the default value is 2.
MaxTotalConnections client's maximum number of parallel connections. the default value is 20.

When the manager re-exploits the link, it adopts the method of reusing the early returnee (least recently used approach ).
Because the HttpClient program is used instead of the HttpClient itself to read the body of the response packet, the HttpClient cannot decide when the connection will no longer be used, this requires that the releaseConnection () must be manually explicitly called to release the application link after the body of the answer packet is read.
MultiThreadedHttpConnectionManager connectionManager = new MultiThreadedHttpConnectionManager ();
HttpClient client = new HttpClient (connectionManager );
...
// In a thread.
GetMethod get = new GetMethod ("http://jakarta.apache.org /");
Try {
Client.exe cuteMethod (get );
// Print response to stdout
System. out. println (get. getResponseBodyAsStream ());
} Finally {
// Be sure the connection is released back to the connection
// Manager
Get. releaseConnection ();
}
Each httpclient.exe cuteMethod must have a method. releaseConnection () matching.

12. HTTP method


There are eight HTTP methods supported by HttpClient, which are described below.

1. Options

The HTTP method Options is used to send requests to the server. you want to obtain the function Options available for the resources marked by the request URL during the request/response communication process. In this way, the client can determine the actions and/or necessary conditions for a resource before taking specific actions, or understand the functions provided by the server. The most typical application of this method is to obtain the HTTP methods supported by the server.
In HttpClient, there is a class named OptionsMethod to support this HTTP method. Using the getAllowedMethods method of this class, you can easily implement the above typical applications.


OptionsMethod options = new OptionsMethod ("http://jakarta.apache.org ");
// Execute the method and handle the exception accordingly
...
Enumeration allowedMethods = options. getAllowedMethods ();
Options. releaseConnection ();

2. Get

The http get method is used to retrieve any information of the request URI (request-URI) flag (in the form of entity). The word "get" originally means "GET. If the request URI points to a data processing process, the data generated by this process is returned in the form of an entity in the response, rather than returning the code of this process.
If the HTTP packet contains header fields such as If-ModifiedSince, If-Unmodified-Since, If-Match, If-None-Match, or If-Range, then GET is changed to the "condition GET", that is, only the entities that meet the conditions described in the above fields are retrieved, which can reduce some non-essential network transmission, or reduce the number of requests to obtain a resource (such as the first check and the second download ). (Generally, browsers have a temporary directory to cache some webpage information. when you browse a page again, you can only download the modified content to speed up browsing, this is the truth. As for the check, the HEAD is usually implemented in a better way than GET .) If the HTTP packet contains a Range header field, only the part of the object specified by the request URI that determines the Range condition is retrieved. (It may be easier for users who have used multi-thread download tools to understand this)
A typical application of this method is used to download documents from the web server. HttpClient defines a class named GetMethod to support this method. the getResponseBody, getResponseBodyAsStream, or getResponseBodyAsString function in the GetMethod class can obtain the document (such as HTML page) information in the response packet body. GetResponseBodyAsStream is usually the best method among the three functions, mainly because it can avoid caching all downloaded data before processing the downloaded document.

GetMethod get = new GetMethod ("http://jakarta.apache.org ");
// Execute the method and process failed requests.
...
InputStream in = get. getResponseBodyAsStream ();
// Use the input stream to process information.
Get. releaseConnection ();

The most common incorrect use of GetMethod is that the data of all the response bodies is not read. Also, you must manually release the link.

3. Head

The HTTP Head method is exactly the same as the Get method. The only difference is that the server cannot include the message-body in the response packet, and it must not contain the body. By using this method, you can obtain basic information about the resource without downloading it back. This method is often used to check the accessibility of hyperlinks and whether resources have been modified recently.
The most typical application of the HTTP head method is to obtain basic information about resources. HttpClient defines that the HeadMethod class supports this Method. Like other * Method classes, the HeadMethod class uses getResponseHeaders () to retrieve header information without its own special Method.

HeadMethod head = new HeadMethod ("http://jakarta.apache.org ");
// Execute the method and process failed requests.
...
// Retrieve the header field of the response packet.
Header [] headers = head. getResponseHeaders ();

// Retrieve only the information of the last modified date field.
String lastModified = head. getResponseHeader ("last-modified"). getValue ();



4. Post

Post indicates "POST" in English. The HTTP method Post requires the server to accept the entity in the request package and use it as the subordinate resource of the request URI. Essentially, this means that the server needs to save the entity information, which is usually processed by the server program. The design intent of the Post method is to implement the following functions in a unified manner:
Comment on existing resources

Publish information to BBS, newsgroup, email list, or similar article group

Submit a piece of data to the data processing process

Expand a database through the append operation
These operations are expected to produce "side effects" on the server, such as modifying the database.
HttpClient defines the PostMethod class to support this HTTP method. in httpclient, there are two basic steps to use the post method: Prepare Data for the request packet, and then read the response packet information from the server. Call the setRequestBody () function to provide data for the request package. it can receive three types of parameters: input stream, name-value pair array, or string. To read the response packet, call the getResponseBody * method, which is the same as the GET method for handling the response packet.
A common problem is that it does not read all responses (whether it is useful to the program) or does not release linked resources.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.