Heritrix 3.1.0 source code parsing (25) heritrix 3.1.0 source code parsing (23)

Source: Internet
Author: User

In heritrix 3.1.0 source code analysis (23), we analyzed how the heritrix3.1.0 system expanded the httpconnection connection object of the httpclient component and the corresponding management interface httpconnectionmanager.

A socket connection is created in the httpconnection connection object, but data is neither written to the output stream nor read from the input stream. How is the httpclient component implemented, how is the heritrix3.1.0 system extended?

We know that when we use the httpclient component to execute a webpage request, we create the corresponding getmethod class or postmethod class based on whether the webpage we want to request is a GET request or a POST request (of course there are other methods, not supported by browsers currently)

These request classes implement the common interface httpmethod, which declares the methods that all requests need to implement (this interface declares many methods, logically, they can be divided into request-related parts and response-related parts for easy understanding). The following lists the important methods

Public interface httpmethod {// descriqueries // Boolean validate (); int getstatuscode (); byte [] getresponsebody () throws ioexception; string getresponsebodyasstring () throws ioexception; inputstream getresponsebodyasstream () throws ioexception; int execute (httpstate state, httpconnection connection) throws httpexception, ioexception; void releaseconnection (); Boolean getdoauthentication (); void setdoauthentication (Boolean doauthentication ); public httpmethodparams getparams (); Public void setparams (final httpmethodparams Params); Public authstate gethostauthstate (); Public authstate getproxyauthstate (); Boolean isrequestsent ();}

When we execute a request, we will actually call the execute method of the interface implementation class.

The implementation of this interface has an abstract class httpmethodbase, which implements the common methods of all inheritance classes (all request methods), mainly processing socket output streams and input streams, the most important is the execute method.

/*** Executes this method using the specified <code> httpconnection </code> and * <code> httpstate </code>. ** @ Param state {@ link httpstate state} information to associate with this * request. must be non-null. * @ Param conn the {@ link httpconnection connection} to used to execute * This HTTP method. must be non-null. ** @ return the integer status code if one was obtained, or <tt>-1 </tt> ** @ throws ioexception if an I/O (Transport) error occurs * @ throws httpexception if a protocol exception occurs. */Public int execute (httpstate state, httpconnection conn) throws httpexception, ioexception {log. trace ("Enter httpmethodbase.exe cute (httpstate, httpconnection)"); // This is our connection now, assign it to a local variable so // that it can be released later this. responseconnection = conn; checkexecuteconditions (State, Conn); this. statusline = NULL; this. connectioncloseforced = false; Conn. setlastresponseinputstream (null); // determine the valid protocol version if (this. required tiveversion = NULL) {This. required tiveversion = This. params. getversion () ;}// socket output stream writerequest (State, Conn); this. requestsent = true; // The socket input stream readresponse (State, Conn); // The method has successfully executed used = true; return statusline. getstatuscode ();}

In the preceding method, writerequest (State, Conn) is responsible for writing the stream, while readresponse (State, Conn) is responsible for reading the stream.

The writerequest (State, Conn) method writes data into the stream. The heritrix3.1.0 system uses this entry to write data into the custom logic and rewrite the httpmethodbase class, including the writing of cookies and writing of form parameters (this part is to be analyzed after the customized cookies and form encapsulation of heritrix3.1.0 system are analyzed)

In addition to the preceding common logic, this method continues to call the Boolean writerequestbody (httpstate state, httpconnection conn) method. This method is usually implemented by sub-classes.

The inheritance class of httpmethodbase in this abstract class provides the method implementation of the corresponding request method. Here I will only analyze the httprecordergetmethod class and httprecorderpostmethod class customized by heritrix3.1.0.

public class HttpRecorderGetMethod extends GetMethod {        protected static Logger logger =        Logger.getLogger(HttpRecorderGetMethod.class.getName());        /**     * Instance of http recorder method.     */    protected HttpRecorderMethod httpRecorderMethod = null;        public HttpRecorderGetMethod(String uri, Recorder recorder) {        super(uri);        this.httpRecorderMethod = new HttpRecorderMethod(recorder);    }    protected void readResponseBody(HttpState state, HttpConnection connection)    throws IOException, HttpException {        // We're about to read the body.  Mark transition in http recorder.        this.httpRecorderMethod.markContentBegin(connection);        super.readResponseBody(state, connection);    }    protected boolean shouldCloseConnection(HttpConnection conn) {        // Always close connection after each request. As best I can tell, this        // is superfluous -- we've set our client to be HTTP/1.0.  Doing this        // out of paranoia.        return true;    }    public int execute(HttpState state, HttpConnection conn)    throws HttpException, IOException {        // Save off the connection so we can close it on our way out in case        // httpclient fails to (We're not supposed to have access to the        // underlying connection object; am only violating contract because        // see cases where httpclient is skipping out w/o cleaning up        // after itself).        this.httpRecorderMethod.setConnection(conn);        return super.execute(state, conn);    }        protected void addProxyConnectionHeader(HttpState state, HttpConnection conn)            throws IOException, HttpException {        super.addProxyConnectionHeader(state, conn);        this.httpRecorderMethod.handleAddProxyConnectionHeader(this);    }}

In addition to passing in URL strings, the constructor of this class also includes the recorder object used to initialize the member object httprecordermethod. This object contains two members: Recorder httprecorder object and httpconnection object, in the methods related to the httprecorderpostmethod class, in addition to calling methods with the same name as the parent class, the httprecordermethod is called to call related methods of the httprecordermethod object, including setting its own httpconnection connection Member object and callback recorder httprecorder object method (input stream preparation)

The httprecorderpostmethod class inherits from the postmethod class. It is similar to the basic logic of the httprecordergetmethod class and I will not analyze it anymore.

---------------------------------------------------------------------------

This series of heritrix 3.1.0 source code parsing is self-original

Reprinted please indicate the source of the blog garden hedgehog gentle

This article link http://www.cnblogs.com/chenying99/archive/2013/04/28/3048387.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.