伺服器架設筆記——使用Apache外掛程式解析簡單請求

來源:互聯網
上載者:User

標籤:

        一般來說,對於一個請求,伺服器都會對其進行解析,以確定請求的合法性以及行進的路徑。於是本節將講解如何擷取請求的資料。(轉載請指明出於breaksoftware的csdn部落格)

        我們使用《伺服器架設筆記——編譯Apache及其外掛程式》一文中的方法建立一個Handler工程——get_request。該工程中,我們可以操作的入口函數是

static int get_request_handler(request_rec *r){    r->content_type = "text/html";  
        通過該入口函數,我們可以直接得到的資料就是request_rec結構體對象指標r。通過查閱源碼,我們得到其定義
/** * @brief A structure that represents the current request */struct request_rec {    /** The pool associated with the request */    apr_pool_t *pool;    /** The connection to the client */    conn_rec *connection;    /** The virtual host for this request */    server_rec *server;    /** Pointer to the redirected request if this is an external redirect */    request_rec *next;    /** Pointer to the previous request if this is an internal redirect */    request_rec *prev;    /** Pointer to the main request if this is a sub-request     * (see http_request.h) */    request_rec *main;    /* Info about the request itself... we begin with stuff that only     * protocol.c should ever touch...     */    /** First line of request */    char *the_request;    /** HTTP/0.9, "simple" request (e.g. GET /foo\n w/no headers) */    int assbackwards;    /** A proxy request (calculated during post_read_request/translate_name)     *  possible values PROXYREQ_NONE, PROXYREQ_PROXY, PROXYREQ_REVERSE,     *                  PROXYREQ_RESPONSE     */    int proxyreq;    /** HEAD request, as opposed to GET */    int header_only;    /** Protocol version number of protocol; 1.1 = 1001 */    int proto_num;    /** Protocol string, as given to us, or HTTP/0.9 */    char *protocol;    /** Host, as set by full URI or Host: */    const char *hostname;    /** Time when the request started */    apr_time_t request_time;    /** Status line, if set by script */    const char *status_line;    /** Status line */    int status;    /* Request method, two ways; also, protocol, etc..  Outside of protocol.c,     * look, but don't touch.     */    /** M_GET, M_POST, etc. */    int method_number;    /** Request method (eg. GET, HEAD, POST, etc.) */    const char *method;    /**     *  'allowed' is a bitvector of the allowed methods.     *     *  A handler must ensure that the request method is one that     *  it is capable of handling.  Generally modules should DECLINE     *  any request methods they do not handle.  Prior to aborting the     *  handler like this the handler should set r->allowed to the list     *  of methods that it is willing to handle.  This bitvector is used     *  to construct the "Allow:" header required for OPTIONS requests,     *  and HTTP_METHOD_NOT_ALLOWED and HTTP_NOT_IMPLEMENTED status codes.     *     *  Since the default_handler deals with OPTIONS, all modules can     *  usually decline to deal with OPTIONS.  TRACE is always allowed,     *  modules don't need to set it explicitly.     *     *  Since the default_handler will always handle a GET, a     *  module which does *not* implement GET should probably return     *  HTTP_METHOD_NOT_ALLOWED.  Unfortunately this means that a Script GET     *  handler can't be installed by mod_actions.     */    apr_int64_t allowed;    /** Array of extension methods */    apr_array_header_t *allowed_xmethods;    /** List of allowed methods */    ap_method_list_t *allowed_methods;    /** byte count in stream is for body */    apr_off_t sent_bodyct;    /** body byte count, for easy access */    apr_off_t bytes_sent;    /** Last modified time of the requested resource */    apr_time_t mtime;    /* HTTP/1.1 connection-level features */    /** The Range: header */    const char *range;    /** The "real" content length */    apr_off_t clength;    /** sending chunked transfer-coding */    int chunked;    /** Method for reading the request body     * (eg. REQUEST_CHUNKED_ERROR, REQUEST_NO_BODY,     *  REQUEST_CHUNKED_DECHUNK, etc...) */    int read_body;    /** reading chunked transfer-coding */    int read_chunked;    /** is client waiting for a 100 response? */    unsigned expecting_100;    /** The optional kept body of the request. */    apr_bucket_brigade *kept_body;    /** For ap_body_to_table(): parsed body */    /* XXX: ap_body_to_table has been removed. Remove body_table too or     * XXX: keep it to reintroduce ap_body_to_table without major bump? */    apr_table_t *body_table;    /** Remaining bytes left to read from the request body */    apr_off_t remaining;    /** Number of bytes that have been read  from the request body */    apr_off_t read_length;    /* MIME header environments, in and out.  Also, an array containing     * environment variables to be passed to subprocesses, so people can     * write modules to add to that environment.     *     * The difference between headers_out and err_headers_out is that the     * latter are printed even on error, and persist across internal redirects     * (so the headers printed for ErrorDocument handlers will have them).     *     * The 'notes' apr_table_t is for notes from one module to another, with no     * other set purpose in mind...     */    /** MIME header environment from the request */    apr_table_t *headers_in;    /** MIME header environment for the response */    apr_table_t *headers_out;    /** MIME header environment for the response, printed even on errors and     * persist across internal redirects */    apr_table_t *err_headers_out;    /** Array of environment variables to be used for sub processes */    apr_table_t *subprocess_env;    /** Notes from one module to another */    apr_table_t *notes;    /* content_type, handler, content_encoding, and all content_languages     * MUST be lowercased strings.  They may be pointers to static strings;     * they should not be modified in place.     */    /** The content-type for the current request */    const char *content_type;   /* Break these out --- we dispatch on 'em */    /** The handler string that we use to call a handler function */    const char *handler;        /* What we *really* dispatch on */    /** How to encode the data */    const char *content_encoding;    /** Array of strings representing the content languages */    apr_array_header_t *content_languages;    /** variant list validator (if negotiated) */    char *vlist_validator;    /** If an authentication check was made, this gets set to the user name. */    char *user;    /** If an authentication check was made, this gets set to the auth type. */    char *ap_auth_type;    /* What object is being requested (either directly, or via include     * or content-negotiation mapping).     */    /** The URI without any parsing performed */    char *unparsed_uri;    /** The path portion of the URI, or "/" if no path provided */    char *uri;    /** The filename on disk corresponding to this response */    char *filename;    /* XXX: What does this mean? Please define "canonicalize" -aaron */    /** The true filename, we canonicalize r->filename if these don't match */    char *canonical_filename;    /** The PATH_INFO extracted from this request */    char *path_info;    /** The QUERY_ARGS extracted from this request */    char *args;    /**     * Flag for the handler to accept or reject path_info on     * the current request.  All modules should respect the     * AP_REQ_ACCEPT_PATH_INFO and AP_REQ_REJECT_PATH_INFO     * values, while AP_REQ_DEFAULT_PATH_INFO indicates they     * may follow existing conventions.  This is set to the     * user's preference upon HOOK_VERY_FIRST of the fixups.     */    int used_path_info;    /** A flag to determine if the eos bucket has been sent yet */    int eos_sent;    /* Various other config info which may change with .htaccess files     * These are config vectors, with one void* pointer for each module     * (the thing pointed to being the module's business).     */    /** Options set in config files, etc. */    struct ap_conf_vector_t *per_dir_config;    /** Notes on *this* request */    struct ap_conf_vector_t *request_config;    /** Optional request log level configuration. Will usually point     *  to a server or per_dir config, i.e. must be copied before     *  modifying */    const struct ap_logconf *log;    /** Id to identify request in access and error log. Set when the first     *  error log entry for this request is generated.     */    const char *log_id;    /**     * A linked list of the .htaccess configuration directives     * accessed by this request.     * N.B. always add to the head of the list, _never_ to the end.     * that way, a sub request's list can (temporarily) point to a parent's list     */    const struct htaccess_result *htaccess;    /** A list of output filters to be used for this request */    struct ap_filter_t *output_filters;    /** A list of input filters to be used for this request */    struct ap_filter_t *input_filters;    /** A list of protocol level output filters to be used for this     *  request */    struct ap_filter_t *proto_output_filters;    /** A list of protocol level input filters to be used for this     *  request */    struct ap_filter_t *proto_input_filters;    /** This response can not be cached */    int no_cache;    /** There is no local copy of this response */    int no_local_copy;    /** Mutex protect callbacks registered with ap_mpm_register_timed_callback     * from being run before the original handler finishes running     */    apr_thread_mutex_t *invoke_mtx;    /** A struct containing the components of URI */    apr_uri_t parsed_uri;    /**  finfo.protection (st_mode) set to zero if no such file */    apr_finfo_t finfo;    /** remote address information from conn_rec, can be overridden if     * necessary by a module.     * This is the address that originated the request.     */    apr_sockaddr_t *useragent_addr;    char *useragent_ip;    /** MIME trailer environment from the request */    apr_table_t *trailers_in;    /** MIME trailer environment from the response */    apr_table_t *trailers_out;};
        這是個非常大的結構體,可謂是包羅永珍。對於初學者來說,想完全弄明白各項是什麼還是比較困難的。而我們的需求很簡單,我們就列出我們可能需要關心的資料

    /** First line of request */    char *the_request;

        請求的第一行資料

    /** Protocol version number of protocol; 1.1 = 1001 */    int proto_num;    /** Protocol string, as given to us, or HTTP/0.9 */    char *protocol;    /** Host, as set by full URI or Host: */    const char *hostname;
        協議的版本和請求的類型
    /** Time when the request started */    apr_time_t request_time;
        請求的時間
    /** The URI without any parsing performed */    char *unparsed_uri;    /** The path portion of the URI, or "/" if no path provided */    char *uri;    /** The filename on disk corresponding to this response */    char *filename;
        未進行urldecode的URI、經過urldecode的URI和處理該請求的檔案路徑
    /** The PATH_INFO extracted from this request */    char *path_info;    /** The QUERY_ARGS extracted from this request */    char *args;
         請求中的路徑和參數
    /** A struct containing the components of URI */    apr_uri_t parsed_uri;
        請求解析的詳細結果
    char *useragent_ip;

        請求來源的IP

/** MIME header environment from the request */    apr_table_t *headers_in;
        以table形式儲存的http頭資訊

        對於基礎資料類型我們很容易編寫出常式

if (r->the_request) {ap_rprintf(r, "the request : %s\n", r->the_request);}else {ap_rprintf(r, "the request is NULL\n");}if (r->protocol) {ap_rprintf(r, "protocol : %s\n", r->protocol);}else {ap_rprintf(r, "protocol is NULL\n");}ap_rprintf(r, "proto_num is %d\n", r->proto_num);
        而對於請求時間apr_time_t類型,我們可以參考《伺服器架設筆記——Apache模組開發基礎知識》中對模組的介紹。我們查看源碼,可以編寫出如下常式

static void print_time(request_rec* r) {if (!r) {ap_rprintf(r, "request_rec pointer is NULL\n");return;}char data_str[128] = {0};apr_status_t status = apr_ctime(data_str, r->request_time);if (APR_SUCCESS != status) {ap_rprintf(r, "apr_ctime error\n");}else {ap_rprintf(r, "ctime\t:\t%s\n", data_str);}apr_time_exp_t exp_t;memset(&exp_t, 0, sizeof(exp_t));status = apr_time_exp_gmt(&exp_t, r->request_time);if (APR_SUCCESS != status) {ap_rprintf(r, "apr_time_exp_gmt error\n");}else {ap_rprintf(r, "exp time\t:\n");ap_rprintf(r, "\ttm_usec\t:\t%d\n", exp_t.tm_usec);ap_rprintf(r, "\ttm_sec\t:\t%d\n", exp_t.tm_sec);ap_rprintf(r, "\ttm_min\t:\t%d\n", exp_t.tm_min);ap_rprintf(r, "\ttm_hour\t:\t%d\n", exp_t.tm_hour);ap_rprintf(r, "\ttm_mday\t:\t%d\n", exp_t.tm_mday);ap_rprintf(r, "\ttm_mon\t:\t%d\n", exp_t.tm_mon);ap_rprintf(r, "\ttm_year\t:\t%d\n", exp_t.tm_year);ap_rprintf(r, "\ttm_wday\t:\t%d\n", exp_t.tm_wday);ap_rprintf(r, "\ttm_yday\t:\t%d\n", exp_t.tm_yday);ap_rprintf(r, "\ttm_isdst\t:\t%d\n", exp_t.tm_isdst);ap_rprintf(r, "\ttm_gmtoff\t:\t%d\n", exp_t.tm_gmtoff);}}
        其中apr_time_exp_t的定義在《apr_time.h》中。

/** * a structure similar to ANSI struct tm with the following differences: *  - tm_usec isn't an ANSI field *  - tm_gmtoff isn't an ANSI field (it's a BSDism) */struct apr_time_exp_t {    /** microseconds past tm_sec */    apr_int32_t tm_usec;    /** (0-61) seconds past tm_min */    apr_int32_t tm_sec;    /** (0-59) minutes past tm_hour */    apr_int32_t tm_min;    /** (0-23) hours past midnight */    apr_int32_t tm_hour;    /** (1-31) day of the month */    apr_int32_t tm_mday;    /** (0-11) month of the year */    apr_int32_t tm_mon;    /** year since 1900 */    apr_int32_t tm_year;    /** (0-6) days since Sunday */    apr_int32_t tm_wday;    /** (0-365) days since January 1 */    apr_int32_t tm_yday;    /** daylight saving time */    apr_int32_t tm_isdst;    /** seconds east of UTC */    apr_int32_t tm_gmtoff;};
        對於已分析過了的請求結構體apr_uri_t的常式也非常簡單,我就不再列出來,只是把其結構體定義貼一下。大家一看就明白

/** * A structure to encompass all of the fields in a uri */struct apr_uri_t {    /** scheme ("http"/"ftp"/...) */    char *scheme;    /** combined [user[:password]\@]host[:port] */    char *hostinfo;    /** user name, as in http://user:passwd\@host:port/ */    char *user;    /** password, as in http://user:passwd\@host:port/ */    char *password;    /** hostname from URI (or from Host: header) */    char *hostname;    /** port string (integer representation is in "port") */    char *port_str;    /** the request path (or NULL if only scheme://host was given) */    char *path;    /** Everything after a '?' in the path, if present */    char *query;    /** Trailing "#fragment" string, if present */    char *fragment;    /** structure returned from gethostbyname() */    struct hostent *hostent;    /** The port number, numeric, valid only if port_str != NULL */    apr_port_t port;        /** has the structure been initialized */    unsigned is_initialized:1;    /** has the DNS been looked up yet */    unsigned dns_looked_up:1;    /** has the dns been resolved yet */    unsigned dns_resolved:1;};
        這些常式中麻煩的是對apr_table_t的解析。因為網上很難找到對該table的遍曆代碼,於是我只能參考apr_table_clone中代碼得出如下

static void print_table(request_rec *r, const apr_table_t* t) {const apr_array_header_t* array = apr_table_elts(t);apr_table_entry_t* elts = (apr_table_entry_t*)array->elts;for (int i = 0; i < array->nelts; i++) {ap_rprintf(r, "\t%s : %s\n", elts[i].key, elts[i].val);}}
        我們請求一個URL:http://192.168.191.129/AP%26AC%3aHE?a=b#c

        其返回如下

headers_in startHost : 192.168.191.129Connection : keep-aliveCache-Control : max-age=0Accept : text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8User-Agent : Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36Accept-Encoding : gzip,deflate,sdchAccept-Language : zh-CN,zh;q=0.8headers_in endheaders_out startheaders_out endthe request : GET /AP%26AC%3aHE?a=b HTTP/1.1protocol : HTTP/1.1proto_num is 1001method : GEThost name : 192.168.191.129unparsed uri : /AP%26AC%3aHE?a=buri : /AP&AC:HEfilename : /usr/local/apache2/htdocs/AP&AC:HEpath info : args : a=buser is NULLlog id is NULLuseragent ip : 192.168.191.1ctime:Mon Feb 16 18:20:39 2015exp time:tm_usec:200039tm_sec:39tm_min:20tm_hour:10tm_mday:16tm_mon:1tm_year:115tm_wday:1tm_yday:46tm_isdst:0tm_gmtoff:0scheme is NULLhostinfo is NULLuser is NULLpassword is NULLhostname is NULLport_str is NULLpath : /AP&AC:HEquery : a=bfragment is NULLThe sample page from mod_get_request.c


伺服器架設筆記——使用Apache外掛程式解析簡單請求

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.