HTTP decoding implementation of Nginx source code analysis

Source: Internet
Author: User
Tags uppercase letter

Analyzes how the nginx resolves and stores HTTP requests. The ability to identify and handle illegal or even malicious requests. It can be found that Nginx uses state machine to parse HTTP protocol, it has some fault-tolerant ability, but it is not comprehensive
Related Configuration

Configuration related to decoding

Merge_slashes
Grammar Merge_slashes on | Off
Default value On
Context HTTP Server
Description Supports the merging of adjacent slashes when parsing the requested row. For example, a request http://www.example.com/foo//bar/will generate the following $uri value: On:/foo/bar/off:/foo//bar/to know, static location matching is a string comparison, so if M Erge_slashes shutdown, a similar/foo//bar/request will not match location/foo/bar/. In the Httpcoremodule
Underscores_in_headers
Grammar Underscores_in_headers on | Off
Default value Off
Context HTTP Server
Description Allow or disallow underscores in headers
Ignore_invalid_headers
Grammar Ignore_invalid_headers on | Off
Default value On
Context HTTP Server
Description The header that controls whether there is an invalid name should be ignored. A valid name is a numeric letter hyphen-It may have an underscore, preceded by a space. If the instruction is specified at the sever level, its value is used only if the server is the default. The specified value is applied to all virtual hosts listening for the same address and port.



Request Body Related Configuration

Client_body_buffer_size
Grammar Client_body_buffer_size size
Default value 8k|16k
Context HTTP Server Locatioin
Description Specify the client request body buffer size If the request body size exceeds the buffer size, then the entire requested experience is written to the temporary file. The default size is twice times the page size. Depending on the platform, it could be 8K or 16K. When the Content-length request header specifies a value that is smaller than the buffer size, then Nginx uses the smaller one. As a result, Nginx will not assign a size buffer to each request.
Client_body_in_single_buffer
Grammar Client_body_in_single_buffer on | Off
Default value Off
Context HTTP Server Location
Description The directive is designed to keep this entire body in a client request buffer. This instruction is recommended when using variable $request_body to reduce the copy operation. Note that when the request body cannot be saved in a buffer (see CLIENT_BODY_BUFFER_SIZE), this will be saved to disk.
Client_body_in_file_only
Grammar Client_body_in_file_only on | Clean | Off
Default value Off
Context HTTP Server Location
Description This directive forces Nginx to always store the requested body in a temporary disk file, even if the request body size is 0 note that if the instruction is on, the file will not be removed after the request has been completed for debug, and for embedded Perl modules $r->request_body_ File method


Data Structure

All the results of the decoding are saved in the request structure.

ngx_http_request_t {
ngx_but_t  *header_in;  buf, save request
ngx_http_headers_in_t headers_in;   Linked list, save requests in the request header
ngx_http_headers_out_t headers_out;  Linked list, save the response header in response ...
}

Save the structure of the request header

NGX_HTTP_HEADERS_IN_T structure

typedef struct {
    ngx_list_t                        headers;
    ngx_table_elt_t                  *host;
    ngx_table_elt_t                  *connection;    
    ngx_str_t                         user;
    ngx_str_t                         passwd;
    ngx_array_t                       cookies;

    ngx_str_t                         server;
    off_t                             content_length_n;
    time_t                            Keep_alive_n;

    unsigned                          connection_type:2;
    unsigned                          chunked:1;
    unsigned                          msie:1;
    unsigned                          msie6:1;
    unsigned                          opera:1;
    unsigned                          gecko:1;
    unsigned                          chrome:1;
    unsigned                          safari:1;
    unsigned                          konqueror:1;
} ngx_http_headers_in_t;


decoding Process



Ngx_http_process_request_line (ngx_event_t *rev)//Request line decoding total entry

Ngx_http_process_request_headers (ngx_event_t *rev)//Request Header decoding entry

Handler of all request headers in ngx_http_request.c:ngx_http_headers_in
For example:
The handler of the host head is Ngx_http_process_host
The Ngx_http_process_host function is to verify host validity, find virtual server, and locate the corresponding server configuration.

Request Initialization ngx_http_request_t *
Ngx_http_create_request (ngx_connection_t *c)

r->header_in = hc->nbusy? Hc->busy[0]: c->buffer;
R->http_state = ngx_http_reading_request_state;
Decode Request Line

Ngx_http_process_request_line (ngx_event_t *rev)
{
ngx_http_read_request_header (r);  Reads the content from the connection, puts it in the header_in buf, returns the number of bytes read, or the error code
ngx_http_parse_request_line (R, r->header_in);
The state machine resolves the request line,  separating the method schema host port URI  protocol version.
  



The Analytic state machine diagram is as follows:
  


Cr= ' \ r '
Lf= ' \ n '

The corresponding regular of the request line

Request Line The corresponding regular
Method ([a-z_]+)
Schema ([a-za-z]+)
Host (\[[a-za-z0-9:._~!$&\\ (\) *+,;=-]*\]| [a-za-z0-9.-]*)
Port [0-9]*]?
Uri .%/? #+ cannot be ' a '/([^crlf.%/? #+\0]+/) * ([^crlf.%/? #+\0]+[%?#]|.%/? #) [^ crlfh\0]*]? (? =cr| Lf| +H)
Protocol version Http/[1-9]+\. [0-9]+ = (*) CR? LF)


Ngx_http_process_request_uri//Parse URI function, some processing from special characters

Defined:
Complex_uri:uri with "/.#"
Quoted_uri:uri with "%"
Plus_in_uri:uri with "+"
Space_in_uri:uri with ""

Uri_ext: The last section of a URI, not adjacent to/.


Logic

if (r->uri_ext) {
        if (r->args_start) {
            R->exten.len = r->args_start-1-r->uri_ext;
        } else { C3/>r->exten.len = r->uri_end-r->uri_ext;
        }
        R->exten.data = r->uri_ext;
}

If it contains/.#%, then the URI is reassigned and the URI is parsed

if (R->complex_uri | | r->quoted_uri) {
r->uri.data = Ngx_pnalloc (R->pool, R->uri.len + 1);  
Ngx_http_parse_complex_uri (R, cscf->merge_slashes)
}

State machine diagram of URI parsing

Request line URI special character processing



Nginx does not support the 3rd part, does not support @

This refers to 6 7 8 parts.

% % must be followed by two hexadecimal digits, otherwise the error ngx_http_parse_invalid_request
/ Overlapping diagonal lines that can be merged, have related configuration control (Merge_slashes On/off control)
/.. / Will go to the previous directory. For example/foo/bar/. /ABC will become/foo/abc/. /front must have a level, or will be an error, will not be back to the first level/front
. The point in the URI, at the last/back, and not immediately after/behind, will be the starting position of the R->uri_ext, the end position in front of args, or the end of the URI
# Represents the end of a URI, #后面的全部忽略. If there is no #, then the end of the URI will be (* (CR)? LF) | H) End
The question mark is considered to be a parameter between the # number, and if there is no #, then the question mark is the parameter
+ If you encounter + R->plus_in_uri=1
\ r must be followed \ n to indicate the end of the request line
\ n Indicates the end of the request line


HTTP0.9 's supportSupports HTTP 0.9

If the protocol version is less than 1, the request header is not read

Supported MethodsMethod
Get

Put

POST
COPY
Move
LOCK
Head
Mkcol

PATCH TRACE ngx_http_not_allowed Error code 405
DELETE
UNLOCK
OPTIONS
PROPFIND

Proppatch


Note: The support here is said to be able to recognize these methods when decoding nginx. However, it is not necessarily supported in subsequent processes. For example, Nginx encounters the trace method and returns 405 not allowed
Unknown methods: If the method character set conforms to [a-b_]+, the request is put over and handed over to the subsequent processing
If you do not meet [a-b_]+, you will report wrong request


decoding request Header Ngx_http_process_request_headers (ngx_event_t *rev)

Ngx_http_read_request_header//Read content from Network connections
Ngx_http_parse_header_line (R, R->header_in, Cscf->underscores_in_headers);

The corresponding regular of the request header
Name [0-9a-za-z-]? Maximum length 32, illegal characters are ignored, more than 32 are covered from scratch
_ is illegal, see Allow_underscores configuration

Value [^ crlf\0]+ no length limit, ending with CRLF


State machine diagram for request header resolution




Illegal head handling Header name contains illegal characters, it is considered an illegal header, Nginx discards this line by default, and can also be determined by the configuration ignore_invalid_headers
same head processing strategy One, second above same head ignore Ngx_http_process_header_line
Policy two, if repeated, returns 400 error ngx_http_process_unique_header_line
Strategy III, which allows bulls to exist, using an array to save ngx_http_process_multi_header_lines eg. X-forwarded-for Cookies

Strategy four, using the top of the head

Request Head Strategy Head Type and value
Connection Strategy Four General if the HTTP version is greater than 1, the default is Keep-alive, otherwise close. Can only have close or keep-alive two cases, otherwise the error
Host Strategy One Request headerhttp more than 1.0 version, host can not be empty, otherwise the error
User-agent Strategy One Request Headernginx will identify whether the following six species (Msie msie6) Opera Gecko Chrome Safari Konqueror
Referer Strategy One Request Header
Content-type Strategy One Entity header indicates the media type of the entity body to which the recipient is sent, or the type of media that the head method indicates if the request is to be sent
Range Strategy One Request Header
Transfer-encoding Strategy One General
Upgrade Strategy One General
Accept-encoding Strategy One Request Header
X-real-ip Strategy One
Accept Strategy One Request Header
Accept-language Strategy One Request Header
Depth Strategy One
Destination Strategy One
Overwrite Strategy One
Date Strategy One General
Via General
Keep-alive Strategy One
If-modified-since Strategy II Request Header
If-unmodified-since Strategy II Request Header
If-match Strategy II Request Header
If-none-match Strategy II Request Header
If-range Strategy II Request Header
Expect Strategy II Request Header
Content-length Strategy II The Entity header must be a number and must be a positive ngx_http_process_request_header
Authorization Strategy II Request Header
X-forwarded-for Strategy Three
Cookies Strategy Three
Extra head. The head will exist in the list, how to deal with your definition



Conflict or association header processing transfer_encoding content-length
in Ngx_http_process_request_header, When transfer-encoding value is chunked content-length will fail. The transfer-encoding value can only be identity or chunked, otherwise the error


Connection keep-alive
If Connection is keep-alive, Then the keep-alive value takes effect
if the request header connection is not indicated as close and the HTTP version is greater than 1, then connection defaults to Keep-alive.
does not support multiline request headers because of the rules in the http/1.1 protocol, the HTTP protocol actually supports multiple-line request headers, which specify that any line that starts with a space is followed by the previous line. For example:
X-random-comment: This is a long sentence,
so we have to wrap it up and look at it more neatly.

Nginx does not support multiline request headers

Cookie processing
Nginx does not continue to parse cookies value
Https://github.com/cloudflare/ Lua-resty-cookie


decode request body

The Nginx core itself does not actively read the request body, which is given to the module at the request processing stage, but the Nginx core provides a ngx_http_read_client_request_body () interface to read the request body. In addition, an interface-ngx_http_discard_request_body () for discarding the request body is provided, in which the modules of any stage, if they are interested in the request body or if they wish to lose the requested body sent by the client, can be called to complete each of the two interfaces. These two interfaces are standard interfaces for the processing request body provided by the Nginx core, and if you want some of the request-body-related directives (such as client_body_in_file_only,client_body_buffer_size, etc.) in the configuration file to work, As well as some of the request-related variables (such as $request_body and $request_body_file) that are built into the normal use of nginx, all modules must call these interfaces to complete the operation, and if a custom interface is required to handle the request body, You should also try to be compatible with nginx default behavior.


The request body reads generally occurs in the Nginx content handler, some Nginx built-in module, for instance proxy module, fastcgi module, Uwsgi module and so on, The behavior of these modules must be forwarded to the backend service process by the requesting body (if any) from the client, all of which call the Ngx_http_read_client_request_body () interface to complete the request body read. It is noteworthy that these modules will be the client's request body after the full read to start forwarding data back.

Ngx_http_discard_request_body
using Interfaces

Available Variables

The following are just examples, not all

$http a variable of type _xxxx, XXXX becomes an underscore for header name connector, uppercase letter is lowercase

The X-REAL-IP variable is $http _x_real_ip


Similar to having

$args _xxxx

$cookie _xxxx


$uri
$request _body
This variable contains the request body. This variable appears in the location of the proxy or Fastcgi_pass.
$request _body_file
Client request body Temporary file name




Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.