Nginx Request receiving process (i)

Source: Internet
Author: User
Tags epoll

This year our group plans to write a Nginx module development and the principle of interpretation of the book, the whole book is in the form of open books on the internet will be regularly updated, the site is http://tengine.taobao.org/book/index.html. This book analyzes the Nginx source version is 1.2.0, the environment is Linux, the event processing model is Epoll, most of the analysis process is based on the above assumptions. I will be responsible for the preparation of some of these chapters, so I am going to write a series of articles that I am responsible for chapter content (mainly including Nginx phase module development, Nginx request processing process, etc.). This article mainly introduces the request receiving process in Nginx, including the parsing of the request header and the reading flow of the request body.

First introduce the basic format of HTTP request as defined in rfc2616:

[CPP]View Plaincopy
    1. <span style="FONT-SIZE:18PX;" >request = Request-line
    2. * ((General-header
    3. | Request-header
    4. | Entity-header) CRLF)
    5. CRLF
    6. [Message-body] </span>

The first line is the request line, which describes the request method, the resources to access, and the HTTP version used:

[CPP]View Plaincopy
    1. <span style="FONT-SIZE:18PX;" >request-line = Method sp Request-uri sp http-version crlf</span>

The request method is defined as follows, the most common of which is the Get,post method:

[CPP]View Plaincopy
    1. <span style="FONT-SIZE:18PX;" >method = "OPTIONS"
    2. | "GET"
    3. | "HEAD"
    4. | "POST"
    5. | "PUT"
    6. | "DELETE"
    7. | "TRACE"
    8. | "CONNECT"
    9. | Extension-method
    10. Extension-method = token</span>

The resource to be accessed is determined by the Uniform Resource position URI (Uniform Resource Identifier), and one of its more general constituent formats (rfc2396) is as follows:

[CPP]View Plaincopy
    1. <span style="FONT-SIZE:18PX;" ><scheme>://<authority><path>?<query> </span>

In general, depending on the method of request, the format of the request URI will vary, usually by simply writing out the path and the query part.

HTTP versions (version) are defined as follows, and are now generally available in versions 1.0 and 1.1:

[CPP]View Plaincopy
    1. Http/<major>.<minor>

The next line of the request line is the request header, which defines 3 different types of request headers, General-header,request-header and Entity-header, each of which defines common headers in each type of RFC. Where the Entity-header type can contain a custom header.


Now begin to introduce the resolution of the request header in Nginx, Nginx request processing process, will involve 2 very important data structures, ngx_connection_t and ngx_http_request_t, respectively, to represent the connection and request, These 2 data structures in this book in the previous chapter has done a more detailed introduction, no impression of readers can go back to review, the entire request processing flow from beginning to end, corresponding to the allocation of 2 data structures, initialization, use, reuse and destruction.


Nginx in the initialization phase, specifically in the INIT process phase of the Ngx_event_process_init function for each listener socket assigned a connection structure (ngx_connection_t), The event handler of the Read event member (read) of the connection structure is set to ngx_event_accept, and if the accept mutex is not used, the Read event is attached to the Nginx event-handling model (poll or epoll, etc.) in this function. Instead, it waits for the INIT process stage to end, and in the event processing loop of the worker process, a process grabs the accept lock to mount the Read event.

[CPP]View Plaincopy
  1. <span style="FONT-SIZE:18PX;" >static ngx_int_t
  2. Ngx_event_process_init (ngx_cycle_t *cycle)
  3. {
  4. ...
  5. / * Initialize the red-black tree used to manage all timers * /
  6. if (ngx_event_timer_init (cycle->log) = = Ngx_error) {
  7. return ngx_error;
  8. }
  9. / * Initialize the event model * /
  10. For (m = 0; ngx_modules[m]; m++) {
  11. if (ngx_modules[m]->type! = ngx_event_module) {
  12. continue;
  13. }
  14. if (ngx_modules[m]->ctx_index! = ecf->use) {
  15. continue;
  16. }
  17. module = ngx_modules[m]->ctx;
  18. if (Module->actions.init (cycle, ngx_timer_resolution)! = NGX_OK) {
  19. / * Fatal * /
  20. Exit (2);
  21. }
  22. Break ;
  23. }
  24. ...
  25. /* For each listening socket * /
  26. / * Assign a connection structure to each listener socket * /
  27. ls = cycle->listening.elts;
  28. For (i = 0; i < cycle->listening.nelts; i++) {
  29. c = ngx_get_connection (LS[I].FD, Cycle->log);
  30. if (c = = NULL) {
  31. return ngx_error;
  32. }
  33. C->log = &ls[i].log;
  34. c->listening = &ls[i];
  35. Ls[i].connection = C;
  36. Rev = c->read;
  37. Rev->log = c->log;
  38. / * Identifies this read event as a new request connection event * /
  39. rev->accept = 1;
  40. ...
  41. #if (NGX_WIN32)
  42. / * not analyzed in Windows environment, but similar in principle * /
  43. #else
  44. / * Set the handler function of the Read event structure to ngx_event_accept * /
  45. Rev->handler = ngx_event_accept;
  46. / * If you use the Accept lock, you need to grab the lock in the back to mount the listener handle on the event processing model * /
  47. if (Ngx_use_accept_mutex) {
  48. continue;
  49. }
  50. / * Otherwise, mount the listener handle directly on the event processing model * /
  51. if (Ngx_event_flags & ngx_use_rtsig_event) {
  52. if (Ngx_add_conn (c) = = Ngx_error) {
  53. return ngx_error;
  54. }
  55. } Else {
  56. if (ngx_add_event (rev, ngx_read_event, 0) = = ngx_error) {
  57. return ngx_error;
  58. }
  59. }
  60. #endif
  61. }
  62. return NGX_OK;
  63. }</span>


When a worker process mounts the event-handling model at some point, Nginx can formally receive and process the client's request. If a user enters a domain name in the address bar of the browser, and the domain name resolution server resolves the domain name to a server that is being monitored by nginx, the Nginx event processing model receives this read event and then speeds up to the previously registered event handler function Ngx_event_ Accept to handle.


In the Ngx_event_accept function, Nginx calls the Accept function, obtains a connection from the connected queue and the corresponding socket, assigns a connection structure (ngx_connection_t), and saves the newly obtained socket in the connection structure. Here are some basic connection initialization tasks:
First, the connection is allocated a memory pool, the initial size defaults to 256 bytes, can be set by the connection_pool_size instruction;
Allocate the log structure and save it in it so that subsequent log systems are used;
Initialize the corresponding IO transceiver function, the specific IO transceiver function and the use of the event model and operating system-related;
Assign a set interface address (SOCKADDR) and copy the address of the received peer to the Sockaddr field;
Save the local socket interface address in the Local_sockaddr field, because this value is available from the listener structure ngx_listening_t, and the listener structure is only the listener address set in the configuration file, but the configured listener address may be a wildcard *, that is, listening at all addresses, So the value saved in the connection may eventually change and will be identified as the true receiving address;
Sets the write event for the connection to ready, that is, to set the default connection for 1,nginx to be writable for the first time;
If the Tcp_defer_accept property is set on the listener socket, it means that there are already packets on the connection, so set the Read event to be ready;
Format the peer address saved in the Sockaddr field as a readable string and save it in the Addr_text field;
Finally, the Ngx_http_init_connection function is called to initialize the other parts of the connection structure.


The most important work of the Ngx_http_init_connection function is to initialize the processing function of the read-write event: Set the handler for the write event of the connection structure to Ngx_http_empty_handler, which will not do anything. In fact, nginx default connection for the first time can be written, will not mount write events, if there is data need to send, Nginx will directly write to this connection, only in the case of a write not finished, will mount the write event to the event model, and set the real write event handler, the following chapters will also do a detailed introduction The handler for the Read event is set to Ngx_http_init_request, and if there is already data on the connection (deferred accept is set), the Ngx_http_init_request function is called directly to process the request. Conversely, a timer is set up and a read event is mounted on the event-handling model, waiting for data to arrive or to time out. Of course, there are already data coming, or waiting for data to arrive, or waiting for time-outs, which will eventually go into the processing function-ngx_http_init_request of the Read event.


The main work of the Ngx_http_init_request function is to initialize the request, because it is an event handler function, it only has a ngx_event_t * type parameter, ngx_event_t structure in Nginx represents an event, The context of the event processing is similar to the context of an interrupt processing, in order to get the relevant information in this context, Nginx will generally save the connection structure reference in the data field of the event structure, the request structure reference is saved in the data field of the connection structure, In this way, the corresponding connection structure and request structure can be conveniently obtained in the event processing function. Inside the function to see, first determine whether the event is a timeout event, if so, directly shut down the connection and return; The request function first assigns a ngx_http_request_t structure to a connected memory pool, which is used to hold all the information for that request. Once the allocation is complete, the reference to this structure is wrapped in the request field of the connected HC member to facilitate the reuse of the requested structure in a long connection or pipelined request. In this function, Nginx finds a default virtual server configuration based on the receive port and address of the request (the Default_server attribute of the Listen directive is used to identify a default virtual server, otherwise it listens to multiple virtual servers on the same port and address. The first one is defined as the default), because in the Nginx configuration file you can set up multiple virtual servers listening on different ports and addresses (one for each server block), and also depending on the domain name (Server_ The name directive can configure the domain name of the virtual server to distinguish between virtual servers listening on the same port and address, and each virtual server can have a different configuration content that determines how Nginx handles the request after it receives a request. Once found, the corresponding configuration is saved in the ngx_http_request_t structure of the request. Note that the default configuration found here based on the port and address is only temporary use, the final nginx will find a real virtual server configuration based on the domain name, followed by the initialization work includes:

Set the handler for the concatenated read event to the Ngx_http_process_request_line function, which is used to parse the request line and set the requested Read_event_handler to the ngx_http_block_reading function. This function actually does nothing (of course, when the event model is set to trigger horizontally, the only thing to do is to remove the event from the Event Model listener list, to prevent the event from being triggered), and later on why the Read_event_handler is set to this function;
Allocate a buffer for this request to save its request header, the address is saved in the Header_in field, the default size is 1024 bytes, you can use the Client_header_buffer_size directive modification, here need to note, The buffer used to hold the request header is allocated in the memory pool where the request is located, and the address is saved in the connected buffer field, which is also intended to reuse the buffer for the next request of the connection, and if the client sends a request header greater than 1024 bytes. Nginx will reallocate larger buffers, the default for large requests for the head of the buffer maximum of 8K, up to 4, these 2 values can be set with the large_client_header_buffers instruction, followed by the request line and a request header can not exceed the size of a maximum buffer ;
The same nginx will allocate a memory pool for this request, and all subsequent memory allocations associated with the request will typically use the memory pool, with a default size of 4,096 bytes, which can be modified using the REQUEST_POOL_SIZE directive;
Assign a list of response headers for this request with an initial size of 20;
Creates a context for all modules CTX pointer array, variable data;
The main field of the request is set to itself, indicating that this is a master request, nginx corresponding to the concept of sub-request, the following chapters will do a detailed introduction;
Set the Count field of the request to the 1,count field to represent the requested reference count;
Keep the current time in the Start_sec and Start_msec fields, which is the starting point of the request and will be used to calculate the processing time of a request, which is slightly different from the Apache used by Nginx. The starting point of the request in Nginx is the start of the first packet received to the client, and Apache is the entire request line received by the client, starting from the calculation;
Other fields that initialize the request, such as setting Uri_changes to 11, indicate that the URI of the request can be overwritten 10 times, subrequests is set to 201, indicating that a request can initiate up to 200 child requests;
After all these initialization work, the Ngx_http_init_request function invokes the handler function of the read event to really parse the data sent by the client, which is going to be processed in the Ngx_http_process_request_line function.


The main function of the Ngx_http_process_request_line function is to parse the request line, also because it involves network IO operations, even a very short line of requests may not be read at once, so in the previous Ngx_http_init_request function , the Ngx_http_process_request_line function is set as the handler for the Read event, and it only has a unique ngx_event_t * type parameter, and at the beginning of the function, it is also necessary to determine whether it is a timeout event, if so, The request and connection are closed, otherwise the normal parsing process begins. Call the Ngx_http_read_request_header function to read the data first.


Since it is possible to enter the Ngx_http_process_request_line function multiple times, the Ngx_http_read_request_header function first checks to see if there is data in the buffer area that the requested header_in points to, and some of the words return directly Otherwise, the data is read from the connection and saved in the buffer that the requested header_in points to, and as long as there is space, the data is read as much as possible, and how much is returned; If the client does not send any data for the moment and returns Ngx_again, 2 things will be done before returning: 1, Set a timer, Shichangme think 60s, can be set by the instruction Client_header_timeout, if the timing event arrives without any readable event, Nginx will close this request; 2, call Ngx_handle_read_ The event function handles the Read event-if the connection has not yet mounted a read event on the event-handling model, it is mounted, and if the client closes the connection prematurely or reads the data with other errors, it returns a 400 error to the client (which, of course, does not guarantee that the client will receive the response data. Since the client may have closed the connection), the last function returns Ngx_error;


If the Ngx_http_read_request_header function reads the data properly, the Ngx_http_process_request_line function calls the Ngx_http_parse_request_line function to parse, This function implements a finite state machine based on the definition of the request line in the HTTP protocol specification, through which Nginx records the request method in the request line, the request URI, and the starting position of the HTTP protocol version in the buffer. Some other useful information is also logged during the parsing process so that it can be used later in the process. If no problem occurs during the parsing of the request line, the function returns NGX_OK, and if the request line does not meet the protocol specification, the function terminates the parsing process immediately and returns the corresponding error number; If the buffer data is insufficient, the function returns Ngx_again. Throughout the state machine that resolves HTTP requests, there are two important principles that are always followed: reduced memory copy and backtracking. A memory copy is a relatively expensive operation, and a large amount of memory copy can result in low runtime efficiency. Nginx in the need to make a memory copy of the place as far as possible to copy only the beginning and end of the memory and not the memory itself, so that only requires two assignment operation, greatly reducing the overhead, of course, the impact is that subsequent operations can not modify the memory itself, if modified, Will affect all references to that memory area, so it must be carefully managed, and a copy is required when necessary. Here, we have to mention the most embodiment of this idea in NGINX data structure, ngx_buf_t, which is used to represent the cache in Nginx, in many cases, only need to save a piece of memory of the start and end of the address in its POS and the last member, and then its memory flag 1 , you can represent a block of memory that cannot be modified, and in the case of another cache that can be modified, you must allocate a piece of memory of the desired size and save its starting address, and then set the ngx_bug_t's temprary flag to 1, indicating that this is a memory area that can be modified.


Back to the Ngx_http_process_request_line function, if the Ngx_http_parse_request_line function returns an error, it returns a 400 error directly to the client;
If you return ngx_again, you need to determine whether it is due to insufficient buffer space or insufficient read data. If the buffer size is not enough, Nginx will call the Ngx_http_alloc_large_header_buffer function to allocate another large buffer, if the large buffer is not enough to load the entire request line, Nginx will return 414 error to the client, Otherwise, after allocating a larger buffer and copying the previous data, continue to call the Ngx_http_read_request_header function to read the data to enter the request line automata processing until the end of the request line resolution;
If NGX_OK is returned, the request line is parsed correctly, then the start address and length of the request line are recorded, and the path and parameter portion of the request URI is saved in the URI field of the request structure, and the request method start position and length are saved in the Method_name field. The HTTP version start location and length are recorded in the Http_protocol field. The parameters are also parsed from the URI and the extension names of the requested resources are saved in the args and Exten fields, respectively. Next I'm going to parse the request header, which I'll go through in the next article.

Nginx Request receiving process (i)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.