Anatomy of Apache operating mechanism

Last Update:2017-07-07 Source: Internet

Author: User

Tags sapi

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Apache Operating mechanism Analysis:

1. b/S Interactive process

Interactive procedures for browsers (BROWSER) and servers (WEB server):

1. The browser makes an HTTP request to the server (request).

2, the server receives the browser request data, after analysis processing, to the browser output response data (Response).

3, the browser receives the response data of the server, after analysis processing, the final result is displayed in the browser.

is a snapshot of the browser request data and server response data:

The process of interacting with the browser and server data is simple and easy to understand. I would like to engage in web development personnel are very clear, not to repeat this, for reference only.

2. Apache Overview

Apache is currently the most widely used web Server in the world, and it is known for its cross-platform, efficiency and stability. According to last year's official statistics, the Apache server's installed capacity accounted for more than 60% of the market share. Especially on the X (unix/linux) platform, Apache is the most common choice. Other Web server products, such as IIS, can only be run on Windows platforms and are based on Microsoft. NET architecture technology is the choice.

Apache is not without shortcomings, it is the most criticized is becoming more and more heavy, is widely regarded as the heavyweight WebServer. So, in recent years, there have been a lot of lightweight alternative products, such as Lighttpd,nginx, and so on, the advantages of these webserver is the operation of high efficiency, but the shortcomings are also obvious, maturity is often lower than Apache, usually only for some specific occasions.

3. Apache Component Logic diagram

Apache is based on the modular design, the overall appearance of the code is more readable than PHP code, its core code is not many, most of the functions are scattered into the various modules, the modules are loaded on demand when the system starts. If you want to read the Apache source code, it is recommended that you read directly from the Main.c file, the system's most important processing logic is contained inside. MPM (multi-processing Modules, multiprocessing module) is one of the core components of Apache and Apache uses the resources of the operating system to manage processes and thread pools. Apache in order to get the best performance, for different platforms (Unix/linux, Window) has been optimized for different platforms to provide different MPM, users can choose according to the actual situation, the most commonly used MPM have prefork and worker two kinds. As to how your server is running, the default compilation parameters on the X system are prefork, depending on the MPM compilation parameters specified during the installation of Apache. Because most UNIX does not support true threads, the use of pre-derived subprocess (Prefork) methods, such as Windows or Solaris-supported threading platforms, is a good choice for multi-process multithreaded mixed worker patterns. Interested students can read the relevant information, there is no more to say here. An important component of Apache is the APR (Apache Portable Runtime Library), the Apache Portable runtime, which is an abstract library of operating system calls to implement the operating system used by Apache internal components. Improve the portability of the system. Apache's parsing of PHP is done through a number of module PHP module.

The logical composition of Apache and its relationship to the operating system

4. The life cycle of Apache

The contents of this section will be related to the loading of the PHP module, and you can take a little bit of attention. The following is the Apache life cycle (prefork mode).

5. Two modes of operation of Apache

Two modes of operation for Apache services: Prefork and worker

Working principle and configuration of prefork 1) Working principle:A separate control process (the parent process) is responsible for generating child processes that are used to listen for requests and respond. Apache always tries to keep some spare (spare) or idle child processes in the process of meeting incoming requests. This eliminates the need for the client to wait for the child process to be generated before the service is received. In Unix systems, the parent process typically runs as root for 80 ports, and Apache-generated child processes are typically run with a low-privileged user. User and group directives are used to configure low-privileged users of child processes. The user who runs the child process must have read access to the content he serves, but must have as few permissions as possible for resources other than the content of the service.

　　2) Configuration instructions:

If you do not explicitly specify a certain mpm,prefork with "--with-mpm", it is the default MPM on the UNIX platform. It uses a pre-derived subprocess pattern that is also used in Apache 1.3. Prefork itself is not used in the thread, version 2.0 uses it to maintain compatibility with version 1.3, on the other hand, the process is independent of each other by using separate sub-processes to handle different requests, which makes it one of the most stable mpm.

If you use Prefork, after make compile and do install, use "httpd-l" to determine which MPM is currently in use, you should see PREFORK.C (if you see the WORKER.C instructions using the worker MPM, and so on). Then look at the default generated httpd.conf configuration file, which contains the following configuration section:

<ifmodule prefork.c>startservers 5MinSpareServers 5MaxSpareServers 10MaxClients 150MaxRequestsPerChild 0</ Ifmodule>

　　The prefork is that the control process, after initially establishing a "startservers" subprocess, creates a process to meet the needs of the Minspareservers Setup, waits a second, continues to create two, and waits a second, Continue to create four ... This increases the number of processes created by the number of digits, up to 32 per second, until the value of the Minspareservers setting is met. This is the origin of pre-derivation (prefork). This pattern eliminates the need to generate new processes when the request arrives, reducing overhead to increase performance.

When the number of concurrent requests reaches maxclients (such as 256), only 10 of the idle processes are available. Apache continues to increase the creation process. Until the number of processes reaches 256.

When the concurrency spikes are over, and the number of concurrent requests is probably only one, Apache gradually deletes the process until the number of processes reaches maxspareservers .

startservers: Specifies the number of child processes that are established when the server starts, and Prefork defaults to 5.

minspareservers: Specifies the minimum number of idle child processes, which defaults to 5. If the current number of idle child processes is less than minspareservers, then Apache will produce a new subprocess at a maximum speed of one second. Do not set this parameter too large.

　　maxspareservers: Sets the maximum number of idle processes, which defaults to 10. If the number of idle processes is greater than this value, the Apache parent process will automatically kill some extra child processes. This value should not be set too large, but if you set a value smaller than minspareservers, Apache will automatically adjust it to minspareservers+1. If the site load is large, consider increasing minspareservers and maxspareservers at the same time.

Maxrequestsperchild: Sets the number of requests that can be processed by each child process. Each child process is automatically destroyed after it has processed "maxrequestsperchild" requests. 0 means infinity, that is, the child process never destroys. Although the default setting of 0 enables each subprocess to process more requests, there are two important benefits if set to a value other than 0:

To prevent accidental memory leaks;

The number of child processes is automatically reduced when the server load drops.

Therefore, this value can be adjusted according to the load of the server. Personally think about 10000 more appropriate.

MaxClients: One of the most important of these directives is that Apache can handle requests at the same time and is the most influential parameter to Apache performance.

Its default value of 150 is far from enough, if the total number of requests has reached this value (can be confirmed by Ps-ef|grep Http|wc-l), then the subsequent request will be queued until a processed request is complete. This is the main reason why the system resources are still a lot left and HTTP access is slow. System administrators can dynamically adjust this value based on hardware configuration and load conditions.

Although theoretically the larger the value, the more requests can be processed, but the Apache1.3 default maximum can be set to 256 (this is a hard limit). If you set this value to greater than 256, then Apache will not be able to start. In fact, 256 is not enough for sites with slightly heavier loads. If you want to increase this value, you must find the "#define HARD_SERVER_LIMIT 256" line by looking for 256 in the src/include/httpd.h in the source tree under manual modification before "configure". Change 256 to the value you want to increase (such as 4000), and then recompile Apache.

However, in Apache 2.0, the new serverlimit directive is added to the limit of 256 of the maximum number of requests. Makes it possible to increase maxclients without recompiling Apache. Here is the Prefork configuration segment:

<ifmodule prefork.c>serverlimit 2000StartServers 10MinSpareServers 10MaxSpareServers 15MaxClients 1000MaxRequestsPerChild 10000</ifmodule>

Serverlimit: In the above configuration, the maximum value for Serverlimit is 2000, which is sufficient for most sites. If you must increase this value, the following two lines in the server/mpm/prefork/prefork.c under the source tree can be modified accordingly:

#define DEFAULT_SERVER_LIMIT 256#define Max_server_limit 2000

You must maxclients≤serverlimit≤2000 at this point . That is, the default concurrency of Prefork is 2000 maximum.

Serverlimit Effective Premise: must be placed in front of other directives, and in order to change this hard limit must stop the server completely and then start the server (direct restart is not possible).

Working principle and configuration of worker

How it works: The number of threads each process can have is fixed. The server increases or decreases the number of processes depending on the load. A separate control process (parent process) is responsible for the creation of child processes. Each subprocess can establish a threadsperchild number of service threads and a listener thread that listens to the access request and passes it to the service thread for processing and answering. Apache always tries to maintain a standby (spare) or free service thread pool. This way, the client does not have to wait for new threads or new processes to be established to be processed. In Unix, in order to be able to bind port 80, the parent process is typically started as root, and then Apache creates child processes and threads with lower-privileged users. The user and group directives are used to configure permissions for the Apache child process. Although the child process must have read access to the content it provides, it should give him less privileges as much as possible. Also, unless suEXEC is used, the permissions configured by these directives will be inherited by the CGI script.

Compared to Prefork,worker is the new MPM for multi-threaded and multi-process hybrid models in version 2.0. Because threads are used for processing, relatively large amounts of requests can be handled, and system resources are less expensive than process-based servers. However, workers also use multiple processes, and each process generates multiple threads to obtain stability based on the process server. This mpm's way of working will be the development trend of Apache 2.0.

After Configure-with-mpm=worker, make compile, do install. The following configuration segments are available in the default generated httpd.conf:

<ifmodule worker.c>startservers 2MaxClients 150MinSpareThreads 25MaxSpareThreads 75ThreadsPerChild 25MaxRequestsPerChild 0</ifmodule>

　　worker works by generating "startservers" sub-processes by the master control process, each of which contains a fixed number of threadsperchild threads, each of which processes requests independently. Similarly, minsparethreads and maxsparethreads set the minimum and maximum number of idle threads, and maxclients sets the total number of threads in all child processes in order not to generate the thread again when the request arrives. If the total number of threads in an existing child process does not meet the load, the control process will derive the new child process.

startservers: The number of child processes established at server startup, the default value is "3".

serverlimit: The maximum number of processes allowed to be configured by the server. This instruction is used in conjunction with Threadlimit to configure the value of the MaxClients maximum allowable configuration. Any changes to this instruction during the reboot will be ignored, but the changes to the maxclients will take effect.

minsparethreads: The minimum number of idle threads, the default value is "75". This MPM will monitor the number of idle threads based on the entire server. If the total number of idle threads in the server is too small, the child process will generate a new idle thread.

maxsparethreads: Configures the maximum number of idle threads. The default value is "250". This MPM will monitor the number of idle threads based on the entire server. If the total number of idle threads in the server is too many, the child process kills the extra idle threads. The value range of the maxsparethreads is limited. Apache automatically corrects your configured values as follows: The worker needs to be greater than or equal to Minsparethreads plus threadsperchild and

Minsparethreads and maxsparethreads These two parameters have little effect on Apache performance, which can be adjusted according to the actual situation.

threadlimit: The maximum number of threads that can be configured per child process. This instruction configures the maximum number of threads that can be configured for each child process threadsperchild. Any changes to this instruction during the reboot will be ignored, but the changes to the Threadsperchild will take effect. The default value is "64".

Threadsperchild: Is the most performance-related instruction in the worker mpm. The maximum default value for Threadsperchild is 64. If the load is large, 64 is not enough. At this point, to explicitly use the THREADLIMIT directive, its maximum default value is 20000. The above two values are located in the following two lines in the source tree server/mpm/worker/worker.c:

#define DEFAULT_THREAD_LIMIT 64#define Max_thread_limit 20000

These two lines correspond to the limits of Threadsperchild and Threadlimit. It is best to change the 64 to the desired value before configure. Note that these two values are not set too high to exceed the processing power of the system, which makes the system very unstable due to Apache's non-starter.

The total number of requests that can be processed concurrently in worker mode is determined by multiplying the total number of child processes by the Threadsperchild value, which should be greater than or equal to maxclients. If the load is large and the number of existing child processes is not met, the control process derives the new child process. The default maximum number of child processes is 16, and you need to explicitly declare serverlimit (the maximum value is 20000) when you increase it. The two values are located in the following two lines in the source tree server/mpm/worker/worker.c:

#define DEFAULT_SERVER_LIMIT 16#define Max_server_limit 20000

It is important to note that if Serverlimit is explicitly declared, then it must be multiplied by the value of threadsperchild to be greater than or equal to maxclients, and maxclients must be an integral multiple of threadsperchild, Otherwise, Apache will automatically adjust to a corresponding value (possibly a non-expectation). Here is the worker configuration segment:

<ifmodule worker.c>startservers 3MaxClients 2000ServerLimit 25MinSpareThreads 50MaxSpareThreads 200ThreadLimit 200ThreadsPerChild 100MaxRequestsPerChild 0</ifmodule>

From the above narrative, you can learn about the operation of the two important MPM Prefork and worker in Apache 2.0, and configure the Apache-related core parameters according to the actual situation for maximum performance and stability.

MaxClients: Maximum number of access requests (maximum number of threads) that allow simultaneous servo. Any requests exceeding the maxclients limit will enter the waiting queue. The default value is "Serverlimit", multiplied by the result of (Threadsperchild). Therefore, to increase maxclients, you must increase the value of serverlimit at the same time.

Threadsperchild: The number of resident execution threads established by each child process. The default value is 25. When a child process establishes these threads at startup, no new threads are established.

Maxrequestsperchild: Configures the maximum number of requests per child process to allow the servo during its lifetime. When the Maxrequestsperchild limit is reached, the child process will end. If Maxrequestsperchild is "0", the child process will never end.
There are two benefits to configuring Maxrequestsperchild to a non-0 value:
1. Ability to prevent (accidental) memory leaks from being carried out indefinitely, thus exhausting memory.
2. Give the process a limited lifespan, thus helping to reduce the number of active processes when the server load is reduced.
Attention
For a keepalive link, only the first request is counted. In fact, he changed the behavior of limiting the maximum number of links per child process.

6.Apache of Operation

Apache runs into the start and run phases. 5.1. Start-up phase

During the start-up phase, Apache primarily does profile parsing (such as http.conf and configuration files set up with include directives), module loading (such as mod_php.so,mod_perl.so, etc.), and system resource initialization (such as log files, Shared memory segments, etc.) work.

At this stage, Apache will start with a privileged user root (x system) or Super Administrator administrator (Windows System) in order to gain maximum access to system resources.

The process of assembling Apache and the "PHP processor" is done at this stage.

The "PHP processor" is the system module responsible for interpreting and executing your PHP code. The name was deliberately created to help you understand the content of this section, and the later chapters will give you a more professional name.

Have you ever made a separate installation configuration for PHP?

If you have done a similar job, the following content is easy to understand, and if you haven't done it, try installing it to help deepen your understanding. However, my article has always been simple, I will try to make this process more obvious. In fact, the installation of PHP is very simple, if you are interested, you can go to the Internet casually search an installation guide, follow the steps to do it.

The eventual integration of PHP into the Apache system also requires some necessary settings for Apache. Here, we take PHP mod_php5 SAPI run mode as an example to explain, as for the concept of SAPI later we will explain in detail.

Assuming that the version we installed is Apache2 and PHP5, then you need to edit the Apache master configuration file http.conf, which includes the following lines:

Under Unix/linux Environment:

LoadModule Php5_module modules/mod_php5.so

AddType application/x-httpd-php. php

Note: Where modules/mod_php5.so is the installation location of the mod_php5.so file under the X system environment.

In the Windows environment:

LoadModule Php5_module D:/php/php5apache2.dll

AddType application/x-httpd-php. php

Note: Where D:/php/php5apache2.dll is the installation location of the Php5apache2.dll file in a Windows environment.

These two configurations are to tell Apache Server, in the future to receive the URL user request, usually in PHP as a suffix, you need to call the Php5_module module (mod_php5.so/php5apache2.dll) to handle.

This process can be referred to the following:

Apache Start-up phase of the source code is included in the SERVER/MAIN.C, I sorted out the source of the corresponding relationship:

A classmate who is unfamiliar with unix/linux may ask what kind of file the so file is (mod_php5.so)?

Unix/linux, so suffix file is a DSO file, the DSO and the Windows system DLL is equivalent concept, is to wrap a bunch of functions in a binary file. The process that invokes them loads them into memory and maps them to their address space.

The DSO is all called the dynamic shared object, which is the dynamically sharing objects. DLL is all called the dynamic link library, which is dynamically linked.

The most important feature of the Apache server architecture is its highly modular design. If you are in pursuit of processing efficiency, you can put these DSO modules at the time of the Apache compile static link, this will improve the processing performance around Apache 5%.

5.2, Operation phase 5.2.1 Operation phase Overview

During the run phase, Apache is primarily working to process User Service requests.

At this stage, Apache abandons the privileged user level and uses normal permissions, which are primarily security-based considerations to prevent security breaches due to code flaws. IIS like Microsoft suffered an overflow attack of malicious code such as "Code Red" and "Nimda (NIMDA)".

2.2 Run Phase Process

Apache divides the request processing loop into 11 phases: Post-read-request,uri translation,header parsing,access control,authentication, Authorization,mime Type Checking,fixup,response,logging,cleanup.

Apache Hook mechanism

Apache hook mechanism refers to: Apache allows the module (including internal modules and external modules, such as mod_php5.so,mod_perl.so, etc.) to inject the custom function into the request processing loop. In other words, the module can join the Apache request processing process by hooking up its own processing function at any one of Apache's processing stages.

Mod_php5.so/php5apache2.dll is the inclusion of the custom functions, through the hook mechanism into Apache, at all stages of the Apache processing process is responsible for processing PHP requests.

About hook mechanisms are also often encountered in Windows system development, where Windows develops both system-level hooks and application-level hooks. Common translation software (such as PowerWord, etc.) of the screen-word function, most of which is done by installing the system-level hook function, the custom function is replaced Gdi32.dll in the screen output of the drawing function.

Apache Request Processing Loop detailed
What has been done in the 11 phases of the Apache request processing cycle?

1. Post-read-request Stage

In the normal request processing process, this is the first stage in which a module can insert a hook. This phase can be exploited for modules that want to enter processing requests very early.

2. URI translation phase
Apache's primary work at this stage is to map the requested URL to the local file system. The module can insert hooks at this stage to perform its own mapping logic. Mod_alias is using this phase to work.

3. Header parsing Stage
The main work of Apache at this stage: Check the header of the request. This hook is rarely used because the module can perform the task of inspecting the request header at any point in the request processing process. Mod_setenvif is using this phase to work.

4. Access Control phase
The main work of Apache at this stage is to check whether the requested resource is allowed to be accessed based on the configuration file. The standard logic of Apache implements the Allow and deny directives. Mod_authz_host is using this phase to work.

5. Authentication Stage
Apache's main work at this stage is to authenticate the user according to the policy set by the profile and set the user name area. The module can insert hooks at this stage to implement an authentication method.

6. Authorization Stage
The main work of Apache at this stage is to check whether authenticated users are allowed to perform the requested action based on the profile. The module can insert hooks at this stage to implement a user rights management approach.

7. MIME Type Checking Stage
The main work of Apache at this stage is to determine which content handler will be used, based on the rules of the MIME type of the requested resource. The standard modules mod_negotiation and mod_mime implement this hook.

8. Fixup stage
This is a generic phase that allows the module to run any necessary process before the content generator. Similar to Post_read_request, this is a hook that captures any information and is the most commonly used hook.

9. Response Stage
The main work of Apache at this stage is to generate the content returned to the client, which is responsible for sending an appropriate reply to the client. This phase is a core part of the entire processing process.

10. Logging Stage
Apache's primary work at this stage is to log transactions after a reply has been sent to the client. The module may modify or replace the standard log records of Apache.

11. Cleanup Stage
Apache's main work at this stage is to clean up the environment left behind by the request transaction, such as file, directory processing, or socket closure, which is the last phase of Apache request processing.

The process of injecting the module into Apache can refer to the server/core.c file in the source code:

Mod_php5.so/php5apache2.dll injected into the Apache function, the most important is the response stage of the processing function.

6. Apache Performance Tuning

If Apache's configuration

<ifmodule prefork.c>
Startservers 8
Minspareservers 5
Maxspareservers 20
MaxClients 256
Maxrequestsperchild 4000
</IfModule>

In the case where Serverllimit is not enabled, use the AB test:

Ab-n 10000-c http://192.168.1.191/test.php

If the same concurrency continues to be tested (test continues immediately after testing), since most of Apache's child processes have not been killed, there is less time to create the subprocess, that is, some of the child processes have been pre-derived. That is n=20000 (ab-n 20000-c http://192.168.1.191/test.php), time should not be 137 * = 274S.

If a period of time in the test, that is, the Apache kill process, until the number of processes equal to Maxspareservers (20) Re-test, the time is still the same (around 137s).

If n=40000 (ab-n 40000-c http://192.168.1.191/test.php) observes the number of processes on the server Apache: Ps-ef | grep httpd | Wc-l. When the number of Apache processes reaches 256, no longer increases.

If you configure Serverlimit

Serverlimit 1000

Startservers 8
Minspareservers 5
Maxspareservers 20
MaxClients 1000
Maxrequestsperchild 4000
</IfModule>

The Apache process can go beyond the 256 limit.

Tuning settings:

1) If the server only has a lot of concurrency for a certain period of time, set Serverlimit, and modify MaxClients.

2) If the server is persistently high load, consider increasing minspareservers and maxspareservers at the same time.

Anatomy of Apache operating mechanism

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More