I. Basic HTTP Server 1. Introduction
For a server application, you need to understand the following four aspects:
Protocols between the client, server, client, and server, and resources available on the server
The client sends a URI to the server over HTTP,
The server returns the response content to the client through the HTTP protocol.
2. HTTP Server Functions
Responds to local files based on client requests or uses local applications to generate response content and return it to the client
The basic internal process of a Web server is as follows:
(1) initialization: starts the server program and requests the running host to use port 80 as the server port.
(2) request-response loop:
1) wait and accept a client request
2) Check the request
3) map the requested URL to a specific file (If yes, it is returned to the client, and if no, an error is returned)
The above is the simplest process. A mature web server also needs to include the following features:
(1) complete implementation of HTTP/0.9, HTTP/1.0, and HTTP/1.1 protocols
(2) ability to process concurrent requests and support multi-process or multi-thread
(3) provide development interfaces and allow additional functions
(4) Implement server security mechanisms
(5) dynamic content generation (allows CGI and scripting to generate dynamic web pages)
(6) Support for Virtual Hosts
(7) Support for proxy
(8) allow users to select appropriate response resources based on mime-type negotiation
Management also needs to consider:
(1) 7x24 running robustness and Stability
(2) easy configuration
(3) You can modify the configuration without stopping the server.
(4) easy to manage. administrators can effectively manage servers through auxiliary tools.
(5) rich logs
3. www documentation
Documents on the Web server are organized in a layered or tree structure.
The hyperlink in HTML uses the document name to point to the corresponding information. The link can be a complete document name (absolute name: www.server.org/there/xx.html), or the name of the current document (relative name:./xx.gif ).Relative names are usually used for Embedded Images. Generally, each part of an HTML document is in the same place.
However, the Web document tree does not actually reflect the organizational structure of Web documents. The actual organizational structure of Web documents has three most important forms:
(1) All documents are located on the same machine and constitute a tree.
(2) Documents are distributed on different machines and no images are formed between them.
(3) Documents form images on different machines
4. Working Method (working principles and steps of web servers)
(1) waiting for customer requests: listening on a port
(2) customer request arrival: the client browser sends an ASCII string (request) to the server, and the Web server reads the request into the memory
(3) accept customer requests: the Web server decodes requests based on the HTTP protocol to determine further actions, including methods (get) and documents requested by the client, protocols used by browsers
(4) Read Other information: the Web server reads other parts of the request information as needed (used to describe the metadata of the browser and its capabilities)
(5) other actions to complete the request: the Web server searches for the requested file index.html in its document tree.
Search successful: the file will be sent out. First, a response code and some description information will be sent, and then the disk file will be read and output to the network.
Failed to process: error message returned
(6) close the file and network connection and end the session
Ii. Apache Functions
Fully implement httipv9, 1.0, and 1.1. And content not included in the Protocol, such as a virtual host. The basic functions are as follows:
1. VM
A vm is a mechanism for running multiple web sites on one machine.
The implementation of a VM includes the following three methods:
(1) "IP-based" technology: Multiple IP addresses are configured in the Web server, and each logic web server uses one IP address.
Advantage: the simplest
Disadvantage: poor scalability. The physical IP address of a machine is limited.
Limited number of IP addresses
(2) Port-based technology: the Web server has only one IP address, and different Web servers use different ports for listening.
Disadvantage: the user must explicitly provide the correct request Port
(3) "based on host domain" technology: the Web server has only one IP address, and multiple domain names are mapped to this IP address at the same time.
All Web servers listen on the same port and distinguish requests based on the host domain in the HTTP request (available only after http1.1 ).
Apache supports the above three methods of virtual hosts and can configure similar virtual hosts through The mod_vhost_alias module.
2. content negotiation
Apache may store multiple different versions of the same document. There are two ways to negotiate client and server resources to select the version that best suits the user:
They can be used independently or in combination. In hybrid use, there is a "transparent negotiation method" that can buffer the proxy-driven negotiation information provided by the initial server and provide server-driven negotiation for subsequent requests, use this method.
(1) server-driven content negotiation
That is, the version of the document to be sent to the client is determined by the server.
By using the "accept" request field, the client provides a list of formats it can accept. Based on this field, the server selects the content most suitable for the client.
The server selection may be based on the language, content encoding, the content of the special message fields in the request message, and other information (such as IP addresses) of the request)
Advantage: it is better for the server to send its "Best guess" to the client together with the first response
Disadvantages: 1) Failure to accurately determine the optimal negotiation
2) The efficiency of the client to describe its own capabilities in the request is very low.
3) complicated implementation of the initial server and Algorithms for generating responses to requests
4) A public cache may be limited so that it can use the same response for requests from multiple users.
Accept-XXX is used for negotiation.
(2) client-driven content negotiation (completed by the browser)
3. continuous connection
Persistent connection: a connection is used instead of being closed immediately after it is opened. Subsequent data transmission is based on this connection.
To use persistent connections, the client and server both use the "connection: keep_alive" request domain (by default, the connection is a persistent connection) unless a special
Use "connection: Close" to close the persistent connection.
Apache provides configuration commands that allow you to limit the number of requests processed on the same connection and the processing timeout time. Once the processing time exceeds this time, the connection will be closed.
4. Cache
In a distributed architecture, you can use the cache response method to improve the system performance and speed up the response to the client.
The cache in http1.1 is designed to reduce the necessity of sending requests and the necessity of sending a complete response in many cases.
Reduce sending requests: Use the expiration mechanism to reduce the required network round
Reduce sending complete responses: Use a verification mechanism to reduce network bandwidth requirements
Related cache fields in http1.1:
(1) expires: indicates the time when a webpage or URL address is no longer cached by the browser.
(2) cache-control: Multiple elements can be declared to specify the maximum time limit for pages to be cached, how to be cached, and how to be converted to another media, and how it is stored in persistent media.
(3) last_modified: two fields related to the condition request. It is used to verify whether the request has been changed on the server page. Avoid repeatedly sending files to the browser, but there will still be HTTP requests.
Generally, a pure static page contains the last-modified information. Apache reads the last-modified information from the page file and adds it to the HTTP response header.
For dynamic pages, if the last-modified is not forcibly added to the page by using the function, Apache returns the current time to the browser as last-modified.
Both static and dynamic pages, Firefox cleverly sets the last-modified cached page according to the time when the server response is received.
(4) etag: used to provide more rigorous verification.
By default, Apache adds the etag field to the Response Headers of all static and dynamic files. You can configure this option using the fileetag command in the httpd. conf file.
The fileetag command configures the attributes of the file used to create the etag response header when the document is based on a file.
In multiple Server Load balancer environments, the same file may have different etag or file modification dates, and the browser will download the file again each time.
5. access control and security
Ensures the security of controlled data through authentication, authorization, and access control (AAA module) security measures.
(1) Access Control
Only one module implements access control
(2) Authentication
There are two types: Basic Authentication and digest authentication.
(3) Authorization
Apache parses the global configuration file and local configuration file. htaccess to determine the user's authorization identity.
6. dynamic content generation
The simplest function of a Web server is to send static html files stored on the server to the client.
Apache provides the dynamic page generation function:
(1) Use CGI Script: the Web server executes an external application to interpret the script code and returns the HTML output after execution to the Web server, and then forwards it to the client.
(2) using additional modules to support scripting languages (such as mod_perl): The scripting language interprets it through the context, the script module provides an execution environment for the script language and an API that allows the script to access data other than the script.
There are two types of server scripts:
1) HTML files that support embedded scripts, such as ASP and PHP, are included in specific tags and identified by the script engine module. After being executed, the scripts are output as HTML text, replace the original script.
2) HTML documents completely generated by programs: CGI programs (C, C ++, or Perl programs) and Java Servlets programs