This article explains how Apache HTTP server applies the requested URL to determine the file system location to get the document. DocumentRoot
For the decision to respond to a request for what content, the default behavior of httpd is to obtain the requested Url-path (the URL path following the hostname and port number), and then add it to the DocumentRoot specified in the configuration file. Therefore, the files and directories below the DocumentRoot form the basic Web document tree.
For example, if DocumentRoot is set to /var/www/html, then request http://www.example.com/fish/guppies.html will get /var/ www/html/fish/guppies.html as a client response.
If a directory is requested (that is, path ends with/), then the response file is specified by the DIRECTORYINDEX directive. For example, if DocumentRoot follows the settings above and then sets
DirectoryIndex index.html index.php
If these files do not exist, then if the Mod_autoindex module is loaded and the configuration allows it, it will attempt to provide a directory index.
HTTPD also has the ability to virtual hosting, and virtual hosting allows httpd to accept requests from multiple hosts. In this case, it is possible to specify a directory for each virtual host that the Documentroot,mod_vhost_alias module provides that can be used to dynamically determine the appropriate location for providing files based on the requested IP address or hostname.
The DocumentRoot directive can be set in the main configuration file (httpd.conf), but more likely, a documentroot is specified for each virtual host.
Files Outside of DocumentRoot
This is often the case: we need the web to access files outside of DocumentRoot. HTTPD provides several ways to achieve this requirement. In Unix systems, soft links (symbolic links) can be used to place files from other locations within DocumentRoot. For security reasons, HTTPD allows access to a soft connection only if it contains followsymlinks or SymLinksIfOwnerMatch in the options settings of the relevant directory.
Or we can use the alias directive to map any location in the file system to a web space. For example, set
Alias "/docs" "/var/web"
Then request http://www.example.com/docs/dir/file.html will be responded /var/web/dir/file.html. The SCRIPTALIAS directive has the same effect, except that the target file requested for access is treated as a CGI script.
For situations where greater flexibility is required, we can use the Aliasmatch and Scriptaliasmatch directives to specify powerful regular expressions based on matching and substitution. For example:
Scriptaliasmatch "^/~ ([a-za-z0-9]+)/cgi-bin/(. +)" "/home/$1/cgi-bin/$2"
The request http://example.com/~user/cgi-bin/script.cgi will be mapped to the/home/user/cgi-bin/script.cgi in the file system, and the target file mapped to this is treated as a CGI script.
User Directory (User-dir)
In a traditional UNIX system, a specific user's home directory can be accessed using ~user/. The Mod_userdir module extends this usage to the Web: we can use the following URLs to access files located in each user's home directory.
http://www.example.com/~user/file.html
Based on security considerations, it is inappropriate to access the user's home directory directly from the Web. Therefore, the USERDIR directive can specify a directory that is located under the user's home directory for storing web files. The default value for Userdir is public_html, so the above URL will be mapped to /home/user/public_html/file.html. Where/home/user is the user home directory specified in the/etc/passwd.
For systems where the user home directory path does not exist in/etc/passwd, we can use several other forms of userdir.
Some people think that using the ~ symbol in the URL (which is often referred to as URL transcoding to%7e) is inappropriate, and they prefer to use another string to represent the user's home directory. The Mod_userdir module does not support this feature. However, if the user's home directory is organized in a regular manner, then using the Aliasmatch directive is likely to achieve the desired effect. For example, to map http://www.example.com/upages/user/file.html to/home/user/public_html/file.html, you can use the following Aliasmatch directive:
Aliasmatch "^/upages/([a-za-z0-9]+) (/(. *))? $" "/home/$1/public_html/$3" URL redirection
The configuration directives discussed above allow HTTPD to respond to clients by obtaining files from specific locations in the file system. But sometimes, we expect to be able to tell the client that the requested content is in another URL, so that the client can send a new request for that URL. This mechanism is known as redirection (redirection) and can be implemented using redirect directives. For example, if the contents of the/foo/directory under DocumentRoot are all moved to a different directory/bar/below, you can instruct the client to resend a request to get the file under the new directory.
Redirect permanent "/foo/" "http://www.example.com/bar/"
The above configuration will redirect Url-path beginning with/foo/to the same path as www.example.com below to replace/foo/with/bar/, for example http://www.yousite.com/foo/fish/ Guppies.html will be redirected to http://www.example.com/bar/fish/guppies.html. As in the example above, we can redirect the client to any server, not just the original server.
HTTPD also provides redirectmatch directives to address more complex redirection issues. For example, to redirect the first page of a site to another site, but other accesses are not redirected, you can use the following configuration:
Redirectmatch permanent "^/$" "http://www.example.com/startpage.html"
Alternatively, you can redirect all pages from one site to another site as follows:
Redirectmatch temp ". *" "http://othersite.example.com/startpage.html"
Note 1: The permanent and temp in the two configurations above will function in the header of the redirect Response (301) received by the client.
NOTE 2: The following will talk about HTTPD's reverse proxy, where we first need to explicitly redirect the most important feature is that the server proactively allows the client to accept a 301 redirect response and then let the client re-initiate a new request, which means the client is aware of the process. The reverse proxy is just the opposite, the server will try to hide the real source of resources (by proxy), and let the client feel like these resources really come from the same proxy server.
Reverse proxy (Reverse proxies)
HTTPD also allows us to place documents from the remote site in the URL space of the local site. This technique is called reverse proxy (reverse proxying). This is because the Web server fetches the documents from the remote server and responds to the client as if they were the proxy server. But unlike normal proxies (normal (forward) proxy), the reverse proxy makes these remote documents look as if they originated from the reverse proxy server itself, in other words, hiding the true source of these documents.
In the following example, when a client requests a document that is located under the/foo/directory, the reverse proxy server fetches the document from the Internal.example.com/bar/directory and returns it to the client, as if it were from a reverse proxy server.
Proxypass "/foo/" "http://internal.example.com/bar/"
Proxypassreverse "/foo/" "http://internal.example.com/bar/"
Proxypassreversecookiedomain internal.example.com public.example.com
Proxypassreversecookiepath "/foo/" "/bar/"
The proxypass directive causes the reverse proxy server to go to the remote site to fetch the document (note: The most basic proxy can be achieved simply by configuring the command).
The proxypassreverse instruction will let httpd adjust the URL in location, content-location, and URI header information, which will be the proxy site The URL content in this three header message in the emitted response is replaced by the reverse proxy site.
Proxypassreversecookiedomain Adjust the domain content in the Set-cookie header information.
Proxypassreversecookiepath adjusts the path content in the Set-cookie header information.
It is important to note that the links in the document being proxied are not rewritten. So any link to the absolute path will eventually be requested to the Proxy service. For more information, refer to Mod_substitute and mod_proxy_html.
Note: Say my understanding of the forward proxy, for example, we can not directly access the external network, need to be accessed through a proxy server, then the client will actually send the request to the agent, and then the agent may go to the target in the request, and the real target response back to the client, It may also be possible to respond directly to the client using its own caching (cache). Either way, the client will not be aware of the presence of the agent, it will assume that I am requesting site A, then I get the same response from point A. The network proxy that we configure in the browser is actually the forward proxy.
Apache HTTP Server Map URL to file system