Apache mod_rewrite Learning (1)
Chelong has long written an article about how to use the mod_rewrite module to hide dynamic pages in the background using links in the form of static pages.
The rewrite module of Apache provides a rule-based rewrite (rewrite, which may be more suitable for translation and refactoring) engine to rewrite the request URL sent to Apache in real time. Because of its extremely powerful functions, it is called the "Swiss Army Knife" for URL rewriting ".
This module uses a rewrite engine developed based on the regular expression parser to rewrite the request URL in real time (on the fly) according to the rules defined by the web administrator. It supports rewrite rules of any number and any number of conditions attached to a rule, thus providing a flexible and powerful URL Processing Mechanism. The implementation of URL Processing operations depends on a variety of condition checks, such as checking server variables, environment variables, HTTP header fields, timestamp values, and even external database search results. This module can process related URLs within the server range (HTTP. conf), within the directory range (. htaccess), or part of the request string (query-string. The URL of the result to be rewritten can point to the processing program in the station, the redirection to the outside site, or the proxy in the station. It is complicated to set with flexibility and powerful functions. Don't expect to understand the entire module in one day. (So this learning note is divided into several parts :)
Internal Process
API stage
Apache processes HTTP requests in stages. Apache API provides a hook for each stage ). Mod_rewrite uses two of these hooks: one is used to convert url_to_filename before the HTTP request is read but no access authorization verification is available, and the other is used to complete the authorization authentication and set the directory file (. after being read, but before being called, the content processor (content handler) is intensified and fixed ). therefore, when a request arrives, Apache determines the relevant server (or virtual server) and then performs the url_to_filename stage, rewrite Engine) start to process rewrite commands (mod_rewrite ctictives) in server settings ). after several phases, the system enters the repair phase. The physical directory where the final data is located has been found, and the rewrite command in the directory configuration starts to be executed. In both stages, mod_rewrite rewrite the URL to a new URL or file name, so there is no obvious difference. The Application of API is not designed in the first place, but the application of apache1.x has to be used. To solve this problem, remember the following two points.
1) Although mod_rewrite can convert URLs to URLs, URLs to file names, or even file names to file names, API (1.x) currently provides a url_to_filename conversion. In apache2.0, these two hooks are added and the entire process is clearer. One fact must be clearly remembered: Apache has done more functions than API design in the url_to_filename hook.
2) What is incredible is that mod_rewrite can perform URL Processing within the directory range (for example, according to the command configuration of the. htaccess file), although the URL has long been converted to the file name. This is because the. htaccess file exists in the file system. That is to say, it is very late to process the URL at this stage. To solve the problem of "first chicken or first egg", mod_rewrite uses a trick:RecordingWhen processing URL/filename within a range, mod_rewrite first reverts the file name back to the relevant URL (although it is generally impossible, please refer to the rewritebase command below to implement this technique ), then an internal sub-request is generated based on the new URL, which re-starts the API process. Mod_rewrite tries to make these complex steps transparent to users, but remember: although the real process of directory-range URLs is very fast and efficient, however, this stage will become slow and inefficient because of this "chicken and egg" problem. On the other hand, this is also the only way that mod_rewrite provides to common users for URL Processing within the directory.
Rule Set (rewriterule command set) Processing Process
When mod_rewrite is activated in the preceding two API phases, it configures the data structure from the server context (Per-server context) or directory context (Per-directory context) read the configured rule set, and then the URL rewriting engine starts to execute the contained Rule Set (one or more rules and their conditions ). The processing process in the two contexts is the same, and the difference is only in the final result processing process.
The order of rules in a rule set is very important because the rewrite engine processes them in a specific order. The rewrite engine traverses the Rule Set sequentially. When a rule matches, the engine traverses the condition set related to it (rewritecond command set ). due to historical reasons, the condition set is first listed, so the control flow is somewhat tortuous (long-winded ). 1:
As you can see, the URL will first be compared with the template (pattern) of each rule. When the matching fails, immediately stop processing the current rule and enter the next rule. When the match is successful, mod_rewrite searches for the relevant rule conditions. If the related conditions cannot be found, execute the replacement defined in the rule and return to the rule traversal process. If the related conditions are found, an internal cycle is started to check the conditions in sequence. For the check, we do not use a template to match the current URL. Instead, we create a teststring and reference the variables in the string (BAKC-reference) the query results (MAP lookups) are expanded and matched with the condpattern in the teststring and condition types. If the matching fails, the entire condition set is not executed, important: Return to rule traversal. If the match is successful, check the next condition. If all the conditions are met, execute the replacement action defined in the rule.
Escape of special characters
Since it is based on the regular expression, of course there will be a special character problem. In Apache 1.3.20, a special character "/" is added before a special character to escape the special character of the teststring or sustitution string.
Backward reference of Regular Expression
Remember that once parentheses are used in the template (pattern) or condition template (condpattern), The Back Reference is automatically generated, you can use $ N or % N in sustitution or teststring to reference related values ., Describes the location where the backward referenced value can be uploaded.
Configuration directives)
Command |
Syntax |
Default Value |
Description |
Remarks |
Rewriteengine |
Rewriteengine on | off |
Off |
Switch reconstruction Engine |
It cannot be inherited by default, so each virtual host must have its own switch command. |
Rewriteoptions |
Rewriteoptions Option |
Maxredirects = 10 |
Set some special parameters |
Inherit: indicates whether the configuration is inherited. maxredirects = Number: number of internal redirects |
Rewritelog |
Rewritelog file-Path |
None |
Set to overwrite the log file |
Use rewriteloglevel 0 to disable logs |
Rewriteloglevel |
Rewriteloglevel level |
Rewriteloglevel 0 |
Set Log Level |
0 indicates no, 2 or more indicates debug, 9 or more indicates all information |
Rewritelock |
Rewritelock file-Path |
None |
Set the synchronization lock file of the rewritemap Program |
It must be a local file, which is only valid for rewriting map-program. |
Rewritemap |
Rewritemap mapname maptype: mapsource |
Notused per default |
Definition override shot |
For more information, see |
Rewritebase |
Rewritebase URL-Path |
Physical directory path |
Set the basic URL to be overwritten in the directory range |
For more information, see |
Rewritecond |
Rewritecond teststring condpattern |
None |
Define Rule Conditions |
For more information, see |
Rewriterule |
Rewriterule pattern substitution |
None |
Define Rewrite Rules |
For more information, see |
References:
Http://httpd.apache.org/docs/mod/mod_rewrite.html
Apache mod_rewrite Learning (2)
I learned the syntax of rewrite rules today.
Rewriterule
Syntax: rewriterule pattern substitution [flags]
A rewriterule command defines a rewrite rule. The order between rules is very important. For apache1.2 and later versions, the template (pattern) is a POSIX regular expression used to match the current URL. The current URL is not necessarily the original submitted URL, because some rules may have been used to process the URL before this rule.
For mod_rewrite ,! It is a valid template prefix, indicating "not", which is very convenient to describe "does not meet certain matching conditions", or is used as the last default rule. When used! The template cannot contain group wildcards or back-reference.
After successful match, substitution will be used to replace the corresponding match. It can be a common string, but also include:
- $ N references the matched strings in the rewriterule template. N indicates the serial number, n = 0 .. 9
- % N, referencing the matched data in the last rewritecond template. N indicates the serial number.
- % {Varname}, server variable
- $ {Mapname: Key | default}, ing function call
These special content are extended in the above Order.
All related parts of a URL will be replaced by substitution, and the replacement process will continue until all rules are executed, unless the L mark is explicitly used to interrupt the processing process.
When susbstitution has a "-" prefix, it indicates that no replacement is performed and only matching checks are performed.
You can use rewriterule to define a URL containing a query string. In this case, you only need to add ?, It indicates that the subsequent content is put into the QUERY_STRING variable. To clear a QUERY_STRING variable, you only need to use? End the substitution string.
If you add an http: // thishost [: Port] prefix to a substitution, mod_rewrite automatically removes the prefix. Therefore, it is difficult to use http: // thisthost to implement an unconditional redirection to itself. To achieve this effect, the R flag must be used.
Flags is an optional parameter. When multiple flags appear at the same time, they are separated by commas.
- 'Redirect | r [= Code] '(Force redirect)
Add the prefix http: // thishost [: thisport]/to the current URI to generate a new URL and forcibly generate an external redirect (external redirection, indicating that the generated URL is sent to the client, the client sends a request with a new URL again, although the new URL still points to the current server ). if no code value is specified, the HTTP response is returned with a status value of 302 (moved temporarily). If you want to use 300-400 (excluding 400) other values can be specified by the corresponding number at the location of the Code, or by using the flag name: temp (default), permanent, seeother.
Note: When this flag is used, it must be confirmed that the substitution is a valid URL. This flag only adds the http: // thishost [: thisport]/prefix before the URL, the rewrite operation will continue. If you want to immediately redirect the new URL, use the l sign to rewrite the process.
- 'Forbidden | f' (forcibly forbid access to resources specified by the URL)
Immediately return the response packet whose status value is 403 (Forbidden. This flag can be used together with the appropriate rewriteconds to block access to certain URLs.
- 'Gone | G' (forcibly returning the resource indicated by the URL does not exist (gone ))
Immediately return the response packet with the status value 410 (gone. The resource indicated by the URL disappears permanently.
- # 'Proxy | P' (force send the current URL to the proxy module ))
This flag forces the substitution as a request to the proxy module, and is about to be sent to the proxy module. Therefore, make sure that the substitution string is a valid uri (for example, it typically starts with http: // hostname). Otherwise, an error is returned from the proxy module. this sign is a more powerful implementation of the proxypass command. It maps remote requests (stuff) to the namespace of the local server.
Note: To use this function, you must ensure that the proxy module has been compiled into the Apache server program. You can use the "httpd-L" command to check whether the output contains mod_proxy.c to confirm. If you do not need this function, you need to recompile the ''httpd ''program and use mod_proxy.
- 'Last | l' (last rule)
Stop the rewrite process and no longer apply more rewrite rules to the current URL. This is equivalent to the last command of Perl or the break command of C.
- 'Next | n' (next round)
Re-execute the rewrite process from the first rewrite rule. The URL in the new process should not be the same as the original URL. This is equivalent to the next command of Perl or the continue command of C. Be careful not to generate an endless loop.
- # 'Chain | C' (chained ))
When a rule is matched, the processing process is the same as if the rule does not match. If the rule does not match, the subsequent rules associated with the rule are not checked and executed.
- 'Type | T = mime-type' (mandatory MIME type)
Forcibly set the mime-type of the target file to a MIME type. For example, this can be used to simulate the specified ScriptAlias of a directory by The mod_alias module. By forcibly changing the type of all files in the directory to "application/X-httpd-cgi ".
- 'Nosubreq | ns' (used only if no internal sub-request)
This flag forces the rewrite engine to skip the rewrite rule for internal sub-request. for example, when mod_include tries to find the default file in a directory (index. XXX), sub-requests will occur within Apache. sub-requests is not always useful. In some cases, an error occurs if the entire rule set is applied to it. Use this flag to exclude execution rules.
- 'Nocase | NC '(The template is case insensitive)
This flag ignores the case sensitivity when the template matches the current URL.
- 'Qsappend | qsa' (append request string (query string ))
The Force rewrite engine appends a part of the string to the substitution request string, instead of replacing the original string. With this flag, you can use an rewrite rule to add more data to the request string.
- 'Noescape | nee' (Escape processing is not performed on special characters in the output result)
In general, in the output result of mod_write, special characters (such as '%', '$', ';', etc) it is escaped to their hexadecimal form (for example, '% 25',' % 24', and '% 3B '). This flag will disable mod_rewrite from performing such operations on the output results. This flag can only be used in Apache 1.3.20 or later versions.
- 'Passthrough | pt' (via the next processor)
This flag forces the rewrite engine to replace the URI field value in the internal request_rec data structure with the value of the filename field .. Using this flag can enable subsequent alias, ScriptAlias, redirect, and other commands of the URI-to-filename converter to process the output results of the rewriterule command. Use a small example to illustrate its semantics: If you want to use the mod_rewrite rewrite engine to convert/ABC to/DEF, and then use mod_alas to rewrite/DEF to ghi, You need:
Rewriterule ^/ABC (. *)/DEF $1 [pt]
Alias/DEF/Ghi
If the PT flag is ignored, mod_rewrite can also do a good job ., set uri =/ABC /... convert to filename =/DEF /..., it fully complies with the action of a URI-to-filename converter. Next, mod_alias tries to convert URI-to-filename.
Note: If you want to mix the commands of different modules that contain the URL-to-filename converter, you must use this flag. The most typical examples are the use of mod_alias and mod_rewrite.
- 'Skip | S = num' (skip the subsequent num Rules)
When the current rule matches, the Force rewrite engine skips the subsequent num rules. This can be used to simulate the if-then-else structure: the last rule of the then clause is skip = N, and N is the number of rules of the else clause.
- 'Env | E = var: Val '(set environment variables)
Set the value of the environment variable named VaR to Val, where Val can contain regular backward references ($ N or % N ). This flag can be used multiple times to set multiple environment variables. The variables set here can be referenced in many cases, such as in xssi or CGI. In addition, the rewritecond template can be referenced in the form of % {env: var.
-
Note: Do not forget that in the configuration file within the server range, the template (pattern) is used to match the entire URL; in the configuration file within the directory range, the directory prefix is always automatically removed before template matching. After replacement, the prefix is automatically added. This function is very important for many types of rewriting, because if there is no prefix, it is necessary to match the parent directory, and the information of the parent directory is not always available. One exception is that when the substitution contains http: // headers, the prefix is no longer automatically added. If the P sign appears, it is forced to turn to the proxy.
Note: If you want to start the rewrite engine within a directory range, you need to set "rewriteengine on" in the corresponding directory configuration file, and the "Options followsymlinks" directory must be set. If the Administrator does not enable followsymlinks for security reasons, the rewrite engine cannot be used.