URL rewriting: Rewritecond instruction and Rewriterule instruction format

Source: Internet
Author: User
Tags response code

Rewirte basic function is to implement the URL of the jump and hide the real address, based on the Perl language of the canonical table specification. Usually help us achieve quasi-static, quasi-folder, domain name jump, prevent hotlinking and so on. This article explores technical details for mod_rewrite and URL matching, as well as rewritecond and rewriterule instruction formats.

Rewirte Module Internal Processing

The internal processing of the Rewirte module is extremely complex, but in order for the general user to avoid making low-level errors. Also allows administrators to take full advantage of their capabilities. Here is still a description.

Rewirte Module API phase

First, you have to understand that Apache processes HTTP requests in a number of stages. The Apache API provides a hook program for each phase.

Mod_rewrite uses two hook programs: one, the conversion hook from the URL to the file name (after the HTTP request is read, before the authorization starts), and the second, The Fix hook (used in the authorization phase and after reading the folder-level configuration (. htaccess), Before the content processor is activated).

So, after Apache receives a request and determines the response host (or virtual host), the rewrite engine starts processing all the mod_rewrite instructions in the server-level configuration (at this point in the transition from URL to file name). After this phase is complete, the Data folder is finally determined.

Next, go to the Fix section and trigger the mod_rewrite instruction in the folder-level configuration. These two phases are not entirely distinct. But all implemented the rewrite URL to a new URL or file name.

Although the API was not originally designed for this purpose. But now it has become a use of the API. Remembering the following two points will help to better understand:

1, although mod_rewrite can rewrite the URL as a new URL or file name. Even the file name is rewritten as a new file name, but the previous API simply provides a hook from the URL to the file name. In Apache 2.0, two missing hooks were added to make the process clearer. Just because this does not cause trouble for users, users simply need to remember the fact that hooks from URLs to file names are more powerful than the original API design's target functionality.

2, incredibly. Mod_rewrite also provides folder-level URL operations (. htaccess files) that must be processed after the URL is converted to a file name (this is necessary because the. htaccess exists in the file system). Other words. Depending on the API phase, it is too late to process whatever URL is being processed. To solve this "chicken and egg" problem, mod_rewrite used a small trick: when doing a folder-level url/file name operation. First rewrite the file name back to the corresponding URL (usually this operation is not feasible, but the following rewritebase instructions will be able to clarify how it is implemented). And then. Create a new internal sub-request for this new URL, and start the API phase again.

In addition, Mod_rewrite tries to make these complex operations transparent to the user. But keep in mind that server-level URL operations are fast and efficient. The folder-level operation is slow and inefficient because of the "chicken and egg" problem. But there's a side look. This is the only way that mod_rewrite can provide (locally restricted) URL operations to the average user.

Processing of rewirte module rule set

When Mod_rewrite starts running in these two API phases. It reads the configuration in the configuration structure (or at the server level that was established when the service was started. Or, the folder-level rule set is traversed by the folder collection, and then the URL rewriting engine is started to process the ruleset (with one or more conditions). Both the server-level and folder-level rule sets are handled by the same URL rewriting engine, and only finally the results are handled differently.

The order of rules in a rule set is very important. Because the rewrite engine is handled in a special order: each rule (rewriterule instruction) is traversed one by one, and if a rule with a matching condition appears, it is possible to go back through the existing rule condition (Rewritecond Directive). Due to historical reasons, the conditional rules are pre-set, so the control process is a bit verbose, the details are shown in Figure 1.


Figure 1: Overriding the control flow in a rule set

Visible. The URL first matches the pattern of each rule. Assuming the match fails, Mod_rewrite will immediately terminate the processing of this rule. The next rule is then processed. Assuming the match succeeds, Mod_rewrite will look for the appropriate rule condition, assuming that none of the conditions are true, then simply replace the URL with the new value constructed by substitution. Then proceed with the other rules, but assuming that the condition exists, start an internal loop that is processed individually in the order in which it is listed.

The rule condition is handled differently: The URL does not match the pattern, but instead establishes a teststring string by extending the variable, reverse referencing, finding the mapping table, and then using it to match the Condpattern. Assume that the match failed. The entire condition set and the corresponding rule fail; If the match succeeds, the next rule runs until all conditions are complete. If all conditions are matched, replace the URL with substitution. and continue to deal with it.

(This section refers to the translator: Jin Bu)

Rewritecond instruction Format

Syntax: Rewritecond teststring Condpattern [flags]

The Rewritecond directive defines a rule condition. There may be one or more rewritecond instructions in front of a rewriterule instruction. Rules are applied to the current URL processing only when their template (pattern) matches successfully and those conditions are met.

1, teststring is a plain text string, in addition to including ordinary characters, but also can include the following extensible structure:

1) $N: Rewriterule back reference, among them (0 <= N <= 9). $N refers to the data that is matched in the current URL by the template in the Rewriterule in the template immediately following rewritecond.

2)%n:rewritecond back reference, among them (0 <= N <= 9). %n references the template in parentheses in the last Rewritecond template to match the data in the current URL.

3) ${mapname:key|default}:rewritemap extension.

2, Condpattern is a conditional pattern, that is, a teststring applied to the current instance of the table, that is, teststring will be computed and then match Condpattern. As a standard extended regular, Condpattern has the following additions:

1) Ability to add a prefix to the template string. To use to represent a mismatch template. But not all of the test can be added! Prefix.

2) The following special variables can be used in Condpattern:

' >condpattern ' (greater than) treats Condpattern as a normal string. Compare it to teststring, when the teststring character is greater than Condpattern is true.

' =condpattern ' (equals) treats Condpattern as a normal string. Compare it to teststring, when TestString and Condpattern are exactly the same. Assuming that Condpattern is just "" (two numbers close together), it is necessary for the teststring to be an empty string before it is true.

'-d ' (whether it is a folder) treats TestString as a folder name. Check to see if it exists and is a folder.

'-f ' (whether regular file) treats teststring as a file name. Check if it exists and whether it is a regular file.

'-s ' (whether it is a regular file of length not 0) teststring as a file name, check if it exists and whether it is a regular file with a length greater than 0.

'-l ' (whether symbolic link) treats teststring as a file name. Check if it exists and whether it is a symbolic link.

'-f ' (through subrequest to check if a file is accessible) check if teststring is a valid file. and access through the current settings in the server-wide access control.

This check is completed by an internal subrequest, so use this feature carefully to reduce server performance.

'-U ' (Checking for the presence of a URL through subrequest) checks if teststring is a legitimate URL and is interviewed through the current set of access controls within the server scope. This check is completed by an internal subrequest, so use this feature carefully to reduce server performance.

3. [Flags] is the third parameter. Multiple flags are separated by commas.

1) ' nocase| NC ' (no distinction between uppercase and lowercase) in extended teststring and condpattern, compared to uppercase and lowercase text in the case of comparison. Attention. This flag has no effect on file system and Subrequest checks.

2) ' ornext| or ' (Establish a relationship with the next condition) by default, the relationship between the two conditions is and, using this flag to change the relationship to or. For example: Rewritecond%{remote_host} ^host1.* [or] Rewritecond%{remote_host} ^host2.* [or] Rewritecond%{REMOTE_HOST} ^host3.* Rewriterule ... Assume there is no [OR] flag. Three conditions/rules need to be written.

Rewriterule directive

Syntax: Rewriterule Pattern Substitution [flags]

1) pattern is a compatible Perl-compliant form that acts on the current URL. The "current" here refers to the value of the URL when the rule is in effect.

2) substitution is a string that is substituted (or replaced) when the original URL matches the pattern.

3) In addition, substitution is able to append a special tag [flags] as the third parameter of the rewriterule instruction. Flags is a comma-delimited list of the following tags:

Redirect| R [=code] (Force redirect redirect)

A substitution that is prefixed with http://thishost[:thisport]/(making the new URL a URI) can be forced to run an external redirect. Assume that code is not specified. Generates an HTTP response code of 302 (transient movement). Suppose you need to use a different response code within the 300-400 range. Just specify this value here, and you can also use one of the following symbol names: Temp (default), permanent, seeother. It can be used to feedback the normalized URL to the client, such as rewriting "/~" to "/u/". Or/u/user plus slashes, and so on.

Note: When using this tag. You must ensure that the replacement field is a valid url! Otherwise. It will point to an invalid location! And remember, this tag itself is just a prefix to the URL plus http://thishost[:thisport]/. The rewrite operation will still continue. Usually. You will want to stop the rewrite operation and redirect immediately. You also need to use the ' L ' tag.

forbidden| F (mandatory URL for forbidden Forbidden)

Forces the current URL to be forbidden. That is, immediately feedback an HTTP response code 403 (forbidden). With this tag, you can link several rewriteconds to conditionally block certain URLs.

gone| G ' (force URL to obsolete gone)

Forces the current URL to be obsolete, that is, immediately feedback an HTTP response code 410 (deprecated). Use this tag. Can indicate that the page has been discarded and does not exist.

proxy| P (Force agent proxy)

This token causes the replacement component to be internally coerced to the proxy request and immediately (i.e., the rewrite rule processing immediately interrupts) handing over the processing to the proxy module.

You must ensure that this replacement string is a valid (for example, a common starting with Http://hostname) URI that can be handled for the Apache proxy module. With this tag, some remote components can be mapped to the local server namespace, thereby enhancing the functionality of the Proxypass directive.

Note: To use this feature, the proxy module must be compiled in Apacheserver. Suppose you are not sure that you can check for mod_proxy.c in the output of "httpd-l". Suppose there is. The mod_rewrite is able to use this function; if not, you must enable mod_proxy and compile the "httpd" program again.

Last| L (last Rule)

Stop the rewrite operation immediately. And no other rewrite rules are applied. It corresponds to the last command in Perl or the break command in the C language. This tag prevents the URL that is currently overridden from being overridden by its successor rule.

For example, use it to rewrite the URL of the root path ('/') to a URL that actually exists, for example, '/e/www/'.

Next| N (run next round again)

Once again, run the rewrite operation (from the first rule to start again). The URL that was processed again is not the original URL, but the URL that was processed by the last rewrite rule.

It corresponds to the next command in Perl or the Continue command in the C language.

This tag can start the rewrite operation again. That is, immediately return to the head of the loop.
But be careful. Don't make a dead loop!

chain| C (link to next rule chained)

This tag causes the current rule to be linked to the next rule (which itself can be linked to its successor rule and can be so repeated).

It produces an effect: Suppose a rule is matched. It will generally continue to deal with its successor rule, that is, that this tag does not work; if the rule cannot be matched, then its subsequent linked rules are ignored. Example. When you run an external redirect, a set of rules for a folder level. You may need to delete the ". www" (". www" should not appear here).

type| T=mime-type (force MIME type types)

The mandatory MIME type for the target file is Mime-type.

For example, it can be used to emulate the Scriptalias directive in Mod_alias to internally force the MIME type of all files in the mapped folder to be "application/x-httpd-cgi".

nosubreq| NS (only for incorrect internal sub-request processing no internal sub-request)

This token forces the rewrite engine to skip the rewrite rule when the current request is an internal child request. For example, when Mod_include tries to search for possible folder default files (index.xxx), Apache generates child requests internally. A child request, which is not necessarily practical, and assumes that the entire rule set is working. It may even throw an error. Therefore, it is possible to use this tag to exclude certain rules.

Follow these guidelines according to your needs: Assume that you are using URL prefixes with CGI scripts to force them to be handled by CGI scripts, and that the error rate (or overhead) of child request processing is very high, in which case the token can be used.

nocase| NC (ignores uppercase and lowercase no case)

It causes the pattern to ignore uppercase and lowercase. That is, there is no difference between ' A-Z ' and ' A-Z ' when pattern matches the current URL.

qsappend| QSA (Append request string query string append)

This flag forces the rewrite engine to append a request string to an existing replacement string, rather than a simple replacement. If you need to add information to the request string through a rewrite rule, you can use this tag.

Noescape|ne (incorrect URI in output escaped no URI escaping)

This flag prevents Mod_rewrite from applying a general URI escape rule to the overridden result.

In ordinary cases, special characters (such as '% ', ' $ ', '; ') And so on) will be escaped to the equivalent hexadecimal encoding.

This flag prevents this escape by agreeing to symbols such as the percent sign out of the current output, such as:

rewriterule/foo/(. *)/bar?arg=p1\=$1 [R,ne] enables '/foo/zed ' to turn to a secure request '/bar?arg=p1=zed '.

Passthrough|pt (hand over to the next processor pass through)

This flag forces the rewrite engine to set the URI field in the internal structure Request_rec to the value of the FileName field, which is only a small change that makes it possible to Alias from another URI to the file name translator. Scriptalias, the output of Redirect and other instructions might be processed. Give an example of what it means: Assume that you want to rewrite/ABC to/def through the mod_rewrite rewrite engine, and then turn/def into/ghi by Mod_alias. Can do this:

Rewriterule ^/ABC (. *)/def$1[PT]

Alias/def/ghi
Assume that the PT tag is omitted, although the mod_rewrite works fine. That is, as a URI using the API to the file name translator, it is able to rewrite uri=/abc/... For filename=/def/, however, perhaps Mod_alias is trying to make a URI to the file name of the translation. Will fail.

Note: It is necessary to use this tag if you need to mix a different module that includes a URI to the file name translator. Mixed use of Mod_alias and mod_rewrite is a typical example.

For Apache hackers

Assume that the current Apache API is in addition to the URI to the file name hook. Another file name to the file name of the hook, there is no need for this tag! But. Assuming there is no such hook, this tag is the only solution. Apache Group has discussed this issue. This hook will be added to the Apache 2.0 version number.

skip| S=num (Skip the successor rule skip)

This flag forces the rewrite engine to skip the NUM rules succeeding the current matching rule. It can implement a pseudo-If-then-else construct: The last rule is then clause. The skip=n rule that has been skipped is an else clause. (IT and ' chain| The C ' tag is different!)

env| E=var:val (environment variable setting environment variable)

This flag causes the value of the environment variable VAR to be Val, and Val can include the extensible inverse reference of the regular table $n and%n.

This tag can be used multiple times to set multiple variables. These variables can be referenced indirectly in many cases, but generally in Xssi (via) or CGI (e.g. $ENV {' VAR '}), and can be referenced by%{env:var} in the pattern of subsequent rewritecond instructions. Use it to peel and remember some information from the URL.

cookie|co=name:val:domain[:lifetime[:p Ath]] (set cookies)

It sets a cookie on the client browser. The name of the cookie is "name" and its value is Val. The Domain field is the field of the cookie. For example '. apache.org ', the optional lifetime is the number of minutes of the cookie's lifetime, and the optional path is the cookie.

Case:

Content of City_map.txt:

Hangzhou 12

Beijing 13

1, hangzhou.google.com/tianqi/20090401 jump to www.google.com/service/detail.html?id=tianqi&date=20090401

[HTML]View Plaincopy
    1. Rewritemap City-map Txt:/etc/httpd/conf.d/map/city_map.txt
    2. Rewritecond%{http_host} ^ (. +) \.google\.com$
    3. Rewriterule ^/([\w]+)/([\d]+) $/service/detail\.html\? ID =$1& Date =$2& C =${city-map:%1|%1} [pt,l]


Explain:

%{http_host}: Fetch the requested domain name

^ (. +) \.google\.com$:^, beginning; $ end.

. (comma), optional characters except Terminator. +, repeat one or more characters. \。 The escape character.

^/([\w]+)/([\d]+) $:[], set character. \w, numbers or letters. \d, numbers.

$: Represents a string that conforms to [\w]+] in Rewriterule, which is Tianqi.

$: Represents a string that conforms to the [\d]+ Rewriterule], which is 20090401.

% 1: Represents the Rewritecond in accordance with the. + regular string, which is Hangzhou.

${city-map:%1|%1}: Represents the value of%1 in City-map, which is Hangzhou, assuming none is%1 or Hangzhou.

2, can you see the following rules are done?

[HTML]View Plaincopy
  1. Rewritecond%{http_host} ^ (. +) \.google\.com$
  2. Rewriterule ^/([\w]+)/([^-]+)-([^-]+)-([^-]+)-([^-]+)--([^-]+)-([^-]+)-([]-([^-]+-[^-]+--[^-]+-[^-]+--[^-]+-[^-]+ ) $/$1/$2=$3&$4=$5&$6=$7&$8 [C]
  3. Rewritecond%{http_host} ^ (. +) \.google\.com$
  4. Rewriterule ^/([\w]+)/([^-]+)-([^-]+)-([^-]+)-([^-]+)-([^-]+)-([^-]+] $/service/list\.html\?) Frontcategoryid =${category-map:$1|0}&$ 2 =$3&$ 4 =$5&$ 6 =$7& City =${city-map:%1|%1} [pt,l]


Explain:

The rule is to convert-(middle dash) to =, turn-(two lines) into &.

[^-]:^ denotes a reverse selection within the character set symbol ([]), which represents the beginning of the line, so that it does not start with a-.

Since $n,n maximum is 9. So use C, with the second rewriterule the last node in the first rewriterule. That is, $8, to continue the conversion.

In addition, the rewrite rule assumes that you encounter Chinese, it is quite possible garbled problem, because Apache in rewrite will do a URL decoding. When JK makes a request forward, it is no longer the encoded string. In this case, it is possible to encode two times at a time (encode), or to take a byte stream with Iso-8859-1 when the request is received, and then use UFT-8 to new String. (New String (Str.getbytes ("Iso-8859-1″)," UTF-8

URL rewriting: Rewritecond instruction and Rewriterule instruction format

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.