Solution to merge URLs with the name of a Chinese path Operator

Source: Internet
Author: User
URL-specific solution containing the name of a Chinese router operator network technology add comments September 08

Solution to merge URLs with the name of a Chinese path Operator-Use mod_filesiri to solve the problem of Chinese nickname

Of course, the best way to solve the problem of paying Chinese name is to "never use Chinese name". However, in many cases, we are still forced to use Chinese anonymous names. This is the case ....

The most common problem encountered when using a Chinese domain name on a Web server is the issue where users cannot access the domain name.

The main reason for this kind of attention is that, in the definition of the URL, there is no resource related to the character set. Only the non-ASCII characters in the URL must be used to encrypt the objects (for example, % A4)

For example, for the URL containing the Chinese name below

Http://bbs.giga.net.tw/fileiri/ .html

Because the characters "Chinese" are not valid ASCII characters, When you access the URL string above in the website column of the browser, the escape operator uploads non-ASCII characters in the form of % HH.
(For the URL encoding method, refer to the URL encoding method.
Non-ASCII characters in Uri attribute values)

However, should the text in the URL be represented in big5 or a UTF-8 Character Set and contain characters?

In the IE of microservices,Tools → webpage selection → progressThere is an option to "always send URL into UTF-8 」
If the English version is "always send URLs as UTF-8 」

When this option is checked (this is also the norm set in most computers), the text in the URL will be considered Unicode and sent in the UTF-8 encoding method, therefore, the URL will be secretly converted:

Http://bbs.giga.net.tw/fileiri/%E4%B8%AD%E6%96%87.html

That is to say, the word "Chinese" will be converted into "% E4 % B8 % ad % E6 % 96% 87" type by the keyboard using a UTF-8, then, send the signed URL to the server.

Most of the text in the UTF-8 will become 3 bytes, therefore, the word "Chinese" is changed to the preceding six bytes "% E4 % B8 % ad % E6 % 96% 87 」

But if the aforementioned "always send URL into UTF-8" option isNot checkedThat is not the case,
At this time, the text in the URL will be sent in the form of big5, And the URL will become like this:

Http://bbs.giga.net.tw/fileiri/%A4%A4%A4%A5.html

Each text in big5 bytes is converted into two bytes, therefore, the word "Chinese" becomes "% A4 % A4 % A5" in the preceding four bytes 」

This is an example of using the receiver. Different connectors or even different settings are used, the URL encoding formats sent by the same Chinese domain name may be different.

How should the server handle this URL string?

As described above, the URL itself does not include the charset information, therefore, the server cannot understand how to use the character set of the remote host to parse and parse Chinese characters in the URL.

As a result, the server's role becomes "what kind of role names are used to handle the case 」

After receiving the URL of the UTF-8, I used the UTF-8's case name to file system to find the case, and received the URL of big5 was found, use the big5 case name to go to file system to find the case.

So if the registration name in the legal system is using a UTF-8, it will not be found to find the legal case with the URL of the big5 linear regression... on the contrary, if the domain name in the legal system is always using big5 protocol, then using the URL of the UTF-8 will not find the legal case!

Especially in Apache/UNIX environments, when FTP is used in Windows to handle the case, most of the Chinese case names will be stored in the big5 domain name format.

So most of these systems will require users not to tick the above "always send URL into UTF-8" option, in this way, accessing the big5 Chinese domain name will not be a problem.

However, this method is not very friendly. it is inconvenient for most netizens to change the parameter settings!

Is there any way for the configurator to solve the problem perfectly without changing the settings?

Actually, this article is about to introduceMod_filesiriThis Apache module

Mod_filesiriThe main function of the server is to allow the server to determine the zookeeper of the URL, and then help the server to perform operations on the website and website, allows the server to process the URL of the UTF-8 and other character sets at the same time

The following describes how to install and setMod_filesiriMethod:

First, obtain this module from CVS:

# fetch http://dev.w3.org/cvsweb/~checkout~/apache-modules/mod_fileiri/mod_fileiri.c

Then use apxs to install and install this module... (Please use the root identity to upload the line)

# /usr/local/sbin/apxs -i -a -c mod_fileiri.c/usr/local/share/apache2/build/libtool ...../usr/local/share/apache2/build/libtool ...../usr/local/share/apache2/build/libtool .....cp .libs/mod_fileiri.so /usr/local/libexec/apache2/mod_fileiri.so----------------------------------------------------------------------Libraries have been installed in:   /usr/local/libexec/apache2If you ever happen to want to link against installed librariesin a given directory, LIBDIR, you must either use libtool, andspecify the full pathname of the library, or use the `-LLIBDIR'flag during linking and do at least one of the following:   - add LIBDIR to the `LD_LIBRARY_PATH' environment variable     during execution   - add LIBDIR to the `LD_RUN_PATH' environment variable     during linking   - use the `-Wl,--rpath -Wl,LIBDIR' linker flagSee any operating system documentation about shared libraries formore information, such as the ld(1) and ld.so(8) manual pages.----------------------------------------------------------------------grep: /usr/local/libexec/apache2/mod_fileiri.la: No such file or directorygrep: /usr/local/libexec/apache2/mod_fileiri.la: No such file or directoryWarning!  dlname not found in /usr/local/libexec/apache2/mod_fileiri.la.Assuming installing a .so rather than a libtool archive.chmod 755 /usr/local/libexec/apache2/mod_fileiri.so[activating module `fileiri' in /usr/local/etc/apache2/httpd.conf]

Apxs not only produces mod_fileiri.so, but also uploads it to/usr/local/libexec/apache2,
It will even help you change the settings of httpd. conf!

Then, you only need to restart Apache!

# /usr/local/etc/rc.d/apache2.sh restart

The next step is to start settingMod_filesiri...

Mod_filesiriThree ctictives are available. The difference isFilesiri,Filenamecharset,OldfilenamecharsetThe location can be placed in server config/directory/virtual host, or even in. htaccess ..

FilesiriThere are four options: off, on, backwards, and only

OffYou don't have to talk about it. Set itOffThere is no preset snapshot.

OnIn this case, the category or case name encoding in the case system uses the legacy encoding method, for example, big5, then provide the case to all users using the UTF-8 for signed URLs, at the same time, if
Mod_filesiriIt is found that the URL is not signed using the UTF-8, it will make an HTTP/1.0 301 moved permanently for the URL, redirect it to the UTF-8 type URL

If the previous example shows that the setting should be as follows (I directly set the setting to the. htaccess directory of the object)

<IfModule mod_fileiri.c>  FileIRI          On  FilenameCharset  Big5</IfModule>

After this setting, we will use wget to compile it:

# Wget-S-O/dev/null http://bbs.giga.net.tw/fileiri/ .html--:29:34 --Http://bbs.giga.net.tw/fileiri/%A4%A4%A4%E5.html=> '/Dev/null' resolving bbs.giga.net.tw... 203.187.29.180connecting to bbs.giga.net.tw | 203.187.29.180 |: 80... connected. HTTP request sent, awaiting response... HTTP/1.0 301 moved permanently Date: Thu, 08 Sep 2005 02:29:37 GMT server: Apache/2.0.54 (FreeBSD) PHP/5.0.4 location: http://bbs.giga.net.tw/fileiri/%e4%b8%ad%e6%96%87.html Content-Length: 354 Content-Type: text/html; charset = iso-8859-1 X-Cache: Miss from webamupa @ gigamedia connection: Keep-alivelocation: http://bbs.giga.net.tw/fileiri/%e4%b8%ad%e6%96%87.html [following] -- 10:29:34 --Http://bbs.giga.net.tw/fileiri/%e4%b8%ad%e6%96%87.html=> '/Dev/null' reusing existing connection to bbs.giga.net.tw: 80. HTTP request sent, awaiting response... HTTP/1.0 200 OK Date: Wed, 07 Sep 2005 12:36:30 GMT server: Apache/2.0.54 (FreeBSD) PHP/5.0.4 last-modified: Wed, 07 Sep 2005 12:31:33 GMT etag: "30bd3-14-b986f340; bc359880" Accept-ranges: bytes Content-Length: 20 Content-Type: text/html age: 451 X-Cache: hit from webamupa @ gigamedia connection: Keep-alivelength: 20 [text/html] 100% [================================== ===>] (1.59 Mb/s) -'/dev/null' saved [20/20]

From the above we can see that the original use of big5 upload URL, is redirect into the UTF-8 type of url, then you can just get the results, and then the program "Chinese .html" in the render case system is named in the big5 type

ThereforeFilesiri onYou can solve most of the Chinese domain name problems!

What if your case name was named by the UTF-8? You need to useBackwardsThis option, for example:

<IfModule mod_fileiri.c>  FileIRI             Backwards  OldFilenameCharset  Big5</IfModule>

The setting above means that all protocol systems are named in UTF-8, but for URLs that do not use the UTF-8 authentication method, it will redirect it to the version of the UTF-8, so it can provide services

Filesiri has another option.OnlyThis option is setOnly provide servicesA signed URL is given to the UTF-8, while the Protocol on the protocol system uses legacy encoding. This method is not commonly used, so we will not introduce it much.

You can test the effects of various types of configuration groups.Mod_filesiriOriginal website description:

Http://www.w3.org/2003/06/mod_fileiri/

For more information about how to modify the character set of a URL, refer to the following introduction:

An Introduction to multilingual web addresses-handling the path

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.