Urlencode () in gb2312 and UTF-8 ()

Source: Internet
Author: User

I. Why do I need to set up locale to be mentioned by Qian, and set that locale has no direct relationship with whether you can browse Chinese webpages, even if you set locale to a standard English locale such as en_US.ISO-8859-1, you can still browse Chinese Web pages, as long as your system has the corresponding character set (this is not necessarily required) and the appropriate font (RU simsun), the browser can translate webpages into Chinese for you to see. The specific process is that after the web page is transmitted to your machine, the browser will determine the corresponding encoding character set, according to the character set used by the web page, find the appropriate font in the font library, the text rendering tool displays the corresponding text on the screen.

Dynamic Web site: url = "finduser. asp? Name = "& server. urlencode (" Stealing cats "). It seems that everything is just a matter of course. However, many people generally "know it, but do not know why ".
Rugoyou only use gb2312 or UTF-8 as a website, and you do not need to analyze referrer from other websites. It does not matter if you do not know the difference between server. urlencode and the two types of encoding. Rugoo needs to transfer content between two websites, analyze keywords from search engines, or use ajax to transmit information. Then the encoding problem will immediately appear in your face.
If you try to identify the problem, you can search Baidu and Google for the same keyword first. Ru "excellent", Baidu search, the URL shows "url code.
To clarify these two types of encoding, we should first start with "GB" and "Unicode. When foreigners invented the electronic computer and used a narrow ASCII code, the Chinese themselves extended the compatible "gb2312" code, from "gb2312" to "GBK" to "GB", it is backward compatible again and again. In the end, the foreign guy defined a "Unicode" encoding that is completely incompatible with the "GB Series. As a result, Ru has created two main factions: gb2312 and UTF-8 in network applications.
Contains a string of "%" URLs. In fact, its source and destination are related to encoding. First, in different encodings, "server. urlencode (" ")" produces different results. Second, different results can only be identified by the corresponding encoding system. In general, the two processes on the same website do not conflict with each other. But this is only "General", because even on the same website, each different "asp" page can also be individually encoded.
In China, it is hard to say what are the advantages and disadvantages of these two encodings. However, after leaving China, the computer may not support the gb2312 character set, but it must support the Unicode Character Set. Therefore, with the development of website production, UTF-8 has become a development trend. While Ajax communication uses UTF-8 as the default encoding method (in contrast, form communication automatically determines the encoding used by the webpage ).
The following describes how the websites with a serial of percentage signs are counted. They correspond to "gb2312" and "Unicode. For example, if the gb2312 code of the word "steal" is 0xcdb5, the URL is "% Cd % B5 ". UTF-8 is more complex because there is a conversion between Unicode and UTF-8. The Unicode code of "steal" is 0x5077, and is converted to UTF-8 to 0xe581b7. Therefore, the conversion to URL is "% E5 % 81% B7 ". Conversion from Unicode to UTF-8 is not discussed in this article.
Next, let's talk about how to extract the corresponding characters from the urlencode string. The request. querystring method is simple. However, this method can only be used for webpages with the same encoding, and can only be obtained when the URL is passed. To extract different codes, you can first convert them into gb2312 or Unicode codes, and then use the CHR () and chrw () functions respectively. You can find the function "urldecode" on the Internet, but it only contains the gb2312 encoding part, and the code is not very scientific.
However, I found two problems during the test: first, the CHR () function fails in the ASP code page =, that is to say, only ASP Web pages of gb2312 can implement arbitrary conversion of these two types of codes. The second is to directly enter the URL without urlencode in the address bar "? S = Excellent ", codePage =" "won't recognize, but IE will input"? S = % E4 % B8 % 80% E7 % Ba % A7 % E6 % A3 % 92 "in memory as"? S = level 1 stick ".

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.