[ASP] Let you know the importance of codepage

Source: Internet
Author: User

These days to study the UTF-8 code, too dizzy, my opinion and you discuss.
Welcome to the Grant AH. The following are my thoughts, where there are wrong, please do not hesitate to enlighten, help point out.

Related digression:

First, operating system
The window system is all Unicode inside. Folder names, file names, and so on are all Unicode and can be displayed correctly under any language system.

Second, Input Method:
Microsoft Pinyin output is Unicode, intelligent ABC output is Simplified Chinese (so intelligent ABC in the non-Simplified Chinese system can not be used, can only play English).

Third, the textarea of the Web page
The textarea of the Web page is displayed in Unicode. So you can show me what you're going to do. And some flash do the input box is not.

Four, Access2000
The data that is stored in Access is Unicode and can be displayed under any language system.
If the data view looks at some characters that are not normal, it is because the font used for the display is not a Unicode font.
The Arial Unicode MS font will be all displayed. (Access Help, search, enter Unicode, with instructions)

Five, Word
Word in the simplified conversion, simplified to traditional, internal code is still simplified Chinese, in fact, only simplified traditional characters.

Six, the ASP is Unicode, all text is in Unicode storage. Converts to the specified character set when needed.

First of all, the conclusion:
<%@ codepage=936%> Simplified Chinese
<%@ codepage=950%> Traditional Chinese
<%@ Codepage=65001%>utf-8

CODEPAGE specifies what IIS encodes to read the string passed over (form submission, address bar delivery, etc.).

also specifies the encoding to which all text variables are converted from Unicode,
Also specifies the encoding from which the data fetched from the database is converted from Unicode. (Note this, it's important.) )

Key words:
READ: A string, read in simplified reading is some words, according to traditional reading is some words, string string itself has not changed.

Conversion: System active conversion, such as from the "translation" of Unicode to the word "Big5", the inner code into a Big5. If Big5 does not have a corresponding word, preserve the Unicode form (& #xxxx;)

Simplified Chinese: six conclusions
The Unicode16 form:& #x5316;& #x516d;& #x4e2a;& #x7ED3;& #x8bba;
The Unicode10 form:& #21270;& #20845;& #20010;& #32467;& #35770;

Here's what I figured out about the coding conversion process:
Client: IME unicode--input Box unicode--from Unicode Press CharSet to corresponding encoding ()--form send code

Server side: IIS unlocks form Encoding-------------------codepage specified encoding read--conversion to corresponding unicode--can be read with request ("")--do some processing--save to database in Unicode encoding

Server side: Reads the Unicode data of the database, converts to codepage the specified encoding---generates the source code--ie reads the display by CharSet.

The following examples illustrate:
Example one:
Suppose there are three ASP pages, a typical message page:
1.write.asp a simple input form, submitted to add.asp.
<meta http-equiv= "Content-type" content= "text/html; Charset=big5 ">
2.add.asp receive messages, save to database
<%@ codepage=936%>
3.read.asp from the database to get a message, show.
<%@ codepage=936%> charset=gb2312 or
<%@ codepage=950%> Charset=big5

You can guess, I in write.asp with the Microsoft Pinyin Input Method input "six discussion". What will be displayed at the end of the read.asp.
Are you dizzy. Let's analyze it from the beginning.

Case TWO:
To change the add.asp of the example of <%@ codepage=936%> to <%@ Codepage=950%>

What we found here.
1. If the input text and charset corresponding to the different, a conversion, you may appear in Unicode form of the word. This is where the reason lies. After that, the whole process is preserved.
2. In Add.asp, codepage determines the text that is saved to the database, and which language corresponds to Unicode. Like codepage=936,
Then the database is stored in Simplified Chinese Unicode (database to get back to the Simplified Chinese system, all normal),
codepage=950 is the traditional Chinese Unicode. (Take back the Simplified Chinese system, it's not right).

3. Note the process of the string change:

1 Input Method---charsetunicode----The mapping of the specified character set
2 Charset----Form code string Simple code
3 The form decodes the reverse process of the step, two steps to cancel.
4 string à codepage read string unchanged, this step is possible "misunderstanding read"
5) to the corresponding Unicode codepage specified character set----Unicode mapping
6 intermediate processing, no change into the database, directly into the form of Unicode
7) Press codepage to read the database Unicode----codepage The mapping of the specified character set
8) display, press CharSet to specify the character set to read the string unchanged.

For example, a description:

Case TWO:

Dizzy. Now to use knowledge.

Case 1.
The Simplified Chinese system runs the good code, puts in the foreign space, the database garbled, the original data also garbled.
Analysis: Because most people usually use the Simplified Chinese system, the default codepage=936, so we usually do not write also has no relationship.
But in the foreign space problem came out. From the database to the Unicode conversion to the English code went, so the original Simplified Chinese database conversion to English, the GB display of natural garbled.
As shown in the figure, the newly typed text appears normal, but the database is stored in English Unicode.
Solution: All add <% @codepage = 936%>.
The whole process only simplified Chinese and corresponding to the conversion between Unicode.

Case TWO:
Simplified Chinese code and data, want to become a full traditional version, how to do.
Analysis: 1. code file encoding all changed to Big5, the file itself to save the encoding selected traditional.
2. <%@ codepage=936%>
3.charset=big5
The 4.access version doesn't matter, because the data in Access is Unicode.
5. OK, the code can run under the pure traditional system.
6. Legacy problems: The original Simplified Chinese data read out there will be some question marks. Effect of the same example one of the 950 read, Big5 display. Since Unicode in Simplified Chinese is converted to traditional Chinese, some of the characters are not in traditional form, and there is a question mark.
7. Solve: With a temporary ASP page, codepage=65001, read the Unicode for Simplified Chinese, with a unicode->big5 function, converted to traditional Chinese, and then write back to the database, it should be OK.

Two cases were completely deduced by me, unconfirmed.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.