String and encoding

Source: Internet
Author: User
Tags dot net

I have encountered many character encoding problems in my text message projects over the past few months. After searching many materials, making a hand-made attempt, and giving tips, it seems that I am not so confused. I have made some summary to share with you (Please advise if there are any mistakes)

First, we should regard the byte array as the carrier of string.
The string used by dot net is unicode encoded. It also displays the string in the form of Unicode encoding.

The following describes several common functions in your own language:
(I have summarized it myself, but I don't know it. msdn)
Bytes = system. Text. encoding. Unicode. getbytes (STR)
Purpose: Convert the STR carrier to Unicode-> unicode encoding-that is, it does not convert the carrier. Because you can use this function to represent the byte array of the string carrier.
STR = system. Text. encoding. Unicode. getstring (bytes)
Purpose: Convert byte arrays to Unicode-> unicode encoding-that is, there is no conversion. The converted byte array is used as the carrier of Str.

Bytes = system. Text. encoding. utf8.getbytes (STR)
Purpose: Convert the STR carrier to utf8-> unicode encoding. Returns the converted character array.
STR = system. Text. encoding. utf8.getstring (bytes)
Purpose: encode and convert byte arrays by gb2312-> Unicode. The converted byte array is used as the carrier of Str.

Bytes = system. Text. encoding. getencoding ("gb2312"). getbytes (STR)
Purpose: Convert the STR carrier to gb2312-> unicode encoding. Returns the converted character array.
STR = system. Text. encoding. getencoding ("gb2312"). getstring (bytes)
Purpose: encode and convert byte arrays by gb2312-> Unicode. The converted byte array is used as the carrier of Str.

And so on
Bytes = system. Text. encoding. getencoding ("XXX"). getbytes (STR)
Purpose: Convert the STR carrier to the Unicode code of XXX->. Returns the converted character array.
STR = system. Text. encoding. getencoding ("XXX"). getstring (bytes)
Purpose: Convert the byte array to XXX-> Unicode, and use the converted byte array as the carrier of Str.

Here are some documents about character encoding I have collected: http: // 61.145.116.154/BM/
Also:
Http://www.unicode.org/charts/unihan.html
Check the characters such as the font, utf8, and Chinese character location codes based on the Unicode encoding.
Http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/GB/GB2312.TXT Unicode and gb2312 table
Http://www.sun.com/developers/gadc/technicalpublications/articles/mabiao.txt
Unicode and GBK table

Bytes -------------------------------------------------------------------------------------

Example 1: Get the byte array of various encodings

For "test", check the table
UNICODE: 75 109 | 213 139
(Hexadecimal :) 4B 6D D5 8b
Utf8: 230 181 139 | 232 175 149
(Hexadecimal :) E6 B5 8B E8 af 95
GBK: 178 226 | 202 212
(Hexadecimal :) B2 E2 ca D4
Gb2312: 50 98 | 74 84

Imports system. Text

Dim STR as string = "test"
Dim bytes as byte ()
Bytes = encoding. Unicode. getbytes (STR)
'Bytes: 75 109 213 139
Bytes = encoding. utf8.getbytes (STR)
'Bytes: 230 181 139 232 175 149
Bytes = encoding. getencoding ("gb2312"). getbytes (STR)
'Bytes: 178 226 202 212 <-why not 50 98 74 84? Confused ~~
Bytes = encoding. Default. getbytes (STR)
'Bytes: 178 226 202 212

Example 2: convert a UTF-8 encoded string to a unicode encoded string

Dim bytes as byte () ={ 230,181,139,232,175,149}
Dim STR as string = encoding. Unicode. getstring (bytes)
The carrier of 'str' is utf8 encoding of "test. Displayed as "& #46566; & #59531; bytes" in unicode format"
'Check the encoding table. The Unicode code 230 181 139 232 175 is exactly the "& #149; & #46566; bytes ".
Bytes = encoding. Unicode. getbytes (STR)
'Bytes: 230 181 139 232 175 149, unchanged
STR = encoding. utf8.getstring (bytes)
'Str: "test"

Example 3:
(Reference http://expert.csdn.net/Expert/topic/1861/1861857.xml? Temp =. 558407)
For "items", the table is as follows:
UNICODE: 42 78
Utf8: 228 184 170
GBK: 184 246

Dim s as string ="
Dim B as byte ()
B = encoding. utf8.getbytes (s)
'Convert 42 78 to Unicode-> utf8 B: 228 184 170
S = encoding. Default. getstring (B)
'Convert 228 184 170 to GB-> Unicode. S: "Juan" (the Unicode code of "Juan" is: 147 109)
B = encoding. Unicode. getbytes (s)
'At this time, the carrier of S is 147 109 0 0 <-- the problem has already occurred.
B = encoding. Default. getbytes (s)
'Convert the 147 109 0 0 0 carrier of S to Unicode-> GB encoding. B: 228 184 0.
S = encoding. utf8.getstring (B)
'Convert 228 184 0 to utf8-> unicode encoding. S = ""
B = encoding. Unicode. getbytes (s) 'B (0) = 0 B (1) = 0
'At this time, the carrier of S is 0 0.

Example: the conversion process of a string that passes through unicodetoutf8-> gbtounicode-> unicodetogb-> utf8tounicode is taken for granted as the final string. However, some cases are not (for example, this example ).
Cause: the GB string can be converted to Unicode (for non-existent uicode characters in GB, it will be encoded as "?" ), But the premise is that the string to be converted must be in GB encoding. For
"Convert 228 184 170 to GB-> Unicode", while 228 184 170 is a "-" UTF-8 encoding, so data is lost during conversion.

Note: This article from http://search.csdn.net/Expert/topic/1880/1880675.xml? Temp =. 6692926.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.