C # Transcoding

Source: Internet
Author: User
Tags comparison table

String unicodestring = "China ";

// Create two different encodings.
Encoding ASCII = encoding. ASCII;
Encoding Unicode = encoding. Unicode;

Byte [] unicodebytes = Unicode. getbytes (unicodestring );

Byte [] asciibytes = encoding. Convert (unicode ascii, unicodebytes );

If you want to obtain gb2312

Encoding ANSI = encoding. getencoding ("gb2312 ");

 

Analysis on the Conversion Relationship Between utf8 and gb2312 Unicode codes

1. UCOS stands for the universal character set of universal character set
2. UTF stands for UCS Transformation Format
3. the gb2312 encoding uses the location code retrieval method. The 1-9 area stores Chinese characters, the 16-55 area stores the first-level Chinese characters, and the 56-87 area stores the second-level Chinese characters. For example, 1st Chinese characters are 1st characters in the 16 area, is: 'Ah', so 1601, that is, 0x1001 represents the location code value of this 'Ah.
To change the location code to the machine internal code, you must add the area code and the location code 0xa0 respectively. Therefore, the machine internal code of 'A' is 0xb0a1.
4. International Code = location code + 0x2020
Machine internal code = International Code + 0x8080 that is = location code + 0xa0a0
>>> Print Unicode ('\ xb0 \ xA1', 'gb2312') // convert the machine internal code to Unicode code
Ah
>>> Print Unicode (CHR (0xb0) + CHR (0xa1), 'gb2312 ')
Ah
>>> A = U' \ u00a9'
>>> Print

>>> A. encode ('utf-8 ')
'\ Xc2 \ xa9'
>>> Print CHR (65)
>>> Print unichr (0x8089)
5. unicode re-encodes Chinese characters, which is completely different from the gb2312 encoding method and sequence. Unicode starts from 0x4e00 to 0x9fa5, so Unicode and gb2312 encoding are converted, you need a conversion table for fast conversion [Luther. gliethttp]
6. The wchar_t type can be used to store Unicode characters.

In ubuntu, use
Locale displays the default encoding of the system.
Luther @ gliethttp :~ $ Locale
Lang = en_US.UTF-8
Lc_ctype = "en_US.UTF-8"
Lc_numeric = "en_US.UTF-8"
Lc_time = "en_US.UTF-8"
Lc_collate = "en_US.UTF-8"
Lc_monetary = "en_US.UTF-8"
Lc_messages = "en_US.UTF-8"
Lc_paper = "en_US.UTF-8"
Lc_name = "en_US.UTF-8"
Lc_address = "en_US.UTF-8"
Lc_telephone = "en_US.UTF-8"
Lc_measurement = "en_US.UTF-8"
Lc_identification = "en_US.UTF-8"
Lc_all =
Luther @ gliethttp :~ $ Vim utf8.c
Then enter: Luther China
Save, view and save, utf8.c's HEX:
0000000: 6C 75 74 68 65 72 E4 B8 ad E5 9B BD 0a Luther .......
Use python for further analysis:
>>> Ord ('\ n ')
10
>>> A = 'China' // use the python default encoding scheme to store values for Chinese Characters
>>>
'\ Xe4 \ xb8 \ XAD \ xe5 \ x9b \ xbd' // This is UTF-8 encoding, so this is the default Python encoding scheme in my ubutun
>>> A = u'china' // use Unicode encoding
>>>
U' \ u4e2d \ u56fd '// This is the Unicode encoding value.
>>> A. encode ('gb2312') // convert it to the location code value of gb2312
'\ Xd6 \ xd0 \ xb9 \ xfa'
>>> A. encode ('utf-8') // convert it to UTF-8 encoding
'\ Xe4 \ xb8 \ XAD \ xe5 \ x9b \ xbd' // The default encoding value of the above python is the same
>>> A = u'china'
U' \ u4e2d \ u56fd'
>>> B = A. encode ('gb2312') // convert Unicode to gb2312
>>> B
'\ Xd6 \ xd0 \ xb9 \ xfa'
>>> Unicode (B, 'gb2312') // convert gb2312 to Unicode
U' \ u4e2d \ u56fd'
Utf8 code of Unicode characters
>>> U'ge'
U' \ u845b'
Print utf8 as a visual character
>>> Print U' \ u845b'
Ge

[NOTE: The following content from: http://www.ixpub.net/thread-865394-1-1.html]
UTF-8
now understand Unicode, so what is UTF-8? Why is there a UTF-8?
convert ASCII to UCS-2, just insert a 0x0 Before encoding. Using these encodings will include some controllers, such as ''or '/', which may cause serious errors in UNIX and some C functions. So certainly, UCS-2 is not suitable for external unicode encoding.
therefore, UTF-8 was born. So how is UTF-8 encoded? How to solve the problem of UCS-2?
example:
E4 BD A0 11100100 10111101 10100000
This is the UTF-8 code of the word "you"
4f 60 01001111 01100000
This is the Unicode code of "you"
according to the encoding rules of the UTF-8, the decomposition is as follows: xxxx0100 xx111101 xx100000
concatenates numbers except X into your Unicode code.
note that the first three 1 s of the UTF-8 indicate that the entire UTF-8 string is composed of three bytes.
after UTF-8 encoding, no more sensitive characters, because the highest bit is always 1.

The conversion relationships between Unicode and UTF-8 are as follows:
U-00000000-U-0000007F: 0 xxxxxxx // No 1 represents only 1 byte
U-00000080-U-000007FF: 110 XXXXX 10 xxxxxx // The first 2 1 represented by 2 bytes
U-00000800-U-0000FFFF: 1110 XXXX 10 xxxxxx 10 xxxxxx // The first 3 1 represented by 3 bytes
U-00010000-U-001FFFFF: 11110xxx 10 xxxxxx 10 xxxxxx 10 xxxxxx // and so on
U-00200000-U-03FFFFFF: 111110xx 10 xxxxxx 10 xxxxxx 10 xxxxxx
U-04000000-U-7FFFFFFF: 1111110x 10 xxxxxx 10 xxxxxx 10 xxxxxx 10 xxxxxx
Unicode encoding to UTF-8, simply put the Unicode byte stream to X into the UTF-8.

Therefore, we can see that Unicode encoding and UTF-8 encoding have a linear conversion relationship, while unicode encoding and gb2312 encoding do not have a linear conversion relationship. Therefore, we must use a comparison table to swap Unicode and gb2312 encoding, just like the conversion between the Gregorian calendar and the lunar calendarAlgorithmSimilarly, linear computing cannot be performed [Luther. gliethttp]

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.