Differences between ANSI Unicode MULTI-BYTE

Source: Internet
Author: User

What is ANSI and Unicode? In fact, these are two different encoding methods. ANSI adopts 8 bits, while Unicode uses 16 bits. 8-bit ANSI encoding can only represent 256 characters, indicating that 26 English letters are more than enough, but it is not enough to represent non-Western characters with thousands of characters, such as Chinese characters, Korean and Japanese, in this way, the Unicode standard is introduced.

In software development, especially some functions related to string processing in C language, ANSI and Unicode are used for distinguishing. How can we define ANSI and Unicode characters, how to use it? How can we convert ANSI and Unicode?

I. Definition:

ANSI: Char STR [1024]; available string processing functions: strcpy (), strcat (), strlen (), and so on. UNICODE: wchar_t STR [1024]; string processing functions available

Ii. Available functions:

ANSI: Char. Available string processing functions: strcat (), strcpy (), strlen (), and other functions with str headers.

UNICODE: the available string processing functions of wchar_t: functions such as wcscat (), wcscpy (), and wcslen () that are headers with WCS.

Iii. System Support

Windows 98: only ANSI is supported.

Windows 2 K: supports both ANSI and Unicode.

Windows CE: Only Unicode is supported.

Description

1. Only Unicode is supported in COM.

2. in Windows 2000, the entire OS system is Unicode-based. Therefore, using ANSI in Windows 2000 requires a price. Although no conversion is required for encoding, this conversion is hidden, CPU and memory are occupied by system resources ).

3. Unicode must be used in Windows 98. You need to manually switch the encoding.

Iii. How to differentiate:

In our software development, we often need to support ANSI and Unicode. It is impossible to re-change the string type and use the string operation functions when type conversion is required. To this end, the standard C Runtime Library and windows provide macro-defined methods.

_ Unicode macros (with underscores) are provided in the C language, and Unicode macros (without underscores) are provided in windows. If _ Unicode macros and Unicode macros are specified, the system automatically switches to the Unicode version. Otherwise, the system compiles and runs in ANSI mode.

Only macros are defined and cannot be automatically converted. It also requires support for a series of character definitions.

1. tchar

If a unicode macro is defined, tchar is defined as wchar_t.

Typedef wchar_t tchar;

Otherwise, tchar is defined as char.

Typedef char tchar;

2. lptstr

If a unicode macro is defined, lptstr is defined as lpwstr.

Typedef lptstr lpwstr;

Otherwise, tchar is defined as char.

Typedef lptstr lpstr;

 

 

Tips on internal code

Character encoding: The character encoding is based on binary numbers.
Character. Currently, the most common character set is ANSI, which corresponds to the binary encoding of the ANSI character set.
An ANSI code is used in both DOS and Windows systems,
The character encoding used must undergo binary conversion, which is called the system internal code.

Chinese character internal code: ANSI code is a single byte (8-bit binary number) Encoding
Set. It can contain up to 256 characters and cannot contain many Chinese characters.
Different Chinese Character collation sets are designed based on ANSI codes and regions
A large number of Chinese characters. These encodings use a single byte to represent ANSI English characters.
(Compatible with ANSI Code). Double Bytes are used to represent Chinese characters. Because only
There is a Chinese character internal code that cannot recognize other Chinese characters internal code, resulting in a lack of communication
.

GB code: the GB code is a simplified Chinese character encoding solution released by China in 1980.
Land and Singapore are widely used, also known as country code. The country code is used to import 6763 Chinese Characters
It is encoded and covers most Chinese Characters in use.

GBK code: GBK code is an extended character code of the GB code.
Traditional Chinese characters are encoded. Both the simplified Win95 and Win98 use GBK as the system.
.

Big5: big5 is a Chinese character code for traditional Chinese characters.
Computer Systems in Bay and Hong Kong are widely used.

Hz: Hz is a Chinese character code widely used on the Internet.

ISO-2022CJK code: IOS-2022 is an International Standard Organization (ISO)
Encoding standard for characters in different languages. Two bytes are used for encoding.
ISO-2022 CN, Japanese, Korean encoding are called JP, KR. Generally, the three are collectively called
CJK code. Currently, CJK codes are mainly used in Internet networks.

Unicode code: the Unicode code is also an international standard code, using two
Byte encoding, which is incompatible with the ANSI code. Currently
Software.

Internal Code Conversion: Due to historical and regional reasons, sometimes multiple types of text appear.
Encoding scheme, especially Chinese characters. Because the characters different from the system internal code cannot be in the system
It is displayed normally. You must convert the characters to internal code, that is, convert non-system internal code characters
Replace it with an internal character that can be recognized by the system. Antarctic star is such an excellent software.
This is the internal code converter for sitongli, magicwin98, cross-strait communication, and Chinese Character communication.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.