Ansi, UTF8, Unicode encoding (continued)

Last Update:2013-11-15 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Ansi string we are most familiar with, English occupies one byte, Chinese characters 2 bytes, ending with a \ 0, commonly used in txt text files.
Unicode string. Each character (Chinese character or English letter) occupies 2 bytes. In the VC ++ world, Microsoft prefers Unicode, such as wchar_t.
UTF8 is A form of Unicode compression. English A is expressed as 0x0041 in unicode. In English, this storage method is too wasteful because it wastes 50% of space, therefore, the English language is compressed into one byte, Which is UTF-8 encoded. However, Chinese characters occupy three bytes in utf8, which is obviously not as cost-effective as Chinese characters, this is why utf8 is commonly used for Chinese Web pages used for ansi encoding and foreign web pages. In the program, after converting the txt file in UTF8 format of 15.7M to ANSI, the size is only 10.8 M.

Generally, you can use the two functions in the Windows header file to convert each type. Add the header file:

#include <Windows.h>

Multi-Byte Character Set-> Unicode Character Set

  __in   DWORD dwFlags,   __in   LPCSTR lpMultiByteStr,   __in    cbMultiByte,   __out  LPWSTR lpWideCharStr,   __in    cchWideChar        );

Unicode Character Set-> multi-Byte Character Set

  __in   DWORD dwFlags,     __in   LPCWSTR lpWideCharStr,   __in    cchWideChar,         __out  LPSTR lpMultiByteStr,    __in    cbMultiByte,       );

Only when a character does not have a corresponding representation in the CodePage code page, WideCharToMultiByte uses the last two parameters. When a character cannot be converted, the function uses the character pointed to by the lpDefaultChar parameter. If this parameter points to NULL, the function uses a default character. The default value is usually a question mark. This is very dangerous for file operations, because the question mark is a wildcard.

Program header file:

<iostream><><fstream><Windows.h>  std::  std;

ANSI to Unicode

* sAnsi =      sLen = MultiByteToWideChar(CP_ACP, NULL, sAnsi, -, NULL, * sUnicode =     MultiByteToWideChar(CP_ACP, NULL, sAnsi, -,);    rtxt.write((*)sUnicode, sLen*=}

Unicode to ANSI

*sUnicode = L     sLen = WideCharToMultiByte(CP_ACP, NULL, sUnicode, -, NULL, * sAnsi =      WideCharToMultiByte(CP_ACP, NULL, sUnicode, -=}

Unicode to UTF8

*sUnicode = L     sLen = WideCharToMultiByte(CP_UTF8, NULL, sUnicode, -, NULL,     * sUtf8 =      WideCharToMultiByte(CP_UTF8, NULL, sUnicode, -, );=

UTF8 to Unicode

    * sUtf8 =      sLen = MultiByteToWideChar(CP_UTF8, NULL, sUtf8, -, NULL, * sUnicode = -,*)sUnicode, sLen*=

Ansi conversion utf8 and utf8 conversion Ansi are the combination of the above two. unicode is used as the intermediate amount and can be converted twice.

During network transmission, we often use UTF8 encoding, but during program processing, we are used to ANSI encoding. At least the display of UTF8 code in VS2010 is garbled. The following functions integrate the above procedures to convert UTF8 encoding of txt files to ANSI encoding.

* changeTxtEncoding(* wcsLen = ::MultiByteToWideChar(CP_UTF8, NULL, szU8, -, NULL, * wszString = -<<wszString<< ansiLen = ::WideCharToMultiByte(CP_ACP, NULL, wszString, -, NULL, , NULL, NULL);      * szAnsi =  - changeTextFromUtf8ToAnsi( * strLine= strResult=(!+=strLine+* changeTemp= [strResult.length()+=;     strcpy(changeTemp, strResult.c_str());     * changeResult==

Problem record:
A. the length () and size () Functions of the String type return the true size of the String, excluding '\ 0 ';
B. The strlen () function of the char * type also returns the true size of the string, excluding '\ 0 ';
C. Note that the sizeof () function contains '\ 0', for example, char str [] = "Hello"; then sizeof (str) = 6.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Ansi, UTF8, Unicode encoding (continued)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Ansi, UTF8, Unicode encoding (continued)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support