Ansi, UTF8, Unicode encoding (continued)

Source: Internet
Author: User

Ansi string we are most familiar with, English occupies one byte, Chinese characters 2 bytes, ending with a \ 0, commonly used in txt text files.
Unicode string. Each character (Chinese character or English letter) occupies 2 bytes. In the VC ++ world, Microsoft prefers Unicode, such as wchar_t.
UTF8 is A form of Unicode compression. English A is expressed as 0x0041 in unicode. In English, this storage method is too wasteful because it wastes 50% of space, therefore, the English language is compressed into one byte, Which is UTF-8 encoded. However, Chinese characters occupy three bytes in utf8, which is obviously not as cost-effective as Chinese characters, this is why utf8 is commonly used for Chinese Web pages used for ansi encoding and foreign web pages. In the program, after converting the txt file in UTF8 format of 15.7M to ANSI, the size is only 10.8 M.

Generally, you can use the two functions in the Windows header file to convert each type. Add the header file:

#include <Windows.h>

Multi-Byte Character Set-> Unicode Character Set

  __in   DWORD dwFlags,   __in   LPCSTR lpMultiByteStr,   __in    cbMultiByte,   __out  LPWSTR lpWideCharStr,   __in    cchWideChar        );

Unicode Character Set-> multi-Byte Character Set

  __in   DWORD dwFlags,     __in   LPCWSTR lpWideCharStr,   __in    cchWideChar,         __out  LPSTR lpMultiByteStr,    __in    cbMultiByte,       );

Only when a character does not have a corresponding representation in the CodePage code page, WideCharToMultiByte uses the last two parameters. When a character cannot be converted, the function uses the character pointed to by the lpDefaultChar parameter. If this parameter points to NULL, the function uses a default character. The default value is usually a question mark. This is very dangerous for file operations, because the question mark is a wildcard.

Program header file:

<iostream><><fstream><Windows.h>  std::  std;

ANSI to Unicode

* sAnsi =      sLen = MultiByteToWideChar(CP_ACP, NULL, sAnsi, -, NULL, * sUnicode =     MultiByteToWideChar(CP_ACP, NULL, sAnsi, -,);    rtxt.write((*)sUnicode, sLen*=}

Unicode to ANSI

*sUnicode = L     sLen = WideCharToMultiByte(CP_ACP, NULL, sUnicode, -, NULL, * sAnsi =      WideCharToMultiByte(CP_ACP, NULL, sUnicode, -=}

Unicode to UTF8

*sUnicode = L     sLen = WideCharToMultiByte(CP_UTF8, NULL, sUnicode, -, NULL,     * sUtf8 =      WideCharToMultiByte(CP_UTF8, NULL, sUnicode, -, );=

UTF8 to Unicode

    * sUtf8 =      sLen = MultiByteToWideChar(CP_UTF8, NULL, sUtf8, -, NULL, * sUnicode = -,*)sUnicode, sLen*=

Ansi conversion utf8 and utf8 conversion Ansi are the combination of the above two. unicode is used as the intermediate amount and can be converted twice.

During network transmission, we often use UTF8 encoding, but during program processing, we are used to ANSI encoding. At least the display of UTF8 code in VS2010 is garbled. The following functions integrate the above procedures to convert UTF8 encoding of txt files to ANSI encoding.

* changeTxtEncoding(* wcsLen = ::MultiByteToWideChar(CP_UTF8, NULL, szU8, -, NULL, * wszString = -<<wszString<< ansiLen = ::WideCharToMultiByte(CP_ACP, NULL, wszString, -, NULL, , NULL, NULL);      * szAnsi =  - changeTextFromUtf8ToAnsi( * strLine= strResult=(!+=strLine+* changeTemp= [strResult.length()+=;     strcpy(changeTemp, strResult.c_str());     * changeResult==

Problem record:
A. the length () and size () Functions of the String type return the true size of the String, excluding '\ 0 ';
B. The strlen () function of the char * type also returns the true size of the string, excluding '\ 0 ';
C. Note that the sizeof () function contains '\ 0', for example, char str [] = "Hello"; then sizeof (str) = 6.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.