Coding problem Learning "2"

Source: Internet
Author: User

Multibyte and wide characters

C + + in string/char*, wstring/wchar_t*

C + + Test

Below window

char* cName = "Beijing";//multi-byte convert to wide character string! unsigned short wsname[50] = {0};int Widecharcount = MultiByteToWideChar (CP_ACP, 0, (LPSTR) CName,-1, NULL, 0)-1; MultiByteToWideChar (CP_ACP, 0, (LPSTR) CName,-1, (LPWSTR) Wsname, Widecharcount + 1); for (int i=0; i<widecharcount; i++ {printf ("%d", Wsname[i]);} printf ("\ n");
Output
21271 20140 24066

Linux below

Test code such as the following:
#include <stdlib.h> #include <stdio.h> #include <string.h> #include <locale.h> #include < Iostream> #include <string>using namespace std;void multibyte_to_widechar_test (); void Read_file (const char*    fname); void Dump_uchar (unsigned char ch); int main () {multibyte_to_widechar_test ();    Read_file ("CHS");    printf ("Any key pressed to exit...\n");        GetChar (); return 0;}    void Multibyte_to_widechar_test () {typedef string str_t;    str_t Cur_loc = setlocale (Lc_all, NULL);       printf ("Cur_locale =%s\n", Cur_loc.c_str ()); SetLocale (Lc_all, "ZH_CN.         GBK ");    Char mb_buf[100];    strcpy (Mb_buf, "Beijing");        int mbstr_len = strlen (MB_BUF);     wchar_t* wcstr = NULL;     int wcstr_len = MBSTOWCS (wcstr, mb_buf, 0) + 1;    printf ("Mb_len =%d, Wc_len =%d\n", Mbstr_len, Wcstr_len);    WCSTR = new Wchar_t[wcstr_len];    int ret = MBSTOWCS (wcstr, Mb_buf, Mbstr_len);    if (ret <= 0) {printf ("Conversion failed \ n"); } else {PrinTF ("Conversion succeeded \ n");        wsprintf (L "%ls\n", wcstr);                printf ("View1 =====\n");            for (int i=0; i<wcstr_len-1; i++) {int code = (int) wcstr[i];        printf ("%d\t", code);                } printf ("\ n");        printf ("View2 =====\n");            for (int i=0; i<wcstr_len-1; i++) {int code = (int) wcstr[i];            Dump_uchar ((unsigned char) (code/256));        Dump_uchar ((unsigned char) (code%256));            } printf ("\ n"); } setlocale (Lc_all, Cur_loc.c_str ());}    void Dump_uchar (unsigned char ch) {Const char* str = "0123456789abcdef"; printf ("0x%c%c\t", STR[CH/16], str[ch%16]);}    void Read_file (const char* fname) {file* fp = fopen (fname, "R");    if (!FP) {return;    } printf ("===============\n");    Char buffer[100] = {0};    Fgets (buffer, +, FP);    printf ("%s", buffer);    printf ("View1 =========== \ n");    int len = strlen (buffer)-1; for (int i=0; i<len;i++) {Dump_uchar ((unsigned char) buffer[i]);    }printf ("\ n");    printf ("View2 =========== \ n");        for (int i=0; i<len; i+=2) {unsigned char-down = (unsigned char) buffer[i];        unsigned char high = (unsigned char) buffer[i+1];    printf ("%d", (high<<8) |down);    } printf ("\ n"); Fclose (FP);}
The Multibyte_to_widechar_test function converts multibyte encoding into Unicode encoding. Then output the Unicode string contents. Read_file attempts to read the string encoded content in the file.
CHS is directly generated via VI, with the content "Beijing", and the/base_profile set up for example the following:
Export Lc_all= "ZH_CN. GBK "
So the code for the CHS file is GBK by default.
g++ Test.cpp-o App_test, then executes the output:
[Email protected]:~/peteryfren/cpp/encode_app>./app_test Cur_locale = Cmb_len = 6, Wc_len = 4 conversion succeeded View1 =====21271
   
    20140   24066view2 =====0x53    0x17    0x4e    0xac    0x5e 0x02===============    Beijing View1 =========== 0xb1    0xb1    0xbe    0xa9    0xca    0xd0view2 =========== 45489 43454 53450 any key pressed to exit ...
   
The Unicode encoding value of "Beijing" is consistent with the output on window. The gbk2312 Code of "Beijing" is 45489,43454,53450. At the same time, Linux VI created a file encoded as GBK, consistent with the settings in Base_profile.
Convert utf-8 encoded files to Unicode by Iconv under BTW Linux:
Iconv-f UTF-8-T GBK test.txt-o pp.txt

python2.7 Test
>>> s = U ' Beijing ' >>> su ' \u5317\u4eac\u5e02 ' >>> gbks = ' Beijing ' >>> gbks ' \xb1\xb1\xbe\xa9\ Xca\xd0 ' >>> s.encode (' utf-8 ') ' \xe5\x8c\x97\xe4\xba\xac\xe5\xb8\x82 '
2.7 The following plus U represents Unicode encoding, without u using GBK encoding. python3.3 below cannot output the byte code of the string, >>s equivalent to, >>print (s)
Windows text encoding verification 1. ANSI uses Windows-brought Notepad to create a default TXT, called Npd.txt Open with UE, 16 in the binary view:
In this case, the Chinese code in the file is gbk2312 encoded. Consistent with file encoding output on Linux.
2. Unicode Notepad opens Npd.txt, then save as, you can see the encoding is ANSI, select Unicode, Save as Npd_u.txt
Unicode encoding that matches the output on Windows and Linux above.
3,utf-8 same open Npd.txt, save As, encode select Utf-8, Save as Npd_utf8.txt
The utf-8 output is consistent with the experiment in Python, which is certain.
study on the problem of string coding http://blog.csdn.net/ryfdizuo/article/details/17324051
GB18030 and the usual gdk are extensions to the gb2312, and all that has been included in the gb2312 remains the same. References

1. http://blog.csdn.net/xiaobai1593/article/details/7063535

2. GBK2312 encoding table see: http://ff.163.com/newflyff/gbk-list/

3. Unicode encoding table see: http://jlqzs.blog.163.com/blog/static/2125298320070101826277/


Coding problem Learning "2"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.