Display the contents of bmpstring in the terminal of Linux

Source: Internet
Author: User

In the previous blog post, we described how to output bmpstring content in the console interface of Windows, but the method there does not apply to Linux. Suppose you put the demo sample code there and run it under Linux. The result of the output is garbled. The reason for garbled is that variables of type wchar_t differ in the length of bytes under Windows and under Linux.

The following C programs can be run under both Windows and Linux:

#include <stdio.h> #include <wchar.h> #if defined (_WIN32) | | Defined (_win64)  #include <stdlib.h> #endifint main (void) {  printf ("Wide character (wchar_t type) length Is%d bytes.\n ", sizeof (wchar_t)); #if defined (_WIN32) | | Defined (_win64)  system ("pause"); #endif  return 0;}

The output is different.

Under 64-bit Windows. Using Microsoft's compiler to compile it into 32-bit and 64-bit running programs, the output proves that the wchar_t type variable length is 2 bytes.

Under 64-bit Linux. After compiling with 64-bit GCC, the output proves that the wchar_t type variable is 4 bytes long.

When processing bmpstring under Windows, for example, for the character "medium", the corresponding UTF-16 encoding is 0x4e, 0x2D, we are dealing with the method of converting it from Big-endian order to Little-endian order, and then using the wprintf () function Output.

When processing bmpstring under Linux. For example, in the character "medium". The corresponding UTF-16 encoding is 0x4e, 0x2D, and we deal with it by extending it from 2 bytes long to 4 bytes long, extending the value of the newly added two bytes to 0. That becomes 0x0, 0x0, 0x4E, 0x2D. It is then converted from Big-endian order to Little-endian order, that is, the encoding becomes 0x2D, 0x4E, 0x0, 0x0. Then use the wprintf () function to output.

So in Linux to display bmpstring, the general method is to first UTF-16 Big-endian way of the character encoding to UTF-32 Big-endian way of character encoding, the extension method is preceded by two bytes of 0. The character encoding of the UTF-32 Big-endian mode is then changed to the character encoding of the UTF-32 Little-endian method. (You can also convert the character encoding of the UTF-16 Big-endian mode to Little-endian order, and then add two bytes of 0 to the back.) The effect is the same. )

A demo sample program is given below:

/*************************************************** Author:han wei* Author ' s blog:http://blog.csdn.net/henter/* Date:oct 31th, 2014* description:demonstrate How to print bmpstring on Linux console*********************************** /#include <stdio.h> #include <stdlib.h> #include <string.h> #include <locale.h > #include <wchar.h>/*************************************************** function Name: printbmpstringonlinux* function: In Linux Terminal output bmpstring* parameters: bmpstring [in] bmpstring_len [in] bmpstring length, in bytes * return value: 0 Success-1 loss Defeated **************************************************/int Printbmpstringonlinux (unsigned char *BMPString, unsigned  int Bmpstring_len) {unsigned char *buffer;  unsigned int buffer_len, I;  unsigned char *p, *q; Buffer_len = Bmpstring_len * 2 + 4; /* Buffer size is twice times the byte length of bmpstring plus four bytes. These four bytes are used to hold the string terminator, (its type is wchar_t), and its corresponding encoding is 0x0, 0x0, 0x0, 0x0 */if (!). Buffer = (unsigned cHar *) malloc (Buffer_len)) {#ifdef _DEBUG printf ("malloc () function failed!\n"); #endif return (-1);  } memset (buffer, 0, Buffer_len);  p = buffer;  Q = bmpstring;    for (i=0; i < (int) BMPSTRING_LEN/2; i++) {*p = * (q+1);    * (p+1) = *q;    p + = 4;  Q + = 2;  } setlocale (Lc_all, "Zh_cn.utf8");  wprintf (L "bmpstring:%ls\n", (wchar_t *) buffer);  Free (buffer); return 0;}  int main (void) {int error_code;  unsigned char bmpstring_data1[]={0x4e, 0x2d, 0x56, 0xfd};  /* Chinese string "China" corresponding Unicode encoding */unsigned char bmpstring_data2[]={0x0, 0x55, 0x0, 0x73, 0x0, 0x65, 0x0, 0x72};  /* The corresponding Unicode encoding for the English string "User" */char str[]={0x2d, 0x4e, 0x0, 0x0, 0xfd, 0x56, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}; if (Error_code = Printbmpstringonlinux (bmpstring_data1, sizeof (BMPSTRING_DATA1))) {printf ("Print bmpstring on Window   S console failed!\n ");  Return (-1); } if (Error_code = Printbmpstringonlinux (bmpstring_data2, sizeof (BMPSTRING_DATA2))) {printf ("Print bmpstring on Win Dows Console FAILed!\n ");  Return (-1); }/* The following is an example of how Unicode encoded characters are stored in Linux, as shown in the results: for each UTF-16 character saved with the wchar_t type, the length is 4 bytes.  Store in Little-endian order */printf ("\ n");  SetLocale (Lc_all, "Zh_cn.utf8");  wprintf (L "%ls\n", (wchar_t *) str); return 0;}

The program is compiled and run with a 64-bit GCC compiler under 64-bit CentOS. The output results are for example:

Display the contents of bmpstring in the terminal of Linux

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.