Display the contents of bmpstring in the terminal of Linux

Source: Internet
Author: User

In the previous blog post, we described how to output bmpstring content in the console interface of Windows, but the methods there are not available under Linux. If you put the sample code there to execute under Linux, the output is garbled. The reason for garbled is that variables of type wchar_t differ in the length of bytes under Windows and under Linux.

The following C programs can be executed under both Windows and Linux:

#include <stdio.h> #include <wchar.h> #if defined (_WIN32) | | Defined (_win64)  #include <stdlib.h> #endifint main (void) {  printf ("Wide character (wchar_t type) length Is%d bytes.\n ", sizeof (wchar_t)); #if defined (_WIN32) | | Defined (_win64)  system ("pause"); #endif  return 0;}

The output is different. Under 64-bit Windows, the Microsoft compiler compiles the 32-bit and 64-bit executables separately, and the output proves that the wchar_t type variable is 2 bytes long. Under 64-bit Linux, the 64-bit GCC is compiled and executed, and the output proves that the wchar_t type variable is 4 bytes long.

When processing bmpstring under Windows, for example, in the character "medium", the corresponding UTF-16 encoding is 0x4e, 0x2D, we process the method is to convert it from Big-endian order to Little-endian order, and then use the wprintf () function Output.

When processing bmpstring under Linux, for example, in the character "medium", the corresponding UTF-16 encoding is 0x4E, 0x2D, we process the method is to expand it from 2 bytes long to 4 bytes long, expand the newly added two bytes of the value of 0, that is, become 0x0, 0x0, 0x4E, The 0x2D is then converted from Big-endian order to Little-endian order, that is, the encoding becomes 0x2D, 0x4E, 0x0, 0x0, and then the wprintf () function is output.

So in Linux to display bmpstring, the general method is to first UTF-16 Big-endian mode of the character encoding to UTF-32 Big-endian way of character encoding, the extension method is preceded by two bytes of 0, and then UTF-32 Big-en The character encoding of the Dian mode becomes the character encoding of the UTF-32 Little-endian mode. (You can also convert the character encoding of the UTF-16 Big-endian mode to the Little-endian order, and then add two bytes to the back of 0, the effect is the same.) )

A sample program is given below:

/*************************************************** Author:han wei* Author ' s blog:http://blog.csdn.net/henter/* Date:oct 31th, 2014* description:demonstrate How to print bmpstring on Linux console*********************************** /#include <stdio.h> #include <stdlib.h> #include <string.h> #include <locale.h > #include <wchar.h>/*************************************************** function Name: printbmpstringonlinux* function: In Linux terminal output bmpstring* parameter: bmpstring [in] bmpstring_len [in] bmpstring length, in bytes * return value: 0 Success-1 loss Defeated **************************************************/int Printbmpstringonlinux (unsigned char *BMPString, unsigned  int Bmpstring_len) {unsigned char *buffer;  unsigned int buffer_len, I;  unsigned char *p, *q; Buffer_len = Bmpstring_len * 2 + 4; /* Buffer size is twice times the byte length of bmpstring plus four bytes, these four bytes are used to hold the string terminator (its type is wchar_t), the corresponding encoding is 0x 0, 0x0, 0x0, 0x0 */if (! ( Buffer = (unsigned cHar *) malloc (Buffer_len)) {#ifdef _DEBUG printf ("malloc () function failed!\n"); #endif return (-1);  } memset (buffer, 0, Buffer_len);  p = buffer;  Q = bmpstring;    for (i=0; i < (int) BMPSTRING_LEN/2; i++) {*p = * (q+1);    * (p+1) = *q;    p + = 4;  Q + = 2;  } setlocale (Lc_all, "Zh_cn.utf8");  wprintf (L "bmpstring:%ls\n", (wchar_t *) buffer);  Free (buffer); return 0;}  int main (void) {int error_code;  unsigned char bmpstring_data1[]={0x4e, 0x2d, 0x56, 0xfd};  /* Chinese string "China" corresponds to Unicode encoding */unsigned char bmpstring_data2[]={0x0, 0x55, 0x0, 0x73, 0x0, 0x65, 0x0, 0x72};  /* Unicode encoding for the English string "User" */char str[]={0x2d, 0x4e, 0x0, 0x0, 0xfd, 0x56, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}; if (Error_code = Printbmpstringonlinux (bmpstring_data1, sizeof (BMPSTRING_DATA1))) {printf ("Print bmpstring on Window   S console failed!\n ");  Return (-1); } if (Error_code = Printbmpstringonlinux (bmpstring_data2, sizeof (BMPSTRING_DATA2))) {printf ("Print bmpstring on Win Dows Console FAILed!\n ");  Return (-1); }/* Below is an example of how Unicode encoded characters are stored in Linux, as shown in the results: for each UTF-16 character saved with the wchar_t type, the length is 4 bytes, stored in Little-endian order  */printf ("\ n");  SetLocale (Lc_all, "Zh_cn.utf8");  wprintf (L "%ls\n", (wchar_t *) str); return 0;}

The program is compiled and executed with a 64-bit GCC compiler under 64-bit CentOS, with output such as:

Display the contents of bmpstring in the terminal of Linux

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.