Guidelines for multi-character set (ANSI) and Unicode and string handling methods

Source: Internet
Author: User

When we write the program, the most used is the processing of the string, and the conversion between ANSI and Unicode often make us dizzy eye disorder.

It should be said that Unicode is a good way to encode, in our program should try to use Unicode encoding, when we write programs, it is best to follow the following guidelines:

Basic guidelines:

1. Imagine a text string as a character array, not a char or byte array

2. Start using common data types to represent text characters and strings (such as TCHAR,PTSTR)

The reason is that we can find the following definition in the WinNT.h header file (the code is censored):

[CPP]View PlainCopy
  1. #ifndef VOID
  2. #define VOID void
  3. typedef Char char;
  4. typedef Short ;
  5. typedef long long;
  6. typedef INT int;
  7. #endif
  8. #endif
  9. //
  10. UNICODE (Wide Character) types
  11. //
  12. #ifndef _mac
  13. typedef wchar_t WCHAR; //WC, 16-bit UNICODE character
  14. #else
  15. Some Macintosh compilers don ' t define wchar_t in a convenient location, or define it as a char
  16. typedef unsigned short WCHAR; //WC, 16-bit UNICODE character
  17. #endif
  18. typedef WCHAR *Pwchar, *lpwch, *pwch;
  19. typedef CONST WCHAR *lpcwch, *pcwch;
  20. typedef WCHAR *NWPSTR, *lpwstr, *pwstr;
  21. typedef PWSTR *pzpwstr;
  22. typedef CONST PWSTR *pczpwstr;
  23. typedef WCHAR UNALIGNED *lpuwstr, *puwstr;
  24. typedef CONST WCHAR *lpcwstr, *pcwstr;
  25. typedef PCWSTR *pzpcwstr;
  26. typedef CONST WCHAR UNALIGNED *lpcuwstr, *pcuwstr;
  27. typedef CONST WCHAR *lpcwchar, *pcwchar;
  28. typedef CONST WCHAR UNALIGNED *lpcuwchar, *pcuwchar;
  29. //
  30. UCS (Universal Character Set) types
  31. //
  32. typedef unsigned long Ucschar;
  33. #define Ucschar_invalid_character (0xFFFFFFFF)
  34. #define MIN_UCSCHAR (0)
  35. //
  36. ANSI (multi-byte Character) types
  37. //
  38. typedef CHAR *PCHAR, *lpch, *pch;
  39. typedef CONST CHAR *lpcch, *pcch;
  40. typedef CHAR *NPSTR, *LPSTR, *PSTR;
  41. typedef PSTR *pzpstr;
  42. typedef CONST PSTR *pczpstr;
  43. typedef CONST CHAR *LPCSTR, *pcstr;
  44. typedef PCSTR *pzpcstr;
  45. //
  46. Neutral Ansi/unicode Types and macros
  47. //
  48. #ifdef UNICODE//R_winnt
  49. #ifndef _tchar_defined
  50. typedef WCHAR TCHAR, *Ptchar;
  51. typedef WCHAR tbyte, *ptbyte;
  52. #define _tchar_defined
  53. #endif/*!_tchar_defined */
  54. typedef lpwch Lptch, Ptch;
  55. typedef LPWSTR ptstr, LPTSTR;
  56. typedef LPCWSTR pctstr, LPCTSTR;
  57. typedef lpuwstr PUTSTR, LPUTSTR;
  58. typedef lpcuwstr PCUTSTR, LPCUTSTR;
  59. typedef LPWSTR LP;
  60. #define __text (quote) l# #quote//R_winnt
  61. #else/* UNICODE *//R_WINNT
  62. #ifndef _tchar_defined
  63. typedef CHAR TCHAR, *Ptchar;
  64. typedef unsigned char tbyte, *ptbyte;
  65. #define _tchar_defined
  66. #endif/*!_tchar_defined */
  67. typedef lpch Lptch, Ptch;
  68. typedef LPSTR ptstr, LPTSTR, Putstr, lputstr;
  69. typedef LPCSTR pctstr, lpctstr, Pcutstr, lpcutstr;
  70. #define __TEXT (quote) quote//R_WINNT
  71. #endif/* UNICODE *//R_WINNT

3. Use explicit data types to represent bytes, byte pointers and data buffers (such as Byte, pbyte) for reasons such as

4. Use text or _t to represent literal characters and strings (these two macros are dynamically converted to the appropriate character set based on the character set properties of your own settings)

5. Perform a global substitution for the same reason as 2.

6. Modify the calculation that is related to the string. If some functions require us to pass in the buffer size of the number of characters, this time need _countof (szbuffer), rather than sizeof (szbuffer);

Sometimes we need to allocate memory for a string, so memory is allocated using bytes, and we need to use malloc (ncharacters*sizeof TCHAR) instead of malloc (ncharacters).

We can use a macro of the following style to deal with this problem:

[C-sharp]View PlainCopy
    1. #define CHMALLOC (Ncharacters) (tchar*) malloc (ncharacters*sizeof (TCHAR))

7. Avoid using the printf series of functions, especially with the%s,%s field type, to convert between ANSI and Unicode strings. The correct approach is to use the MultiByteToWideChar and WideCharToMultiByte functions

8. For UNICODE and _unicode, either define or not, because vs will define the _unicode by default when we create the project.

9. Use a secure string function, such as a function with a suffix of _s or a function prefixed with STRINGCCH, which truncates the string. The former needs to specify the string length.

10. Use the/GS and/RTCs compiler options to automatically detect buffer overflows.

Using the Unicode encoding specification is a good programming habit, but sometimes we have to use ANSI encoding, what should we do with it?

Look at the conversion of the list of Unicode and ANSI strings

See the same series of articles:

the Conversion of Unicode and ANSI strings

"Make your program more applicable--use ANSI and Unicode export functions"

http://blog.csdn.net/blpluto/article/details/5755162

Guidelines for multi-character set (ANSI) and Unicode and string handling methods

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.