Win32API Unicode encoding & wide Byte
- The computer is invented by the Americans, so the character set is based on English first. In the 30 's, to satisfy its own encoding method: ASC encoding method, 7 bits (bit) for a character, can represent the word characters 128. Because the memory is expensive before. 128 characters are enough for Americans.
- With the development of computers to Europe, it is not enough to find ASC. Develop to ASCII
- Some countries in Europe do not use English: such as Spain ~ ~
- After upgrading ASCII, 8-bit memory is used to represent the character encoding, which is 256 characters. So the first 128 are set to be permanent, the latter 128 assigned to other European languages. There are more than 10 countries in Europe and it is not enough to find it. Americans have taken a compromise: to take a "code page" mechanism to represent character encodings without adding memory. CodePage Use a number to denote a language: for example, 936 code in Chinese, or 437 code for English. If the code page is a Chinese code page, then 128 is the kanji. If code page 437 then 128 is English.
- Remember the usual 3 locations for ASCII:
- 1 ASCII code for lowercase A: # # mnemonic: (Glasses a), Hong Kong return
- 2 ASCII code for uppercase A: 65
- 3 Arabic numerals 0 ASCII code: ~ # to help remember:" stolen Holy, your incident () ~ ~" && Hong Kong returns to the founding year (97-49)
- After a few years of development, computers came to Asia, (8bit) 256 is not enough.
- That is, the Chinese character, after 128 to us is absolutely not enough. Developed into a DBCS (single-and double-byte hybrid encoding of more than 60,000 characters), the current computer mainstream encoding method.
- DBCS this encoding congenital defects, congenital "infantile paralysis", an improper treatment will produce garbled. Because English accounts for one byte, Chinese characters account for two bytes, this rule is not unique and error prone. When parsing a string, there are two criteria for parsing the string, and processing is inherently slower.
- Finally, a Unicode encoding is presented: it can be considered a patch on a DBCS, Unicode uniform specification: All characters are encoded in all 2 characters, the English characters are all cut, can be encoded in two bytes. Fill 0 on the English high byte.
- Unicode encoding method has a disadvantage: the use of memory space, there is a waste of suspicion, but in the current hardware, it is not a problem. But it is not the mainstream way of coding on the market.
- Application of Character Set
- Char occupies 1 bytes and has two bytes (DBCS encoding)
- Wide byte wchar_t per two bytes (Unicode encoding)
- wchar_t is actually unsigned short type (takes 2 bytes)
- You need to add "L" when defining. To the compiler, tell the compiler to compile the string in double-byte.
- You need to manipulate a wide-byte string using a function that supports the wchar_t type.
wchar_t* Pwsztext = L "Hello wchar";
wprintf (L "%s \ n", pwsztext);?
- It is not possible to apply the char* function in standard C, and the double-byte operation must be done using a double-byte corresponding function.
- New type in Windows: TCHAR
#ifdef Unicode
typedef wchar_t?
- Note that the location of the macro is defined, #ifdef XXX has an up-to-the-source property, if there is more than one ifdef XXX in the code should let it find a uniform definition, or can not find the definition, not self-contradictory. The
Sample code defines the location of the macro (#define Unicode) to be defined in front of the Windows.h file, because the Windows.h header file contains WINNT.H files, and WINNT.H has a "#ifdef UNICODE" determination.
#define UNICODE
#include "StdAfx.h"
#include "stdio.h"
#include <tchar.h>
#include < Windows.h>
Void T_char ()
{
TCHAR *psztext = _text ("Hello");
#ifdef UNICODE
wprintf (L"%s\n ", pszText);
#else
printf ("Single:%s\n", PszText);
#endif
}
int main ()
{
T_char ();
return 0;
}
- Example: Support for wprintf functions in Unicode encoding is limited, imperfect, needs to be replaced
//WinChar.cpp:Defines The entry point for the console application.
//
#define UNICODE
#include "StdAfx.h"
#include "stdio.h"
#include <tchar.h>
#include < Windows.h>
Void Printunicode ()
{
for (word nhigh = 0; Nhigh <256; nhigh++)
{
for (word nlow = 0; nlow<256; nlow++)
{
wchar_t WCHAR = Nhigh * +nlow;
wprintf (L "%s", &wchar);
}
printf ("\ n");
}
}
int main ()
{
Printunicode ();
return 0;
}
- The Unicode printout is to be implemented using the Writeconsole API.
BOOL Writeconsole (
? In HANDLE hconsoleoutput,//standard output handle
? In CONST VOID *lpbuffer,//buffer buffer for output content
? In DWORD Nnumberofcharstowrite,//Prepare output content length
? Out Lpdword Lpnumberofcharswritten,//return actual output content length
? In LPVOID lpreserved//Standby
? );
- Only three special handles to the device: 1 keyboard 2 monitor 3 error device (all other handles point to memory)
HANDLE WINAPI GetStdHandle (
_in_ DWORD nstdhandle//input,output, or error device
);//return value gets the corresponding standard handle
Win32API Unicode encoding & wide Byte