Code compilation Run Environment: Vs2012+win32+debug.
Introduction to 1.c++ data types
C + + is a strongly typed language. Any variable (or function) in a C + + program must follow the principle of "first use after". There are two ways to define a data type: One is to determine how the type of data is stored in memory, and the other is to determine which legitimate operations can be performed on that type of data.
The data types of C + + are divided into basic data types and non-basic data types. Where non-basic data types are called composite data types or constructed data types. To be able to embody the difference between the C + + and the traditional C language on non-basic data types, this is where non-basic data types that embody object-oriented characteristics become constructor types, and other non-basic data types are called composite data types. The data type data for C + + is as follows:
The basic data types are predefined within C + +, also called built-in (built-in) data types. Non-basic data types are data types that users create according to C + + syntax rules as needed. Here, the difference between constructing a data type and a composite data type is that an instance of a constructed data type is called an object, which is a collection of properties and methods. The real construction data type is introduced by the C + + language, which embodies the object-oriented program design idea. One notable feature of constructing data types is that the constructors defined by the type are automatically called when an instance of the data type is generated. In other words, the initialization of a variable that constructs a data type is done by the constructor.
Note: when defining a variable with a base data type, the type appears in front of the variable directly following the type. However, when you define a variable with a composite data type, the variable is not necessarily completely behind the type. For example, define an array of int a[8], and the data type of identifier A is int[8], but it appears in the middle of the data type. In addition, when defining or declaring a variable, the type must not be bracketed, such as defining a pointer in this way: (int*) p;, it represents the true meaning of converting p to int* type, which is a syntactic form of coercion of type conversions.
2. Wide character type and single character type
The traditional character char is a single-byte character type, which stores the ASCII code of the character and occupies one byte. Char can also be understood as a single-byte integer, the value range is -128~127. A single-byte unsigned integer can be represented by unsigned char with a value range of 0-255.
VC + +, if you include Chinese characters in a string, each character occupies 2 bytes, the highest bit of each byte is 1, and the width character occupies a specific implementation of the compiler, to ensure that Unicode characters can be stored. VC + + will implement wchar_t to 2 bytes, 2 bytes Obviously cannot represent all Unicode characters, but through the current system's language environment for encoding conversion, two bytes maximum can represent 65,536 characters, enough to represent a country's text.
A single-byte character is not able to hold a kanji character, as defined by Char c= ' good '; a compile warning message is given, and only low-byte encodings are stored in the character variable C.
The C + + language supports both wide character types (wchar_t), which are used to represent Unicode characters. To support the processing of Unicode characters, C + + defines the processing functions of the corresponding Unicode characters in the library functions, and puts the declarations of these functions in the header file.
In Visual C + +, whar_t and char are two different data types, and their storage structure differs from the way they are used. See the example below.
#include <iostream>using namespace STD;intMainintargcChar* argv[]) {Char* p;wchar_tS[]=l"ABC";Charname[]="Zhang San";wchar_tWname[]=l"Zhang San";cout<<sizeof(wchar_t) <<" ";//Output 2 cout<<sizeof(s) <<endl;//Output 8P= (Char*) s; for(intI=0;i<sizeof(s); ++i)cout<< (int) p[i]<<" ";cout<<endl;cout<<s<<" "; wcout<<s<<endl; for(intI=0;i<sizeof(name); ++i)cout<< (int) name[i]<<" ";cout<<endl; P= (Char*) Wname; for(intI=0;i<sizeof(wname); ++i)cout<< (int) p[i]<<" ";cout<<endl;cout<<name<<endl;//setlocale (Lc_all, "CHS");//Add this sentence below the wname will be outputwcout<<wname<<endl; GetChar ();}
Program Output Result:
Read the above procedure and draw the following conclusions:
(1) wchar_t and char are different data types, the data width is not the same, sizeof (char) ==1,wchar_t data width and compiler implementation, and then according to the current system language environment encoding conversion, enough to guarantee the storage of Unicode characters, wchar_t occupies two bytes in Visual C + +.
(2) When defining a string of type wchar_t, start with L, or a compilation error occurs. Defines a character constant of type wchar_t, and also needs to start with L, for example wchar_t wc=l ' A ', if L is removed, the compiler automatically performs a conversion from Char to wchar_t.
(3) for western characters (such as ' A ', ' B ', ' C ', etc.), in a variable of type wchar_t, the high byte holds the 0x00, and the low byte holds the ASCII value of the West character.
(4) The char-type string ends with a single-byte ' wchar_t ', and the string type is terminated with a double-byte '. '
(5) Windows7 Chinese Simplified environment A Chinese character occupies two bytes, the use of GBK encoding, so a char type string in a character two bytes, the two bytes of the highest bit is 1, so that they can be distinguished from the West character, So they get two negative numbers when the ASCII code is output. In a string of type wchar_t, each Chinese character is represented by a double-byte, with the UTF-16 encoding, and the same Chinese characters, the stored code values are different. UTF-16 encoding is incompatible with ASCII encoding, so the above code is not output properly with the cout output L "ABC". There is UTF-16 encoding will be commonly used characters in two bytes for storage, less commonly used Chinese characters with four bytes of storage, so wchar_t storage UTF-16 encoding four bytes of Chinese characters will result in data loss, can not be stored correctly.
(6) In the above program, the output of the statement cout<<name<<endl;
is "Zhang San", but the statement wcout< <wname< <endl;
does not see the output normally. If the string wname is full of Western characters, you can still see the output, which is a phenomenon in the console program that is related to the settings of the console's default locale, which is how the settings are encoded. After setting the locale through setlocale, encode the conversion, see the code in the program.
Reference documents
[1] Advanced Step-by-step tutorials for C + + Chen Gang. Wuhan University Press
[2 http://www.cnblogs.com/wpcockroach/p/3907324.html]
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Data types for C + +