C string
There are 3 encoding modes and correspond to 3 character types.
(1) Single-byte character set (Single-byte character set (SBCS)).
-In this encoding mode, all characters are marked with only one byte (byte).
-ascii is a SBCS that identifies the end of a SBCS string with a byte marked as ' \ s '
-Single-byte characters contain Latin alphabet, accented characters and ASCII standard and DOS operating system-defined graphic characters.
such as "hi!" are stored as follows. (1Byte storage Value range: xx ~ FF (16 binary))
┌─────────────────────────────┐
Size 1Byte 1Byte 1Byte 1Byte
Memory Content (Hex) 48 69 21 00
Charactor H I!
└─────────────────────────────┘
(2) multibyte character set (Multi-bye character set (MBCS)).
-MBCS in Windows contains two character types, single-byte characters and double-byte characters. Since most of the multibyte characters used by Windows are two bytes long, MBCS is often replaced by a DBCS.
-In DBCS encoding mode, some specific values are reserved to indicate that they are part of a double-byte character.
For example, in Shift-jis encoding (a common Japanese encoding pattern), the value between 0x81-0x9f and 0XE0-OXFC means "this is a double-byte character, and the next subsection is part of the character." "Such values are called" leading bytes "and they are all larger than 0x7f. The byte following a leading byte subsection is referred to as "trail byte". In DBCS, trail byte can be any non-0 value. Like SBCS, the end flag of a DBCS string is also a single-byte representation of 0.
-double-byte characters are used to represent East Asian and Middle East languages.
┌────────────────────────────┐
Size 2Byte 2Byte 2Byte 1Byte
Memory Content (Hex) C4 E3 BA C3 A3 A1 00
Charactor Hello!
└────────────────────────────┘
(3) Unicode.
-unicode is an encoding mode in which all characters are encoded using a two-byte encoding. Unicode characters are sometimes referred to as wide characters because they are justifies (using more storage space) than the single-node word.
-note that Unicode cannot be considered MBCS. MBCS is unique in that its characters are encoded using different lengths of bytes. The Unicode string uses a two-byte representation of 0 as its end flag.
-unicode is used inside the COM and Windows NT operating systems.
┌───────────────────────────────┐
Size 2Byte 2Byte 2Byte 2Byte 2Byte
Memory Content (Hex) FF FE 48 00 69 00 21 00 00 00
Charactor H I!
└───────────────────────────────┘
The FF FE identifies Unicode using a small head approach. (Little endian, second (low) byte in front)
If it is FE FF it is indicated as the big head way. (Big endian Unicode) The storage of H is 00 48;
When you use char, a single-byte character is processed. Double-byte characters are also manipulated with the char type (this is one of the many strange places we will see about Gemini characters). Unicode characters are represented by wchar_t. Unicode characters and string constants are represented by the prefix L. For example:
wchar_t WCH = L ' 1 '; 2 bytes, 0x0031 wchar_t *wsz = L "Hello"; Bytes, 6 wide characters
2. In the C language, there is no string data type, and a character array ending with a null (' + ') character is used to hold the string.
Statement:
Char a[100];
Operation:
String initialization: Char a[100]= "Hello world!"; Char *p= "Hello world!"; /assignment: (Can be initialized with "=" when defined, but cannot be assigned to the C string later with "=") strcpy (A, "Ni hao!"); /Gets the string length. Strlen (a);p rintf ("%s", a);
Conversions: string--numeric int atoi (const char *nptr);
String functions:
String length: int strlen (char *str)
String copy: Char *strcpy (char *targetstr,char *orignalstr);
String connection: Char *strcat (char *str1,char *str2)
string comparison: int strcmp (char *str1,char *str2)//compares the ASCII code of the character, STR1>STR2 returns 1, equals 0
String positioning: Char *strchr (char *str,char ch)//finds the position of the returned character in the string, otherwise returns-1;
C + + string manipulation
//initialization: String mystr= "Hello world!"; /assignment: mystr= "Ni Hao";//Conversion: char*, String char a[]= "Hello world!"; String b;b=a; Conversion: string--char*string b= "Ni Hao"; char a[]= "Hello world!"; strcpy (A,b.c_str ()); A string object cannot be automatically converted to a C string. You need to use the member function c_str ();