1. Characters
2. Integral type number
3. Floating-point numbers
4. Pictures, sounds, videos
1. Characters
The code mainly has the input code, the machine code, the glyph code three kinds. The input code refers to the keyboard and other external devices input memory when the encoding, in-machine code refers to the memory/hard disk encoding. The Glyph code is the character lattice corresponding to the display of the display.
We mainly discuss the in-machine code.
Mainly four ASCII GBK Unicode Utf-8
ASCII is the oldest encoding, only for Latin and control. The 7Bit later expanded to 8 bits.
GBK since ASCII cannot store Chinese, Chinese standards have been established, with two bytes indicating that the previous ASCII is the lowest byte and the highest bit is 0 (the default GBK encoding on Windows)
Unicode because people all over the world use computers, you have to add characters, not enough. Use Unicode (2B) for the same specification. Unicode is just a set of symbols that specifies only the binary code of the symbol, but does not specify how the binary code should be stored
UTF-8 If all the English is too wasted space, so there is a UTF-8 encoding (note that this is only a Unicode implementation of the essence) Chinese characters generally 3 bytes, with the minimum prefix encoding method
ASCII encoding can actually be seen as part of the UTF-8 encoding, so a large number of legacy software that only supports ASCII encoding can continue to work under UTF-8 encoding.
In computer memory, Unicode encoding is used uniformly, and is converted to UTF-8 encoding when it needs to be saved to the hard disk or when it needs to be transferred.
When editing with Notepad, the UTF-8 characters read from the file are converted to Unicode characters into memory, and when the edits are complete, the conversion of Unicode to UTF-8 is saved to the file
Problems in the program:
1. Char single byte save ASCII wchar_t two bytes
#define LPSTR char* #define LPCSTR const char* #define LPTSTR tchar* #define LPCTSTR Const tchar* #define LPWSTR wchar_t* #define lpcwstr Const wchar_t*
St,t indicates that tchar,w represents wchar_t. Finally, say TCHAR.
#ifdef _unicode#define TCHAR wchar_t#else#define TCHAR char#endif
2. If the file is opened as text, the \ n characters are read differently depending on the operating system. Linux is 10 in Windows. And there's the sub. This file terminator
2. Integral type number
It is divided into two types: signed and unsigned. Considering the design of the operation and logic circuit, signed number is stored in the complement, because the data storage is fixed length so the operation is (a calc b)%2^n, then we found that if a number is reduced eg:a->b (b<a)
You can consider becoming B+2^n c=b+2^n-a
Floating point number
IEEE754 Standard. The numbers are divided into sign bits, the trailing digits (symbol bits are listed separately, so it is convenient to use the original code), and the order code bit. (Shift code representation)
Eg:float 32-bit, 1-bit sign bit, 23-bit mantissa, 8-bit order.
For floating-point numbers, the function of the order code is to compare the size, the code, and therefore does not involve the operation but requires
1, compare size convenient, 2, special values (0 and Max) are tested easier
The scope of the definition is -127~128. The shift code is now defined as 127. The number of fields that span the 0~255.
If the exponent is 0 and the fractional part of the mantissa is 0, this number ±0 (and the sign bit related)
If the exponent = 2^{e}-1 and the fractional portion of the mantissa is 0, this number is ±∞ (same as the sign bit correlation)
If the exponent = 2^{e}-1 and the fractional portion of the mantissa is not 0, this number is represented as not a number (NaN).
Cond.................
Data representation in a computer