Z tianjié Byte: bytes are the units in which information is transmitted over the network (or stored in hard disk or memory. Byte is a unit of measurement used by Computer Information Technology to measure storage capacity and transmission capacity. one byte is equal to eight-bit binary. In ASCII code, an English letter (case-insensitive) occupies the space of one byte, and a Chinese character occupies the space of two bytes. Symbol: English Punctuation occupies one byte, and Chinese Punctuation occupies two bytes. A sequence of binary numbers, which is generally an 8-bit binary number as a numerical unit in a computer. For example, an ascii code is a byte.UnitOfConversionIs: 1 gib (TB, terabyte) = 1024 gib (40 to the power of 2) (1 TB = 1024 GB) 1JiByte (GB, gigabyte) = 1024 MB (2 to the power of 30 bytes) (1 GB = 1024 MB) 1MBByte (MB, megabyte) = 1024 kilobytes (20 to the power of 2) (1 MB = 1024kb) 1024 bytes (kb, kilobyte) = bytes (10 to the power of 2 bytes) 1 byte = 8 bits (Bit)
Note: Larger units include petabyte (petabyte, 1pb = 1024 TB), EB (exabyte, 1eb = 1024pb), ZB (zettabyte, 1zb = 1024eb), Yb (yottabyte, 1yb = 1024zb )...... 1.2 characters, in bytes, string The key to understanding encoding is to understand the concept of character and byte accurately. These two concepts are easy to confuse. Here we will make a distinction: Concept Description Example The mark used by the character. It is an abstract symbol. '1', 'zhong', 'A', '$', '¥ ',...... A data storage unit in a byte computer, an 8-bit binary number, is a very specific storage space. 0x01, 0x45, 0xfa ,...... ANSI string In memory, if the character is ANSI encoded, one character may be represented by one or more bytes, we call this string an ANSI string or multi-byte string. For example, "123 Chinese characters" (7 bytes ). Character Set and code page For the ANSI encoding method, different character sets (charset) exist ). The same byte sequence represents different characters in different character sets. To parse an ANSI string correctly, you must select the correct character set. Otherwise, the so-called garbled characters may occur. Operating systems of different languages have a default character set. If no character set is specified, the system uses this character set to parse the ANSI string. That is to say, if we open an ANSI text file (only text files containing ANSI strings) Saved by the Japanese operating system under windows in the Simplified Chinese version, we will see garbled characters. However, if we open this file using a text editor with encoding options such as Visual Studio and select the correct character set, we can see its original appearance. Note: The traditional Chinese character sets and traditional Chinese character sets in simplified Chinese character sets do not necessarily have the same encoding (it seems to be completely different in practice ). Each character set has a unique number calledCodePage (code page ). The code page of Simplified Chinese (gb2312) is 936, while the default code page of the system character set is 0, which indicates selecting a suitable character set based on the system's language settings. Unicode If the character string is in memory and the character number exists in UNICODE, it is called a unicode string or a wide byte string. In Unicode, each character occupies two bytes. For example, l "123 Chinese characters" (10 bytes ). because different ANSI encoding standards are different (character sets are different), for a given multi-byte string, we must know which character set it uses to know which "characters" it contains ". For a unicode string, the content of the "character" represented by it remains unchanged in any environment. Unicode has a unified standard, which defines the encoding of the vast majority of Characters in the world, so that Latin, numbers, simplified Chinese, traditional Chinese, and Japanese can be stored in the same way. references: 1. http://news.newhua.com/news1/programming/2007/1211/071211141827567CJ808093734C2I7CK.html 2. About cluster, http://gupeng.blogspot.com/2005/04/kb-mb-gb-tb-pb-eb-zb-yb.html |