What is the number of a byte?

Source: Internet
Author: User

Byte: bytes are units that transmit information over the network (or store information in hard disks or in memory).
Bytes are a unit of measurement used by computer information technology to measure storage capacity and transmission capacity, with 1 bytes equal to 8-bit binary.
In ASCII, an English letter (not case) occupies a single byte of space, and a Chinese character occupies two bytes of space.
Symbol: English punctuation occupies one byte, Chinese punctuation is two bytes. Example: "." 1 bytes in size, Chinese period ". "Takes up 2 bytes in size
A binary number sequence, in a computer as a digital unit, typically a 8-bit binary number, such as an ASCII code is a byte, the conversion of such units is:
1000 gigabytes (tb,terabyte) =1024 gigabytes (2 of 40 square bytes)
(1TB=1024GB)
1 gigabytes (gb,gigabyte) = 1024 megabytes (2 of 30 bytes)
(1GB=1024MB)
1 megabytes (mb,megabyte) = 1.024 million bytes (2 of 20 bytes)
(1MB=1024KB)
1000 bytes (kb,kilobyte) = 1024 bytes (2 of 10 square bytes) (1kb=1024b)
1 byte (byte) = 8 bit (bit)
Note: TB is now the largest storage unit of computer hard disk. 10TB is approximately equal to the storage capacity of a human brain.
Larger units, PB (PETABYTE,1PB=1024TB), EB (EXABYTE,1EB=1024PB), ZB (ZETTABYTE,1ZB=1024EB), YB (YOTTABYTE,1YB=1024ZB) ...
1.2 characters, Bytes, strings
The key to understanding coding is to understand the concept of the character and the concept of the byte accurately. These two concepts are easy to confuse, so let's make a distinction here:
Example of concept description
Characters people use the notation, a symbol in the abstract sense. ' 1 ', ' Medium ', ' a ', ' $ ', ' ¥ ', ...
Bytes A unit of data stored in a computer, a 8-bit binary number, is a very specific storage space. 0x01, 0x45, 0xFA, ...
ANSI string
In memory, if the "character" is in ANSI encoded form, one character may be represented by one byte or more bytes, then we call this string an ANSI string or a multibyte string. For example, "Chinese 123" (7 bytes).
Character sets and code pages
For ANSI encoding, there are different character sets (Charset). The same sequence of bytes, the characters represented under different character sets are not the same. To correctly parse an ANSI string, you also choose the correct character set, otherwise it can cause the so-called garbled phenomenon. There is a default character set for different language versions of the operating system. This character set is used by the system to resolve ANSI strings without specifying a character set. In other words, if we open an ANSI text file (a text file containing only ANSI strings) that was saved by the Japanese operating system under the Simplified Chinese version of Windows, we will see garbled characters. However, if we open this file with an encoded selection of text editor such as Visual Studio, and choose the correct character set, we will be able to see its original appearance. Note: The traditional and traditional Chinese characters in the Chinese character set are not necessarily the same (it turns out to be quite different).
Each character set has a unique number, called a code page. The code page for Simplified Chinese (GB2312) is 936, and the system default character set has a code page of 0, which indicates that a suitable character set is selected based on the language settings of the system.
Unicode
String in memory, if "character" exists in Unicode, then we call this string a Unicode string or a wide-byte string. In Unicode, each character occupies two bytes. For example, "Chinese 123" (10 bytes).
The difference between Unicode and ANSI is equivalent to the "full-width" and "half-width" differences within the input method.
Since the standards set by different ANSI encodings are not the same (different character sets), for a given multibyte string, we must know which character set it takes to know what character it contains. For a UNICODE string, the "character" content it represents is always the same, regardless of the environment. Unicode has a unified standard that defines the encoding of most characters in the world, allowing Latin, digital, Simplified Chinese, Traditional Chinese, and Japanese to be saved together in one encoding.

What is the number of a byte?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.