What is the number of a byte?

Last Update:2016-10-23 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Byte: bytes are units that transmit information over the network (or store information in hard disks or in memory).
Bytes are a unit of measurement used by computer information technology to measure storage capacity and transmission capacity, with 1 bytes equal to 8-bit binary.
In ASCII, an English letter (not case) occupies a single byte of space, and a Chinese character occupies two bytes of space.
Symbol: English punctuation occupies one byte, Chinese punctuation is two bytes. Example: "." 1 bytes in size, Chinese period ". "Takes up 2 bytes in size
A binary number sequence, in a computer as a digital unit, typically a 8-bit binary number, such as an ASCII code is a byte, the conversion of such units is:
1000 gigabytes (tb,terabyte) =1024 gigabytes (2 of 40 square bytes)
(1TB=1024GB)
1 gigabytes (gb,gigabyte) = 1024 megabytes (2 of 30 bytes)
(1GB=1024MB)
1 megabytes (mb,megabyte) = 1.024 million bytes (2 of 20 bytes)
(1MB=1024KB)
1000 bytes (kb,kilobyte) = 1024 bytes (2 of 10 square bytes) (1kb=1024b)
1 byte (byte) = 8 bit (bit)
Note: TB is now the largest storage unit of computer hard disk. 10TB is approximately equal to the storage capacity of a human brain.
Larger units, PB (PETABYTE,1PB=1024TB), EB (EXABYTE,1EB=1024PB), ZB (ZETTABYTE,1ZB=1024EB), YB (YOTTABYTE,1YB=1024ZB) ...
1.2 characters, Bytes, strings
The key to understanding coding is to understand the concept of the character and the concept of the byte accurately. These two concepts are easy to confuse, so let's make a distinction here:
Example of concept description
Characters people use the notation, a symbol in the abstract sense. ' 1 ', ' Medium ', ' a ', ' $ ', ' ￥ ', ...
Bytes A unit of data stored in a computer, a 8-bit binary number, is a very specific storage space. 0x01, 0x45, 0xFA, ...
ANSI string
In memory, if the "character" is in ANSI encoded form, one character may be represented by one byte or more bytes, then we call this string an ANSI string or a multibyte string. For example, "Chinese 123" (7 bytes).
Character sets and code pages
For ANSI encoding, there are different character sets (Charset). The same sequence of bytes, the characters represented under different character sets are not the same. To correctly parse an ANSI string, you also choose the correct character set, otherwise it can cause the so-called garbled phenomenon. There is a default character set for different language versions of the operating system. This character set is used by the system to resolve ANSI strings without specifying a character set. In other words, if we open an ANSI text file (a text file containing only ANSI strings) that was saved by the Japanese operating system under the Simplified Chinese version of Windows, we will see garbled characters. However, if we open this file with an encoded selection of text editor such as Visual Studio, and choose the correct character set, we will be able to see its original appearance. Note: The traditional and traditional Chinese characters in the Chinese character set are not necessarily the same (it turns out to be quite different).
Each character set has a unique number, called a code page. The code page for Simplified Chinese (GB2312) is 936, and the system default character set has a code page of 0, which indicates that a suitable character set is selected based on the language settings of the system.
Unicode
String in memory, if "character" exists in Unicode, then we call this string a Unicode string or a wide-byte string. In Unicode, each character occupies two bytes. For example, "Chinese 123" (10 bytes).
The difference between Unicode and ANSI is equivalent to the "full-width" and "half-width" differences within the input method.
Since the standards set by different ANSI encodings are not the same (different character sets), for a given multibyte string, we must know which character set it takes to know what character it contains. For a UNICODE string, the "character" content it represents is always the same, regardless of the environment. Unicode has a unified standard that defines the encoding of most characters in the world, allowing Latin, digital, Simplified Chinese, Traditional Chinese, and Japanese to be saved together in one encoding.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

What is the number of a byte?

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

What is the number of a byte?

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support