Computing of computer storage capacity (bytes)

Source: Internet
Author: User

The most basic capacity unit used by a computer to indicate the size of a bucket is

Byte: bytes are the units in which information is transmitted over the network (or stored in hard disk or memory.

 

Bytes

Open Category: Computer, unit, storage, capacity, optical storage
Z tianjié
Byte: bytes are the units in which information is transmitted over the network (or stored in hard disk or memory.

Byte is a unit of measurement used by Computer Information Technology to measure storage capacity and transmission capacity. one byte is equal to eight-bit binary.

In ASCII code, an English letter (case-insensitive) occupies the space of one byte, and a Chinese character occupies the space of two bytes.
Symbol: English Punctuation occupies one byte, and Chinese Punctuation occupies two bytes.

A sequence of binary numbers, which is generally an 8-bit binary number as a numerical unit in a computer. For example, an ascii code is a byte.UnitOfConversionIs:

1 gib (TB, terabyte) = 1024 gib (40 to the power of 2)
(1 TB = 1024 GB)
1
JiByte (GB, gigabyte) = 1024 MB (2 to the power of 30 bytes)
(1 GB = 1024 MB)
1
MBByte (MB, megabyte) = 1024 kilobytes (20 to the power of 2)
(1 MB = 1024kb)
1024 bytes (kb, kilobyte) = bytes (10 to the power of 2 bytes)
1 byte = 8 bits (
Bit)

Note: Larger units include petabyte (petabyte, 1pb = 1024 TB), EB (exabyte, 1eb = 1024pb), ZB (zettabyte, 1zb = 1024eb), Yb (yottabyte, 1yb = 1024zb )......

1.2 characters, in bytes, string
The key to understanding encoding is to understand the concept of character and byte accurately. These two concepts are easy to confuse. Here we will make a distinction:
Concept Description Example
The mark used by the character. It is an abstract symbol. '1', 'zhong', 'A', '$', '¥ ',......
A data storage unit in a byte computer, an 8-bit binary number, is a very specific storage space. 0x01, 0x45, 0xfa ,......

ANSI string
In memory, if the character is ANSI encoded, one character may be represented by one or more bytes, we call this string an ANSI string or multi-byte string. For example, "123 Chinese characters" (7 bytes ).

Character Set and code page
For the ANSI encoding method, different character sets (charset) exist ). The same byte sequence represents different characters in different character sets. To parse an ANSI string correctly, you must select the correct character set. Otherwise, the so-called garbled characters may occur. Operating systems of different languages have a default character set. If no character set is specified, the system uses this character set to parse the ANSI string. That is to say, if we open an ANSI text file (only text files containing ANSI strings) Saved by the Japanese operating system under windows in the Simplified Chinese version, we will see garbled characters. However, if we open this file using a text editor with encoding options such as Visual Studio and select the correct character set, we can see its original appearance. Note: The traditional Chinese character sets and traditional Chinese character sets in simplified Chinese character sets do not necessarily have the same encoding (it seems to be completely different in practice ).

Each character set has a unique number calledCodePage (code page ). The code page of Simplified Chinese (gb2312) is 936, while the default code page of the system character set is 0, which indicates selecting a suitable character set based on the system's language settings.

Unicode
If the character string is in memory and the character number exists in UNICODE, it is called a unicode string or a wide byte string. In Unicode, each character occupies two bytes. For example, l "123 Chinese characters" (10 bytes ).

because different ANSI encoding standards are different (character sets are different), for a given multi-byte string, we must know which character set it uses to know which "characters" it contains ". For a unicode string, the content of the "character" represented by it remains unchanged in any environment. Unicode has a unified standard, which defines the encoding of the vast majority of Characters in the world, so that Latin, numbers, simplified Chinese, traditional Chinese, and Japanese can be stored in the same way. references:
1. http://news.newhua.com/news1/programming/2007/1211/071211141827567CJ808093734C2I7CK.html
2. About cluster, http://gupeng.blogspot.com/2005/04/kb-mb-gb-tb-pb-eb-zb-yb.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.