First, let's take a look at this question: "Howmany bytes is the memory space in the Java language that the string" java "occupies? "To answer this question we must first understand what is" byte "and what is" character ".
byte: bytes are units that transmit information over the network (or store information in hard disks or in memory). Bytes are a unit of measurement used by computer information technology to measure storage capacity and transmission capacity, and1 bytes equals 8-bit binary, which is a
A 8-bit binary number, is a very specific storage space.
Characters: Symbols used by people, a symbol in the abstract sense. ' 1 ', ' Medium ', ' a ', ' $ ', ' ¥', ...
When it comes to characters, we have to mention ANSI and Unicode two different coding standards (for these two coding standards here I simply mention that if you are interested to check it yourself),the characters in ANSI are 8bit, and
Characters in Unicode are used in 16bit. (for characters that say ANSI holds English characters in single-byte, double-byte for Chinese, and Unicode, both English and Chinese characters are stored in double-byte)Unicode code is also an international standard
quasi-coded, with two-byte encoding, with the Span lang= "en-us" xml:lang= "en-US" >ansi code is not compatible. ansi rules: a less than 127 characters of the same meaning as originally, but two more than 127 Word connect prompt together, Represents a Chinese character, preceded by a byte (which he calls
high byte" from 0xa1 for 0xf7, followed by a byte (low byte) from 0xa1 to 0xfe, so that we can assemble about
names are all in the ascii in the original number, punctuation, letters are all re-compiled two bytes long encoding, this is often said "full-width " characters, while the original 127th the following are called Span lang= "en-us" xml:lang= "en" "to" half-width "characters.
unicode , whether it is a half-width of the English alphabet, or full-width Chinese characters, they are unified Span lang= "en-us" xml:lang= "en-us" > "one character "! At the same time, it is also unified ".
We can simply take a conclusion: according to ANSI coding standard, punctuation, numbers, uppercase and lowercase letters accounted for one byte, Chinese characters accounted for 2 bytes. all characters in the Unicode standard account for 2 bytes.
Let's look at the string, because there are 2 encoding standards for characters, so the string is divided into 2 types.
String (ANSI): In memory, if the "character " is in ANSI encoded form, one character may be represented by one byte or more bytes, then we call this string an ANSI string or a multibyte string.
String (UNICODE): In memory, if "character " exists in Unicode, then we call this string a Unicode string or a wide-byte string.
Since the standards set by different ANSI encodings are not the same, for a given multibyte string, we must know which encoding rule it uses to know what "characters" it contains . and to
In the case of a UNICODE string, the "character" content it represents is always the same, regardless of the environment .
As a result, the problem we raised above is solved because the characters in Java are encoded in Unicode, so the "Learn Java" string takes up 10 bytes in the Java language .
Java fill Trap-the concept of characters and bytes and their differences