Java IO (1) Basics--Bytes and characters

Source: Internet
Author: User

It is the well-known "Murphy's Law" to be afraid of what to do. Java Foundation covers all aspects, dare to say that the Java foundation of the people are not just graduating students, is working n years of programmers. Programmers who work n years don't even dare everyone to say that Java is a solid foundation, or even proficient, often just "without him"-skilled.

Io This piece I am really afraid, it is not difficult, there are only two aspects: input/output. But you said it used a lot, I believe that you do not write more concurrency, concurrency is often seen everywhere, written written on the familiar, and Io is often just a module will be involved, so it is not every programmer in the development and maintenance of their own modules will use the API about IO, and encountered when often into distress, Don't know how to write.

I want to study Io, which is a willingness to solidify my Java base and hopefully become a proficient java person. This article begins with the Java IO Series and introduces several concepts: bytes and characters. The reason is that the Java IO API is divided into byte streams and character streams to understand what bytes and characters contribute to our subsequent IO understanding.

BYTE (byte)

A unit of data stored in a computer. Smaller than it is a bit (bit, also called bit), which is the smallest unit of measurement of data storage in the computer, 1 bits are stored binary data 0 and 1, as shown below.

Of course, larger than the byte is KB (Kbytes), 1KB = 1024B, then to the back is MB (megabytes), 1MB = 1024KB,GB, TB ...

There are data types in Java that represent bytes--byte, and again you might want to review some of the knowledge about byte in Java.

The first mentioned 1 bytes equals 8 bits, so that means 1 bytes can represent the maximum number is [0, 255] (closed interval), but in Java, the byte type is signed, that is, its highest bit is the sign bit. That is, remove the highest bit sign bit, there are 7 bits, then 7 binary can represent the maximum number of [0, 127], which is positive, plus the highest bit is 1 for negative numbers, the maximum number of byte data type can be expressed as [-127, 0], that is, the data range of byte type is [- 127, 127], is that really the case? Wrong. The above analysis is wrong. The value range for byte data types in Java is [-128, 127].

The reason for the error is that it does not take into account the coding problem of numerical storage in computers. So this will continue to extend to the original code, anti-code, the concept of complement.

    • The original code: the highest bit represents the sign bit, 0 is a positive number, 1 is negative, and the remaining bits represent true values. The previous error analysis was to define the values stored in the computer for the original code, so that the value range of byte data types in Java was [-127, 127].
    • Anti-code: the same highest bit indicates the sign bit, positive inverse code is the same as the original code, and negative number of the inverse code in addition to the sign bit, the rest of the counter.
    • Complement: The same highest bit indicates the sign bit, positive inverse code is the same as the original code, and the complement of negative numbers in addition to the sign bit, the remaining bit counter +1. The value stored in the computer is the complement.

Can be observed through the program, the computer's numerical storage is through the complement to store.

System.out.println ("Positive 3 of the binary source code is: 11, its complement and the original code is the same:" + integer.tobinarystring (3)); System.out.println ("negative-3 of the binary code is: 111, its complement and for (int 4bytes=32bits, only see the last 3 bits):" + integer.tobinarystring (-3) + " (Do not believe the last three-digit complement-1 to get the original code) ");

The result of the operation shows that the values in the computer are actually stored in a complementary fashion.

After knowing the original code, the inverse code, the complement, and knowing that the values in the computer were stored in a complementary fashion, we now go back to the range of byte data types in Java. Even in the form of a complement of storage, it can be determined that the range of the number of bytes in the byte array (the highest bit is 0) is [0, 127] Altogether 128 numbers, then the negative number (the highest bit is 1) The original code range is [-127, 0], the binary is [11111111, 10000000], Note that this is the original code, and this place a bit of conflict, that is, there is a 0 this expression, which is obviously unreasonable or 0 has been included in the positive number, where the byte array is actually done a certain amount of processing, that is, the complement of 0 is the -128,-0 of the original code is 10000000, Its anti-code is 11111111, its complement is still 10000000, anti-code +1 after the need to carry, but the highest point of the symbol bit, so was squeezed out, in short, the range of negative numbers is [ -128, 0), the range of byte array is [-128, 127]. The reason is that 0 and 0 are all 0 to avoid waste, and 0 is represented as-128 expands the range.

In this section, we extend the value range of byte data types in Java by the byte (byte) representation of the computer data store, and then review the encoding of the numerical storage in the computer, which should be a better understanding of the concept of byte. The following describes what the character is again.

Character (Char)

Characters represent text and symbols. Communication between people through human language, the computer through the binary system to communicate, when people-computer-people, in the middle of the computer after the media, the middle of the computer to our human language symbol "code" for transmission, and computer-human this process is called "decoding." This is a bit like the process of "encrypting" and "decrypting".

When the computer just appears when the English characters can only be transferred, where the transmission includes display and storage, the previous reference to encode storage, since the need to encode a table to indicate what a is, B is what, like the Morse code in the same password. Then the "Code table" is the encoding method called ASCII.

  

The computer continues in the development, needs to develop to other countries and regions, at this time needs to encode the Chinese characters, the Japanese, the Korean and so on, but the original ASCII certainly cannot satisfy, its design is includes the English and the symbol, at this time has appeared the ANSI code (also called the ASCII extension), this is actually a specification, A localized specification code, for example, in the Chinese operating system ANSI represents the GB2312 encoding (and of course, its extension is called GBK encoding), in the Japanese operating system ANSI is the JIS and so on. ANSI encoding uses 2 bytes to represent a character (range in 0x80-0xff), two bytes is 16 bits, theoretically can represent 216 characters, of course, this need to subtract 0x00-0x79 this range, which can represent a lot of characters. The GB2312 code also represents more than 6,000 commonly used Chinese characters. However, this coding method has brought new problems, which is only localized, that is, in the GB2312 encoding environment, the Japanese can not be encoded. So we need to do internationalization.

With the development of computer, internationalization is becoming more and more important, which of course includes the change of encoding mode, in order to avoid the condition of ANSI incompatibility, the new coding rule--unicode is developed. Unicode encoding is used in Java, which conforms to the Java cross-platform feature, which explains that the data type of char characters in Java occupies 2 bytes, because Java uses Unicode encoding and Unicode is 2 bytes for 1 characters. Unicode solves the incompatibility of different languages in different platforms, but there is a small drawback, that is, slightly more than the previous two to occupy space, in the Unicode character set in memory stored in the string we call the "wide-byte string", In fact, the work on character encoding then concentrates on how to shorten the byte space. The emphasis here is on Unicode encoding, where Unicode encoding takes up a little space because it uses 2 bytes to represent 1 characters. Even English is used for 2 bytes. ACSII and ANSI use 1 bytes to represent English. The occupancy of the space is reflected in this place, as shown in.

As you can see, this wasted 1 bytes of space, and here in fact can continue to extend the knowledge of the computer base, that is, the data in the computer is stored in memory in the big-endian mode (Big-endian, also known as high-byte in front), or small-ended mode (Little-endian, also known as low byte in front). The so-called big-endian mode is the high-bit byte at the low address of the memory, the low byte at the upper address of memory. The small-ended mode is the high-bit byte at the upper address of the memory, the low byte at the lower address of the memory. Shown in the way is the big-endian mode, you can see the low byte ran to the address of the left side is the high address. It needs to be clear that Java uses the big-endian mode.

Continue to go back to the code, because the Unicode to any character is the use of 2 bytes to represent 1 characters, will cause space waste, so on the basis of the Unicode encoding, there is a variable length encoding UTF-8 encoding, which will be flexible to the space distribution of characters, Different characters occupy the memory space is not the same, in ensuring compatibility, but also ensure the most reasonable use of space.

This is the basic knowledge of Java IO in order to facilitate a better understanding of the byte stream and the character stream in the Java io behind.

This is a public number that can give the programmer a buff.

Java IO (1) Basics--Bytes and characters

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.