Java IO (1) Basic knowledge-byte and character, javaio

Last Update:2017-11-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

What do you fear? This is a well-known "Murphy's Law ". Java basics cover all aspects. Dare to say that people with solid Java foundations are not new students, but programmers who have been working for N years. Programmers who have been working for N years have never dared to say that the Java foundation is solid or even proficient. It is often just "no other skills.

I am afraid of IO, but it is not difficult. There are only two aspects: input/output. But if you say that it is not used much, I believe that without your writing concurrency, concurrency is often everywhere, and writing is familiar, while IO is often only involved in a module, therefore, not every programmer will use IO-related APIs when developing and maintaining his/her own modules, but is often in embarrassment and does not know how to write.

I want to study I/O. I want to consolidate my own Java infrastructure, and I want to be the one who is proficient in Java. As the beginning of the Java IO series, this article first introduces several concepts: byte and character. The reason is that Java IO APIs are divided into byte streams and byte streams. Understanding what are byte and character helps us to understand IO later.

Byte)

A unit of data stored in a computer. Bit is smaller than bit, which is the smallest unit of measurement for data storage in computers. One bit stores binary data 0 and 1, as shown below.

Of course, larger than the byte is KB (kilobytes), 1KB = 1024B, followed by MB (MB), 1 MB = 1024KB, GB, TB ......

Java contains byte, the data type used to represent bytes. Let's review some knowledge about byte in Java.

As mentioned above, one byte is equal to eight binary bits. That is to say, the maximum number of energy-saving characters is [0,255] (closed range). However, in Java, the byte type is signed, that is, the highest bit of the byte type is a signed bit. That is to say, except for the highest bit of the symbol, there are still seven binary digits. the maximum number of seven binary digits can be [0,127]. This is a positive number. When the highest bit is 1, it indicates a negative number, the maximum number of byte data types can be [-127, 0]. That is to say, the byte data range is [-127,127]. Is that true? Wrong. The above analysis is incorrect. In Java, the value range of the byte data type is [-128,127].

The cause of the error is that the encoding of numerical storage in the computer is not taken into account. Therefore, this will continue to extend to the concepts of the original code, anti-code, and complement code.

Original code: the highest bit indicates the symbol bit, 0 indicates the positive number, 1 indicates the negative number, and other digits represent the real value. The preceding error analysis defines the value storage in the computer as the original code. Therefore, the value range of the byte data type in Java is [-127,127].
Reverse code: the same highest bit indicates the symbol bit. The reverse code of a positive number is the same as the original code, while the reverse code of a negative number is the opposite of the other digits except the symbol bit.
Complement: The same highest bit indicates the symbol bit. The reverse code of a positive number is the same as the original code, while the reverse code of a negative number is + 1 except the symbol bit. The storage of numerical values in the computer is a complementary code.

We can observe through the program that the numerical value storage in the computer is stored through the complement code.

System. out. println ("the binary original code of positive number 3 is: 11, and its complement code is the same as the original code:" + Integer. toBinaryString (3); System. out. println ("the binary original code of negative-3 is 111, and its complement value is (int type occupies 4 bytes = 32 bits, only the last 3 bits):" + Integer. toBinaryString (-3) + "(do not believe that the last three digits of the complement code-1 are reversed to obtain the original code )");

The calculation result shows that the values in the computer are indeed stored in the Complement Method.

After learning about the source code, the reverse code, and the supplementary code, and knowing that the numerical values in the computer are stored as the supplementary code, we can return to the byte data type range in Java. Even if it is stored as a complement, it can be determined that the range of positive numbers (the highest bit is 0) in the byte array is [0,127] A total of 128 numbers, then negative (the highest bit is 1) the range of the original code is [-127,-0], and the binary code is [11111111,100 00000]. Note that this is the original code and there is a conflict in this place, that is, the expression of-0 appears, this is obviously unreasonable, or 0 has already been included in a positive number. Here, we have actually processed byte arrays, that is, we have treated the-0 complement code as-128, the original code of-0 is 10000000, its anticode is 11111111, and its complement code is 10000000. the anticode + 1 needs to be carried, but the highest bit indicates the symbol bit, so it is squeezed out, in short, the range of the negative number is [-128, 0), and the range of the byte array is [-128,127]. The reason is that-0 and 0 indicate 0. To avoid waste,-0 indicates-128, which expands the range.

In this section, Byte represents the unit of computer data storage, extends the value range of byte data type in Java, and reviews the encoding method of numerical storage in computer, we should be able to better understand the concept of byte. The following describes what is a character.

Character (Char)

Character represents text and symbols. People communicate with each other in human languages, while computers communicate with each other in binary. When a person-computer-person has a computer media in the middle, in the middle, computers need to transmit the "encoding" of our human language symbols, and the computer-human process is also called "decoding ". This is a bit similar to the "encryption" and "decryption" process.

When A computer appears, it can only transmit English characters. The transmission here includes display and storage. As mentioned above, to store encoding, A table is required to indicate what A is, what is B, just like the cipher book in the moss password. At that time, the "code table" is encoded in ASCII format.

The computer continues to develop and needs to develop to other countries and regions. At this time, Chinese characters, Japanese, and Korean must be encoded, but the original ASCII must not meet the requirements, it is designed to contain English letters and symbols. At this time, ANSI encoding (also called ASCII extension) is introduced, which is actually a standard and a localized standard encoding, for example, in a Chinese operating system, ANSI represents GB2312 encoding (of course, its extension is GBK encoding), and in a Japanese operating system, ANSI represents JIS. ANSI encoding uses two bytes to represent a character (in the range of 0x80-0xFF). Two bytes, that is, 16 binary digits, can theoretically represent 216 characters. Of course, this must be within the range of 0x00-0x79, this indicates many characters. GB2312 encoding represents more than 6000 frequently used Chinese characters. However, this encoding method still brings about a new problem, which is just localization. That is to say, Japanese cannot be encoded in the GB2312 encoding environment. Therefore, internationalization is required.

As the computer continues to develop, internationalization is becoming more and more important, including the change in encoding methods. To avoid ANSI incompatibility, a new encoding rule-UNICODE has been developed. In Java, UNICODE encoding is used, which complies with the cross-platform features of Java. This explains that the data type of char characters in Java occupies 2 bytes, Because Java uses UNICODE encoding, UNICODE represents 1 character in 2 bytes. UNICODE solves the incompatibility of different languages on different platforms, but it also has a small drawback, that is, it takes up a little space than the first two, strings stored in memory using the UNICODE Character Set are called "wide byte strings". In fact, the subsequent work on character encoding is focused on how to shorten the byte space. Here we will focus on UNICODE encoding. UNICODE encoding occupies space slightly because it uses 2 bytes to represent 1 character. Even in English, two bytes are used. However, acⅱ and ANSI use one byte to represent English. The occupied space is reflected in this area, as shown in.

It can be seen that this wastes 1 byte of space, and here we can actually continue to extend the basic knowledge about computers, that is, the data stored in the computer in the memory is in the Big-Endian mode (also known as the high-byte front) or the small-end mode (Little-Endian, also known as low byte before ). The so-called large-end Mode means that the high-end bytes are at the low-end address of the memory, and the low-end bytes are at the high-end address of the memory. In the small-end mode, high bytes are at the high address of the memory, and low bytes are at the low address of the memory. The Mode shown in is the big-end mode. You can see that the low-level bytes are on the left of the address, that is, the high-end. It should be clear that the big-end mode is used in Java.

Continue to return to encoding. Since UNICODE represents 1 character for any character in 2 bytes, it will cause a waste of space. Therefore, based on UNICODE encoding, there is a variable length encoding of the UTF-8, this encoding method will be flexible to allocate space for characters, different characters occupy different memory space, while ensuring compatibility, it also ensures the most reasonable use of space.

This is the basic knowledge of Java IO, in order to facilitate a better understanding of byte streams and producer streams in Java IO.

This is a public number that can add buff to programmers.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Java IO (1) Basic knowledge-byte and character, javaio

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Java IO (1) Basic knowledge-byte and character, javaio

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support