Java and unsigned things

Source: Internet
Author: User

Java and unsigned things
What is the problem with the unsigned type in Java? In languages such as C and C ++, integer types with different lengths are provided: char, short, int, and long (in fact, char is not really an integer, but you can use it as an integer. In practical application scenarios, many people use char to store smaller integers in the C language ). In most 32-bit operating systems, these types correspond to 1, 2, 4, and 8 bytes respectively. However, it should be noted that the byte lengths of these integer types are different on different platforms. Because Java is designed for cross-platform platforms, the byte in Java is always 1 byte, short is 2 byte, and int Is 4 byte, long is 8 bytes. All Integer types in C provide the corresponding "unsigned" version, but this feature is not available in Java. I think it is really uncomfortable that Java does not support the unsigned type. Think about it. A lot of hardware interfaces, network protocols, and file formats all use the unsigned type! (The char type provided in Java is different from the char type in C. in Java, chat uses two bytes to represent the Unicode value. In C, char represents the ASCII value in 1 byte. Although char can be used as an unsigned short integer in Java, it is used to represent an integer ranging from 0 to 2 ^ 16. However, this may lead to a variety of strange things. For example, when you want to print this value, what is actually printed is the character corresponding to this value rather than the string representation of this value ). So, how should we deal with the absence of unsigned types in Java? Well, you may not like this solution ...... The answer is: Use a larger signed type than the unsigned type to be used. For example, use short to process unsigned bytes and long to process unsigned integers (or even char to process unsigned short integers ). Indeed, this seems a waste, because you have used twice the storage space, but there is no better way. In addition, it should be noted that the access to long variables is not Atomic. Therefore, if you are dealing with synchronization problems in multi-thread scenarios. How to store and read data in the form of unsigned? If someone sends you a bunch of bytes containing unsigned values (or the bytes read from the file) from the network ), then you need to perform some additional processing to convert them to a larger value type in Java. Another problem is the byte order. But now let's ignore it first, when it is "Network byte order", that is, "High Priority", this is also the Standard byte order in Java. Read from the network byte order. Suppose we start to process a byte array. We want to read an unsigned byte, an unsigned short integer, and an unsigned integer. Short anUnsignedByte = 0; char anUnsignedShort = 0; long anUnsignedInt = 0; int firstByte = 0; int secondByte = 0; int thirdByte = 0; int fourthByte = 0; byte buf [] = getMeSomeData (); // Check to make sure we have enough bytes if (buf. length <(1 + 2 + 4) doSomeErrorHandling (); int index = 0; firstByte = (0x000000FF & (int) buf [index]); index ++; anUnsignedByte = (short) firstByte; firstByte = (0x0 00000FF & (int) buf [index]); secondByte = (0x000000FF & (int) buf [index + 1]); index = index + 2; anUnsignedShort = (char) (firstByte <8 | secondByte); firstByte = (0x000000FF & (int) buf [index]); secondByte = (0x000000FF & (int) buf [index + 1]); thirdByte = (0x000000FF & (int) buf [index + 2]); fourthByte = (0x000000FF & (int) buf [index + 3]); index = index + 4; anUnsignedInt = (long) (firstByte <2 4 | secondByte <16 | thirdByte <8 | fourthByte) & 0 xFFFFFFFFL; well, it seems a little complicated now. But it is actually very intuitive. First, you can see a lot of such stuff: 0x000000FF & (int) buf [index] First, upgrade the signed byte to the int type, and then perform bitwise AND operations on the int, only the last 8 bits are retained. Because byte in Java is signed, when the unsigned value of a byte is greater than 127, the binary bit of the symbol will be set to 1 (strictly speaking, this is not a symbol bit, because the numbers in the computer are encoded in the complement code mode). For Java, this is a negative number. When the byte corresponding to a negative number is raised to the int type, the bytes 0 to 7 are reserved, and the 8 to 31 are set to 1. Then, bitwise AND operation is performed on 0x000000FF to erase 1 of 8 to 31 bits. The above code can be briefly written: 0xFF & (int) buf [index] Java automatically fills in 0 leading to 0xFF, and in Java, bitwise operator & will cause byte to be automatically upgraded to int. Next, you will see many bitwise Left Shift Operators <. This operator shifts the bitwise specified by the left operand to the left operand. Therefore, if you have an int foo = 0x000000FF, foo <8 will get 0x0000FF00, foo <16 will get 0x00FF0000. The last is the bitwise OR operator |. Assume that you load two bytes of an unsigned short integer to the corresponding integer, and you will get two integers: 0x00000012 and 0x00000034. Now you get 0x00001200 and 0x00000034 after shifting the first byte to the left, and then you need to splice them back. Therefore, bitwise OR operation is required. 0x00001200 | 0x00000034 returns 0x00001234, which can be stored in the char type in Java. These are basic operations. But for an unsigned int, you need to store it in the long type. Other operations are similar to the preceding operations, but you only need to promote int to long and then perform bitwise AND operation with 0 xFFFFFFFFL. The last L is used to tell Java to treat this constant as long. Write the byte order to the network. Suppose we want to write the value we read in the above step into the buffer zone. We read the data in the order of unsigned byte, unsigned short, and unsigned int. Now, what is the reason for attention? We plan to follow the unsigned int, in the order of unsigned short and unsigned byte. Buf [0] = (anUnsignedInt & 0xFF000000L)> 24; buf [1] = (anUnsignedInt & 0x00FF0000L)> 16; buf [2] = (anUnsignedInt & 0x0000FF00L)> 8; buf [3] = (anUnsignedInt & 0x000000FFL); buf [4] = (anUnsignedShort & 0xFF00)> 8; buf [5] = (anUnsignedShort & 0x00FF ); buf [6] = (anUnsignedByte & 0xFF); what happened to the byte order? What does this mean? Do I need to pay attention to it? And what is the network's byte order? The "High Priority" byte sequence used in Java is also called "Network byte sequence ". The Intel x86 processor is a low-priority byte sequence (unless you run a Java program on it ). Data files created in x86 systems are generally (but not mandatory) at a low level, while data files created in Java programs are usually (but not mandatory) at a high level. Any system can output data in byte order as needed. What does byte order mean? Byte order refers to the order in which the computer stores values in the memory. Generally, there are two modes: high priority and low priority. Of course you need to pay attention to the issue of byte order. Otherwise, if you read a data file stored in the low-priority byte order, it is very likely that we can only get messy data, and vice versa. Any numeric value, regardless of its expression, such as 5000,000,007 or its hexadecimal format 0x1dcd00007, can be considered as a numeric string. For a numeric string, we can think that it has a start (leftmost) and an end (rightmost ). In English, the first digit is the highest digit. For example, 5 in 5000,000,007 actually represents 500,000,000. The last digit is a digit. For example, 7 in 500,000,007 corresponds to 7. When we talk about the byte order, we refer to the order in which we write numbers. We always start to write from the high level, and then from the high level until the second bit. Is that true? In the preceding example, the value 500,000,007 corresponds to the hexadecimal representation of 0x1dcd00007, which is divided into four independent Bytes: 0x1D, 0xDC, 0x65, and 0x07, corresponding to the 10-digit values 29,205,101 and 7. The maximum byte 29 indicates 29*256*256*256 = 486539264, followed by 205, indicating 205*256*256 = 13434880, followed by 101, indicating 101*256 = 25856, the last 7 is 7*1 = 7. Their values: 486539264 + 13434880 + 25856 + 7 = 500,000,007 when the computer stores these four bytes in its memory, assume that the address stored in the memory is 2056,205 7, 2058 and 2059. So the question is: which memory address stores the bytes? It may be stored in address 2056 storage 29,205 7 storage 7, just like the order in which you write this number, we call it high priority. However, other computer architectures may be stored in 2056 storage, 101,205 storage, 205,205 storage, storage, and 29 storage. This order is called low priority. The same is true for the storage of 2 bytes and 8 bytes. The maximum byte is MSB, and the maximum byte is LSB. Well, why should I care about the byte order? This depends on the situation. Generally, you do not need to worry about this issue. No matter what platform you are running a Java program, its byte order is the same, so you don't need to worry about the byte order. But what if you want to process data generated in other languages? Then, byte order is a big problem. You must ensure that you perform Decoding Based on the data encoding sequence, and vice versa. If you are lucky enough, you can find instructions on the byte sequence in the API or protocol specification or file format description. If not ...... Good luck! The most important thing is that you need to clearly understand what the byte sequence you are using and what the byte sequence of the data you want to process is. If the two are different, you need to perform additional processing to ensure correctness. Also, if you need to process unsigned values, make sure that the correct bytes are placed in the correct position of the corresponding integer/short/long type. What is the network's byte order? When the IP protocol is designed, the high-priority byte sequence is designed as the network byte sequence. The Sino-German values of IP packets are stored in byte order. The computer used to generate packets is called the host machine's byte sequence, which may be the same or different from the network's byte sequence. Similar to the network's byte order, Java's byte order is high-priority. Why is there no unsigned type? Why does Java not provide the unsigned type? Good question! I often think this is very strange, especially when many network protocols were using the unsigned type. In 1999, I also looked for a long time on the Web (google was not so good at that time), because I always thought this was not the case. Until one day I interviewed one of the Java inventors (is it Gosling? I don't quite remember it. If I saved the webpage at that time, I 'd be fine.) The designer gave me a remark to the general idea: "Hey! The unsigned type makes things complicated. No one really needs the unsigned type, so we threw it out ". Here is a page that records an interview with James Gosling to see if he can get some inspiration: Q: programmers often discuss the advantages and disadvantages of using "simple language" programming. What do you think of this problem? Do you think C/C ++/Java is a simple language? Ritchie: slightly Stroustrup: slightly Gosling: As a language designer, I don't quite understand what the so-called "simple" means, I hope Java developers can leave this concept in their own minds. For example, Java is not a simple language according to that definition. In fact, many languages are doomed to extreme cases, which are not understood by people. Ask the C language developer about the unsigned question. You will soon find that few C language developers really understand what happened to the unsigned type, what is an unsigned operation. These things make C language complex. I think the Java language is very simple.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.