We know that the Tomcat communication is based on the socket, and the socket on the server side and the client passes the message is not encoded byte stream, each 8 bits constitute 1 bytes, the computer is based on the binary system, which is due to the use of the transistor opening and closing state of 1 and 0, This allows 8 transistors to form a single byte, which is the smallest unit-byte used by the application layer.
In the network communication through the socket program, if we do not know what code to decode when receiving the message, the best way is to use the bottom of the socket input device to read the byte stream, after confirming the encoding and then the byte stream transcoding, otherwise the decoding error. Our common coding is ASCII, GB2312, UNICODE, UTF-8 and so on, but there are many other encodings, why are there so many different encodings?
ASCII encoding isAmerican Standard Code for information interchange, as the American Standard Information Interchange code, and computers were just beginning to pop in the United States, so all the computers were usedASCIIencoding,ASCIIencoding is determined by the8bit composition, from0to the127used to represent different characters, including various symbols, English letters, Arabic numerals, etc., because -types of characters just7bits to complete the encoding, so the highest bit is0Fill. It -characters have been fully satisfied with English-speaking Americans, English words can be broken into letters after the use ofASCIIcode representation.
Later, as computers developed, and other countries introduced computers, they found that they were not enough to encode their native text, ascii number altogether 8 256 0-127 has been used in the United States and become a standard, for compatibility considerations must not be changed, leaving the 128-255 available, but soon the rest of the 128 The only thing you can do now is to use two or more bytes to represent a character, and each country has its own rules, so China has compiled a gb2312 encoded, in order to be compatible with ascii 127 means ASCII, and if two bytes greater than 127 are joined together, the Chinese character is represented, and the value of two bytes is within a certain range. After a round of expansion has basically solved the problem of Chinese character coding.
Then many countries according to their own rules for their own text coding, the code of other countries do not know, the situation was once chaotic. The International Organization for Standardization (ISO) introduced Unicode encoding, which would include all the text symbols in the world, encode all characters using two byte 16 bits, and in order to guarantee compatible ASCII codes, the lower seven bits are still used to represent the original ASCII characters. Through Unicode It is true that all the characters in the world are unified.
Unicode encoding unifies all the characters, but there is still a problem, if the English character is actually enough to use a byte, but using Unicode has to catch another meaningless byte, in the network transmission process means to transfer more than one useless message. So the introduction of UTF-8 encoding, which is a Unicode implementation, it is a variable length encoding, in the implementation of the UTF-8 specified can be a byte to represent all the characters ASCII code, to avoid unnecessary space waste.
Out of the above several common coding, there are many other codes, different codes specify the rules are different, but basically all the ASCII is compatible with processing, it can be said that ASCII is the most basic, this section is to discuss the ASCII code decryption common Method-table-driven mode.
The Web container is actually based on the HTTP protocol communication to establish the two ends of the communication, through the socket to achieve message transmission, transmission process is sure to design the code of the Convention, if not a convention will result in message decoding error. The HTTP message consists of three parts: the request line, the request header, and the request body (detailed in the preceding HTTP protocol section), the HTTP protocol contract request line and the request header must be ASCII-encoded, in order to unify all HTTP protocol-based communications servers, so as not to cause confusion in different system default encoding. Tomcat receives ASCII-encoded messages because the ASCII code is 1 bytes (8 bits) long and Java's byte length is 1 bytes, which is exactly the same, so the packets received in Java are buffered using arrays of byte types. In general, we are more concerned about the decoding of ASCII code to numbers, letters, and several special symbols, through which they are enough to form commonly used word statements. As shown in the table below, the ASCII code in 48-57 respectively indicates that the digital 0-9,65-90 represents a-Z for a-z,97-122 respectively.
The decoding process involves some logical processing, such as whether it is an English letter after decoding, whether it is uppercase or lowercase, if it is a numeric character, whether it is a whitespace character, is converted to another type, a case conversion, and so on. Our usual approach is to judge directly with If-else, such as whether an ASCII encoding is an English letter to determine whether the encoding is between 65 to 90, 97 to 122, and the table-driven idea is not to do so, it put a table of English letters is the result tables in memory, as shown in, The expression for T in the array is the English letter, and the value of the array is the result.
Similarly, it is possible to store more tables in memory according to other requirements, to calculate the logical result of the judgment beforehand, and to get the array value directly is the result. Table-driven mode is often used to replace many if-else, switch-case logic judgments, and its use is helpful to improve the readability and maintainability of code. The ASCII table driver class used by Tomcat is Org.apache.tomcat.util.buf.Ascii.java.
The ASCII decoded table-driven mode of the Tomcat kernel