Information encoding: string and text

Source: Internet
Author: User
Tags printable characters

Strings and text with a long history (printable, printable string) may be the most common way to represent information. Text is very convenient to use, because people are used to processing all kinds of information expressed in the form of strings, such as books, newspapers, and information on computer monitors. Therefore, as long as we specify how to encode the text to be transmitted, we can send almost any other type of data: first express it as text, and then encode the text. Obviously, we can represent numeric and boolean data as String type, such as "123478962", "6.02e23", "true", "false", etc. We can also see that by calling the getBytes () method, you can convert a string into a byte array (see TCPEchoClient. java ). Of course, there are other ways to implement this function. To better understand this process, we must first consider the text as composed of symbols and characters. In fact, each String instance corresponds to a character sequence (array, char [] type ). A character is expressed as an integer in Java. For example, the character "a", that is, the letter "a", corresponds to the Integer 97; the character "X" corresponds to 88, and the symbol "! "(Exclamation point) corresponds to 33. The ing between a group of symbols and a group of integers is called the encoding character set (coded character set .). You may have heard of ASCII character sets (ASCII, American Standard Code for Information Interchange, American Standard Information Interchange Code ). ASCII code maps English letters, numbers, punctuation marks, and some special characters (non-printable characters) into integers ranging from 0 to 127. Since 1960s, ASCII code has been used for data transmission. Even today, it is widely used in application protocols, such as HTTP (the protocol used by the World Wide Web ). However, because it ignores the symbols used by many languages other than English, it is not ideal to use ASCII codes to develop applications and design protocols in today's global economy. Therefore, Java uses an international standard encoding Character Set called Unicode to represent char and String values. The Unicode Character Set Maps "Most languages and symbols in the world" [] to Integers 0 to 65535, which is better suited to international programs. For example, the symbol representing the syllable "o" in the Japanese hirakana maps to an integer of 12362. Unicode contains ASCII code: the mapped integers in Unicode are the same as the integers mapped to each ASCII code. This provides a certain degree of backward compatibility between ASCII and Unicode. The sender and receiver must reach a consensus on the ing between symbols and Integers to communicate with each other using text information. Is that what they want to reach an agreement? Depends on the situation. For a group of characters whose integer values are smaller than 255, no additional information is required because each character can be encoded as a separate byte. There are multiple ways to encode a large integer that may exceed one byte online. Therefore, the sender and receiver also need to have a unified opinion on how these integers are expressed as byte sequences, that is, the encoding scheme (encodingscheme ). A character set is a character set (charset, see RFC 2278 ). You can also define your own character set, but there is no reason to do so. There are already a large number of different standard character sets in the world. Java provides support for any character set, and each implementation must support at least one of the following character sets: US-ASCII (another ASCII name), ISO-8859-1, UTF-8, UTF-16BE, UTF-16LE, UTF-16. By calling the getBytes () method of the String instance, a byte array is returned, Which is encoded Based on the default Character Set of the Platform (default charset. The default character set for many platforms is UTF-8, but in some regions that often use characters other than the ASCII character set, the situation varies. To ensure that a String is encoded according to a specific (participant) Character Set, you only need to pass the name of the character set as a parameter (String type) to the getBytes () method, the returned byte array contains the string represented by the specified character set. (Note that the TCP echo Client/Server sample programs in section 2.2.1 are irrelevant to the encoding because they are not interpreted in conjunction with the received data .) The following example describes the getBytes () method. If you call "Test!" on the platform of this book! ". GetBytes (), you will get the byte array encoded according to the UTF-8 character set; however, if you call" Test! ". GetBytes (" UTF-16BE "), you will get the following array: in this case, each value is encoded into a sequence of two bytes, high before; if you call" Test! ". GetBytes (" IBM037 "), the returned result will be: the above example shows that the sender and receiver must reach a consensus on the representation of text strings. The simplest way is to define a standard character set. We know that you can write a String to OutputStream by converting the String to an independent byte and writing it to the stream. The encoding method must be specified each time the getBytes () method is called. In the subsequent content of this chapter, we will see that you only need to specify a simple encoding method to build text messages.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.