The use of String.getbytes () in Java

Source: Internet
Author: User

In Java, the GetBytes () method of string is a byte array that is given the default encoding format of the operating system. This means that there is no way to return something under the OS!


The String.getbytes (string decode) method returns a byte array representation of a string under the encoding according to the specified decode encoding, as

byte[] B_GBK = "Medium". GetBytes ("GBK");
byte[] B_utf8 = "Medium". GetBytes ("UTF-8");
byte[] b_iso88591 = "Medium". GetBytes ("iso8859-1");


The byte array in the GBK, UTF-8, and Iso8859-1 encodings is returned for the character "medium" respectively, at which time the length of the B_GBK is 2,b_utf8 and the length of 3,b_iso88591 is 1.

In contrast to GetBytes, the "medium" Word can be restored by means of the new string (byte[], decode), and the new string (byte[], decode) is actually using the encoding specified by decode to byte[] parsed into a string.

String S_GBK = new String (B_GBK, "GBK");
String S_utf8 = new String (B_utf8, "UTF-8");
String s_iso88591 = new String (b_iso88591, "iso8859-1");

By printing S_GBK, S_utf8 and s_iso88591, you will find that S_GBK and S_utf8 are "medium", and only s_iso88591 is an unrecognized character, why can't I restore the word "medium" after using ISO8859-1 encoding and then combining it? In fact, the reason is very simple, because iso8859-1 encoded in the encoding table, there is no Chinese characters, of course, can not pass the "medium". GetBytes ("Iso8859-1"), to get the correct "medium" in the iso8859-1 of the encoded value, so again through the new String () to restore it is impossible to talk about.

Therefore, when using the String.getbytes (String decode) method to get byte[], it is important to make sure that the code value of the string representation exists in the Decode encoding table, so that the resulting byte[] array can be correctly restored.

Sometimes, in order for Chinese characters to accommodate certain special requirements (such as HTTP header headers requiring their content to be iso8859-1 encoded), it is possible to encode Chinese characters in bytes, such as

String s_iso88591 = new String ("Medium". GetBytes ("UTF-8"), "iso8859-1"),

The resulting s_iso8859-1 string is actually three characters in the iso8859-1, after passing these characters to the destination,

The destination program then passes the opposite way by string S_utf8 = new String (S_iso88591.getbytes ("iso8859-1"), "UTF-8") to get the correct Chinese kanji "medium". This guarantees both compliance with the Agreement and the support of Chinese.


===================================================

Detailed procedures for Java encoding transformations

Our common Java programs include the following categories:
* Classes that run directly on the console (including the Visual interface classes)
*jsp Code Class (note: JSP is a variant of the Servlets Class)
*servelets class
*EJB class
* Other support classes that cannot be run directly
These class files are likely to contain Chinese strings, and we often use the first three classes of Java programs to interact directly with the user for output and input characters, such as: we get the characters from the client in the JSP and the servlet, which also include Chinese character. Regardless of the role of these Java classes, the life cycle of these Java programs is like this:
* Programmers Select an appropriate editing software on a certain operating system to implement the source code and save the. java extension in the operating system, for example, we use Notepad in the Chinese Win2K to edit a Java source program;
* Programmers use Javac.exe in the JDK to compile these source code to form A. Class class (JSP files are compiled by the container invoking the JDK);
* Run these classes directly or put them into a web container to run, and output the results.
So how do the JDK and JVM encode and decode and run these files in these processes?
Here, we use the Chinese Win2K operating system as an example of how Java classes are encoded and decoded.
The first step, we use editing software in the Chinese Win2K, such as Notepad to write a Java source program files (including the above five types of Java programs), the program files are saved by default, the operating system default support GBK encoding format (operating system default support format is file.encoding format) Formed a. java file, that is, Java program before being compiled, our Java source program files are in the operating system by default support file.encoding encoding format, Java source program contains Chinese information characters and English program code; To view the system's file.encoding parameters, you can Use the following code:
public class Showsystemdefaultencoding {
public static void Main (string[] args) {
String encoding = System.getproperty ("file.encoding");
SYSTEM.OUT.PRINTLN (encoding);
}}
Step Two, We compile our Java source program with the JDK Javac.exe file, because the JDK is an international version, and at compile time, if we do not specify the encoding format of our Java source program with the-encoding parameter, then Javac.exe first obtains the encoding format that our operating system uses by default, that is, the compilation J Ava program, if we do not specify the encoding format of the source program file, the JDK first obtains the operating system's file.encoding parameter (it holds the operating system default encoding format, such as Win2K, which is the value of GBK), The JDK then translates our Java source program into memory from the file.encoding encoded format into the Java internal default Unicode format. And then Javac compiles the converted Unicode file into a. class file, at which point the. class file is Unicode encoded, it is temporarily placed in memory, and the JDK then saves this Unicode-encoded compiled class file to our operating system, which we see. clas S file. For us, the. class file that we finally get is the class file that the content is saved in Unicode encoded format, which contains the Chinese string inside our source program, except that it has been converted to Unicode format by file.encoding format. In this step, for the JSP source program files are different, for the JSP, the process is this: the Web container calls the JSP compiler, the JSP compiler to see if the JSP file is set in the file encoding format, if the JSP file does not set the encoding format of the JSP file, The JSP compiler calls the JDK to convert the JSP file into a temporary servlet class using the JVM's default character encoding format (also known as the default file.encoding of the operating system where the Web container resides), and then compiles it into the Unicode format class. and save it in the Temp folder. For example, on Chinese Win2K, the Web container translates the JSP file from the GBK encoding format into Unicode format and compiles it into a temporary saved servlet class in response to the user's request.
The third step is to run the second-step compiled class into three scenarios:
A. Classes that run directly on the console
B, EJB classes, and support classes that cannot be run directly (such as the JavaBean Class)
C, JSP code, and Servlet classes
D. Between Java programs and databases
Let's look at these four scenarios.
A. Classes that run directly on the console
In this case, running the class requires JVM support first, which means that the JRE must be installed in the operating system. The process is as follows: First Java starts the JVM, at which point the JVM reads the class file stored in the operating system and reads the contents into memory, in memory in Unicode format class, and then the JVM runs it, if this class needs to receive user input at this time, The class will, by default, encode the user-entered string in the File.encoding encoding format and convert it to Unicode to save in memory (the user can set the encoding format for the input stream). After the program runs, the resulting string (Unicode encoded) is then handed back to the JVM, and finally the JRE converts the string to the file.encoding format (the user can set the encoding format of the output stream) to the operating system display interface and output to the interface. The conversion of each step above requires the correct encoding format conversion, in order to eventually do not appear garbled phenomenon.
B, EJB classes, and support classes that cannot be run directly (such as the JavaBean Class)
Because of EJB classes and unsupported classes that cannot be run directly, they generally do not interact directly with the user input and output, they often interact with other classes of input and output, so they are compiled in the second step, the content is the Unicode encoding of the class is saved in the operating system, Later, as long as the interaction between it and other classes is not lost during parameter passing, it will run correctly.
C, JSP code, and Servlet classes
After the second step, the JSP file is also converted to the Servlets class file, but it is not like the standard Servlets a school exists in the classes directory, it exists in the Web container temporary directory, so this step we also make it as a servlets view.
For Servlets, when the client requests it, the Web container calls its JVM to run the servlet, first the JVM reads the Servlet class class from the system and loads it into memory, in memory the code of the Servlet class encoded in Unicode. The JVM then runs the servlet class in memory, and if the servlet is running, it needs to accept the word such as from the client: The value of the form input and the value passed in the URL, if there is no encoding format in the program to accept the parameter. The Web container defaults to the ISO-8859-1 encoded format to accept the values passed in and is converted to Unicode format in the JVM in the memory of the Web container. After the servlet runs the output is generated, the output string is in Unicode format, and then the container sends the Unicode-formatted string (such as HTML syntax, user-output string, etc.) directly to the client browser and outputs it to the user, as the servlet runs. If the encoding format of the output is specified at this time, it is output to the browser in the specified encoding format and, if not specified, is sent to the client's browser by default by Iso-8859-1 encoding.
D. Between Java programs and databases
For almost all database JDBC drivers, the default pass data between Java programs and databases is in ISO-8859-1 as the default encoding format, so our program stores data containing Chinese in the database. JDBC First is to convert the Unicode encoding format data inside the program into a iso-8859-1 format, and then pass it to the database, and when the database is saved, it is iso-8859-1 saved by default, so this is why we often read the Chinese data in the database is garbled.

The use of String.getbytes () in Java

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.