The previous three blogs focus on the character, coding problems, through the three blog members Bo friends on a variety of character encoding has a preliminary understanding, to understand the Java Chinese problem this must be understood. But understanding these is just a start, the following blog will focus on how Java garbled is produced, what is garbled, how to fundamentally solve the garbled problem. Let's conquer the annoying Java garbled problem with the bloggers!!!
The Java Encoding conversion process
We always use a Java class file and the user for the most direct interaction (input, output), these interactive content contains the text may contain Chinese. Whether these Java classes interact with the database or interact with the front-end pages, their lifecycle is always the same:
1, the programmer on the operating system through the editor to write program code and in. java format to save the operating system, these files we call the source file.
2. Compile these source files through the Javac.exe in the JDK to form the. Class.
3. Run these classes directly or deploy them in a web container to get the output results.
These processes are viewed from a macro perspective, and it's certainly not possible to understand this, and we need to really understand how Java is encoded and decoded:
The first step: when we use the editor to write Java source files, the program file will be saved with the operating system default encoding format (generally our Chinese operating system using the GBK encoding format) to form a. java file. Java source files are saved in the File.encoding encoding format, which is supported by the operating system default. The following code can view the system's file.encoding parameter values.
System.out.println (System.getproperty ("file.encoding"));
Step Two: When we compile our Java file using Javac.exe, the JDK first confirms its compilation parameter encoding to determine the source code character set, and if we do not specify that compilation parameter, the JDK first obtains the operating system default file.encoding parameter, then the JDK will write our Java source The program is converted from the File.encoding encoded format to the Java internal default Unicode format into memory.
Step three: The JDK writes the above compiled and saved in-memory information to the class file, creating a. class file. At this point, the. class file is Unicode encoded, meaning that the contents of our common. class files are converted to Unicode encoding format, whether they are Chinese characters or English ones.
In this step, the JSP source files are treated a little differently: the Web container calls the JSP compiler, the JSP compiler will first see if the JSP file is set in the file encoding format, If not set, the JSP compiler invokes the calling JDK to convert the JSP file into a temporary servlet class using the default encoding, and then compiles it into a. class file and remains in the Temp folder.
Fourth step: Run the compiled class: There are a few things here.
1. Run directly on the console.
2, Jsp/servlet class.
3, between the Java class and the database.
Each of these three situations will be different in every way,
classes that run on 1.Console
In this case, the JVM first reads the class file stored in the operating system into memory, when the in-memory class file is encoded in Unicode and the JVM runs it. If a user is required to enter information, the information entered by the user is encoded in the File.encoding encoding format and converted to the Unicode encoding format to be saved to memory. After the program is run, the resulting results are converted to the file.encoding format and returned to the operating system and exported to the interface. The entire process is as follows:
In the entire process above, all involved in the encoding conversion can not be error, otherwise it will produce garbled.
2.Servlet class
Since JSP files are eventually converted to servlet files (except where they are stored), we also include JSP files here.
When a user requests a servlet, the Web container invokes its JVM to run the servlet. First the JVM loads the servlet's class into memory, and the servlet code in memory is in Unicode encoded format. The JVM then runs the servlet in memory, and if it needs to accept data passed from the client (such as forms and URL-passed data) during the run, the Web container will accept the incoming data, and in the receiving process, if the program sets the encoding of the passed parameters in the encoded format, If not set, the default iso-8859-1 encoding format is used, and after the data is received, the JVM converts the data to Unicode and is stored in memory. Output results are generated after the servlet is run, and the encoding format for these outputs is still Unicode. The Web container then sends the resulting Unicode encoded string directly to the client, and if the program specifies the encoding format for the output, it is output to the browser in the specified encoding format, otherwise the default iso-8859-1 encoding format is used. The entire process flow chart is as follows:
3. Database Section
We know that Java programs are connected to the database through the JDBC driver, and the JDBC driver defaults to the ISO-8859-1 encoded format, which means that when we pass data to the database through a Java program, JDBC first converts data in Unicode encoded format to ISO-8859-1 encoded format, and then stores it in a database, that is, the default format is iso-8859-1 when data is saved in the database.
-----Original from: http://cmsblogs.com/?p=1475, please respect the author's hard work results, reproduced the source of the explanation.
-----Personal site: http://cmsblogs.com
Java Chinese garbled solution (iv)-----The Java Encoding conversion process