Detailed process of Java coding and conversion

Source: Internet
Author: User
Tags string to file

 Common JavaProgramIncludes the following categories:
* Classes that run directly on the console (including visual interface classes)
* JSPCodeClass (Note: JSP is a variant of the servlets class)
* Servelets class
* EJB class
* Other support classes that cannot be directly run

These class files may contain Chinese strings, and the first three classes of Java programs are commonly used to directly interact with users for output and input characters, such: get the characters sent from the client in JSP and Servlet, which also contain Chinese characters. Regardless of the role of these Java classes, the lifecycle of these Java programs is as follows:

* The programmer selects an appropriate editing software on a certain operating system to implement the source code and. the Java extension is stored in the operating system. For example, you can use NotePad to edit a Java source program in Win2k;
* Programmers use javac.exe in JDK to compile theseSource codeTo form a. Class class (JSP files are compiled by the container by calling JDK );
* Directly run these classes or deploy these classes to Web containers for running and output the results.
In these processes, how does JDK and JVM encode, decode, and run these files?

Here, the Chinese Win2k operating system is used as an example to illustrate how Java classes are encoded and decoded.

Step 1, in Win2k, we compile a Java source program file (including the first five types of Java programs) with editing software such as Notepad ), by default, program files are saved in the GBK encoding format supported by the operating system (the default format supported by the operating system is file. encoding format) to form. java files, that is, before the Java program is compiled, our Java source program files use the default file supported by the operating system. the encoding format is saved. The JAVA source program contains Chinese characters and English program code. You need to view the system file. you can use the following code for the encoding parameter:
public class showsystemdefaultencoding {
Public static void main (string [] ARGs) {
string encoding = system. getproperty ("file. encoding ");
system. out. println (encoding);
}

Step 2: we use the javac.exe file of JDK to compile our Java source program. Because JDK is the kernel, we first obtain the default encoding format used by our operating system, that is, when compiling a Java program, if we do not specify the encoding format of the source program file, JDK first obtains the file of the operating system. the encoding parameter (which stores the default encoding format of the operating system, such as Win2k, whose value is GBK). Then, JDK extracts our Java source program from file. the encoding format is converted to the Java internal default Unicode format and placed into the memory. Then, javac compiles the converted unicode format file. class file. the class file is unicode encoded and is temporarily stored in the memory. Then, JDK saves the compiled class file encoded with Unicode to our operating system to form what we see. class file. For us, what we finally get. A class file is a class file whose content is saved in Unicode encoding format. It contains a Chinese character string in our source program, but it has been written by file. the encoding format is converted to the unicode format.

In this step, the JSP source code files are different. For JSP, the process is as follows: that is, the Web Container calls the JSP compiler, the JSP compiler first checks whether the JSP file has a file encoding format. If the JSP file does not have a JSP file encoding format set, the JSP compiler calls JDK to use the default JVM character encoding format (that is, the default file of the operating system where the Web container is located) for the JSP file. encoding) is converted to a temporary servlet class, then compiled into a class in unicode format, and saved in a temporary folder. For example, in the Chinese Win2k, the Web Container converts the JSP file from the GBK encoding format to the unicode format and then compiles it into a temporarily saved servlet class to respond to user requests.

Step 3: run the classes compiled in step 2:

A. classes run directly on the console
B. EJB class and support class that cannot be directly run (such as JavaBean class)
C. JSP code and Servlet class
D. Between Java programs and databases

See the following four cases.

A. classes run directly on the console

In this case, JVM is required to run this class, that is, JRE must be installed in the operating system. The running process is as follows: first, start JVM in Java. At this time, JVM reads the class file stored in the operating system and reads the content into the memory. At this time, the class in unicode format is used in the memory, then the JVM runs it. If this class needs to receive user input at this time, the class uses file by default. the encoding format encodes the string you entered and converts it to Unicode and saves it To the memory (you can set the encoding format of the input stream ). After the program runs, the generated string (unicode encoded) is handed back to JVM, and then the JRE converts the string to file. the encoding format (you can set the encoding format of the output stream) is passed to the operating system display interface and output to the interface.

The conversion of each step above requires correct encoding format conversion to avoid garbled characters.

B. EJB class and support class that cannot be directly run (such as JavaBean class)

Because EJB classes and support classes that cannot be directly run, they generally do not directly interact with users for input and output. They often interact with other classes for input and output, therefore, after the second step is compiled, the classes whose content is unicode encoded are saved in the operating system, in the future, as long as its interaction with other classes is not lost during parameter transmission, it will run correctly.

C. JSP code and Servlet class

After step 2, the JSP file is also converted to a servlets file, but it does not exist in the classes directory like the standard servlets one, it exists in the temporary directory of the Web container, in this step, we also use it as the servlets.

For Servlets, when the client requests it, the Web Container calls its JVM to run the servlet. First, the JVM reads the servlet class from the system and loads it into the memory, the servlet class code in the memory is unicode encoded, and then the JVM runs the servlet class in the memory. If the servlet is running, it needs to accept characters sent from the client, such: the value entered in the form and the value passed in the URL. If no encoding format is set in the process, the Web Container uses the ISO-8859-1 encoding format by default to accept incoming values and relay to unicode format in the memory of the Web Container in JVM. After the servlet runs, the output string is in unicode format. Then, the container runs the Unicode string generated by the servlet (such as HTML syntax and user output string) it is directly sent to the client browser and output to the user. If the encoding format specified for sending is specified, it is output to the browser according to the specified encoding format. If not specified, by default, it is sent to the client's browser in ISO-8859-1 encoding.

D. Between Java programs and databases

For almost all the JDBC drivers of the database, by default, the data transmitted between the Java program and the database is in the ISO-8859-1 as the default encoding format, so, when our program stores data containing Chinese characters to the database, JDBC first converts the data in the Unicode encoding format inside the program to the ISO-8859-1 format, and then passes it to the database, when the database saves the data, it is saved by ISO-8859-1 by default, so this is why the Chinese data we often read in the database is garbled.

3. Several principles that must be clarified when analyzing common Java Chinese problems

First of all, after detailed analysis above, we can clearly see that the key process of coding conversion for any Java program in its lifecycle is: the transcoding process that is initially compiled into a class file and ultimately output to the user.
Secondly, we must understand the following common encoding formats supported by Java during compilation:
* ISO-8859-1, 8-bit, with 8859_1, ISO-8859-1, iso_8859_1 and Other encoding
* Cp1252, American English code, same as ANSI Standard Code
* UTF-8, same unicode encoding
* Gb2312, same as gb2312-80, gb2312-1980, etc.
* GBK, same as ms936, is an extension of gb2312.
And other codes, such as Korean, Japanese, and traditional Chinese. At the same time, we should note that the compatibility between these encodings is as follows:
Unicode and UTF-8 encoding are a one-to-one relationship. Gb2312 can be considered as a subset of GBK, that is, GBK encoding is extended on gb2312. At the same time, GBK encoding contains 20902 Chinese characters in the range of 0x8140-0xfefe. All the characters can correspond to unicode2.0 one by one.

Again, for the. Java source program file stored in the operating system, we can specify the encoding format of its content during compilation. Specifically, we can use-encoding to specify it. Note: If the source program contains Chinese characters and you use-encoding to specify other encoding characters, it is obviously wrong. Use-encoding to specify the source file encoding method as GBK or gb2312. No matter what system we compile a Java source program containing Chinese characters, it will correctly convert Chinese to Unicode and store it in the class file.

Then, we must be clear,Almost all web containers use the default character encoding format iso-8859-1., At the same time,Almost all browsers PASS Parameters in UTF-8 by default.. Therefore, although our Java source file specifies the correct encoding method at the entrance, it is also handled by ISO-8859-1 when running inside the container.

4. Classification of Chinese problems and recommended optimal solutions

After learning about the above Java File Processing principles, we can propose a set of recommended methods to best solve the problem of Chinese characters.
Our goal is to compile the Java source program that contains Chinese strings or processes Chinese in the Chinese system and then move the value to any other operating system for proper operation, or, after compilation in other operating systems, it can run correctly, pass Chinese and English parameters correctly, and communicate with the database in Chinese and English strings correctly.
Our specific idea is to restrict the correct encoding method at the entry and exit of Java program transcoding and at the same time as the user's input/output conversion.

The specific solution is as follows:

1. For Classes running directly on the console
in this case, we recommend that you, if you want to receive a user's input or output that may contain Chinese characters from the user end, the program should use the RST stream to process the input and output. Specifically, the following types of stream are applied for the nodes:
file: filereader, filewrieter
the byte node stream type is: fileinputstream, fileoutputstream
memory (array): chararrayreader, chararraywriter
its byte node stream type is bytearrayinputstream and bytearrayoutputstream
memory (string): stringreader, stringwriter
pipe: pipedreader, pipedwriter
its byte node stream types are pipedinputstream and pipedoutputstream
at the same time, the following types of stream for processing input and output should be used:
bufferedwriter, bufferedreader
the byte processing stream is bufferedinputestream and bufferedoutputstream
inputstreamreader and outputstreamwriter
the byte processing stream is datainputstream, dataoutputstream
inputstreamreader and inputstreamwriter are used to convert a byte stream to a bytes stream based on the specified character sequence set, for example:
inputstreamreader in = new inputstreamreader (system. in, "gb2312");
outputstreamwriter out = new outputstreamwriter (system. out, "gb2312");
for example, the following sample Java encoding meets the requirements:

// Read. Java
Import java. Io .*;
Public class read {
Public static void main (string [] ARGs) throws ioexception {
String STR = "/n Chinese test, which is an internal hard-coded string" + "/ntest English character ";
String strin = "";
Bufferedreader stdin = new bufferedreader (New inputstreamreader (system. In, "gb2312"); // sets the input interface to be encoded in Chinese.
Bufferedwriter stdout = new bufferedwriter (New outputstreamwriter (system. Out, "gb2312"); // sets the output interface to be encoded in Chinese.
Stdout. Write ("Enter :");
Stdout. Flush ();
Strin = stdin. Readline ();
Stdout. Write ("this is from the user input string:" + strin );
Stdout. Write (STR );
Stdout. Flush ();
}}
At the same time, we use the following methods to compile the program:
Javac-encoding gb2312 read. Java

2. Support classes for EJB classes and those that cannot be directly run (such as JavaBean classes)

Because they are called by other classes and do not directly interact with users, we recommend that the internal program use the character stream to process the Chinese character strings in the Program (as in the previous section). At the same time, when compiling a class, use the-encoding gb2312 parameter to indicate that the source file is encoded in Chinese format.

3. For Servlet

For servlet, we recommend that you use the following methods:

When compiling the source program of the servlet class, use-encoding to specify the Encoding As GBK or gb2312, and use the setcontenttype ("text/html; charset = GBK "); or gb2312 to set the output encoding format. Similarly, when receiving user input, we use request. setcharacterencoding ("gb2312"); in this way, no matter which operating system our servlet class is transplanted to, only the browser of the client supports Chinese display. The following is a correct example:

// helloworld. java
package hello;
Import Java. io. *;
Import javax. servlet. *;
Import javax. servlet. HTTP. *;
public class helloworld extends httpservlet
{< br> Public void Init () throws servletexception {}< br> Public void doget (httpservletrequest request, httpservletresponse response) throws ioexception, servletexception
{< br> request. setcharacterenco Ding ("gb2312"); // sets the input encoding format
response. setcontenttype ("text/html; charset = gb2312"); // set the output encoding format
printwriter out = response. getwriter (); // printwriter output is recommended
out. println ("Hello world! This is created by Servlet! Test Chinese! ");
}

Public void dopost (httpservletrequest request, httpservletresponse response) throws ioexception, servletexception
{
Request. setcharacterencoding ("gb2312"); // you can specify the input encoding format.
Response. setcontenttype ("text/html; charset = gb2312"); // you can specify the output encoding format.
String name = request. getparameter ("name ");
String id = request. getparameter ("ID ");
If (name = NULL) name = "";
If (ID = NULL) id = "";
Printwriter out = response. getwriter (); // printwriter output is recommended.
Out. println ("your input Chinese string is:" + name );
Out. println ("the id you entered is:" + id );
}
Public void destroy (){}
Use javac-encoding gb2312 helloworld. Java to compile this program.

4. Between Java programs and databases

To avoid gibberish during data transmission between Java programs and databases, we recommend that you use the following optimal methods:
1. the Java program is processed in the method we specify.
2. Change the default supported encoding format of the database to GBK or gb2312.

For example, in MySQL, we can add the following statement in the configuration file my. ini:
Add the following in the [mysqld] area:
Default-character-set = GBK
And added:
[Client]
Default-character-set = GBK
In SQL Server 2 K, we can set the default language of the database to simplified Chinese.

5. JSP code

Since JSP is dynamically compiled by the Web container at runtime, if the encoding format of the JSP source file is not specified, the JSP compiler will obtain the file of the operating system of the server. the value of encoding is used to compile JSP files. It is the most prone to problems during transplantation. For example, if a JSP file that is lucky enough in the Chinese Win2k language cannot be obtained in English Linux, although the client is the same, it is because the encoding of the operating system obtained by the container when compiling the JSP file is different (in the Chinese wink file. encoding and file in English Linux. encoding is different, and the file in English Linux. encoding does not support Chinese characters, so the compiled JSP class will be faulty ). Most of the issues discussed on the network are these problems, mostly because they cannot be correctly displayed when the JSP file is transplanted to the platform. For these problems, we understand the principle of program encoding conversion in Java, it is much easier to solve. The recommended solution is as follows:

1. We need to ensure that the JSP is output in Chinese encoding to the client. In any case, we first Add the following line to our JSP Source Code:

<% @ Page contenttype = "text/html; charset = gb2312" %>

2. In order for JSP to correctly obtain the input parameters, we add the following sentence to the JSP Source File Header:

<% Request. setcharacterencoding ("gb2312"); %>

3. In order for the JSP compiler to correctly decode our JSP file containing Chinese characters, We need to specify the encoding format of our JSP source file in the JSP source file. Specifically, add the following sentence to the JSP Source File Header:
Or
This is a newly added instruction in JSP specification 2.0.
We created

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.