Coding problems in Java

Source: Internet
Author: User
Tags app service

has been trying to figure out the coding problem in Java, but also read some articles on the internet, but still foggy. It was not until recently that I saw Fanglixun's web course that I understood a little.

Keep a record of your understanding and see if you can be clear about it.

The first question: I define a string in Java code, what is it encoded?

The string is essentially a char array. Then the encoding of char is actually the encoding of the string. So char what code? Why is the value of ' Medium ' to int type 20013?

        char c = ' Medium ';         // in        SYSTEM.OUT.PRINTLN (int//  20013

Introduction of the first concept: Unicode encoding

Unicode is a kind of "coding", so-called encoding is a number (number) to the character of a mapping relationship, is just a one-way mapping.

The encoding used by the Java string is Unicode.

Here's a simple example to prove that:

Find a transcoding tool that converts the word ' medium ' to Unicode encoding and the result is ' \u4e2d '. Use ' \u4e2d ' to replace the original ' medium ' word. The printed result is the same as the ' Medium ' word.

        char c = ' \u4e2d ';         // in        SYSTEM.OUT.PRINTLN (int//  20013

What does ' \u ' mean?

The word ' \u ' means that Unicode encoding is used. followed by a hexadecimal code to represent Unicode characters. The following code can be verified.

        Integer num = integer.valueof ("4e2d", +);         // 20013

Second question: What is the difference between encoding and encoding format?

This part comes from the coding and encoding format problems in Java. wind

1. Unicode is a "coding", so-called encoding is a number (number) to the character of a mapping relationship, is just a one-on mapping.

2. GBK, UTF-8, is a "coded format" that is used to serialize or store the "number (number)" of a "format" mentioned in 1.

encoding and encoding format :

The encoding used by Java string is Unicode, when a string exists in memory (when it is manipulated in code with a string-type reference), is " encoded without encoding ", So any string object in the Java program that says it is GBK or utf-8 is wrong, and string does not need an "encoded format" in memory, it is just a Unicode string.

The encoding format is required when the string needs to be transmitted over the network or to be written to a file . Garbled problems also arise.

GBK and UTF-8:

  Both GBK and UTF-8 are used to serialize or store Unicode encoded data, but they are 2 different formats, and they are all implementations of Unicode encoding ; In addition to their format, the two The range of Unicode encodings they care about is also different.

UTF-8 considers a number of different country characters, covering the entire Unicode code table, so that it stores a character encoding when the byte length used varies from 1 bytes to 4 bytes;

and GBK only consider the Chinese--a small part of the Unicode character encoding, so it is OK as long as 2 bytes can be covered by the vast majority of commonly used Chinese (2 bytes can represent 6w multiple characters), so it stores a character, the length of the bytes used is fixed;

ASCII code and Unicode:

ASCII code, like Unicode encoding, is also an "encoding".

The ASCII code has a smaller range, which specifies a 128-character encoding.

Unicode encoding is a large collection that now scales to accommodate 100多万个 symbols. As its name indicates, this is an encoding that contains all the symbols.

Of course, there are other codes, no use, I do not know much.

The third question: Where will the encoding format be used?

 One of the words mentioned earlier is that when a string needs to be transmitted over a network or written to a file , the encoding format is required.

Transmission in the network:

For Java Web Developers, the Java web is what it means. where do you need to set up the encoding format in the Java Web?

First image (online copy). Assume that the browser is the Ie,web server is Tomcat, the page is a JSP, the application server is a servlet.

1. A page is opened in the browser that maps a JSP for the application service. What is the decoding format in the browser?

The decoding format of the browser is specified in the JSP, such as two lines of code that can often be seen in a JSP file.

What do they mean by their differences?

<!--this line means that the content in this JSP is encoded using utf-8 (that is, the content is UTF-8 encoded and sent to the browser)--
<%@ Page Language="Java"Import="java.util.*"pageencoding="UTF-8"%>

<!--this line means to tell the browser: When you decode it, use Utf-8 to decode it.
<http-equiv= "Content-type"  content= "text/html; Charset=utf-8"  >

This way, when parsing this JSP, the UTF-8 encoding is used to decode the browser.

The browser's decoding format can also be set on the browser:

If there is garbled in the webpage, you can decode it in a different encoding format, it may not be messy.

2. response can also set the encoding format of the content and the decoding format of the specified browser.

Developers should know that JSPs are a special kind of servlet. In the servlet, response has two encoding methods, and the two lines of encoding configuration in the JSP have exactly the same functionality.

They are:

        The content that represents response is encoded in UTF-8 encoding and sent to the browser.
Response.setcharacterencoding ("UTF-8");

//Tell the browser, decode the time also to use Utf-8 decoding OH. Response.setcontenttype ("Text/html;charset=utf-8");

3. The Request object allows you to specify which encoding format the app service uses to decode the received data.

In both cases, the servlet can encode itself and tell the browser how to decode it. The servlet can also specify which encoding format is used to decode the objects received by the docking.

    In this sentence, you can specify the UTF-8 encoding format to decode the data from the browser. But only the data coming from the post is valid. If it is the data from the Get method, it will be decoded by default iso-8859-1.
Request.setcharacterencoding ("UTF-8");

4. Set the Tomcat server configuration file Server.xml, specifying the encoding to decode the parameters from the browser.

    <port= "8080"  protocol= "http/1.1"               connectiontimeout  = "20000"               redirectport= "8443"  uriencoding= "UTF-8" />

' uriencoding= ' UTF-8 "', the configuration of this attribute, and" request.setcharacterencoding ("UTF-8"); " This code functions roughly the same. is to specify how the Tomcat server will decode after receiving the receipt.

If not specified, the Tomcat server will use Iso-8859-1 to decode by default.

Files are written to:

1. The file content is encoded when it is written to the file, the default encoding format is gb2312(may be related to system, Simplified Chinese system test, is gb2312).

You can also specify in your code what encoding format the file stream uses when writing to a file. For example, specify "Utf-8".

New OutputStreamWriter (new fileoutputstream (file), "UTF-8");

2. The file itself is different software is selectively supported which encoding can be decoded.

Like what:

Excel supports gb2312 and does not support UTF-8.

TXT Notepad supports UTF-8 encoding.

So sometimes, I download a CSV file (specify the content using Utf-8 encoding). You will encounter a situation like this:

This CSV file in the use of Excel when opened, Chinese is garbled, if converted to txt open, Chinese will be displayed normally.

This time if you want to not garbled in Excel, you need to specify the content in your code using GB2312 encoding.

You can also change the encoding format of the text by using TXT ' Save as '.

3. How do I determine the encoding of a file?

I'm in Win7. In the system, right-click to create a new text document, nodepad++ Open, look at the encoding, you can see that the encoding is ANSI.

Using Java code to generate a TXT file, do not explicitly specify the encoding format, nodepad++ Open, look at the encoding, you can see the encoding is utf-8.

Did a little test:

Test1. Specify UTF-8 encoding format

        String FullPath = "D:\\test2.txt";         New File (FullPath);         if (! file.exists ()) {            file.createnewfile ();        }         New OutputStreamWriter (new  fileoutputstream (file),                "Utf-8");        Write.write ("Hello");        Write.flush ();        Write.close ();

The resulting file encoding format is:

Test2. Specify GB2312 encoding format

        String FullPath = "D:\\test2.txt";         New File (FullPath);         if (! file.exists ()) {            file.createnewfile ();        }         New OutputStreamWriter (new  fileoutputstream (file),                "gb2312");        Write.write ("Hello");        Write.flush ();        Write.close ();

The resulting file encoding format is:

Test3. First use Utf-8 to generate files, output "Hello", then use gb2312, append content, Output "Hello", the result is this.

Summarize:

First output with UTF-8 encoded format Hello, at this time the contents of the first "Hello" is utf-8 encoded, the file is also Utf-8 decoding method.

Second output with gb2312 encoding Hello, file appended at this time the second "Hello" is gb2312 encoded, the file decoding method has also become gb2312.

In other words, the TXT file will always decode the contents of the file using the encoding format used by the last action file.

So the second output "Hello", the first output of "Hello" has become garbled.

Coding problems in Java

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.