What does the file.encoding attribute in Java really decide?

Source: Internet
Author: User

First the previous code:

public class Propertiestest {public
	static void Main (string[] args) {
		System.out.println ("file.encoding:" + System.getproperty ("file.encoding"));
		System.out.println ("sun.jnu.encoding:" +system.getproperty ("sun.jnu.encoding"));
		Properties Pro = System.getproperties ();
		set<entry<object,object>> EntrySet = Pro.entryset ();
		for (entry<object,object> entry:entryset) {
		//System.out.println (Entry.getkey () + ":" +entry.getvalue ()) ;
		// }
	}
}
This code prints out the values of the File.encoding property and the Sun.jnu.encoding property, and the annotation section prints the values of all the properties in turn.

So what are the file.encoding and Sun.jnu.encoding properties in Java? Where to come from. What to do with. Online check, most of them vague, not clear, so decided to do their own, to a probe.


Let's first look at what these two properties are for. After all, it's important to know what they have to do to care where they come from.

Take a look at the results of my running the above code in eclipse:


Let's look at a piece of code:

Import java.io.UnsupportedEncodingException;

public class Propertiestest {public
	static void Main (string[] args) throws Unsupportedencodingexception {
		System.out.println ("file.encoding:" + system.getproperty ("file.encoding"));
		System.out.println ("sun.jnu.encoding:" + system.getproperty ("sun.jnu.encoding"));
		Encodingtest ();
	}

	private static void Encodingtest () throws Unsupportedencodingexception {
		String s = "We are Chinese";
		If the use of getBytes () without parameters is certainly not garbled
		byte[] bytes = s.getbytes ("Utf-8");
		String s2 = new string (bytes);
		SYSTEM.OUT.PRINTLN (s2);
	}
This first converts a string to bytes and then constructs a string using bytes, and if GetBytes () and new string (bytes) are not encoded, it will not be garbled because the default encoding will be used, as described in the method:

Run results as shown: Chinese print normal

Modify GetBytes () to GetBytes ("Utf-8"), the result is still normal Chinese

Again modified to: GetBytes ("GBK"), the results of the Chinese display garbled:



So it seems that the default encoding is UTF-8, in order to further prove the conjecture, we have to trace the source code:

Public byte[] GetBytes () {return
        Stringcoding.encode (value, 0, value.length);
    }
Continue to follow up encode method:

static byte[] Encode (char[] CA, int off, int len) {String CSN = Charset.defaultcharset (). Name ()
        ;
            try {//Use charset name Encode () variant which provides caching.
        Return encode (CSN, CA, off, Len);
        catch (Unsupportedencodingexception x) {warnunsupportedcharset (CSN);
        try {return encode ("Iso-8859-1", CA, off, Len);
            The catch (Unsupportedencodingexception x) {//If This code is hit during VM initialization, Messageutils is
            The only way we are able to get any kind an error message.
            Messageutils.err ("Iso-8859-1 CharSet not available:" + x.tostring ()); If we can not find iso-8859-1 (a required encoding) then things//are seriously wrong with the Installatio
            N. system.exit (1);
        return null; }
    }
Here the Defaultcharset should return a CharSet object and continue following:

public static Charset Defaultcharset () {
        if (Defaultcharset = null) {
            synchronized (charset.class) {
                String CSN = accesscontroller.doprivileged (
                    new Getpropertyaction ("file.encoding"));
                Charset cs = lookup (CSN);
                if (CS!= null)
                    Defaultcharset = CS;
                else
                    Defaultcharset = forname ("UTF-8");
            }
        return defaultcharset;
    }
As you can see, here you get the File.encoding property and find the corresponding Charset object by this property, and if you can't find the corresponding Charset for that property, return Utf-8 by default (all Charset name arguments are uppercase and lowercase equivalent) Charset. Indicates that the File.encoding property does determine the so-called default encoding when the value is reasonable. As for the use of sun.jnu.encoding properties, because of the less used, let me first sell a case, the next introduction ^_^.


Know the effect, and the effect is not small, then we come to see exactly what determines the file.encoding attribute.

(PS: In general, in order to ensure that the program run everywhere and keep the results consistent, it is not recommended to use the default encoding in the program, you should specify the encoding directly).

Many people on the internet answer is: The main method of the class file encoding determines the file.encoding attribute.

Let's verify:

Select the class file for the Explorer area or place the cursor on the class file edit page, Alt+enter open the Properties page of the class file to see my settings


It's really utf-8, so my file.encoding attribute was utf-8, and now I'm going to modify the property,


After the modification is complete, there is no doubt that Chinese are garbled, run the program, see the results of the operation:


Sure enough, the file.encoding attribute became "Iso-8859-1".

And then modify the settings, changed to Us-ascii, after the implementation of the File.encoding property will become the value of "Us-ascii."

The conclusion is that the attribute value of the file.encoding is the encoding of the class file where the main method resides. Netizens are really witty.


But that's the way to jump to conclusions.

We're now doing it through eclipse, like nowhere near, and it always feels almost right. To really Li Jufu, you have to go to the console.


Change the file back to Utf-8 encoding, eliminate the Chinese garbled in the file, GetBytes and New String (bytes) still use the default encoding without encoding, and then we compile and execute the code directly on the console:


Strange phenomena appear, and the first time in eclipse to execute the same code, but the results of the operation has changed, file.encoding value into a "GBK", but the main method of the class encoding is clearly "UTF-8." There are also Chinese garbled. What is it all about?

Garbled for a while, first look at the changes in File.encoding, file.encoding into a "GBK", and the file code is utf-8 just right, so it seems that the previous so-called conclusion seems not reliable.


We continue to try to specify the properties of the file.encoding directly at run time, plus the run-time parameter "-dfile.encoding=utf-8":


This time the value of the file.encoding changed, and became Utf-8, stating that file.encoding can indeed be set, we are trying to set a code name that does not exist:


OK, this is all right, and the garbled has never changed. Here we can see thatfile.encoding can be set to any value . And this garbled problem is not out on the file.encoding attribute.


Here we can guess, in eclipse, that the file.encoding attribute is the same encoding as the class with the main method, because eclipse automatically sets the encoding in the file properties to the Run-time parameter when the Java virtual machine is started.


We know that a file is stored in binary form on the hard disk, we first stored the Java file in Utf-8 encoding, and then manually javac, because we did not specify the encoding, then the system is not know what encoding to parse the file, So you can only use the system default encoding (this is the default encoding for Windows systems, which is not the same as the JVM default encoding mentioned earlier), we look at the default encoding for Windows:


Here 936 indicates that the default encoding is GBK.

Then obviously the problem is here, the file to Utf-8 code to save, if the GBK resolution, not garbled is strange. Then add the compile parameter "-encoding Utf-8" and recompile the run:


You can see that the Run-time file.encoding value is still GBK, but not garbled, that the garbled is indeed the compilation process introduced, the operation of the process is no problem.

We can verify again in the following way, back to eclipse, we try to modify the code and rerun:


Can see at this time garbled and the same as above. It is proved that "鎴 戜 slip 鏄  fu imagery han" is really because "we are Chinese" saved in utf-8 format, which is caused by GBK reading.

Just before this error occurred in the compilation process, after compiling the string has become "鎴 戜 slip 鏄  fu imagery Ring han". There are no problems during the run.

This time, the Eclipse compilation is fine. But in the program "We are Chinese" first to utf-8 into bytes, and then to GBK converted to strings, the same generated garbled.


As a result, we can guess that eclipse, in compiling and running, will default to adding the corresponding encoding parameter, which is the same as the encoding of the class in which the Main method resides.







Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.