The story originates from this week's tread of a small pit, Tomcat local tune start web times wrong. Error message that there is a problem with the encoding of an XML file, I click in to see what is abnormal, beginning with other XML as specified Utf-8 format, in addition to lowercase, I foolishly to change to uppercase ... And then... Definitely not solved AH!!! Cough, resolute Consult colleague went, glanced at, throw over a string of parameters-dfile.encoding=utf-8, let me in webserver VM option inside Add, became.
Well, it's that old terrier again, (Eh, OK!) But what is this for? Although I do not understand from beginning to end, but also to filter out some such as-D is what the ghost of the problem. O (∩_∩) o ... the excavation of the problem has undergone several stages:
What is the system default for 1.file.encoding (in fact, I don't know if there is a system default value for this kind of thing.) In short, it is the default value ah ... Ha ha... )
First of all, naturally want to know what the default encoding value is. I built an empty project, wrote a class, and printed System.getproperty ("file.encoding") in Main. The result makes me inexplicable, obviously is UTF-8 ah. That system default output encoding format should be UTF-8, should not have a problem ah. Why is the default encoding not used?
2.file.encoding in which links will be used
Find a well-written article http://cmsblogs.com/?p=1475. The process of Java output format is systematically analyzed in this article. The process is different in different situations. The article gives an example of three scenarios: 1) Direct output to console 2) deployed to the webserver output to Web Page 3) output to the database. This article solves my first question, which is why the output to the web is not used for that UTF-8. Because the file.encoding in System.getproperty () represents the encoding format that is output to the console, it doesn't matter if it's a Web page.
I also know that when the Java virtual machine processes the data, the data is stored in memory in Unicode format, including the. class file and externally entered data. So I can understand that the data in memory are all in the Unicode format of the flow, the need to output to where to get the corresponding file.encoding, the Unicode format of the data is converted to file.encoding format output. Think carefully, in fact, this way is also very consistent with the Java platform-independent features, after all, there are Unicode such a large killing device (can cover all the characters of the encoding format I was drunk), it is natural to use it as a unified data processing format, as for those related to the platform of the case, and then separate processing.
3. Where does the so-called default file.encoding value get from?
To tell the truth this is the problem that makes me very headache, because the Minister concubine is really the tool god horse of the ignorant ah!! The ghost knows which configuration file the TMD writes in which corner AH!! I continued to search, and kept searching. Finally found the. The IDE-controlled file.encoding default setting for the console. I use is IntelliJ, in the $itellijhome/contents/info.plist, (this home is my blind BB, can understand just fine ah. ), did find such a string
<key>VMOptions</key>
<string>-dfile.encoding=utf-8-xx:+useconcmarksweepgc-xx:softreflrupolicymspermb=50-ea- DSUN.IO.USECANONCACHES=FALSE-DJAVA.NET.PREFERIPV4STACK=TRUE-XVERIFY:NONE-XBOOTCLASSPATH/A:.. /lib/boot.jar</string>
<key>WorkingDirectory</key>
See that "-dfile.encoding=utf-8" Wood has!! The sense of accomplishment is full of wood!!! The story comes to an UTF-8, the first to find is from the IDE's default settings, and the Console,web container's file.encoding in the Tomcat Run Settings VM option is re-designated as UTF-8, so the output of the Web page format OK.
But! How can you stop it! There is not a getproperty, there is also setproperty ah, You do not say where the output to get the corresponding file.encoding and then convert the Unicode to file.encoding again output, then I first press utf-8 output, and then setproperty into other encoding format, and then output the same content is not garbled?
It turns out that the attempt to modify the runtime's file.encoding with SetProperty does not work at all, and what the output or output is. So I checked the information, The value of the file.encoding is specified at the time of the JVM initialization is valid, runtime file.encoding is a read-only property, is said to be initialized when a cache is saved, even if the change will only read the cache is initialized value, interested can read the source, related Discussion link in this Http://stackoverflow.com/questions/1749064/how-to-find-the-default-charset-encoding-in-java. I have tried it, and indeed, I can only change the VM option to output garbled. A few key lines of code:
New OutputStreamWriter (New Bytearrayoutputstream ()). GetEncoding ();//view valid file.encoding, that is, the cache value at initialization time
System.getproperty ("file.encoding");//view the real value of file.encoding, although true, but invalid AH.
Above.
First glimpse of Java garbled problem