Java-based unicode

Source: Internet
Author: User

During Java Development, some garbled characters may occur, or files that cannot be correctly identified or read, such as Common Message Resources (properties) used for validator verification) the file must undergo Unicode re-encoding. The reason is that java uses Unicode by default, while our computer system uses GBK encoding. It is necessary to convert the system encoding to the correct encoding identified by java.

1. Introduction to native2ascii: native2ascii is a tool provided by sun java sdk. It is used to convert other text files (such as *. txt, *. ini, *. properties, *. java, etc.) into Unicode encoding. The reason for transcoding is that the program is internationalized. Unicode encoding definition: Unicode (Uniform Code, universal code, Single Code) is a character encoding used on a computer. It sets a unified and unique binary code for each character in each language to meet the requirements of cross-language and cross-platform text conversion and processing. R & D started in December 1990 and officially announced in December 1994. With the enhancement of computer capabilities, Unicode has been popularized in more than a decade since its launch. (Declaration: Unicode encoding definition comes from the Internet ).

2. Obtain native2ascii: After jdk is installed, if you install it on windows, there will be a bin directory under the jdk installation directory, where native2ascii.exe is.

3. native2ascii command line naming format:
Native2ascii-[options] [inputfile [outputfile]
Note:
-[Options]: indicates the command switch. Two options are available.
-Reverse: converts Unicode encoding to local or specified encoding. If no encoding is specified, it is converted to local encoding.
-Encoding encoding_name: convert to the specified encoding, and encoding_name is the encoding name.
[Inputfile [outputfile]
Inputfile: The full name of the input file.
Outputfile: name of the output file. If this parameter is missing, it is output to the console.

4. Best Practice: First add the JDK bin directory to the system variable path. Create a test directory under the disk and a zh.txt file in the testdirectory. The file content is "lava". Open the "command line prompt" and enter the C: est directory. Next we can follow the instructions to perform operations step by step. Pay attention to observe the encoding changes.

A: Convert zh.txtto unicode.pdf and output the file to u.txt.
Native2ascii zh.txt u.txt
Open u.txt with the content "u7194u5ca9 ".
B: Convert zh.txt to Unicode and output it to the console.
C: est> native2ascii zh.txt
U7194u5ca9
As you can see, the console outputs "u7194u5ca9 ".
C: Convert zh.txtto iso8859-1133, and output the file to I .txt
Native2ascii-encoding ISO8859-1 zh.txt I .txt
Open the I .txt file with the content "u00c8u00dbu00d1u00d2 ".
D: Convert the u.txtfile to the upload folder and output it to the u_nv.txt file.
Native2ascii-reverse u.txt u_nv.txt
Open the u_nv.txt file with the content "lava ".
E: Convert u.txt to local encoding and output it to the console
C: est> native2ascii-reverse u.txt
Lava
As you can see, the console outputs "lava ".
F: Convert I .txtto strongswan and output it to I _nv.txt.
Native2ascii-reverse I .txt I _nv.txt
Open the I _nv.txt file with the content "u00c8u00dbu00d1u00d2 ". The results are exactly the same before and after transcoding. That is to say, the name is not transferred, or the idea is confused ..

G: Convert I .txtto gbk.txt and output it to I _gbk.txt.
Native2ascii-reverse-encoding GBK I .txt I _gbk.txt
Open the I _gbk.txt file with the content "u00c8u00dbu00d1u00d2 ". The results are exactly the same before and after transcoding. That is to say, the name is not transferred, or the idea is confused.

H: transcode u_nv.txt to the local GBK and output it to the console.
C: est> native2ascii-reverse-encoding ISO8859-1 I .txt
Lava
From this point of view, the target reached, I .txt FOR THE ISO8859-1, converted to local encoding after the content is "lava ". From this point, we should realize that the encoding specified by-encoding in the native2ascii-reverse command is the source file encoding format. In the native2ascii command, the encoding specified by-encoding is the (generated) encoding format of the target file. This is very important! Remember !!

Create a new file 12a.txt with the content "12 axyz ". Let's take a look at the encoding of pure letters and numbers.

I: Convert the TXT file of pure numbers to Unicode
Native2ascii 12a.txt 12a_nv.txt
Open the 12a_nv.txt file with the content "12 axyz ".
Continue testing, convert to ISO8859-1 Code look
C: est> native2ascii-encoding ISO8859-1 12a.txt
12 axyz
The result is still not transcoded.
From the results, we can conclude that the content before and after transcoding is the same for plain numbers and letters.

5. Conclusion: native2ascii is a very good transcoding tool and transcoding is reversible! The real meaning is not local encoding-> transcoding to ASCII code, but a common text file encoding conversion tool. There are two types of specified encoding During encoding conversion: output file encoding and input file encoding. For details, refer to the Best Practices Section.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.