Processing of characters in the Java language

Source: Internet
Author: User
Tags exit character set expression flush new set string tostring stringbuffer
Shanxi Province Network Management Center Ren June

----Summary: This paper mainly discusses the special expressions of characters in the Java language, especially the expression processing of Chinese information, and expounds that the key of character processing is to convert 16-bit Unicode characters into the platform of this underground layer, which is the character form that can be understood by the platform running Java virtual processor.

----Keywords: Java, character, 8-bit, 16-bit, Unicode character set

----Java is a programming language, a running system, a set of development tools, and an application programming interface (API). Java builds on the familiar, useful features of C + +, and cancels the complex, dangerous, and redundant elements of C + +. It is a safer, simpler and easier to use language.

1, the Java character expression

----the Java and C languages have different descriptions of the characters, Java uses a 16-bit Unicode character set (which describes the various characters in many languages), so the Java character is a 16-bit unsigned integer that holds a single character for a character variable. Rather than the full string.

----a character (character), which is a single letter, many letters form a word, a group of words form a sentence, and so on. But for characters that contain information such as Chinese, it's not that simple.

The basic char type----Java is defined as an unsigned 16-bit, which is the only unsigned type in Java. The primary reason for using 16-bit expression characters is that Java is able to support any Unicode character, so that Java can be used to describe or display any language supported by Unicode, and portability will be better. However, a string display that supports a language, and a string that is capable of printing a language correctly, is often two different issues. Because the main environment of the oak (Java Initial code name) Development Group is the UNIX system and some systems originating from UNIX, the most convenient and practical character set for developers is ISOLatin-1. Correspondingly, this development group is inherited by Unix, which leads to a large degree of Java I/O systems modeled on the UNIX streaming concept, while in UNIX systems, each I/O device is represented by a string of 8-bit streams. This approach to UNIX in the I/O system makes the Java language have 16-bit Java characters and only 8-bit input devices, which is a problem for Java. So in any place where a Java string is read or written in 8 bits, a small piece of code, known as "split", maps 8-bit characters to 16-bit Unicode, or hack 16-bit Unicode to 8-bit characters.

2. Problems and Solutions

----We want to implement reading information from a file, especially read the file containing Chinese information, and will read the information displayed on the screen, generally we use the FileInputStream function to open the file, Readchar function read characters. As follows:


Import java.io.*;
public class rf{
public static void Main (String args[]) {
FileInputStream fis;
DataInputStream dis;
char c;

try {
FIS = new FileInputStream ("Xinxi.txt");
dis = new DataInputStream (FIS);
while (true) {
c = Dis.readchar ();
System.out.print (c);
System.out.flush ();
if (c = = ' \ n ') break;
}
Fis.close ();
catch (Exception e) {}
System.exit (0);
}
}

----But in fact, running this program, the output result is a bunch of useless garbled. The contents of the Xinxi.txt file cannot be exported correctly because the Readchar function reads 16-bit Unicode characters and System.out.print it as the eight-bit ISO latin-1 character output.

----Java 1.1 version introduces a new set of readers and writers interfaces to handle characters. We can use InputStreamReader classes instead of DataInputStream to process files. Modify the above procedure as follows:
Import java.io.*;


public class RF {
public static void Main (String args[]) {
FileInputStream fis;
InputStreamReader IRS;
Char ch;

try {
FIS = new FileInputStream ("Xinxi.txt");
IRS = new InputStreamReader (FIS);
while (true) {
ch = (char) irs.read ();
System.out.print (c);
System.out.flush ();
if (ch = = ' \ n ') break;
}
Fis.close ();
catch (Exception e) {}
System.exit (0);
}
}

----to correctly print the text in Xinxi.txt (especially Chinese information). In addition, when the Xinxi.txt files from different machines, that is, from different operating platforms (or different Chinese characters) of the machine, such as: file from the client (client upload file to the server), and read the information in the text of the operation by the server side. If you use the above program to achieve this function, it is possible that the correct results may still not be achieved. The reason for this is that the input encoding conversion failed, and we need to make the following changes:


......
int C1;
int j=0;
StringBuffer str=new StringBuffer ();
Char lll[][]= new char[20][500];
String ll= "";
try {
FIS = new FileInputStream ("Fname.txt");
IRS = new InputStreamReader (FIS);
C1=irs.read (lll[1],0,50);
while (lll[1][j]!= ') {
Str.append (Lll[1][j]);
j=j+1;
}
Ll=str.tostring ();
System.out.println (LL);
catch (IOException e) {
System.out.println (E.tostring ());}
......

----This way, the output is the correct result. Of course, the above procedure is not complete, just explained the solution.

----in short, character processing in the Java language, especially the processing of Chinese information, is quite special. In Java, the key to character processing is to convert 16-bit Unicode characters into a character form that can be understood by the platform on which the Java virtual processor is running.



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.