Character Processing in Java ---- Abstract: This article mainly discusses special expressions of characters in Java, especially the Expression Processing of Chinese characters, the key to character processing is to convert the hexadecimal UNICODE character into a local lower-level platform, that is, the character form that can be understood by the platform running the java virtual processor.
---- Keywords: Java, character, 8-bit, 16-bit, Unicode Character Set
---- Java is a programming language, a running system, a set of development tools, and an application programming interface (API ). Java is built on the familiar and useful features of C ++, and removes the complex, dangerous, and redundant elements of C ++. It is a safer, simpler, and easier to use language.
1. Java Character Expression
---- The Java language and C language describe the characters differently. Java uses a 16-bit Unicode character set (this standard describes different characters in many languages ), therefore, a Java character is a 16-bit unsigned integer. Character variables are used to store a single character, rather than a complete string.
---- A character is a single letter. Many letters constitute a word, a group of words constitute a sentence, and so on. However, it is not that simple to contain characters such as Chinese characters.
---- The basic char type of Java is defined as the unsigned 16-bit. It is the only unsigned type in Java. The main reason for the use of 16-bit characters is to allow Java to support any Unicode characters, so it is better to make Java suitable for describing or displaying any languages supported by Unicode. However, the ability to display strings in a language and to correctly print strings in a language are often two different problems. Since the main environment of the oak (Java's original code) Development Group is UNIX systems and some Unix-originated systems, the most convenient and practical character set for developers is ISOLatin-1. Correspondingly, this development group carries UNIX inheritance, which leads to the Java I/O system largely modeled on the Unix stream concept. In Unix systems, each type of I/O device is represented by an 8-bit stream. This method is modeled on UNIX in the I/O system, so that the Java language has 16-bit Java characters, but only eight-bit input devices, this brings some shortcomings to Java. Therefore, in any place where Java strings are read or written in 8 bits, there must be a small piece of program code called "hacker )", to map 8-bit characters to 16-bit Unicode, or split 16-bit Unicode into 8-bit characters.
2. Problems and Solutions
---- We need to read information from a file, especially files containing Chinese information, and display the read information on the screen, generally, we use the fileinputstream function to open a file and read characters from the readchar function. Import java. Io .*;
Public class RF {
Public static void main (string ARGs []) {
Fileinputstream FCM;
Datainputstream DIS;
Char C;
Try {
FCM = new fileinputstream ("xinxi.txt ");
Dis = new datainputstream (FCM );
While (true ){
C = dis. readchar ();
System. Out. Print (C );
System. Out. Flush ();
If (C = '/N') break;
}
FCM. Close ();
} Catch (exception e ){}
System. Exit (0 );
}
}
---- But in fact, the output result of running this program is a bunch of useless garbled characters. The xinxi.txt file cannot be output because the readchar function reads a 16-bit Unicode character, while system. Out. Print uses it as an eight-bit ISO Latin-1 character output. ---- Java 1.1 introduces a new set of readers and writers interfaces to process characters. We can use the inputstreamreader class instead of datainputstream to process files. Modify the above program as follows: Import java. Io .*;
Public class RF {
Public static void main (string ARGs []) {
Fileinputstream FCM;
Inputstreamreader IRS;
Char ch;
Try {
FCM = new fileinputstream ("xinxi.txt ");
IRS = new inputstreamreader (FCM );
While (true ){
Ch = (char) IRS. Read ();
System. Out. Print (C );
System. Out. Flush ();
If (CH = '/N') break;
}
FCM. Close ();
} Catch (exception e ){}
System. Exit (0 );
}
}
------In this way, the text in xinxi.txt can be output (especially Chinese ). In addition, when the xinxi.txt file comes from different machines, that is, machines from different operating platforms (or machines with different Chinese characters), such as files from the client (the client uploads files to the server ), the operations to read the information in the text are performed by the server. If the above program is used to implement this function, it may still fail to get the correct result. The reason is that the input encoding fails to be converted. We also need to make the following changes :......
Int C1;
Int J = 0;
Stringbuffer STR = new stringbuffer ();
Char lll [] [] = new char [20] [2, 500];
String LL = "";
Try {
FS = new fileinputstream ("fname.txt ");
IRS = new inputstreamreader (FCM );
C1 = IRS. Read (lll [1], 0, 50 );
While (lll [1] [J]! = ''){
Str. append (lll [1] [J]);
J = J + 1;
}
LL = Str. tostring ();
System. Out. println (LL );
} Catch (ioexception e) {system. Out. println (E. tostring ());}
......
---- In this way, the output result is correct. Of course, the above program is incomplete, just to illustrate the solution. ---- In short, Character Processing in Java, especially processing of Chinese information, is quite special. In Java, the key to character processing is to convert sixteen Unicode characters into character forms that can be understood by the local underlying platform, that is, the platform that runs the java virtual processor.