Processing of characters in Java

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Character Processing in Java ---- Abstract: This article mainly discusses special expressions of characters in Java, especially the Expression Processing of Chinese characters, the key to character processing is to convert the hexadecimal UNICODE character into a local lower-level platform, that is, the character form that can be understood by the platform running the java virtual processor.

---- Keywords: Java, character, 8-bit, 16-bit, Unicode Character Set

---- Java is a programming language, a running system, a set of development tools, and an application programming interface (API ). Java is built on the familiar and useful features of C ++, and removes the complex, dangerous, and redundant elements of C ++. It is a safer, simpler, and easier to use language.

1. Java Character Expression

---- The Java language and C language describe the characters differently. Java uses a 16-bit Unicode character set (this standard describes different characters in many languages ), therefore, a Java character is a 16-bit unsigned integer. Character variables are used to store a single character, rather than a complete string.

---- A character is a single letter. Many letters constitute a word, a group of words constitute a sentence, and so on. However, it is not that simple to contain characters such as Chinese characters.

---- The basic char type of Java is defined as the unsigned 16-bit. It is the only unsigned type in Java. The main reason for the use of 16-bit characters is to allow Java to support any Unicode characters, so it is better to make Java suitable for describing or displaying any languages supported by Unicode. However, the ability to display strings in a language and to correctly print strings in a language are often two different problems. Since the main environment of the oak (Java's original code) Development Group is UNIX systems and some Unix-originated systems, the most convenient and practical character set for developers is ISOLatin-1. Correspondingly, this development group carries UNIX inheritance, which leads to the Java I/O system largely modeled on the Unix stream concept. In Unix systems, each type of I/O device is represented by an 8-bit stream. This method is modeled on UNIX in the I/O system, so that the Java language has 16-bit Java characters, but only eight-bit input devices, this brings some shortcomings to Java. Therefore, in any place where Java strings are read or written in 8 bits, there must be a small piece of program code called "hacker )", to map 8-bit characters to 16-bit Unicode, or split 16-bit Unicode into 8-bit characters.

2. Problems and Solutions

---- We need to read information from a file, especially files containing Chinese information, and display the read information on the screen, generally, we use the fileinputstream function to open a file and read characters from the readchar function. Import java. Io .*;
Public class RF {
Public static void main (string ARGs []) {
Fileinputstream FCM;
Datainputstream DIS;
Char C;

Try {
FCM = new fileinputstream ("xinxi.txt ");
Dis = new datainputstream (FCM );
While (true ){
C = dis. readchar ();
System. Out. Print (C );
System. Out. Flush ();
If (C = '/N') break;
}
FCM. Close ();
} Catch (exception e ){}
System. Exit (0 );
}
}

---- But in fact, the output result of running this program is a bunch of useless garbled characters. The xinxi.txt file cannot be output because the readchar function reads a 16-bit Unicode character, while system. Out. Print uses it as an eight-bit ISO Latin-1 character output. ---- Java 1.1 introduces a new set of readers and writers interfaces to process characters. We can use the inputstreamreader class instead of datainputstream to process files. Modify the above program as follows: Import java. Io .*;

Public class RF {
Public static void main (string ARGs []) {
Fileinputstream FCM;
Inputstreamreader IRS;
Char ch;

Try {
FCM = new fileinputstream ("xinxi.txt ");
IRS = new inputstreamreader (FCM );
While (true ){
Ch = (char) IRS. Read ();
System. Out. Print (C );
System. Out. Flush ();
If (CH = '/N') break;
}
FCM. Close ();
} Catch (exception e ){}
System. Exit (0 );
}
}

------In this way, the text in xinxi.txt can be output (especially Chinese ). In addition, when the xinxi.txt file comes from different machines, that is, machines from different operating platforms (or machines with different Chinese characters), such as files from the client (the client uploads files to the server ), the operations to read the information in the text are performed by the server. If the above program is used to implement this function, it may still fail to get the correct result. The reason is that the input encoding fails to be converted. We also need to make the following changes :......
Int C1;
Int J = 0;
Stringbuffer STR = new stringbuffer ();
Char lll [] [] = new char [20] [2, 500];
String LL = "";
Try {
FS = new fileinputstream ("fname.txt ");
IRS = new inputstreamreader (FCM );
C1 = IRS. Read (lll [1], 0, 50 );
While (lll [1] [J]! = ''){
Str. append (lll [1] [J]);
J = J + 1;
}
LL = Str. tostring ();
System. Out. println (LL );
} Catch (ioexception e) {system. Out. println (E. tostring ());}
......

---- In this way, the output result is correct. Of course, the above program is incomplete, just to illustrate the solution. ---- In short, Character Processing in Java, especially processing of Chinese information, is quite special. In Java, the key to character processing is to convert sixteen Unicode characters into character forms that can be understood by the local underlying platform, that is, the platform that runs the java virtual processor.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Processing of characters in Java

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Processing of characters in Java

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support