Java FAQ series (6) -- string Discussion

Last Update:2018-12-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Author: zookeeper (zangweiren)
Web: http://zangweiren.javaeye.com

>>> Reprinted, please specify the source! <

The last time we reviewed the knowledge about creating several string objects that we often used in the interview questions, this time we started with several common interview questions, to review other aspects related to the string object.

1. Does the string class have the length () method? Does the array have the length () method?

Of course, the string class has the length () method. Check the source code of the string class. This is the definition of this method:
Java code
Public int length (){
Return count;
}

The length of a string is actually the length of the char array value. Arrays do not have the length () method. As you know, in Java, arrays are also used
And its methods are inherited from the object class. An array has a property length, which is also its unique property. This applies to all types of arrays.

2. Can a Chinese character be saved in a Char?

See the following example:
Java code
Public class chinesetest {
Public static void main (string [] ARGs ){
// Assign a Chinese character to a char variable
Char A = '中 ';
Char B = '文 ';
Char c = 'test ';
Char d = 'test ';
Char E = 'to ';
Char F = 'gong ';
System. Out. Print ();
System. Out. Print (B );
System. Out. Print (C );
System. Out. Print (d );
System. Out. Print (E );
System. Out. Print (f );
}
}

No error is reported during compilation. The running result is as follows:

1. The Chinese language test is successful.

Not to mention the answer. Why can a Chinese character be stored in a char variable? In Java, a char contains 2 bytes, while a Chinese character is a character and 2 bytes. The English letter is a byte, so it can be saved to a byte, but it cannot be a Chinese character. See:
Java code
Public class chinesetest {
Public static void main (string [] ARGs ){
// Assign a letter to a BYTE Variable
Byte A = 'a ';
// When a Chinese character is assigned to a Byte variable, an error is reported during compilation.
// Byte B = '中 ';

System. Out. println ("Byte A =" + );
// System. Out. println ("Byte B =" + B );
}
}

Running result:

1. byte A = 97

As you can see, we actually assign the ASCII code value corresponding to the character 'a' to The Byte variable.

Let's look back at the original example. Can we splice A, B, C, D, E, and F together for output? Let's try:
Java code
Public class chinesetest {
Public static void main (string [] ARGs ){
// Assign a Chinese character to a char variable
Char A = '中 ';
Char B = '文 ';
Char c = 'test ';
Char d = 'test ';
Char E = 'to ';
Char F = 'gong ';
System. Out. Print (A + B + C + D + E + F );
}
}

Running result:

1. 156035

This is obviously not the result we want. This is because we misuse the "+" operator. When it is used between a string and a string, or between a string and other types of variables, it produces
String concatenation. However, when used between characters, the effect is equivalent to that between numbers. It is an arithmetic operation. Therefore, the "156035" we get is the "medium", "text", and "test ".
The arithmetic addition result of the values corresponding to the six Chinese characters ', 'test', 'cheng', and 'gong.

3. Reverse output of the string.

This is also common in the interview questions. Let's take a minimum sentence that contains all 26 English letters and has a complete meaning as an example. Let's take a look at this sentence:

Reference
A quick brown fox jumps over the lazy dog. (a light brown fox jumps over the lazy dog .)

The most common method is to retrieve the characters at each position in reverse order and output them to the console in sequence:
Java code
Public class stringreverse {
Public static void main (string [] ARGs ){
// Original string
String S = "a quick brown fox jumps over the lazy dog .";
System. Out. println ("original string:" + S );

System. Out. Print ("reverse string :");
For (INT I = S. Length (); I> 0; I --){
System. Out. Print (S. charat (I-1 ));
}

// You can also convert it into an array and then reverse it.
Char [] DATA = S. tochararray ();
System. Out. println ();
System. Out. Print ("reverse string :");
For (INT I = data. length; I> 0; I --){
System. Out. Print (data [I-1]);
}
}
}

Running result:

1. original string: A quick brown fox jumps over the lazy dog.
2. Reverse string:. God yzal EHT revo spmuj XOF nworb kciuq
3. Reverse string:. God yzal EHT revo spmuj XOF nworb kciuq

Although the above two methods are commonly used, they are not the simplest. The simpler method is to use the existing method:
Java code
Public class stringreverse {
Public static void main (string [] ARGs ){
// Original string
String S = "a quick brown fox jumps over the lazy dog .";
System. Out. println ("original string:" + S );

System. Out. Print ("reverse string :");
Stringbuffer buff = new stringbuffer (s );
// The Reverse () method of the Java. Lang. stringbuffer class can reverse the string
System. Out. println (buff. Reverse (). tostring ());
}
}

Running result:

1. original string: A quick brown fox jumps over the lazy dog.
2. Reverse string:. God yzal EHT revo spmuj XOF nworb kciuq

4. Extract strings containing Chinese Characters in bytes.

It is required to implement a method to intercept strings by byte. For example, for the string "I zwr love Java", the first four bytes of the string should be "I zw" instead of "I zwr ", at the same time, make sure that no half Chinese characters are intercepted.

English letters and Chinese characters are in different encoding formats, and the number of bytes occupied is also different. The following example shows some common encoding formats, the number of bytes occupied by an English letter and a Chinese character.
Java code
Import java. Io. unsupportedencodingexception;

Public class encodetest {
/**
* Print the number of bytes and the encoding name of the string in the specified encoding to the console.
*
* @ Param s
* String
* @ Param encodingname
* Encoding format
*/
Public static void printbytelength (string S, string encodingname ){
System. Out. Print ("Bytes :");
Try {
System. Out. Print (S. getbytes (encodingname). Length );
} Catch (unsupportedencodingexception e ){
E. printstacktrace ();
}
System. Out. println ("; encoding:" + encodingname );
}

Public static void main (string [] ARGs ){
String en = "";
String CH = "persons ";

// Calculate the number of bytes of an English letter in various encodings
System. Out. println ("English letter:" + en );
Encodetest. printbytelength (EN, "gb2312 ");
Encodetest. printbytelength (EN, "GBK ");
Encodetest. printbytelength (EN, "gb18030 ");
Encodetest. printbytelength (EN, "ISO-8859-1 ");
Encodetest. printbytelength (EN, "UTF-8 ");
Encodetest. printbytelength (EN, "UTF-16 ");
Encodetest. printbytelength (EN, "UTF-16BE ");
Encodetest. printbytelength (EN, "UTF-16LE ");

System. Out. println ();

// Calculate the number of bytes of a Chinese character in various encodings
System. Out. println ("Chinese characters:" + CH );
Encodetest. printbytelength (CH, "gb2312 ");
Encodetest. printbytelength (CH, "GBK ");
Encodetest. printbytelength (CH, "gb18030 ");
Encodetest. printbytelength (CH, "ISO-8859-1 ");
Encodetest. printbytelength (CH, "UTF-8 ");
Encodetest. printbytelength (CH, "UTF-16 ");
Encodetest. printbytelength (CH, "UTF-16BE ");
Encodetest. printbytelength (CH, "UTF-16LE ");
}
}

The running result is as follows:

1. English letter:
2. number of bytes: 1; encoding: gb2312
3. Number of bytes: 1; encoding: GBK
4. Number of bytes: 1; encoding: gb18030
5, the number of bytes: 1; encoding: ISO-8859-1
6, the number of bytes: 1; encoding: UTF-8
7. Number of bytes: 4; encoding: UTF-16
8, number of bytes: 2; encoding: UTF-16BE
9, number of bytes: 2; encoding: UTF-16LE
10,
11. Chinese characters: Persons
12. number of bytes: 2; encoding: gb2312
13. number of bytes: 2; encoding: GBK
14. number of bytes: 2; encoding: gb18030
15. number of bytes: 1; encoding: ISO-8859-1
16, number of bytes: 3; encoding: UTF-8
17. number of bytes: 4; encoding: UTF-16
18, number of bytes: 2; encoding: UTF-16BE
19, number of bytes: 2; encoding: UTF-16LE

UTF-16BE and UTF-16LE are two members of the Unicode encoding family. Unicode Standard defines three encoding cells: UTF-8, UTF-16, and UTF-32
Type, a total of UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, UTF-32LE seven encoding scheme. Java
The encoding scheme used is UTF-16BE. From the running results of the example, we can see that the three encoding formats gb2312, GBK, and gb18030 can all meet the requirements of the subject. Next we will
Take GBK encoding as an example.

We cannot directly use the substring (INT beginindex, int
Endindex) method, because it is intercepted by characters. Both 'I' and 'Z' are regarded as a character, and the length is 1. In fact, we only need to distinguish Chinese characters from English characters.
The difference between Chinese characters and English letters is one byte.
Java code
Import java. Io. unsupportedencodingexception;

Public class cutstring {

/**
* Determine whether it is a Chinese character
*
* @ Param C
* Characters
* @ Return true indicates Chinese characters, and false indicates English letters.
* @ Throws unsupportedencodingexception
* The encoding format not supported by Java is used.
*/
Public static Boolean ischinesechar (char C)
Throws unsupportedencodingexception {
// If the number of bytes is greater than 1, it is a Chinese character
// The Difference Between English letters and Chinese characters in this way is not very rigorous, but in this question, this judgment is sufficient.
Return string. valueof (c). getbytes ("GBK"). length> 1;
}

/**
* Truncate strings by byte
*
* @ Param orignal
* Original string
* @ Param count
* Number of truncated digits
* @ Return refers to the intercepted string.
* @ Throws unsupportedencodingexception
* The encoding format not supported by Java is used.
*/
Public static string substring (string orignal, int count)
Throws unsupportedencodingexception {
// The original character is neither null nor an empty string
If (orignal! = NULL &&! "". Equals (orignal )){
// Convert the original string to GBK encoding format
Orignal = new string (orignal. getbytes (), "GBK ");
// The number of bytes to be truncated is greater than 0 and smaller than the number of bytes of the original string
If (count> 0 & count <orignal. getbytes ("GBK"). Length ){
Stringbuffer buff = new stringbuffer ();
Char C;
For (INT I = 0; I <count; I ++ ){
C = orignal. charat (I );
Buff. append (C );
If (cutstring. ischinesechar (c )){
// In case of Chinese characters, the total number of truncated bytes is reduced by 1
-- Count;
}
}
Return buff. tostring ();
}
}
Return orignal;
}

Public static void main (string [] ARGs ){
// Original string
String S = "I zwr love Java ";
System. Out. println ("original string:" + S );
Try {
System. Out. println ("capture the first digit:" + cutstring. substring (s, 1 ));
System. Out. println ("capture the first two digits:" + cutstring. substring (S, 2 ));
System. Out. println ("capture the first four digits:" + cutstring. substring (S, 4 ));
System. Out. println ("capture the first 6 digits:" + cutstring. substring (s, 6 ));
} Catch (unsupportedencodingexception e ){
E. printstacktrace ();
}
}
}

Running result:

1. original string: zwr I love Java
2. Capture the first one: Me
3. Capture the first two digits: Me
4. Capture the first four digits: ZW
5. Capture the first six digits: I love zwr.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Java FAQ series (6) -- string Discussion

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Java FAQ series (6) -- string Discussion

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support