Puzzle 14: Escape Character defeat

Source: Internet
Author: User

Tag: Str its identifier prints sixteen different sys double quotes class

The following program uses two Unicode escape characters, which use their hexadecimal code to represent Unicode characters. So what will this program print?

 
PublicClass escaperout {PublicStaticVoidMain (string [] ARGs ){// \ U0022 is the Unicode Escape Character System of double quotation marks.Out. println ("A \ u0022.length () + \ u0022b". Length ());}}

A superficial analysis of the program will think that it should print out 26, because in the two double quotes "A \ u0022.length () + \ u0022b "indicates a string of a total of 26 characters.

A little deeper analysis will assume that the program should print 16 characters, because each of the two Unicode escape characters must be represented by six characters in the source file, but they only represent one character in the string. Therefore, this string should be 10 characters shorter than its appearance. If you run this program, you will find that this is far from the case. It neither prints 26 nor 16, but 2.

The key to understanding this puzzle is to know that Java does not provide any special processing for Unicode escape characters in string literal constants. Before parsing a program into symbols, the compiler converts Unicode escape characters into the characters they represent [JLS 3.2]. Therefore, the first Unicode Escape Character in the program is used as the ending quotation mark of a single character string literal constant (", the second Unicode escape character is used as the start quotation mark of another single character string literal constant ("B. The program prints the expression "a". Length () + "B". Length (), that is, 2.

If the author of the program really wants this behavior, the following statement will be much clearer:

 
System.Out. println ("A". Length () +"B". Length ());

It is more likely that the author wants to place two double quotation marks inside the string literal constant. You cannot use Unicode escape characters, but you can use the Escape Character Sequence to implement [JLS 3.10.6]. The Escape Character Sequence of a double quotation mark is a backslash followed by a double quotation mark (\"). If you replace the Unicode Escape Character in the original program with the Escape Character Sequence, it prints the expected 16:

System.Out. println ("A \". Length () + \ "B". Length ());

Many characters have corresponding escape character sequences, including single quotes (\ '), line breaks (\ n), tabs (\ t), and backslash (\). You can use escape character sequences in character literal constants and string literal constants.

In fact, you can place any ASCII character in a string literal constant or a character literal constant by using a special series of escape characters called octal escape characters, however, it is best to use common escape character sequences as much as possible.

The common escape character sequences and octal escape characters are much better than the Unicode escape characters, because different from the Unicode escape characters, the escape character sequences are processed after the program is parsed into various symbols.

ASCII is the minimum common feature set of a character set. It contains only 128 characters, But Unicode contains more than 65,000 characters. A Unicode escape character can be used to insert a Unicode character in a program that only uses ASCII characters. A Unicode escape character is exactly equivalent to the character it represents.

Unicode escape characters are designed to be used when a programmer needs to insert a character that cannot be expressed in the source file character set. They are mainly used to place non-ASCII characters in identifiers, string literal constants, character literal constants, and comments. Occasionally, Unicode escape characters are also used to explicitly identify one of several seemingly similar characters to increase the definition of the program.

In short, the escape character sequence should be preferred in character strings and literal constants, rather than Unicode escape characters. Unicode escape characters may be confusing because they are processed too early in the compilation sequence. Do not use Unicode escape characters to represent ASCII characters. In character strings and character literal constants, escape character sequences should be used. In addition to these literal constants, ASCII characters should be directly inserted into the source file.

Puzzle 14: Escape Character defeat

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.