1. Eight binary escape sequence: \ + 1 to 3 bit 5 digits; Range ' \000 ' ~ ' \377 '
: null character
2.Unicode Escape Character: \u + four hexadecimal digits; 0~65535
\u0000: null character
3. Special characters: 3
\ ": Double quotation marks
\ ': Single quote
\ \: Backslash
4. Control characters: 5
\ ' Single quote character
\ \ backslash Character
\ r Enter
\ nthe line break
\f Paper Change page
\ t transverse jump lattice
\b Backspace
Escape of points:. ==> u002e
Escape of Dollar sign: $ ==> u0024
The escape of the symbol of the exponent: ^ ==> u005e
Escape of opening curly brace: {==> u007b
Escape of the left parenthesis: [==> u005b
Escape of the Left parenthesis: (==> u0028
Escape of vertical bars: | ==> u007c
Escape of Right parenthesis:) ==> u0029
Escape of asterisks: * ==> u002a
Escape of the plus sign: + ==> u002b
Escape of question marks:? ==> u003f
Anti-slash escape: ==> u005c
The following program uses two Unicode escape characters, which represent Unicode characters in their hexadecimal code. So, what does this program print?
Java code
Public class escaperout{ publicstaticvoid main (string[] args) { // \u0022 is the Unicode escape character of the double quotation mark System.out.println ("A\u0022.length () +\u0022b". Length ()); } }
Public class publicstaticvoid //
A very superficial analysis of the program would think that it should print 26 because there are 26 characters between the strings identified by the two double-quotes "a\u0022.length () +\u0022b".
A slightly deeper analysis would suggest that the program should print 16 because each of the two Unicode escape characters needs to be represented in the source file with 6 characters, but they represent only one character in the string. So the string should be 10 characters shorter than it looks. If you run this program, you will find that it is far from the case. It prints neither 26 nor 16, but 2.
The key to understanding this puzzle is to know that Java does not provide any special handling of Unicode escape characters in string literal constants. The compiler converts the Unicode escape characters into the characters they represent [JLS 3.2] before parsing the program into various symbols. Therefore, the first Unicode escape character in the program will be the closing quotation mark for a single character string literal constant ("a"), and the second Unicode escape character will be the opening quotation mark for another one-character string literal constant ("B"). The program prints the expression "a". Length () + "B". Length (), or 2.
If the program's author really wants this behavior, the following statement will be much clearer:
Java code
System.out.println ("a". Length () + "B". Length ());
More likely, the author wants to place two double-quote characters inside the string literal constant. You cannot do this with Unicode escape characters, but you can use the escape character sequence to implement [JLS 3.10.6]. The escape character sequence that represents a double quote is a backslash followed by a double quotation mark (\ "). If you replace the Unicode escape character in the original program with a sequence of escape characters, it prints the expected 16 (error, which should be 14, and does not know how it will come out 16):
Java code
System.out.println ("A\". Length () +\ "B". Length ());
Many characters have an appropriate sequence of escape characters, including single quotation marks (\ '), newline (\ n), tab (\ t), and backslash (\ \). You can use escape character sequences in character literal constants and string literal constants.
In fact, you can place any ASCII character in a string literal constant or a character literal constant by using a special type of escape character sequence called the Octal escape character, but it is best to use the normal escape character sequence whenever possible.
Both the normal escape character sequence and the octal escape character are much better than the Unicode escape character, because unlike Unicode escape characters, the escape character sequence is processed after the program has been parsed into various symbols.
ASCII is the minimum common set of properties for a character set, it is only 128 characters, but Unicode has more than 65,000 characters. A Unicode escape character can be used to insert a Unicode character in a program that uses only ASCII characters. A Unicode escape character is precisely equivalent to the character it represents.
The Unicode escape character is designed to be used when a programmer needs to insert a character that cannot be represented by the character set of the source file. They are primarily used to place non-ASCII characters in identifiers, string literal constants, character literal constants, and annotations. Occasionally, Unicode escape characters are also used to explicitly identify one of the several characters that appear to be quite similar, thereby increasing the clarity of the program.
In summary, the escape character sequence, rather than the Unicode escape character, is the preferred choice in string and character literal constants. Unicode escape characters can cause confusion because they are prematurely processed in the compilation sequence. Do not use Unicode escape characters to represent ASCII characters. The escape character sequence should be used in string and character literal constants, and for cases other than these literal constants, the ASCII characters should be inserted directly into the source file.
Escape characters in Java