The number of characters encoded and strings in Java.

Source: Internet
Author: User

 

The number of bytes occupied by strings in Java is closely related to character encoding.

Java coding can actually involve these aspects: ide encoding, operating system default encoding, and Java character encoding.

For example, when using eclipse to write a Java program, you can set the Java program encoding in the project properties. If this parameter is not set, the program encoding is the operating system encoding by default, the encoding set here is the code file encoding; or when we use Vim to write a Java program, you can set the system's environment variable Lang, such as zh_CN.UTF-8, zh_cn.gb18030, etc. At this time, the code file encoding is the encoding specified by Lang. This is the IDE encoding, IDE encoding is very important, for example, a Java code file is UTF-8 encoding, and your ide is gb18030 encoding, then the display will appear garbled.

The character encoding in Java refers to the encoding of a string in Java. For example, the following program is used to calculate the number of bytes occupied by a string and runs on Windows 7:

[Java]
View plaincopyprint?
  1. Public class charset {
  2. Public static void main (string [] ARGs ){
  3. // Todo auto-generated method stub
  4. String MSG = "China ABC ";
  5. System. Out. println (MSG );
  6. Int Len = MSG. getbytes (). length; // encoding by default Operating System Encoding
  7. System. Out. println (LEN );
  8. Try {
  9. Len = msg. getbytes ("gb2312"). length; // output 7
  10. System. Out. println ("gb2312:" + Len );
  11. Len = msg. getbytes ("GBK"). length; // output 7
  12. System. Out. println ("GBK:" + Len );
  13. Len = msg. getbytes ("gb18030"). length; // output 7, 2*2 + 3. One Chinese Character occupies 2 bytes, one English letter and one byte
  14. System. Out. println ("gb18030:" + Len );
  15. Len = msg. getbytes ("UTF-8"). length; // output 9, 2*3 + 3 = 9, a Chinese character occupies 3 bytes, an English letter a byte.
  16. System. Out. println ("UTF-8:" + Len );
  17. Len = msg. getbytes ("UTF-16"). length; // OUTPUT 12
  18. System. Out. println ("UTF-16:" + Len );
  19. Len = msg. getbytes ("UTF-32"). length; // output 20
  20. System. Out. println ("UTF-32:" + Len );
  21. Len = msg. getbytes ("Unicode"). length; // OUTPUT 12
  22. System. Out. println ("UNICODE:" + Len );
  23. } Catch (Java. Io. unsupportedencodingexception E)
  24. {
  25. System. Out. println (E. getmessage (). tostring ());
  26. }
  27. }
  28. }

Public class charset {</P> <p> Public static void main (string [] ARGs) {<br/> // todo auto-generated method stub <br/> string MSG = "China ABC"; <br/> system. out. println (MSG); <br/> int Len = MSG. getbytes (). length; // encoded by default Operating System encoding <br/> system. out. println (LEN); <br/> try {<br/> Len = MSG. getbytes ("gb2312 "). length; // output 7 <br/> system. out. println ("gb2312:" + Len); <br/> Len = MSG. getbytes ("GBK "). length; // output 7 <br/> system. out. println ("GBK:" + Len); <br/> Len = MSG. getbytes ("gb18030 "). length; // output 7, 2*2 + 3. One Chinese Character occupies 2 bytes, one English letter and one byte <br/> system. out. println ("gb18030:" + Len); <br/> Len = MSG. getbytes ("UTF-8 "). length; // output 9, 2*3 + 3 = 9. A Chinese Character occupies 3 bytes, and an English letter occupies 1 byte. <br/> system. out. println ("UTF-8:" + Len); <br/> Len = MSG. getbytes ("UTF-16 "). length; // OUTPUT 12 <br/> system. out. println ("UTF-16:" + Len); <br/> Len = MSG. getbytes ("UTF-32 "). length; // output 20 <br/> system. out. println ("UTF-32:" + Len); <br/> Len = MSG. getbytes ("Unicode "). length; // OUTPUT 12 <br/> system. out. println ("UNICODE:" + Len); <br/>} catch (Java. io. unsupportedencodingexception e) <br/>{< br/> system. out. println (E. getmessage (). tostring (); <br/>}</P> <p>}

Program output:

China ABC
7
Gb2312: 7
GBK: 7
Gb18030: 7
UTF-8: 9
UTF-16: 12
UTF-32: 20
UNICODE: 12

Analysis:
Len = MSG. getbytes (). the value of length is 7, because the Windows 7 operating system character encoding is GBK (gb2312, GBK or gb18030), Java uses the default Operating System encoding to encode characters when running the program, therefore, the number of characters in bytes is 7.

If this program is placed,

[Plain]
View plaincopyprint?
  1. [Zhankunlin @ icthtc javatest] $ export lang = zh_cn.gb18030
  2. [Zhankunlin @ icthtc javatest] $ Vim charset. Java (when writing a Java code file, the code used is zh_cn.gb18030, that is, the code file encoding is gb18030)
  3. [Zhankunlin @ icthtc javatest] $ javac charset. Java
  4. [Zhankunlin @ icthtc javatest] $ Java charset (lang = zh_cn.gb18030, that is, the default encoding is gb18030)
  5. China ABC
  6. 7 (the default encoding is gb18030, so it occupies 7 bytes)
  7. Gb2312: 7
  8. GBK: 7
  9. Gb18030: 7
  10. UTF-8: 9
  11. UTF-16: 12
  12. UTF-32: 20
  13. UNICODE: 12
  14. [Zhankunlin @ icthtc javatest] $ export lang = zh_CN.UTF-8 (Change System code to UTF-8)
  15. [Zhankunlin @ icthtc javatest] $ Java charset
  16. Juan .. ABC (because the xshell Terminal code is not set to UTF-8, so the print garbled)
  17. 9 (the operating system encoding is UTF-8, so it occupies 9 bytes)
  18. Gb2312: 7
  19. GBK: 7
  20. Gb18030: 7
  21. UTF-8: 9
  22. UTF-16: 12
  23. UTF-32: 20
  24. UNICODE: 12

[Zhankunlin @ icthtc javatest] $ export lang = zh_cn.gb18030 <br/> [zhankunlin @ icthtc javatest] $ Vim charset. java (when writing a Java code file, the code used is zh_cn.gb18030, that is, the code in the code file is gb18030) <br/> [zhankunlin @ icthtc javatest] $ javac charset. java <br/> [zhankunlin @ icthtc javatest] $ Java charset (lang = zh_cn.gb18030, that is, the default encoding is gb18030) <br/> China ABC <br/> 7 (the default system encoding is gb18030, so it occupies 7 bytes) <br/> gb2312: 7 <br/> GBK: 7 <br/> gb18030: 7 <br/> UTF-8: 9 <br/> UTF-16: 12 <br/> UTF-32: 20 <br/> UNICODE: 12 <br/> [zhankunlin @ icthtc javatest] $ export lang = zh_CN.UTF-8 (Change System code to UTF-8) <br/> [zhankunlin @ icthtc javatest] $ Java charset <br/> Juan .. ABC (because the xshell terminal encoding is not set to UTF-8, so the printing garbled) <br/> 9 (the operating system encoding is UTF-8, So 9 bytes) <br/> gb2312: 7 <br/> GBK: 7 <br/> gb18030: 7 <br/> UTF-8: 9 <br/> UTF-16: 12 <br/> UTF-32: 20 <br/> UNICODE: 12[Plain]
View plaincopyprint?

  1. {Set the xshell terminal encoding to UTF-8}

{Set the xshell terminal encoding to UTF-8}[Plain]
View plaincopyprint?

  1. [Zhankunlin @ icthtc javatest] $ Java charset
  2. China ABC (print normal)
  3. 9
  4. Gb2312: 7
  5. GBK: 7
  6. Gb18030: 7
  7. UTF-8: 9
  8. UTF-16: 12
  9. UTF-32: 20
  10. UNICODE: 12
  11. [Zhankunlin @ icthtc javatest] $ Vim charset. Java

[Zhankunlin @ icthtc javatest] $ Java charset <br/> China ABC (print normal) <br/> 9 <br/> gb2312: 7 <br/> GBK: 7 <br/> gb18030: 7 <br/> UTF-8: 9 <br/> UTF-16: 12 <br/> UTF-32: 20 <br/> UNICODE: 12 <br/> [zhankunlin @ icthtc javatest] $ Vim charset. java[Plain]
View plaincopyprint?

  1. [Zhankunlin @ icthtc javatest] $ javac charset. java (the program code file encoding is gb18030, while the system encoding is UTF-8 at the time of compilation, the compiler will read the code file for compilation in the way of operating system encoding if there is no specified, so there is a warning)
  2. Charset. Java: 6: bytes ?. Forbidden .??. Utf8 ?.??.. Rejected... Enabled
  3. String MSG = "ABC ";
  4. ^
  5. Charset. Java: 6: bytes ?. Forbidden .??. Utf8 ?.??.. Rejected... Enabled
  6. String MSG = "ABC ";

[Zhankunlin @ icthtc javatest] $ javac charset. java (the program code file encoding is gb18030, while the system encoding is UTF-8 at the time of compilation, the compiler will read the code file for compilation in the way of operating system encoding if there is no specified, so there is a warning) <br/> charset. java: 6: runtime ?. Forbidden .??. Utf8 ?.??.. Forbidden .. Sorry <br/> string MSG = "ABC"; <br/> ^ <br/> charset. Java: 6: forbidden ?. Forbidden .??. Utf8 ?.??.. Summary .. summary <br/> string MSG = "ABC ";[Plain]
View plaincopyprint?

  1. [Zhankunlin @ icthtc javatest] $ javac-encoding gb18030 charset. Java (if you use the-encoding option to specify the encoding format of the program file, compilation will not fail)
  2. [Zhankunlin @ icthtc javatest] $ Java charset {print normal, because the xshell terminal encoding has been set to UTF-8 }}
  3. China ABC
  4. 9
  5. Gb2312: 7
  6. GBK: 7
  7. Gb18030: 7
  8. UTF-8: 9
  9. UTF-16: 12
  10. UTF-32: 20
  11. UNICODE: 12

[Zhankunlin @ icthtc javatest] $ javac-encoding gb18030 charset. java (if you use the-encoding option to specify the encoding format of the program file, the compilation will not fail.) <br/> [zhankunlin @ icthtc javatest] $ Java charset {print normal, because the xshell terminal encoding has been set to UTF-8 }}< br/> China ABC <br/> 9 <br/> gb2312: 7 <br/> GBK: 7 <br/> gb18030: 7 <br/> UTF-8: 9 <br/> UTF-16: 12 <br/> UTF-32: 20 <br/> UNICODE: 12 <br/>[Plain]
View plaincopyprint?

  1. <PRE>

<PRE>[Plain]
View plaincopyprint?

  1. </PRE> <PRE name = "code" class = "plain">

</PRE> <PRE name = "code" class = "plain">[Plain]
View plaincopyprint?

  1. </PRE> <PRE name = "code" class = "plain">

</PRE> <PRE name = "code" class = "plain">[Plain]
View plaincopyprint?

  1. </PRE> <PRE name = "code" class = "plain"> <PRE>

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.