: This article mainly introduces the relationship between the length and encoding of php and java in Chinese and English bytes. if you are interested in the PHP Tutorial, refer to it. 1. PHP
PHP is similar to the C language. it uses ASCII. a char occupies 1 byte and is encoded in GBK. an English character occupies 1 byte and a Chinese character occupies 2 bytes. However, in the UTF-8 encoding, an English still occupies 1 byte, but a Chinese character occupies 3-4 bytes (usually 3 bytes ), this usually causes you to get the character length of a string or truncate a string. For example:
You can find the answer to the above questions online. The simplest is to use the extended library and use the mb_substr function to intercept them.
2. Java
A char in java is two bytes. Java uses unicode and two bytes to represent one character. unicode encoding of one Chinese or English character occupies two bytes. However, if other encoding methods are used, each character occupies different bytes. For example:
Public class Test {public static void main (String [] args) {String str = "we aaaaa"; int byte_len = str. getBytes (). length; int len = str. length (); System. out. println ("bytes:" + byte_len); System. out. println ("character length:" + len );}}
In the above example, the output result in GBK is 9 and 7, but in the UTF-8 the output result is 11 and 7, that is, no matter what encoding, with str. the length () is the same. This method returns the number of characters in a string. both Chinese and English characters are considered as one character.
The above section describes the length and encoding relationship between php and java in Chinese and English bytes, including some content. it is helpful for anyone interested in PHP tutorials.