1.PHP
PHP is actually the same as the C language, using ASCII, a char accounted for 1 bytes, in the GBK encoding, an English account of 1 bytes, a Chinese accounted for 2 bytes. However, in UTF-8 encoding, an English still accounts for 1 bytes, but a Chinese is 3-4 bytes (typically 3 bytes), which usually gives you the trouble of getting the string word length or string interception. For example:
The above questions online can find the answer, the simplest is to use the extension library, with mb_substr function to intercept.
2.Java
A char in Java is 2 bytes. Java uses unicode,2 bytes to represent a character, and a Chinese or English character has a Unicode encoding of 2 bytes, but if other encodings are used, each character occupies a different number of bytes. For example:
public class Test {public static void Main (string[] args) { String str = "We aaaaa"; int byte_len = Str.getbytes (). length; int len = Str.length (); System.out.println ("Byte length:" + byte_len); System.out.println ("character length:" + len);} }
in the above example, the GBK output is: 9 and 7, but the output under UTF-8 is: 11 and 7, meaning that the word length obtained by str.length () is consistent regardless of the encoding used. The method returns the number of characters in a string, either Chinese or English, and is considered a character.
The above describes the PHP and Java in the English byte length and encoding relationship, including the content, I hope that the PHP tutorial interested in a friend helpful.