Just to remember that strlen () return the number of characters of a string. often the strlen () function is used to compute the length in bytes of a string. this is correct until string is single byte encoded. if multi-byte char-set is used this constraint I no more verified. so when you require the number of bytes of A ascii or UTF-8 encoded string, it is better to use following function:
/**
* Count the number of bytes of a given string.
* Input string is expected to be ascii or UTF-8 encoded.
* Warning: the function doesn' t return the number of chars
* In the string, but the number of bytes.
*
* @ Param string $ STR the string to compute number of bytes
*
* @ Return the length in bytes of the given string.
*/
Function strbytes ($ Str)
{
// Strings are expected to be in ASCII or UTF-8 format
// Number of characters in string
$ Strlen_var = strlen ($ Str );
// String bytes counter
$ D = 0;
/*
* Iterate over every character in the string,
* Escaping with a slash or encoding to UTF-8 where necessary
*/
For ($ C = 0; $ C <$ strlen_var; ++ $ c ){
$ Ord_var_c = ord ($ STR {$ d });
Switch (true ){
Case ($ ord_var_c> = 0x20) & ($ ord_var_c <= 0x7f )):
// Characters U-00000000-U-0000007F (same as ASCII)
$ D ++;
Break;
Case ($ ord_var_c & 0xe0) = 0xc0 ):
// Characters U-00000080-U-000007FF, mask 110 XXXXX
// See http://www.cl.cam.ac.uk /~ Mgk25/unicode.html # UTF-8
$ D + = 2;
Break;
Case ($ ord_var_c & 0xf0) = 0xe0 ):
// Characters U-00000800-U-0000FFFF, mask 1110 xxxx
// See http://www.cl.cam.ac.uk /~ Mgk25/unicode.html # UTF-8
$ D + = 3;
Break;
Case ($ ord_var_c & 0xf8) = 0xf0 ):
/Characters U-00010000-U-001FFFFF, mask 11110xxx
// See http://www.cl.cam.ac.uk /~ Mgk25/unicode.html # UTF-8
$ D + = 4;
Break;
Case ($ ord_var_c & 0xfc) = 0xf8 ):
/Characters U-00200000-U-03FFFFFF, mask 111110xx
// See http://www.cl.cam.ac.uk /~ Mgk25/unicode.html # UTF-8
$ D + = 5;
Break;
Case ($ ord_var_c & 0xfe) = 0xfc ):
/Characters U-04000000-U-7FFFFFFF, mask 1111110x
// See http://www.cl.cam.ac.uk /~ Mgk25/unicode.html # UTF-8
$ D + = 6;
Break;
Default:
$ D ++;
}
}
Return $ D;
}
This function has been adapted form the JSON function used to convert character in UTF-8 representation.
With this new function we solved problem in JSON and in pear/soap PHP libraries.