Strlen function and multi-byte encoded string Byte Length

Source: Internet
Author: User
Just to remember that strlen () return the number of characters of a string. often the strlen () function is used to compute the length in bytes of a string. this is correct until string is single byte encoded. if multi-byte char-set is used this constraint I no more verified. so when you require the number of bytes of A ascii or UTF-8 encoded string, it is better to use following function:

/**
* Count the number of bytes of a given string.
* Input string is expected to be ascii or UTF-8 encoded.
* Warning: the function doesn' t return the number of chars
* In the string, but the number of bytes.
*
* @ Param string $ STR the string to compute number of bytes
*
* @ Return the length in bytes of the given string.
*/
Function strbytes ($ Str)
{
// Strings are expected to be in ASCII or UTF-8 format

// Number of characters in string
$ Strlen_var = strlen ($ Str );

// String bytes counter
$ D = 0;

/*
* Iterate over every character in the string,
* Escaping with a slash or encoding to UTF-8 where necessary
*/
For ($ C = 0; $ C <$ strlen_var; ++ $ c ){

$ Ord_var_c = ord ($ STR {$ d });

Switch (true ){
Case ($ ord_var_c> = 0x20) & ($ ord_var_c <= 0x7f )):
// Characters U-00000000-U-0000007F (same as ASCII)
$ D ++;
Break;

Case ($ ord_var_c & 0xe0) = 0xc0 ):
// Characters U-00000080-U-000007FF, mask 110 XXXXX
// See http://www.cl.cam.ac.uk /~ Mgk25/unicode.html # UTF-8
$ D + = 2;
Break;

Case ($ ord_var_c & 0xf0) = 0xe0 ):
// Characters U-00000800-U-0000FFFF, mask 1110 xxxx
// See http://www.cl.cam.ac.uk /~ Mgk25/unicode.html # UTF-8
$ D + = 3;
Break;

Case ($ ord_var_c & 0xf8) = 0xf0 ):
/Characters U-00010000-U-001FFFFF, mask 11110xxx
// See http://www.cl.cam.ac.uk /~ Mgk25/unicode.html # UTF-8
$ D + = 4;
Break;

Case ($ ord_var_c & 0xfc) = 0xf8 ):
/Characters U-00200000-U-03FFFFFF, mask 111110xx
// See http://www.cl.cam.ac.uk /~ Mgk25/unicode.html # UTF-8
$ D + = 5;
Break;

Case ($ ord_var_c & 0xfe) = 0xfc ):
/Characters U-04000000-U-7FFFFFFF, mask 1111110x
// See http://www.cl.cam.ac.uk /~ Mgk25/unicode.html # UTF-8
$ D + = 6;
Break;
Default:
$ D ++;
}
}

Return $ D;
}

This function has been adapted form the JSON function used to convert character in UTF-8 representation.

With this new function we solved problem in JSON and in pear/soap PHP libraries.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.