Php multi-byte string

Source: Internet
Author: User
PHP extended string encoding functions-multi-byte strings

Introduction

Although each of the necessary characters in many languages can be mapped to an 8-bit value, there are also several languages that require a lot of characters for written communication, so that their encoding range cannot be only contained in one Byte (a Byte consists of 8 bits. Each bit can only contain two different values: 1 or 0. Therefore, a single byte can only represent 256 different values, that is, the power of 2 ). The multi-byte character encoding scheme is developed to express more than 256 characters in a byte-based general encoding system.

When you operate multi-byte encoded strings (trim, split, splice, and so on), because of this encoding scheme, two or more consecutive bytes may only express one character, so you need to use special functions. Otherwise, when you apply a function that cannot detect a multi-byte string to this string, it may not be able to detect the starting position of the multi-byte character and end with a garbled string, the original meaning is basically lost.

Mbstring provides a function for multi-byte strings to help you process the multi-byte encoding in PHP. In addition, mbstring can also be encoded and converted between possible character encodings. For convenience, mbstring is designed to handle Unicode-based encoding, similar to UTF-8, UCS-2 and many single-byte encoding.

Mbstring is not a default extension. This means that it is not activated by default. You must explicitly activate this module in the configure option.

HTTP input and output

The conversion of HTTP input/output character encoding is also applicable to binary data. If binary data is used for HTTP input/output, you should control character encoding and conversion.

Since PHP 4.3.3, if the enctype attribute of the HTML form is set to multipart/form-data, and php. mbstring in ini. encoding_translation is set to On. The POST variable and the name of the uploaded file will also be converted to internal character encoding. However, the conversion will not be applied to the query key.

The conversion of HTTP input characters cannot be controlled in the PHP script. To disable conversion of HTTP input characters, you must set it in php. ini.

Example #1 disable HTTP input conversion in php. ini

; Disable HTTP input conversion mbstring. http_input = pass; disable HTTP input conversion (PHP 4.3.0 or later) mbstring. encoding_translation = Off

When PHP runs in the Apache module. These settings can also be overwritten through each Virtual Host command in httpd. conf or. htaccess in each directory ).
There are several methods to convert the HTTP output character encoding. One is php. ini, the other is ob_start (), and mb_output_handler () is used as the callback function of ob_start.

Example #2 php. ini setting Example

; Enable output character encoding conversion for all PHP pages; enable output buffer output_buffering = On; set mb_output_handler to output conversion output_handler = mb_output_handler

Example #3 script Example

 

Multi-byte string functions

Mb_check_encoding-check whether the string is valid in the specified encoding.

Mb_convert_case-case-based string conversion

Mb_convert_encoding-encoding of converted characters

Mb_convert_kana-Convert "kana" one from another ("zen-kaku", "han-kaku" and more)

Mb_convert_variables-character encoding for converting one or more variables

Mb_decode_mimeheader-decode the string in the MIME header field

Mb_decode_numericentity-decodes characters from HTML numeric strings

Mb_detect_encoding-encoding of detected characters

Mb_detect_order-set/obtain the check sequence of character encoding

Mb_encode_mimeheader-the MIME header encoding string

Mb_encode_numericentity-Encode character to HTML numeric string reference

Mb_encoding_aliases-Get aliases of a known encoding type

Mb_ereg_match-Regular expression match for multibyte string

Mb_ereg_replace_callback-Perform a regular expresssion seach and replace with multibyte support using a callback

Mb_ereg_replace-Replace regular expression with multibyte support

Mb_ereg_search_getpos-Returns start point for next regular expression match

Mb_ereg_search_getregs-Retrieve the result from the last multibyte regular expression match

Mb_ereg_search_init-Setup string and regular expression for a multibyte regular expression match

Mb_ereg_search_pos-Returns position and length of a matched part of the multibyte regular expression for a predefined multibyte string

Mb_ereg_search_regs-Returns the matched part of a multibyte regular expression

Mb_ereg_search_setpos-Set start point of next regular expression match

Mb_ereg_search-Multibyte regular expression match for predefined multibyte string

Mb_ereg-Regular expression match with multibyte support

Mb_eregi_replace-Replace regular expression with multibyte support ignoring case

Mb_eregi-Regular expression match ignoring case with multibyte support

Mb_get_info-get internal settings of mbstring

Mb_http_input-detects the HTTP input character encoding

Mb_http_output-set/get HTTP output character encoding

Mb_internal_encoding-set/get internal character encoding

Mb_language-set/obtain the current language

Mb_list_encodings-returns all encoded arrays.

Mb_output_handler-callback function for converting character encoding in the output buffer

Mb_parse_str-parse GET/POST/COOKIE data and set global variables

Mb_preferred_mime_name-get the MIME string

Mb_regex_encoding-Set/Get character encoding for multibyte regex

Mb_regex_set_options-Set/Get the default options for mbregex functions

Mb_send_mail-send an encoded email

Mb_split-use regular expressions to separate multi-byte strings

Mb_strcut-get part of the character

Mb_strimwidth-get the string truncated by the specified width

Mb_stripos-case-insensitive search for the position where the string first appears in another string

Mb_stristr-case-insensitive search for the first appearance of a string in another string

Mb_strlen-get the string length

Mb_strpos-find the position where the string first appears in another string

Mb_strrchr-find the last occurrence of a specified character in another string

Mb_strrichr-case-insensitive search for the last occurrence of a specified character in another string

Mb_strripos-case-insensitive search for the last position of a string in the string

Mb_strrpos-searches for the position of the string in a string.

Mb_strstr-find the first appearance of a string in another string

Mb_strtolower-lowercase string

Mb_strtoupper-uppercase string

Mb_strwidth-returns the string width.

Mb_substitute_character-set/get substitution characters

Mb_substr_count-count the number of times a string appears

Mb_substr-obtain the part of the string

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.