Brief introduction
Although many languages each of the necessary characters can be mapped to a 8-bit (bit) value one-to-one, there are several languages that require a very large number of characters to communicate in writing, so that their encoding range cannot be contained in only a single byte (a byte is composed of 8 bit bits.) Each bit can contain only two different values: 1 or 0. Therefore, a byte can only represent 256 different values, which is 2 of the eight Parties. The multi-byte character encoding scheme is developed to express more than 256 characters in a general byte-based encoding system.
When you manipulate multibyte-encoded strings (trim, split, splice, and so on), you need to use specialized functions because two or more contiguous bytes may represent only one character in this encoding scheme. Otherwise, when you apply a function that cannot detect a multibyte string to the string, it may not be able to detect the starting position of the multibyte character and end with a garbled string, basically losing its original meaning.
Mbstring provides functions for multibyte strings that can help you with multibyte encoding in PHP. In addition, mbstring can encode and convert the possible character encodings to each other. For convenience, Mbstring is designed to handle Unicode-based encoding, similar to UTF-8, UCS-2, and many single-byte encodings.
Mbstring is not a default extension. This means that it is not activated by default. You must explicitly activate the module in the Configure option.
HTTP Input and output
The HTTP input/output character encoding conversion also applies to binary data. If the HTTP input/output is using binary data, the user should control the encoding conversion of the characters.
from PHP 4.3.3, if the Enctype property of the HTML form is set to Multipart/form-data, and the mbstring.encoding_translation in PHP.ini is set to ON, the POST variable and the The name of the file will also be converted to the internal character encoding. However, the transform does not apply to the key of the query.
HTTP input cannot control the conversion of HTTP input characters in PHP scripts. To disable the conversion of HTTP input characters, you must set it in php.ini.
Example #1 Disable the HTTP input conversion in php.ini
;; Disable HTTP input conversion mbstring.http_input = pass;; Disable HTTP input conversion (PHP 4.3.0 or later) Mbstring.encoding_translation = Off
When PHP is running as an Apache module. These settings can also be overridden by each virtual host command in httpd.conf or by the. htaccess in each directory.
There are several ways to use HTTP output-output character encoding conversions. One is to use php.ini, and the other is to use Ob_start (), Mb_output_handler () as the Ob_start callback function.
Example #2 php.ini Setup Example
;; Enable conversion of output character encoding for all PHP pages;; Enable output buffer output_buffering = on; set Mb_output_handler for output conversion output_handler = Mb_output_handler
Example #3 Script Example
<?php// only enable output character encoding for this page conversion //Set HTTP output character encoding for SJIS mb_http_output (' SJIS '); Start buffering and specify "Mb_output_handler" as the callback function Ob_start (' Mb_output_handler '); >
Multi-byte String functions
mb_check_encoding-checks if the string is valid in the specified encoding
Mb_convert_case-converting a string to uppercase and lowercase
mb_convert_encoding-encoding of converted characters
Mb_convert_kana-convert "Kana" one from another ("Zen-kaku", "Han-kaku" and more)
mb_convert_variables-converting the character encoding of one or more variables
mb_decode_mimeheader-decoding a string in a MIME header field
Mb_decode_numericentity-decoded into characters based on an HTML numeric string
mb_detect_encoding-encoding of detected characters
mb_detect_order-set/Get the detection order of character encoding
mb_encode_mimeheader-encoding strings for MIME headers
Mb_encode_numericentity-encode character to HTML numeric string reference
Mb_encoding_aliases-get aliases of a known encoding type
Mb_ereg_match-regular expression match for multibyte string
Mb_ereg_replace_callback-perform a regular expresssion seach and replace with multibyte support using a callback
Mb_ereg_replace-replace regular expression with multibyte support
Mb_ereg_search_getpos-returns start point for next regular expression match
Mb_ereg_search_getregs-retrieve the result from the last multibyte regular expression match
Mb_ereg_search_init-setup string and regular expression for a multibyte regular expression match
Mb_ereg_search_pos-returns position and length of a matched part of the multibyte regular expression for a predefined mu Ltibyte string
Mb_ereg_search_regs-returns the matched part of a multibyte regular expression
Mb_ereg_search_setpos-set start point of next regular expression match
Mb_ereg_search-multibyte regular expression match for predefined multibyte string
Mb_ereg-regular expression match with multibyte support
Mb_eregi_replace-replace regular expression with multibyte support ignoring case
Mb_eregi-regular expression match ignoring case with multibyte support
Mb_get_info-getting the internal settings of the mbstring
mb_http_input-Detecting HTTP Input character encoding
mb_http_output-set/Get HTTP output character encoding
mb_internal_encoding-set/Get internal character encoding
mb_language-setting/Getting the current language
mb_list_encodings-returns all arrays that support encoding
Mb_output_handler-A callback function that converts character encoding in output buffering
mb_parse_str-parsing Get/post/cookie data and setting global variables
mb_preferred_mime_name-getting the MIME string
Mb_regex_encoding-set/get character encoding for multibyte regex
Mb_regex_set_options-set/get The default options for Mbregex functions
mb_send_mail-sending an encoded message
Mb_split-using regular expressions to split multibyte strings
Mb_strcut-getting part of a character
Mb_strimwidth-gets the string truncated by the specified width
mb_stripos-case insensitive to find the first occurrence of a string in another string
mb_stristr-case insensitive to find the first occurrence of a string in another string
Mb_strlen-gets the length of the string
mb_strpos-finding where a string first appears in another string
Mb_strrchr-finds the last occurrence of a specified character in another string
mb_strrichr-case insensitive to find the last occurrence of a specified character in another string
mb_strripos-case insensitive to find where a string last appears in a string
mb_strrpos-find where the string last appears in a string
Mb_strstr-finds the first occurrence of a string in another string
mb_strtolower-Make string lowercase
mb_strtoupper-making a string uppercase
mb_strwidth-returns the width of a string
mb_substitute_character-set/Get alternate characters
mb_substr_count-the number of occurrences of a statistical string
Mb_substr-getting the part of a string