PHP Multi-byte string

Source: Internet
Author: User

Brief introduction

Although many languages each of the necessary characters can be mapped to a 8-bit (bit) value one-to-one, there are several languages that require a very large number of characters to communicate in writing, so that their encoding range cannot be contained in only a single byte (a byte is composed of 8 bit bits.) Each bit can contain only two different values: 1 or 0. Therefore, a byte can only represent 256 different values, which is 2 of the eight Parties. The multi-byte character encoding scheme is developed to express more than 256 characters in a general byte-based encoding system.

When you manipulate multibyte-encoded strings (trim, split, splice, and so on), you need to use specialized functions because two or more contiguous bytes may represent only one character in this encoding scheme. Otherwise, when you apply a function that cannot detect a multibyte string to the string, it may not be able to detect the starting position of the multibyte character and end with a garbled string, basically losing its original meaning.

Mbstring provides functions for multibyte strings that can help you with multibyte encoding in PHP. In addition, mbstring can encode and convert the possible character encodings to each other. For convenience, Mbstring is designed to handle Unicode-based encoding, similar to UTF-8, UCS-2, and many single-byte encodings.

Mbstring is not a default extension. This means that it is not activated by default. You must explicitly activate the module in the Configure option.

HTTP Input and output

The HTTP input/output character encoding conversion also applies to binary data. If the HTTP input/output is using binary data, the user should control the encoding conversion of the characters.

from PHP 4.3.3, if the Enctype property of the HTML form is set to Multipart/form-data, and the mbstring.encoding_translation in PHP.ini is set to ON, the POST variable and the The name of the file will also be converted to the internal character encoding. However, the transform does not apply to the key of the query.

HTTP input cannot control the conversion of HTTP input characters in PHP scripts. To disable the conversion of HTTP input characters, you must set it in php.ini.

Example #1 Disable the HTTP input conversion in php.ini

;; Disable HTTP input conversion mbstring.http_input = pass;; Disable HTTP input conversion (PHP 4.3.0 or later) Mbstring.encoding_translation = Off

When PHP is running as an Apache module. These settings can also be overridden by each virtual host command in httpd.conf or by the. htaccess in each directory.
There are several ways to use HTTP output-output character encoding conversions. One is to use php.ini, and the other is to use Ob_start (), Mb_output_handler () as the Ob_start callback function.

Example #2 php.ini Setup Example

;; Enable conversion of output character encoding for all PHP pages;; Enable output buffer output_buffering    = on; set Mb_output_handler for output conversion output_handler      = Mb_output_handler

Example #3 Script Example

<?php//    only enable output character encoding for this page conversion    //Set HTTP output character encoding for SJIS    mb_http_output (' SJIS ');    Start buffering and specify "Mb_output_handler" as the callback function    Ob_start (' Mb_output_handler '); >

Multi-byte String functions

mb_check_encoding-checks if the string is valid in the specified encoding

Mb_convert_case-converting a string to uppercase and lowercase

mb_convert_encoding-encoding of converted characters

Mb_convert_kana-convert "Kana" one from another ("Zen-kaku", "Han-kaku" and more)

mb_convert_variables-converting the character encoding of one or more variables

mb_decode_mimeheader-decoding a string in a MIME header field

Mb_decode_numericentity-decoded into characters based on an HTML numeric string

mb_detect_encoding-encoding of detected characters

mb_detect_order-set/Get the detection order of character encoding

mb_encode_mimeheader-encoding strings for MIME headers

Mb_encode_numericentity-encode character to HTML numeric string reference

Mb_encoding_aliases-get aliases of a known encoding type

Mb_ereg_match-regular expression match for multibyte string

Mb_ereg_replace_callback-perform a regular expresssion seach and replace with multibyte support using a callback

Mb_ereg_replace-replace regular expression with multibyte support

Mb_ereg_search_getpos-returns start point for next regular expression match

Mb_ereg_search_getregs-retrieve the result from the last multibyte regular expression match

Mb_ereg_search_init-setup string and regular expression for a multibyte regular expression match

Mb_ereg_search_pos-returns position and length of a matched part of the multibyte regular expression for a predefined mu Ltibyte string

Mb_ereg_search_regs-returns the matched part of a multibyte regular expression

Mb_ereg_search_setpos-set start point of next regular expression match

Mb_ereg_search-multibyte regular expression match for predefined multibyte string

Mb_ereg-regular expression match with multibyte support

Mb_eregi_replace-replace regular expression with multibyte support ignoring case

Mb_eregi-regular expression match ignoring case with multibyte support

Mb_get_info-getting the internal settings of the mbstring

mb_http_input-Detecting HTTP Input character encoding

mb_http_output-set/Get HTTP output character encoding

mb_internal_encoding-set/Get internal character encoding

mb_language-setting/Getting the current language

mb_list_encodings-returns all arrays that support encoding

Mb_output_handler-A callback function that converts character encoding in output buffering

mb_parse_str-parsing Get/post/cookie data and setting global variables

mb_preferred_mime_name-getting the MIME string

Mb_regex_encoding-set/get character encoding for multibyte regex

Mb_regex_set_options-set/get The default options for Mbregex functions

mb_send_mail-sending an encoded message

Mb_split-using regular expressions to split multibyte strings

Mb_strcut-getting part of a character

Mb_strimwidth-gets the string truncated by the specified width

mb_stripos-case insensitive to find the first occurrence of a string in another string

mb_stristr-case insensitive to find the first occurrence of a string in another string

Mb_strlen-gets the length of the string

mb_strpos-finding where a string first appears in another string

Mb_strrchr-finds the last occurrence of a specified character in another string

mb_strrichr-case insensitive to find the last occurrence of a specified character in another string

mb_strripos-case insensitive to find where a string last appears in a string

mb_strrpos-find where the string last appears in a string

Mb_strstr-finds the first occurrence of a string in another string

mb_strtolower-Make string lowercase

mb_strtoupper-making a string uppercase

mb_strwidth-returns the width of a string

mb_substitute_character-set/Get alternate characters

mb_substr_count-the number of occurrences of a statistical string

Mb_substr-getting the part of a string

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.