Character string, encoding, UTF-8 in PHP

Source: Internet
Author: User
Recently read a lot of coding articles, so divided into two blog post said "PHP, string, encoding, UTF-8" related knowledge, this blog is the first half, divided into four parts, they are "definition and use of strings", "string conversion", "essence of PHP strings", and "multi-byte strings ". Upper-half comparison basics

Recently read a lot of coding articles, so divided into two blog post said "PHP, string, encoding, UTF-8" related knowledge, this blog is the first half, divided into four parts, they are "definition and use of strings", "string conversion", "essence of PHP strings", and "multi-byte strings ". The upper part is relatively basic ,.

Definition and use of strings

In PHP, you can set strings in four ways:

Single quotation mark string

The single quotation mark string is similar to the original character string in Python. that is to say, the single quotation mark string does not support variable parsing and special character escaping. For example, $ str = 'Hello \ nworld', where \ n does not have the line feed function.

Double quotation mark string

Double quotation marks are used to parse variables and escape special characters that are not provided by single quotation marks.

I am very interested in the special escape characters of hexadecimal and octal strings:

\ [0-7] {1, 3} # octal expression \ x [0-9A-Fa-f] {1, 2} # hexadecimal expression


This expression is similar to a long string in Python and can define strings containing multiple rows. Its syntax definition is very strict, so pay attention to it.

$ Str = <


Nowdoc is similar to a single quotes string and does not parse variables. It is suitable for defining a large text segment without escaping special characters.

Variable parsing

The most powerful part of a PHP string is variable parsing. variables can be parsed based on the context at runtime (this is an interpreted language), which can be a great use.

Simple variable parsing means that strings can contain "variables", "arrays", and "object attributes ", complex syntax rules use the {} symbol to perform operations (to form an expression ).

Let's look at the power of variable parsing through an example.

class beers {    const softdrink = 'softdrink';    public static $ale = 'ale';    public $data = array(1,3,"k"=>4);} $softdrink = "softdrink";$ale = "ale";$arr = array("arr1","arr2","arr3"=>"arr4","arr4"=>array(1,2));$arr4 = "arr4";$obj = new beers;echo "line1:{$arr[1]}\n";echo "line2:{$arr['arr4'][0]}\n";echo "line3:{$obj->data[1]}\n";echo "line4:{${$arr['arr3']}}\n";echo "line5:{${$arr['arr3']}[1]}\n";echo "line6:{${beers::softdrink}}\n";echo "line7:{${beers::$ale}}\n";
String conversion

Another reason why PHP is simpler than Python is implicit type conversion, which simplifies many operations. here we use string conversion.

String type forced conversion

$var = 10 ;$dvar = (string)$var ;echo $dvar . "_" . gettype($dvar);

The strval () function is used to obtain the string value of the variable:

$var = 10.2 ;$dvar = strval($var) ;echo gettype($var) . "_" . $dvar . "_" . gettype($dvar);

The settype () function is used to set the variable type:

$str = "10hello";settype($str, "integer");echo $str ;

During the forced type conversion process, certain rules are followed when values of other types are converted to strings. for example, the TRUE value of a boolean value is converted to the "1" of a string ". It is best to understand the relevant rules.

Automatic type conversion

The two conversions above are Display conversions, but more importantly, they are automatic conversions. in a string expression, they are automatically converted to types. for details, see the example:

$bool = true;$str = 10 + "hello"echo $bool . "_" . $str ;
PHP string nature

Explanation of reference to the PHP document:

The implementation of string in PHP is an array composed of bytes plus an integer to specify the buffer length. There is no way to convert bytes into character information, which is determined by the programmer. There is no limit on the value of a string. bytes with a value of 0 can appear anywhere in the string.

PHP does not specify the encoding of a string. it depends on the programmer. The string is encoded according to the PHP file encoding. For example, if your file encoding is GBK, your code content is GBK.

Supplement the concept of binary security. bytes with a value of 0 (NULL) can be in any position of the string, while some non-binary functions in PHP are called C functions at the underlying layer, the characters after NULL are ignored.

As long as the PHP file encoding is compatible with ASCII, string operations can be well processed. However, string operations are Native in nature (no matter what the file encoding is), so you must note the following when using it:

  • Some functions assume that strings are single-byte encoded, but do not need to be interpreted as specific characters. For example, the sbustr () function.

  • Many functions are the pass-encoding parameters to be displayed. Otherwise, the default value will be obtained from the PHP. INI file, such as the htmlentities () function.

  • There are also some functions related to the local region. these functions can only be operated in a single byte.

Under normal circumstances, although PHP does not support Unicode characters inside, but supports UTF-8 encoding, in most cases there will be no problem, but the following situations may not be handled:

  • How to convert a non-UTF-8 encoded string

  • A UTF-8-encoded web page, but the user may use GBK encoding when submitting the form (without following the meta tag)

  • A UTF-8-encoded PHP file that uses strlen ("China") returns 6 instead of the actual number of characters (2)

How can this problem be solved? PHP provides the mbstring extension!

Multi-byte string

The mbstring extension is not enabled by default. during installation, -- enable-mbstring is required.

First, let's take a look at the configuration of the mbstring command in PHP. INI. it took a long time to gradually understand it.

  • The parameter mbstring. language is understood as a UTF-8.

  • Mbstring. internal_encoding is irrelevant to PHP file encoding, but the encoding of the string to be processed must be specified in most mbstring functions. If no encoding is displayed, the parameter value is obtained by default, the value of this parameter is replaced by the default_charset parameter in PHP later versions.

  • Mbstring. http_input this parameter specifies the default HTTP input encoding (excluding the GET parameter ). It is generally consistent with the encoding of HTML pages. The value of this parameter is replaced by the default_charset parameter.

  • Mbstring. http_output this parameter misleads me. what is HTTP output and PHP output is not a page? how can this concept be found?

  • Mbstring. encoding_translation. this parameter is disabled by default. If enabled, PHP will automatically convert the POST variable and the name of the uploaded file to mbstring. the value specified by internal_encoding, but I have not tried it. you can upload a file with a Chinese name. We recommend that you close it and ask the programmer to solve the problem.

Next let's look at some functions of the mbstring extension:

  • Mb_http_input (): checks the HTTP input character encoding and finds it necessary to process the file name uploaded.

  • Mb_convert_encoding (): a common function. pay attention to the third parameter.

  • Mb_detect_order (): sets/obtains the check sequence of character encoding.

  • Mb_list_encodings (): returns the list of supported encoding codes.

Note: PHP files must be compatible with ASCII codes.

But do not use BIG-5 as PHP file encoding, especially strings in the form of identifiers or literals. if the PHP file encoding is BIG-5, so try to convert the content of input output to UTF-8.

Zend Multibyte

In the end, the concept of Zend Multibyte is not very profound. first, do not mix it with the mbstring extension. The Zend Multibyte mode is disabled by default and can be opened through the zend. multibyte command. Then, the declare () function is used to specify the encoding of the PHP parser.

What does this command mean? As mentioned above, PHP files must be encoded with ASCII compatibility. how can we use this command to perform incompatible ASCII encoding similar to BIG-5? when the PHP parser reads mbstring. script_encoding encoding is used to parse PHP files.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.