strings, encodings, UTF-8 in PHP

Source: Internet
Author: User
Tags php language type casting

Recently read a lot of coding aspects of the article, so divided into two posts "PHP, String, code, UTF-8" related knowledge, this post is the upper half, divided into four pieces of content, respectively, "string definition and use", "string Conversion", "The Nature of PHP string," "multibyte string." The upper part compares the basic,.

Definition and use of strings

There are four ways to set strings in PHP:

Single-Quote string

The single-quote string is similar to the original string in Python, meaning that the single-quote string does not have variable parsing and special character escapes. For example $str= ' Hello\nworld ', where \ n does not have a newline function.

Double quote string

The double-quote string has the function of variable parsing and special character escaping without the single quote string.

Individuals are interested in the special escape of hexadecimal and octal strings, in particular:

\[0-7]{1,3} #八进制表达方式 \x[0-9a-fa-f]{1,2} #十六进制表达方式

Heredoc

This expression resembles a long string in Python and is capable of defining strings that contain multiple lines. Its syntax is strictly defined and needs attention.

$str =<<<eodhello\nworldeod;

Nowdoc

Nowdoc is similar to single-quote strings and does not parse variables. It is more appropriate to define a large piece of text without escaping the special characters therein.

Variable resolution

The most powerful part of the PHP string is the variable parsing, which can be used to parse variables at run time (this is the interpreted language), which can produce a lot of magical things.

Simple variable parsing is the ability to include "variables", "Arrays", "Object Properties" in a string, and a complex syntax rule that uses the {} symbol to manipulate (form an expression).

See the power of variable parsing in one example

Class Beers {    Const Softdrink = ' Softdrink ';    public static $ale = ' ale ';    Public $data = Array (1,3, "K" =>4);} $softdrink = "Softdrink"; $ale = "ale"; $arr = Array ("arr1", "arr2", "arr3" = "ARR4", "ARR4" =>array); $arr 4 = " ARR4 "; $obj = new Beers;echo" line1:{$arr [1]}\n "; echo" line2:{$arr [' Arr4 '][0]}\n "; echo" line3:{$obj->data[1]}\n "; echo "line4:{${$arr [' Arr3 ']}}\n"; echo "line5:{${$arr [' Arr3 ']}[1]}\n"; echo "line6:{${beers::softdrink}}\n"; echo " Line7:{${beers:: $ale}}\n ";

String conversions

Another reason for the PHP language to be simpler than Python is that implicit conversions of types can simplify many operations, as illustrated by string conversions.

String Type Casting

$var = ten; $dvar = (string) $var; Echo $dvar. "_" . GetType ($dvar);

The Strval () function is a string value that gets the variable:

$var = 10.2; $dvar = Strval ($var); Echo GetType ($var). "_" . $dvar. "_" . GetType ($dvar);

The Settype () function is the type of the set variable:

$str = "10hello"; Settype ($str, "integer"); echo $str;

During coercion of type conversion, the conversion of other types of values to strings follows certain rules, such as a Boolean value of Boolean TRUE being converted to string "1". The rules are best understood.

Automatic type conversion

The above two transformations belong to the display transformation, but more attention is the automatic type conversion, in a need string expression, will be automatically converted to type, see example:

$bool = true; $str = ten + "Hello" echo $bool. "_" . $STR;

The nature of PHP strings

Explanation of the PHP document referenced:

The string in PHP is implemented by a byte array plus an integer indicating the buffer length. There is no information on how to convert bytes into characters, as determined by the programmer. There is no limit to what value a string consists of, including a byte with a value of 0 that can appear anywhere in the string.

PHP does not specifically specify the encoding of the string, which is how the string is encoded, depending on the programmer. The string is encoded according to the encoding of the PHP file. For example, your file encoding is GBK, then your code content is GBK.

In addition to the concept of binary security, a byte with a value of 0 (NULL) can be anywhere in a string, while a portion of PHP's non-binary function is called the C function, which ignores the character after null.

As long as the PHP file encoding is compatible with ASCII, then the string manipulation can be handled very well. But string manipulation is inherently Native (regardless of the file encoding), so you need to be aware of it when you use it:

    • Some functions assume that a string is encoded in a single byte, but do not need to interpret the byte as a specific character. such as the SBUSTR () function.

    • Many functions are required to display the pass-through encoding parameters, otherwise it will be from PHP. INI file, for example, the Htmlentities () function.

    • There are also functions that are related to local areas, and these functions can only be single-byte operations.

In general, although Unicode characters are not supported internally in PHP, but UTF-8 encoding is supported, in most cases there will be no problem, but the following conditions may not be able to handle:

    • How non-UTF-8 encoded strings are converted

    • A UTF-8 encoded Web page, but the user may use GBK encoding when submitting the form (does not follow the META tag)

    • A UTF-8 encoded PHP file that uses strlen ("China") to return 6 characters (2) instead of the actual number

So how do we solve the problem? PHP offers the mbstring extension!

Multi-byte string

Mbstring extensions are not open by default and need to be--enable-mbstring when installing.

Let's look at PHP first. INI in the configuration of the mbstring instruction, it took a long time to gradually understand.

    • Mbstring.language This parameter, I understand it as UTF-8.

    • Mbstring.internal_encoding This code is not related to PHP file encoding, but in most mbstring functions need to specify the encoding of the string to be processed, if not display the specified, the default is to get the value of the parameter, the value of this parameter in the high version of PHP The Default_charset parameter is replaced by the.

    • Mbstring.http_input This parameter specifies the default encoding for HTTP input (does not include the GET parameter). General and HTML page encoding is consistent, the value of this parameter is replaced by the Default_charset parameter.

    • Mbstring.http_output This parameter misled me, what is the HTTP output, and the PHP outputs is not the page, how can there be this concept?

    • Mbstring.encoding_translation, the key point of this parameter, the default is closed, if opened, PHP will be the POST variable and the name of the uploaded file is automatically encoded as mbstring.internal_encoding specified value, But I have not tried, we can upload a Chinese name file. It is recommended to shut down and let the programmer handle the related issues.

Look back at some of the functions of the mbstring extension:

    • Mb_http_input (): detects HTTP input character encoding and feels it is necessary to handle file upload filenames.

    • Mb_convert_encoding (): Compare common functions and note the third parameter.

    • Mb_detect_order (): Sets/Gets the detection order of character encodings.

    • Mb_list_encodings (): Returns the list of encodings supported by the system.

Key notes: PHP files supported by the encoding must be compatible with ASCII.

But do not use BIG-5 as PHP file encoding, especially the string in the form of identifiers or literals, if the actual php file encoding if BIG-5, then the input and output of the content as far as possible to convert to UTF-8.

Zend multibyte

Finally say Zend multibyte this concept, understanding is not particularly profound, first do not and mbstring extension mixed together. Zend multibyte mode is off by default and can be opened by zend.multibyte instructions. The PHP parser's encoding is then specified by the Declare () function.

What is the meaning of this directive? As mentioned above, the PHP file encoding needs to be compatible with ASCII, then a similar to BIG-5, such as non-compatible ASCII encoding, can be manipulated by this command, when the PHP parser read mbstring.script_encoding encoding and use the code to parse PHP File.

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.