Parsing a String (String) of the PHP data type)

Source: Internet
Author: User
Tags intl
A string is composed of a series of characters, each of which is equivalent to a byte. This means that PHP only supports the 256 character set and therefore does not support Unicode. For details, see the description of the string type. Note: string...

A string is composed of a series of characters, each of which is equivalent to a byte. This means that PHP only supports the 256 character set and therefore does not support Unicode. For details, see the description of the string type.

Note: the maximum value of string is 2 GB.

Syntax

A string can be expressed in four ways:

  • Single quotes

  • Double quotation marks

  • Heredoc syntax structure

  • Nowdoc syntax structure (from PHP 5.3.0)

Single quotes

The simplest way to define a string is to enclose it with single quotes (character ').

To express a single quotation mark, you must add a backslash (\) before it to escape it. To express a backslash itself, use two backslash (\\). The backlash of any other method is considered as the backslash itself: that is to say, if you want to use other escape sequences such as \ r or \ n, it does not represent any special meaning, it is simply the two characters.

Note: Unlike double quotation marks and heredoc syntax structures, escape sequences of variables and special characters in single quotation marks strings are not replaced.

 
Double quotation marks

If the string is enclosed in double quotation marks ("), PHP will parse some special characters:

Sequence

Description

\ N

Line feed (LF or 0x0A (10) in the ASCII character set ))

\ R

Enter (CR or 0x0D (13) in the ASCII character set ))

\ T

Horizontal Tab (HT or 0x09 (9) in the ASCII character set ))

\ V

Vertical Tab (VT or 0x0B (11) in the ASCII character set) (from PHP 5.2.5)

\ E

Escape (ESC or 0x1B (27) in the ASCII character set) (from PHP 5.4.0)

\ F

Form Feed (FF or 0x0C (12) in the ASCII character set) (since PHP 5.2.5)

\\

Backslash

\ $

Dollar tag

\"

Double quotation marks

\ [0-7] {1, 3}

A character that matches the regular expression sequence in octal format

\ X [0-9A-Fa-f] {1, 2}

A character in hexadecimal format that matches the regular expression sequence

Like a single quotation mark string, escaping any other character will cause the backslash to be displayed. Before PHP 5.1.1, the backslash in \ {$ var} is not displayed yet.

The most important feature of a string defined by double quotation marks is that the variable is parsed. for details, see variable parsing.

Heredoc structure

The third way to express the string is to use the heredoc syntax structure: <. Provide an identifier after this operator, and then wrap the line. Next, use the string itself, and finally use the identifier defined above as the end mark.

The identifier referenced at the end must be in the first column of the row, and the name of the identifier must follow the PHP rules like other labels: only letters, numbers, and underscores are allowed, it must start with a letter or underscore.

Warning

Note that the end identifier line cannot contain any other character except a semicolon. This means that the identifier cannot be indented, and there cannot be any blank or tabs before and after the semicolon. More importantly, the end identifier must be preceded by a line break recognized by the local operating system, for example, \ n in UNIX and Mac OS X systems, the ending separator (which may have a semicolon) must be followed by a line break.

If the end identifier is not "clean" because it does not comply with this rule, PHP considers it not an end identifier and continues searching. If a correct end identifier is not found before the end of the file, PHP will generate a parsing error in the last line.

The Heredocs structure cannot be used to initialize class attributes. Since PHP 5.3, this restriction is only valid when heredoc contains variables.

Example #1 invalid Example

 

The Heredoc structure is like a double quotation mark string without double quotation marks. This means that in the heredoc structure, single quotation marks are not escaped, but the escape sequence listed above can also be used. The variable will be replaced, but be especially careful when the heredoc structure contains complex variables.

Example #2 string Example of the Heredoc structure

 Foo = 'foo'; $ this-> bar = array ('bar1', 'bar2', 'bar3'); }}$ Foo = new foo (); $ name = 'myname'; echo <
 
  
Foo. Now, I am printing some {$ foo-> bar [1]}. This shoshould print a capital 'A': \ x41EOT;?>
 

The above routine will output:

My name is "MyName". I am printing some Foo.
Now, I am printing some Bar2.
This shoshould print a capital 'A':

You can also use the Heredoc structure to transmit data in function parameters:

Example #3 Example of the Heredoc structure in parameters

 

After PHP 5.3.0, you can also use the Heredoc structure to initialize static variables and class attributes and constants:

Example #4 use the Heredoc structure to initialize static values

 
Nowdoc structure

Just like the heredoc structure is similar to the double quotation mark string, and the Nowdoc structure is similar to the single quotation mark string. The Nowdoc structure is similar to the heredoc structure, but no parsing is performed in nowdoc. This structure is suitable for Embedding PHP code or other large text without escaping special characters. And SGML The structure is similar to the non-resolved text used to declare a large segment, and the nowdoc structure also has the same features.

A nowdoc structure is also the same as that of heredocs <, but the identifier following it must be enclosed in single quotation marks, that is, <'eot '. All the rules in the Heredoc structure also apply to the nowdoc structure, especially the rules for ending identifiers.

Example #6 Nowdoc structure string Example

 Foo = 'foo'; $ this-> bar = array ('bar1', 'bar2', 'bar3'); }}$ Foo = new foo (); $ name = 'myname'; echo <'eot' My name is "$ name ". I am printing some $ foo-> foo. now, I am printing some {$ foo-> bar [1]}. this shoshould not print a capital 'A': \ x41EOT;?>

The above routine will output:

My name is "$ name". I am printing some $ foo-> foo.
Now, I am printing some {$ foo-> bar [1]}.
This shoshould not print a capital 'A': \ x41

Note:

Unlike the heredoc structure, the nowdoc structure can be used in any static data environment. The most typical example is to initialize the attributes or constants of a class:

Example #7 static data Example

 

Note:

The Nowdoc structure is added to PHP 5.3.0.

Variable parsing

When the string is defined in double quotation marks or heredoc structure, the variables in the string will be parsed.

There are two types of syntax rules: a simple rule and a complex rule. Simple syntax rules are the most common and convenient. they can embed a variable, an array value, or an object attribute in a string with the least code.

A notable mark of complex rule syntax is an expression enclosed by curly brackets.

Simple syntax

When the PHP parser encounters a dollar sign ($), it combines as many identifiers as possible to form a valid variable name. Brackets can be used to define the boundary of variable names.

 

The above routine will output:

He drank some apple juice.
He drank some juice made.

Similarly, an array index or an object attribute can be parsed. Array indexes use square brackets (]) to indicate the end margin of the index. the object attribute is the same as the preceding variable rules.

Example #8 simple syntax Example

  "purple"); echo "He drank some $juices[0] juice.".PHP_EOL;echo "He drank some $juices[1] juice.".PHP_EOL;echo "He drank some juice made of $juice[0]s.".PHP_EOL; // Won't workecho "He drank some $juices[koolaid1] juice.".PHP_EOL; class people {   public $john = "John Smith";   public $jane = "Jane Smith";   public $robert = "Robert Paulsen";   public $smith = "Smith";} $people = new people(); echo "$people->john drank some $juices[0] juice.".PHP_EOL;echo "$people->john then said hello to $people->jane.".PHP_EOL;echo "$people->john's wife greeted $people->robert.".PHP_EOL;echo "$people->robert greeted the two $people->smiths."; // Won't work?>

The above routine will output:

He drank some apple juice.
He drank some orange juice.
He drank some juice made of s.
He drank some purple juice.
John Smith drank some apple juice.
John Smith then said hello to Jane Smith.
John Smith's wife greeted Robert Paulsen.
Robert Paulsen greeted the two.

To express more complex structures, use complex syntax.

Complex (curly brackets) syntax

Complex syntax is not named because of its complex syntax, but because it can use complex expressions.

This syntax can be used for any scalar variable with string expression, array unit or object attribute. Simply write an expression like a string, and enclose it with braces {and. Because {cannot be escaped, it is recognized only when $ is next. You can use {\ $ to express {$. The following example provides a better explanation:

 Width} 00 centimeters broad. "; // valid. only the quoted key name echo can be correctly parsed using the curly braces syntax." This works: {$ arr ['key']} "; // valid echo "This works: {$ arr [4] [3]}"; // This is an incorrect expression, because the format of $ foo [bar] is the same as that of string. // In other words, PHP works normally only when the constant foo can be found; an // E_NOTICE (undefined constant) error is generated here. Echo "This is wrong: {$ arr [foo] [3]}"; // valid. when multiple arrays are used in a string, be sure to enclose it in parentheses echo "This works: {$ arr ['foo'] [3]}"; // valid echo "This works :". $ arr ['foo'] [3]; echo "This works too: {$ obj-> values [3]-> name }"; echo "This is the value of the var named $ name: {$ {$ name}"; echo "This is the value of the var named by the return value of getName () :{$ {getName ()} "; echo" This is the value of the var named by the ret Urn value of \ $ object-> getName (): {$ {$ object-> getName ()} "; // invalid, output: this is the return value of getName (): {getName ()} echo "This is the return value of getName (): {getName ()}";?>

You can also use this syntax in strings to call class attributes through variables.

 $bar}\n";echo "{$foo->$baz[1]}\n";?>

The above routine will output:

I am bar.

I am bar.

Note:

Functions, methods, static class variables, and class constants can only be used in {$} after PHP 5. However, the value can be accessed as a variable name only in the namespace defined by the string. The use of curly braces ({}) alone cannot process the values of return values from functions or methods, class constants, and class static variables.

 
Access and modify characters in a string

The character in string can be accessed and modified by a subscript starting from 0, which contains numbers in square brackets similar to the array structure, for example, $ str [42]. You can use string as an array consisting of characters. The substr () and substr_replace () functions can be used to operate on more than one character.

Note: string can also be accessed with curly brackets, for example, $ str {42 }.

Warning

Writing with a subscript that exceeds the string length will lengthen the string and fill it with spaces. Non-integer subscript is converted to an integer. An E_NOTICE-level error occurs when the subscript type is invalid. When a string is written with a negative subscript, an E_NOTICE-level error is returned. when a string is read with a negative subscript, an empty string is returned. Only the first character of the value string is used for writing. If an empty string is assigned a value, the value is NULL.

Warning

The PHP string is an array of Bytes. Therefore, using curly braces to access or modify strings is not safe for multi-byte character sets. This operation should be performed only for single-byte encoding strings such as ISO-8859-1.

Example #9 string examples

 

The subscript of a string starting with PHP 5.4 must be an integer or a string that can be converted to an integer. Otherwise, a warning is triggered. For example, the subscript of "foo" is converted to 0 silently.

Example #10 differences between PHP 5.3 and PHP 5.4

 

Output of the above routine in PHP 5.3:

String (1) "B"
Bool (true)
String (1) "B"
Bool (true)
String (1) ""
Bool (true)
String (1) "B"
Bool (true)

Output of the above routine in PHP 5.4:

String (1) "B"
Bool (true)

Warning: Illegal string offset '1. 0' in/tmp/t. php on line 7
String (1) "B"
Bool (false)

Warning: Illegal string offset 'X' in/tmp/t. php on line 9
String (1) ""
Bool (false)
String (1) "B"
Bool (false)

Note:

Variables that use [] or {} to access any other type (excluding arrays or object implementations with corresponding interfaces) only return NULL silently.

Note:

PHP 5.5 adds support for accessing characters directly using [] or {} in the string prototype.

Useful functions and operators

The string can be connected using the '.' (vertex) operator. Note that the '+' (plus sign) operator does not have this function. For more information, see string operators.

There are many useful functions for string operations.

Refer to string functions to learn about most functions. for advanced search and replacement functions, refer to regular expression functions or Perl-compatible regular expression functions.

There are also URL string functions, as well as encryption/decryption string functions (mcrypt and mhash ).

Finally, you can refer to character type functions.

Convert to string

A value can be converted into a string by adding (string) or using the strval () function before it. In a string expression, it is automatically converted to string. For example, this conversion occurs when the echo or print function is used or when a variable is compared with a string. Type and type conversion can better explain the following, you can also refer to the settype () function ().

The TRUE value of a boolean value is converted to the "1" of the string ". The FALSE value of Boolean is converted to "" (null string ). This type of conversion can be performed between boolean and string.

An integer or floating point float is converted to a numeric literal string (including the exponent part of float ). Floating point numbers (4.1E + 6) that use exponential notation can also be converted.

Note:

The decimal character is defined in the script area (category LC_NUMERIC. See setlocale ().

Array Array is always converted to the string "array". Therefore, echo and print cannot display the content of this Array. To display a unit, use the echo $ arr ['foo'] structure. To display the entire array, see the following.

In PHP 4, the object is always converted to the string "Object". if you need to print the object value for debugging reasons, read the following. To get the class name of the object, you can use the get_class () function. From PHP 5, you can use the _ toString method as appropriate.

A resource is always converted into a string of the "Resource id #1" structure, where 1 is the unique value that PHP assigns to the resource at runtime. Do not rely on this structure, and there may be changes. To obtain a resource type, use the get_resource_type () function ().

NULL is always converted into an empty string.

As mentioned above, directly converting array, object, or resource to string does not obtain any useful information except its type. You can use the print_r () and var_dump () functions to list these types of content.

Most PHP values can be converted to strings for permanent storage. this is called serialization and can be implemented using the serialize () function. If the PHP engine supports WDDX, the PHP value can also be serialized into formatted XML text.

Convert string to numeric value

When a string is taken as a value, the result and type are as follows:

If the string does not contain '.', 'e', or 'e' and its numeric value is within the integer range (defined by PHP_INT_MAX), the string is treated as an integer. All other cases are taken as float values.

The starting part of the string determines its value. This value is used if the string starts with a valid value. Otherwise, the value is 0 (0 ). A valid value is an optional positive or negative number followed by one or more numbers (which may have a decimal point) and then an optional index. The index is composed of one or more numbers following 'E' or 'e.

 

For more information, see strtodd (3) in the Unix manual ).

The examples in this section can be displayed by copying/pasting to the following code:

 \n";?>

Do not convert a character into an integer as in C to get its code. Use the functions ord () and chr () to convert ASCII codes and characters.

Detailed description of string types

The implementation of string in PHP is an array composed of bytes plus an integer to specify the buffer length. There is no way to convert bytes into character information, which is determined by the programmer. The value of a string is not limited. in particular, bytes with a value of 0 ("NUL bytes") can be anywhere in the string (but there are several functions, this manual is called non-"binary security" and may ignore all data after NUL bytes ).

This attribute of the string type explains why PHP does not have a separate "byte" type-it has been replaced by a string. A function that returns non-text values, for example, any data read from a network socket, returns a string.

Since PHP does not specify the encoding of a string, how is the string encoded? For example, the string "á" is equal to" \ xE1 "(ISO-8859-1)," \ xC3 \ xA1 "(UTF-8, C form)," \ x61 \ xCC \ x81 "(UTF-8, D form) or any other possible expressions? The answer is that the string will be encoded according to the same encoding method of the script file. So if a script is encoded as a ISO-8859-1, the strings in it will also be encoded as a ISO-8859-1, and so on. However, this does not apply when Zend Multibyte is activated. in this case, the script can be encoded in any way (explicitly specified or automatically detected) and then converted to some internal encoding, then the string is encoded in this way. Note that the encoding of the script has some constraints (if Zend Multibyte is activated, it is its internal encoding)-this means that this encoding should be an ASCII compatible superset, such as a UTF-8 or a ISO-8859-1. However, you must note that the same byte value in the dependent state encoding can be used for initial and non-initial state conversion, which may cause problems.

Of course, to be useful, the function that operates the text must assume how the string is encoded. Unfortunately, PHP has many variants on this function:

  • Some functions assume that strings are single-byte encoded, but do not need to be interpreted as specific characters. For example, substr (), strpos (), strlen (), and strcmp (). Another way to understand these functions is that they act on the memory buffer, that is, they are operated by byte and byte subscript.

  • Some functions are passed into the string encoding method, or this information may be assumed by default. For example, most functions in htmlentities () and mbstring extensions.

  • Other functions use the current region (see setlocale (), but operate by byte. For example, strcasecmp (), strtoupper (), and ucfirst (). This means that these functions can only be used for single-byte encoding, and the encoding must match the region. For example, strtoupper ("á") returns "á" when the region is set correctly and is a single-byte encoding ". If it is encoded with a UTF-8, the correct results are not returned, and the results may return corrupt values based on the current region.

  • The final functions would assume that the string is encoded in a specific way, typically UTF-8. Most functions of intl extension and PCRE extension (in the above example only when u modifier is used) are like this. Although this is for its special purpose, utf8_decode () assumes the UTF-8 encoding while utf8_encode () assumes the ISO-8859-1 encoding.

Finally, writing programs that correctly use Unicode depends on carefully avoiding functions that may damage data. Use functions from intl and mbstring extensions. However, using a function that can process Unicode encoding is only the beginning. No matter which language the function is provided, the most basic thing is to understand the Unicode specification. For example, if a program is assumed to have only uppercase and lowercase letters, it is a big mistake.

The above is the details of parsing the String (String) of the PHP Data type. For more information, see other related articles in the first PHP community!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.