Serialization and deserialization in PHP tested in UTF8 and GBK encodings

Source: Internet
Author: User
Tags object serialization

In PHP if we have a unified code is no problem, but a lot of friends will find a problem is that UTF8 and GBK the value returned in the code will be different:

PHP uses serialize and unserialize to serialize and deserialize with UTF8 and GBK encoding, which can cause problems that cannot be deserialized successfully.
The main reason for this problem is that the Strlen function calculates the reasons for the different lengths of Chinese strings under different encodings.

<?php$array=array (' title ' + ' PHP tutorial sharing web ', ' url ' = ' http://www.111cn.net '); Echo serialize ($array);//GBK encoding  a:2:{s:5: "title"; s:13: "PHP tutorial sharing network"; s:3: "url"; s:20: "Http://www.111cn.net";} UTF8 code a:2:{s:5: "title"; s:18: "PHP tutorial sharing network"; s:3: "url"; s:20: "Http://www.111cn.net";}? >

To solve this problem, you should re-correct the length of the string at deserialization time.
Solution Solutions

<?php$str= ' a:2:{s:5: "title"; s:13: "PHP tutorial sharing network"; s:3: "url"; s:20: "Http://www.111cn.net";} '; $regex = '/s\:(\d+) \:\ "([^\"]+) \ "/isx"; $str = Preg_replace_callback ($regex, "Fixser", $str); function Fixser ($matches) {return ' s: '. strlen ($matches [2]). ': '. '. $matches [2]. ' "';}? >

Can be changed to anonymous function

<?php$str= ' a:2:{s:5: "title"; s:13: "PHP tutorial sharing network"; s:3: "url"; s:20: "Http://www.111cn.net";} '; $regex = '/s\:(\d+) \:\ "([^\"]+) \ "/isx"; $str = Preg_replace_callback ($regex, function ($matches) {return ' s: '. strlen ($ MATCHES[2]). ': '. '. $matches [2]. ' "';}, $str);? >

PHP serialized format is a simple text format, but the letter case and whitespace (space, carriage return, newline, etc.) sensitive, and the string is calculated by byte (or 8-bit characters), so it is more appropriate to say that the content of PHP serialization is the byte stream format.

Therefore, when implemented in other languages, if the string in the implemented language is not a byte storage format, but rather a Unicode storage format, the serialized content is not suitable to be saved as a string, but should be saved as a byte stream object or an array of bytes, otherwise the data exchange with PHP will produce an error.

PHP uses different letters to mark different types of data, and the use of serialized PHP with Yahoo! Web Services provides all the letters and their meanings in the Yahoo development site:

A-array
B-boolean
D-double
I-integer
O-common Object
R-reference
S-string
C-custom Object
O-class
N-null
R-pointer Reference
U-unicode string

N represents NULL, while B, D, I, and s represent four scalar types, and the PHP serialization formatter currently implemented by other languages basically implements serialization and deserialization of these types, although there are problems with implementations of s (strings).

A, O is the most commonly used composite type, most of the other language implementations are well implemented for the serialization and deserialization of a, but to O only implemented the PHP4 in the object serialization format, but did not provide support for the extended object serialization format in PHP 5.

R, R, which represent both object references and pointer references, are also useful in serializing complex arrays and objects that produce data with these two markers, which we will explain in detail later, which are not yet found in other languages.

C is introduced in PHP5, which represents a custom object serialization method, although this is not necessary for other languages because it is seldom used, but it is explained in detail later.

U is introduced in PHP6, which represents a Unicode encoded string. Because PHP6 provides the ability to save strings in Unicode, it provides the format of this PHP serialization format string, although this type is not supported by PHP5, PHP4, and these two versions are currently mainstream, so it is not recommended to serialize this type when implemented in other languages, However, it is possible to implement its deserialization process. I'll also explain the format of it later.

The

also has an O, which is the only type of data I have yet to figure out. This indicator was introduced in PHP3 to serialize objects, but was replaced by O after PHP4. In the source code of PHP3, you can see that the serialization and deserialization of O is essentially the same as array A. But in the source code of PHP4, PHP5, and PHP6, the PHP serialization format is missing its shadow, but there are several versions of the Deserializer source that deal with it, but I haven't figured out what to do with it. So there is no more explanation for it for the time being.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.