PHP serialization (serialize) format of the detailed

Source: Internet
Author: User
Tags format arrays bool hash object serialization variable visibility web services
Detailed

Written by Andot is a very classic introduction to PHP serialization of the article, the original from: http://www.coolcode.cn/?p=170


1. Objective
PHP (starting with PHP 3.05) provides a set of serialized and deserialized functions for saving objects: Serialize, Unserialize. However, the descriptions of these two functions in the PHP manual are limited to how they are used, but the format of the serialized results is not explained. As a result, this is more cumbersome for serialization of PHP in other languages. Although some of the PHP serialization programs that were implemented in other languages were previously collected, these implementations are incomplete and can be faulted when serializing or deserializing some of the more complex objects. So I decided to write a document on the PHP serialization format (that is, this document) to provide a more complete reference for writing PHP serializers implemented in other languages. What I wrote in this article is that I got it by writing a program to test and read the PHP source code, so, I'm not 100% sure everything is right, but I'll try to make sure that what I write is correct and that I'm not quite sure where I'm going with it, and I'll make it clear in the text, We also hope that we can complement and perfect.


2. Overview
The content of PHP serialization is a simple text format, but it is sensitive to letter capitalization and whitespace (spaces, carriage returns, line feeds, and so on), and the string is computed in bytes (or 8-bit characters), so it is more appropriate to say that the content of PHP serialization is the byte stream format. Thus, when implemented in other languages, if the string in the language being implemented is not a byte-store format, but a Unicode storage format, the serialized content is not suitable to be saved as a string, but should be saved as a byte stream object or an array of bytes, otherwise an error occurs when exchanging data with PHP.

PHP is marked with different letters for different types of data, with the use serialized PHP with Yahoo! Web Services provided in the Yahoo development website for all the letters and their meanings:

A-array
B-boolean
D-double
I-integer
O-common Object
R-reference
S-string
C-custom Object
O-class
N-null
R-pointer Reference
U-unicode string
N indicates NULL, whereas B, D, I, and s represent four scalar types, the PHP serializer implemented in other languages basically implements serialization and deserialization of these types, although there are some problems with the implementation of S (string) implementations.

A, O is the most common type of composite, most other languages implement a serialization and deserialization of a, but the O only implements the PHP4 in the object serialization format, and does not provide support for the serialized format of the object extended in PHP 5.

R and R represent both object references and pointer references, which are also useful for serializing more complex arrays and objects with data with these two marks, which we will explain in detail, and which are not yet found to be implemented in other languages.

C, introduced in PHP5, represents a custom object serialization method, although it is not necessary for other languages, because it is rarely used, but it is explained in detail later.

U is introduced in PHP6, which represents a Unicode-encoded string. Because PHP6 provides the ability to save strings in Unicode, it provides a format for this serialized string, which is not supported by PHP5, PHP4, which is currently mainstream, so it is not recommended for serialization in other languages when implementing the type, It can, however, implement its deserialization process. I will also explain its format in the back.

Finally there is an O, which is the only one I have not yet figured out a data type indicator. This indicator is introduced in PHP3 to serialize objects, but is replaced by O after PHP4. In PHP3 's source code, you can see that the serialization and deserialization of O is essentially the same as array A. But it was not found in the serialization section of PHP4, PHP5, and PHP6 's source code, but it was handled in these versions of the deserialization program, but I haven't figured out what to do with it. So there is no more explanation for it for the time being.


3. Serialization of NULL and scalar types
The serialization of NULL and scalar types is the simplest and constitutes the basis for conforming to type serialization. This part of the content is believed by many PHP developers are already familiar with. If you feel you have mastered this part of the content, you can skip this chapter directly.

3.1. Serialization of NULL
In PHP, NULL is serialized as:

N
3.2. Serialization of Boolean data
The Boolean data is serialized as:

b:<digit>;
Where <digit> is 0 or 1,,<digit> is 0 when Boolean data is false, otherwise 1.

3.3. Serialization of Integer data
The integer data (integer) is serialized as:

i:<number>;
Where <number> is an integer, the range is: 2147483648 to 2147483647. A number can have a positive sign before it, and if the serialized number exceeds that range, it is serialized as a floating-point type instead of an integer. If the serialized number exceeds this range (this problem does not occur when PHP itself is serialized), the expected value is not returned when deserialized.

3.4. Serialization of double type data
The double data (floating-point number) is serialized as:

d:<number>;
Where <number> is a floating-point number with the same range as the floating-point number in PHP. It can be expressed as integral form, floating point number form and science and technology law form. If the serialized number ranges beyond the maximum PHP can represent, returns an infinity (INF) when deserialized, and returns 0 if the serialized number ranges beyond the minimum precision that PHP can represent.

3.5. Serialization of String data
String data (String) is serialized as:

S:<length>: "<value>";
Where <length> is the length of <value>,<length> is a nonnegative integer, which can be preceded by a plus sign (+). <value> is a string value, where each character is a single-byte character, which corresponds to the 0-255 character of the ASCII code. Each character represents the meaning of the original character, and no escape character,<value> on either side of the quotation mark ("") is required, but is not counted in <length>. The <value> here is equivalent to a byte stream, and <length> is the number of bytes in the stream.

4. Serialization of simple composite types
The composite types in PHP have arrays (array) and objects (object), and this chapter mainly describes the serialization format of the two types of data in simple cases. The serialization format of the objects for the composite type and custom serialization of nested definitions will be discussed in detail in later chapters.

4.1. Serialization of arrays
Arrays are usually serialized as:

A:<n>:{<key 1><value 1><key 2><value 2>...<key n><value n>
Where <n> represents the number of array elements, <key 1>, <key 2>......<key n> represent array subscript, <value 1>, <value 2>......<value N > represents the value of an array element corresponding to the subscript.

The subscript type can only be an integer or a string, and the serialized format is the same as the integer and string data serialization format.

An array element value can be of any type, and its serialized format is the same as the serialization of its corresponding type.

4.2. Serialization of objects
Objects (object) are usually serialized as:

O:<length>: "<class name>": <n>:{<field name 1><field value 1><field name 2>< Field value 2>...<field name N><field value N>}
Where <length> represents the string length of the object's class name <class name>. <n> represents 1 Number of fields in an object. These fields include fields that are declared with Var, public, protected, and private in the class where the object is located and its ancestor class, but do not include static fields that are declared by static and Const. That is to say, only instance (instance) fields.

The <filed name 1>, <filed name 2>......<filed name n> represents the field name for each field, and <filed value 1>, <filed value 2> <filed value n> represents the field value that corresponds to the field name.

The field name is a string type, and the serialized format is the same as after the string data is serialized.

The field value can be any type, and its serialized format is the same as that of the type it corresponds to.

However, the serialization of field names is related to the visibility they declare, and the following focuses on serialization of field names.

4.3. Serialization of Object field names
The fields declared by Var and public are common fields, so their field names are serialized in the same format. The field names of the public fields are serialized according to the field names at the time of the Declaration, but the serialized field names do not include the variable prefix symbol $ when declared.

The field declared by protected is a protected field that is visible in the declared class and subclasses of the class, but is not visible in the object instance of the class. Therefore, when the field name of the protected field is serialized, the field name is preceded by

\0*\0
The prefix. The number here represents a character with an ASCII code of 0, not a combination.

Private fields are declared in the field, and are visible only in the declared class, and are not visible in the subclass of the class and in the object instance of the class. Therefore, when the field name of a private field is serialized, the field name is preceded by

\0<declared class Name>\0
The prefix. Here <declared class Name> represents the class name of the class that declares the private field, not the class name of the object being serialized. Because the class that declares the private field is not necessarily the class of the object being serialized, it may be its ancestor class.

When a field name is serialized as a string, the string value includes the prefix that is added according to its visibility. The string length also includes the length of the prefix added. Which is also the length of the calculated characters.


--------------------------------------------------------------------------------

1 Note: In the PHP manual, the fields are called attributes, whereas in fact, the __set, __get-defined object members introduced in PHP 5 are more appropriate to be called attributes. Because object members that are defined with __set and __get are consistent with the behavior of attributes in other languages, and the properties in the PHP manual are actually referred to as fields in other languages (for example, C #), this is also a field, not a property, to avoid confusion.

5. Serialization of nested composite types
The previous chapter discusses the serialization of simple composite types, and you'll find that simple arrays and objects are actually easy. But how does PHP serialize such objects and arrays if you encounter an object or array that contains yourself or a containing b,b and a? In this chapter we will discuss the serialization form in this case.

5.1. Object references and pointer references
In PHP, scalar type data is passed by value, while composite type data (objects and arrays) are passed by reference. However, there is a difference between the reference pass of a composite type of data and the reference pass specified by the & symbol, whose reference pass is an object reference, and the latter is a pointer reference.

Before interpreting object references and pointer references, let's look at a few examples.

<?php
echo "<pre>";
Class SampleClass {
var $value;
}
$a = new SampleClass ();
$a->value = $a;

$b = new SampleClass ();
$b->value = & $b;

Echo serialize ($a);
echo "\ n";
Echo Serialize ($b);
echo "\ n";
echo "</pre>";
?>
The output of this example is as follows:

O:11: "SampleClass": 1:{s:5: "value"; r:1;}
O:11: "SampleClass": 1:{s:5: "Value"; r:1;}
As you can see, the value of the values field $a the variable is serialized to R:1, and the value of the $b is serialized to R:1.

But what's the difference between an object reference and a pointer reference?

You can look at the following example:

echo "<pre>";
Class SampleClass {
var $value;
}
$a = new SampleClass ();
$a->value = $a;

$b = new SampleClass ();
$b->value = & $b;

$a->value = 1;
$b->value = 1;

Var_dump ($a);
Var_dump ($b);
echo "</pre>";
You may find that the results of the operation may be unexpected:

Object (SampleClass) #1 (1) {
[' Value ']=>
Int (1)
}
Int (1)
Changing the value of a $a->value only changes the value of the $a->value, and changing the value of the $b->value changes the $b itself, which is the difference between an object reference and a pointer reference.

Unfortunately, however, the serialization of PHP arrays makes an error, although the array itself is passed as an object reference, but in serialization, PHP seems to forget this, look at the following example:

echo "<pre>";
$a = array ();
$a [1] = 1;
$a ["value"] = $a;

echo $a ["value"] ["value"][1];
echo "\ n";
$a = unserialize (serialize ($a));
echo $a ["value"] ["value"][1];
echo "</pre>";
The result:

1
You will find that the array structure changes after serializing the original array and deserializing it. The value 1, originally $a ["Value"] ["value"][1], was lost after deserialization.

What is the reason? Let's output the results after serialization to see:

$a = array ();
$a [1] = 1;
$a ["value"] = $a;

Echo serialize ($a);
The result:

A:2:{i:1;i:1;s:5: "Value"; A:2:{i:1;i:1;s:5: "Value"; N;}}
Originally, after serialization, $a ["value"] ["value"] becomes NULL instead of an object reference.

That is, the object reference indicator (R) is generated by PHP only when the object is serialized. Object references are not generated when serializing all scalar types and arrays (also including NULL). However, if a reference to the & symbol is explicitly used, it is serialized as a pointer reference mark (R) when serialized.

5.2. Number after the reference is marked
As you may have seen in the previous example, the format of the object reference (r) and pointer Reference (R) is:

r:<number>;
r:<number>;
Everyone must be surprised what's that <number> back there? In this section we will discuss the issue in detail.

This <number>, in short, is where the referenced object first appears in the serialized string, but this position does not refer to the position of the character, but to the position of the object (where the object refers to the amount of all types, not just the object type).

I think you may not be very clear, then I would like to illustrate:

Class ClassA {
var $int;
var $str;
var $bool;
var $obj;
var $pr;
}

$a = new ClassA ();
$a->int = 1;
$a->str = "Hello";
$a->bool = false;
$a->obj = $a;
$a->PR = & $a->str;

Echo serialize ($a);
The result of this example is:

O:6: "ClassA": 5:{s:3: "int"; I:1;s:3: "Str"; s:5: "Hello"; s:4: "bool"; B:0;s:3: "obj"; R:1;s:2: "PR"; R:3;}
In this example, the object that is first serialized is an object of the ClassA, then give it a number 1, and the next thing to serialize is several members of the object, the first serialized member is the Int field, then the number is 2, then the serialized member is STR, and then the number is 3, and so on. , when the obj member is found, it finds that the member has been serialized and numbered 1, so it is serialized as r:1; , the PR member is serialized next, it finds that the member is actually a reference to the STR member, and the STR member is numbered 3, so the PR is serialized as R:3; Out.

How does PHP number a serialized object? In fact, PHP first creates an empty table when serializing, each serialized object then needs to calculate the hash value of the object before being serialized, and then determine if the hash value has already appeared in the table, and if not, add the hash value to the end of the table and return the added success. If it does, the addition fails, but before the return fails, the object is judged to be a reference (a reference defined by the & symbol), and if not, the Hash value is added to the table (although the addition fails). If the return fails, the last occurrence of the same position is returned.

After adding a Hash value to the table, if the addition fails, the decision is whether to add a reference or an object, and if it is a reference, return the R indicator and, if it is an object, the R indicator. Because of the failure, the last occurrence is returned at the same time, so R and R indicate the following number, which is the position.

5.3. Deserialization of object references
PHP is interesting when deserializing object references, if the deserialized string is not generated by the PHP serialize () itself, but is constructed or generated in other languages, it can correctly deserialize the data pointed to by the object reference, even if the object reference does not point to an object. For example:

echo "<pre>";
Class Strclass {
var $a;
var $b;
}

$a = unserialize (' O:8: strclass ": 2:{s:1:" a "; S:5:" Hello "; s:1:" B "; r:2;} ');

Var_dump ($a);
echo "</pre>";
Run Result:

Object (strclass) #1 (2) {
["A"]=>
String (5) "Hello"
["B"]=>
String (5) "Hello"
}
You will find that after deserialization of the above example, the value of the $a->b is the same as the value of the $a->a, although $a->a is not an object, but a string. So if you use other languages to serialize, you don't have to treat string as a scalar type, even if you serialize a composite type with the same string content by object reference, you can deserialize it correctly with PHP. This saves the space occupied by the serialized content.


6. Custom Object serialization
TBD

7. Serialization of Unicode strings
TBD

8. Reference documents
Source code for serialization and deserialization in PHP 3

The source code for serialization in PHP 4

The source code for deserialization in PHP 4

The source code for serialization in PHP 5

The source code for deserialization in PHP 5

The source code for serialization in PHP 6

The source code for deserialization in PHP 6

Introduction to serialization and deserialization in the PHP manual

Using serialized PHP with Yahoo! Web Services

Some other language implementations of PHP serialize

JavaScript version (Stable): http://www.devpro.it/code/102.html
Perl version (Stable): http://hurring.com/code/perl/serialize/
Another Perl version: http://www.cpan.org/modules/by-module/PHP/JBROWN/php-serialization/
Python version (Beta): http://hurring.com/code/python/serialize/
Java version (Pre-Alpha): http://hurring.com/code/java/serialize/
Ruby version: HTTP://WWW.AAGH.NET/FILES/RUBY/PHP_SERIALIZE.RB
Flash/actionscript version: http://sourceforge.net/projects/serializerclass/
C # version: http://sourceforge.net/projects/csphpserial/




Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.