The format of unity3d asset bundle is not public. However, we still want to know the packaging format for better differential updates. In this way, a special difference comparison and merge tool can be created, which is much better than the binary difference. Because the data in asset bundle can be split into independent units, only the changed units can be compared.
The information available on the Internet is not officially provided. The most popular is an open-source tool called disunity. It is written in Java and only the source code is provided without the format description (the latter is more important than the Code ). By reading the disunity code, I sorted out the following records:
Asset bundle can be divided into compression mode and non-compression mode. The compression mode only uses the open-source lzma library to compress the entire non-compressed package. The header of the compressed data contains 13 bytes. The first 5 bytes are the imported props for the lzma extract API. the next 4 bytes are the extracted database length. Ignore the last 4 bytes.
After the compressed data is unzipped, there is no difference between the non-Compressed Mode. We will only discuss the non-compressed format below:
The assert bundle file header is serialized from such a data structure.
struct AssetBundleFileHead { struct LevelInfo { unsigned int PackSize; unsigned int UncompressedSize; }; string FileID; unsigned int Version; string MainVersion; string BuildVersion; size_t MinimumStreamedBytes; size_t HeaderSize; size_t NumberOfLevelsToDownloadBeforeStreaming; size_t LevelCount; LevelInfo LevelList[]; size_t CompleteFileSize; size_t FileInfoHeaderSize; bool Compressed;};
String is a string ending with \ 0, serialized in sequence; size_t is a 4-byte number on the large end; bool is a single byte; and vector is the arranged structure.
The assert bundle format varies with the unity version. Version indicates the bundle format version. Version = 3 is used from Unity 3.5 to version 4.x. We will only discuss this version. Headersize should be equal to the Data Length of the above file header.
An assert bundle is composed of multiple asset files. These assets are packaged in sequence. Serialize to the following structure:
struct AssetFileHeader { struct AssetFileInfo { string name; size_t offset; size_t length; }; size_t FileCount; AssetFileInfo File[];};
In this way, we can break down multiple assets packaged together (in most cases, there is only one ). Offset indicates the Offset after removing headersize. We can use headersize to add the offset of the part to get the file offset from the entire bundle.
Each asset has its own data header. In addition to the basic data header structure assetheader, the data header has three additional parts. Disunity calls them typetree objectpath and assetref. Note: The format varies with different unity3d versions. We only care about the format of the current version. Here, the format is 9 (for other versions, in terms of size and other issues ).
Struct assetheader {size_t typetreesize; size_t filesize; unsigned int format; size_t dataoffset; size_t unknown;
123456 |
Struct assetheader {size_t typetreesize; size_t filesize; unsigned int format; size_t dataoffset; size_t unknown; |
Unity performs a simple and crude serialization operation on the asset data. The whole serialization process is based on the data structure of each object. Typetree is a description of the data structure. Through this description, each object can be deserialized.
The assetheader is followed by typetree. However, this typetree is optional for asset bundle, because the data structure information can be placed in the engine beforehand (the engine generally only supports the inherent data types ). When published to a mobile device, typetree is not packaged into asset bundle.
Each asset object has a class ID that can be found in typetree for deserialization. The ing between class IDs and specific types can be found in the official document of unity3d. However, if we only want to compare the differences at the object level (rather than comparing the specific attributes of a specific object), we do not need to solve the details of a specific object. So it is not expanded here (if you are interested, you can read the disunity code. The format is not complex ).
The typetreesize in assetheader indicates the size of the typetree part. The following describes the description data of each assetobject.
struct ObjectHeader { struct ObjectInfo { int pathID; int offset; int length; byte classID[8]; }; int ObjectCount; ObjectInfo Object[];};
Here, all the int values are 4-byte integers encoded on the small end (different from the large end encoding used in the external file format ). In unity3d, each object has a unique string path, but the asset bundle does not directly Save the string, but is a hash integer, it can also be seen as the index number of this object. The real object is placed behind the data header, and the offset is offset.
The offset here is relative to the current asset block. If you want to get the correct position relative to the entire file, it should be the file headersize + asset Offset + asset dataoffset + the object offset here.
The assetref table connected to the objectheader records the reference relationship of the asset. Used to indicate the external asset references of the asset in the bundle. The structure of assetreftable is as follows:
struct AssetTable { struct AssetRef { byte GUID[8]; int type; string filePath; string assetPath; }; int Count; byte Unknown; vector Refs;