superpack--new serialization format for smaller workloads

Source: Internet
Author: User

Shape Security Open Source has a new modeless binary serialization format named SuperPack.

SuperPack uses the binary serialization mode, which means that it reduces the load size (Tengyun technology ty300.com). According to shape security, for a given sample of 4.48 KB, the payload of the superpack is minimal relative to many other modeless formats:

Yaml and Bson are very verbose and increase the load on messages. JSON is much better than YAML, but because it is based on a text encoding format, it will still be much larger than superpack. After gzip compression, their values are significantly different, and the values of YAML, JSON, and SuperPack are very close and are the 12-14% of the original message.

One of the main advantages of using the SuperPack encoding format is that there is no need to pre-exchange message patterns (Basic tutorial qkxue.net) when communicating with clients. The data type information is included in the payload. SuperPack has 36 pre-defined data types, including common true, false, UInt16, UInt32, float32, and less common types, such as UINT6, Nint4, and Array5, These types can represent values that are most likely to appear in a message.

The SuperPack also contains types for arrays, strings, and maps. There is also a type of extension, which allows the user to add new types. SuperPack also has two optional optimization options to reduce the load in a particular scenario: repeating string optimization (repeated string optimization) and repetitive keyset optimizations (repeated keyset optimization).

We interviewed Michael Ficarra, a research engineer and freelance/open source software Coordinator (FOSS Coordinator), for more details about SuperPack.

InfoQ: After coding, your load will be smaller, compared to other modeless formats, how is your approach different?

Mf:superpack has the philosophy behind it that even if we cannot anticipate the patterns of data, the structure or values in the data are likely to recur many times. For example, suppose there is a data structure called "Cats" that associates everyone with his cat. We are not coding directly, considering that every cat has a name, a birthday and a favorite food, we encode it only once, and then reference it later, using a very efficient protobuff-style packaging for these values.

In addition, some values are more general than others and should have a more efficient representation. If you look at the detailed description of the format, you will find that all values are preceded by a single-byte indicator, which is used to indicate the type of the value, which we call "type tag". In the Type label field, we reserve a portion of the range so that all or part of the value can be encoded in the label itself. For a simple example, there are two Boolean tags: one for the value true and another for the value false. Similarly, there are 64 "uint6" type labels that allow us to use a single byte to represent each number between 0 and 63, and for an array with an entry length of less than 32 (which must be encoded at the same time as its length and entry), it can be encoded in the label. Back in the previous example, the cat's beard is usually no more than 64 and most of the cats will not have more than 32, so these values can be stored very efficiently.

InfoQ: Do you compare superpack with schema-driven binary formats, such as protocol buffers? Will the load of the protobuf be significantly smaller?

MF: We have not done this type of comparison. I think that in most scenarios, the load on the protobuf will be smaller, unless the superpack string de-emphasis function is particularly effective. You should use this scenario when your requirements allow you to use a schema-driven format, especially if you are using a lossless data compression algorithm such as LZW or deflate.

InfoQ: What is the time spent on encoding/decoding messages?

MF: Encoding time varies depending on whether the encoder enables optional keyset and string de-re-optimization. At the language level, there are some legacy performance challenges, such as JavaScript using the IEEE 754 double form for all the numbers.

InfoQ: Do you have any plans to support other languages?

MF: Of course! We already have a Java implementation, and for now, this implementation is used inside shape security. It's not ready for open source, but if we hear that demand, it will accelerate the process. I would be happy to help if the community wanted to start a new implementation for another ecosystem. I think the rust implementation will be very exciting!

In addition, it is worth mentioning that SuperPack is still very young, and if the reader has any suggestions for its ascension, we are very happy to listen, just open a issue on the canonical issue tracker. We hope the future version of SuperPack will be even better!

Currently, SuperPack comes with a JavaScript transcoding device, but other transcoding can be developed based on it. SuperPack is open source and employs a very permissive license agreement.

English Original: SuperPack, a New serialization Format with a Smaller Payload author Abel Avram

superpack--new serialization format for smaller workloads

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.