1. Comparison of the technical principles of the Java Serialization tool
- Binary Formats & language-specific Ones
Javabuiltin(Java native), javamanual (manually written according to member variable type),fstserliazation,kryo
- Binary formats-generic language-unspecific Ones
protobuf (Google),Thrift (Facebook), Avrogeneric,Hessian
- JSON Format
Jackson, Gson, Fastjson
- Json-like:
CKS (Textual json-like format), BSON (json-like format with extended datatypes), Jacksonbson, MongoDB
- xml-based formats
Xmlxstream
Java serialization tools can be broadly divided into the above categories, simple generalization is divided into binary binary and text format (JSON, XML) two broad categories.
In the comparison of speed, there are generally the following laws:
Binary > textual
Language-specific > Language-unspecific
In textual, the XML redundancy is lower than that of JSON, which makes it more bson and more sophisticated in the textual serialization of the JSON, and the framework is more abundant and excellent in its choice. The following highlights the next Kryo, Fast-serialiation, Fastjson, Protocol-buffer
2. Typical Java serialization tool analysis
It is a proven solution that internet companies are using sophisticated serialization solutions such as Protobuf, Thrift, and Avro to build RPC frameworks.
2.1 Java Native Serialization tool
The serialization tools provided by Java itself are basically capable of serializing tasks in most scenarios, and the serialization mechanism of this article is a very detailed explanation (7820018), which is worth reading. The Java-brought serialization tool needs to record not only the full class name of the object but also the definition of the class, including all other referenced classes, in the serialization process, especially when serializing a single object in a very large amount of overhead. Because the Java serialization mechanism records all meta-data, deserialization is an error when the package name of the class is modified. a summary of the performance issues with the Java Self-serialization tool is as follows:
The serialization of a single object is recursively serialized along with all member variables (instsnce variables), a default mechanism that can easily cause unnecessary serialization overhead.
The serialization and deserialization process requires that this mechanism be used to recursively merge the information of all member variables using the reflection mechanism, and if you do not define your own serialversionuid, then the object and other variables must produce one for themselves. The above process costs a lot.
With the default serialization mechanism, all serialized class definition complete information is recorded , including all package names, parent class information, and member variables
2.2 Optimized Java serialization tools
- Kryo
Kryo based on some of the problems of the Java native serialization mechanism, many optimizations have been made, and many serializer have been provided, even encapsulating the type of unsafe serialization , and more about the type of unsafe serialization method, Please note here that, after jdk1.7, the Unsafe Class (Sun.misc.Unsafe) package is closed by default. More Kryo Introduction to the wiki of reference Kryo.
- Fast-serialization
Fst-serialozation is relatively a very new serialization tool, although from 2-1 of the evaluation of the speed of the kryo there are some gaps, but according to my production environment on the scene test, the effect of almost kryo consistent, can be instantly deserialized content and rendering
2.3 JSON
Good JSON parsing tools are still better, and some JSON parsing tools are even faster than some binary serialization methods.
2.4 Protocol-buffer
Protocol buffers is a technique for serializing structured data that supports multiple languages such as C + +, Java, and Python, which can be used to persist data or serialize data to be transmitted over a network. Compared to some other XML technologies, One obvious feature of this technique is that it is more space-saving (stored in binary streams), faster, and more flexible.
In addition, PROTOBUF supports a relatively small number of data types and does not support constant types. Since its design concept is purely presentation layer protocol (Presentation layer), there is currently no RPC framework that specifically supports PROTOBUF.
2.5 Thrift
Thrift is a high-performance, lightweight RPC service framework from Facebook Open source that is designed to meet the demands of today's big data volumes, distributed, cross-language, cross-platform data communications. However,Thrift is not just a serialization protocol, but an RPC framework. compared with JSON and XML, thrift has a great increase in space cost and resolution performance, it is an excellent RPC solution for distributed systems with high performance requirements, but because thrift serialization is embedded in the thrift framework, The thrift framework itself does not reveal the serialization and deserialization interfaces, which makes it difficult to work with other transport layer protocols (such as HTTP).
2.6 Avro
Avro parsing performance is high and the data after serialization is very concise and more suitable for high-performance serialization services.
Avro provides two serialization formats: JSON format or binary format. binary format can be comparable with protobuf in terms of space overhead and resolution performance, The JSON format facilitates debugging of the test phase. Avro supports a very rich range of data types, including the union type within the C + + language. The AVRO supports IDL in JSON format and IDL (experimental phase) similar to thrift and protobuf, which can be turned between each other. The schema can be sent at the same time as the data is transmitted, plus the self-describing properties of the JSON, which makes the Avro ideal for dynamic type languages. Avro is usually stored with the schema when the file is persisted, so the Avro serialization file itself has a self-describing attribute, so it is ideal for persisting data formats for hive, pig, and MapReduce. For different versions of the schema, when making RPC calls, the server and client can confirm the schema with each other during the handshake phase, which greatly improves the final data parsing speed.
3. Here are a few examples of commonly used Java serialization techniques
Kryoregister, FST, Kryo, Gson, Fastjson, JDK
3.1 JDK
public static byte[] serialize(Object obj) { try { ByteArrayOutputStream baos = new ByteArrayOutputStream(); ObjectOutputStream oos = new ObjectOutputStream(baos); oos.writeObject(obj); byte[] bs = baos.toByteArray(); baos.close(); oos.close(); return bs; } catch (IOException e) { throw new RuntimeException(e); }}public static Object deserialize(byte[] bits) { try { ByteArrayInputStream bais = new ByteArrayInputStream(bits); ObjectInputStream ois = new ObjectInputStream(bais); Object obj = ois.readObject(); bais.close(); ois.close(); return obj; } catch (Exception e) { throw new RuntimeException(e); }}
3.2 Fastjson
One of the most basic functions involved in a JSON library is serialization and deserialization. Fastjson supports direct serialization of Java beans. Use the Com.alibaba.fastjson.JSON class for serialization and deserialization.
public static String serialize(Object obj){ String json = JSON.toJSONString(obj); return json;}public static Object deserialize(String json, Class<?> clazz){ Object obj = JSON.parseObject(json, clazz); return obj;}
3.3 FST
FST Fast-serialization is a re-implemented Java fast object serialization development package. Serialization is faster (2-10 times), smaller in size, and compatible with JDK native serialization.
The Java Fast Serialization library FST has been released with version 2.0, and the package name has changed to not be upgraded smoothly. In addition, the official recommendation is to use the latest version 1.58 for stability reasons.
static FSTConfiguration configuration = FSTConfiguration .createDefaultConfiguration();public static byte[] serialize(Object obj){ return configuration.asByteArray((Serializable)obj);}public static Object deserialize(byte[] sec){ return configuration.asObject(sec);}
3.4 Gson
This is used in JSON format and is escaped with Google Gson.
static Gson gson = new Gson();public static String serialize(Object obj){ String json = gson.toJson(obj); return json;}public static Object deserialize(String json, Class<?> clazz){ Object obj = gson.fromJson(json, clazz); return obj;}
3.5 Jackson
Jackson Library (http://jackson.codehaus.org), a Java-based open source JSON format parsing tool, contains 3 jar packages for the entire library (using the latest version 2.2):
The jackson-core.jar--core package (required) provides an API based on "stream mode" parsing.
The jackson-databind--data binding package (optional) provides APIs based on the object binding and tree model.
The jackson-annotations--annotation package (optional) provides annotation functionality.
Compared to other libraries parsed by Java JSON, such as the Json-lib, Gson package, Jackson has the following advantages:
Fully functional, provide a variety of modes of JSON parsing, "object binding" easy to use, the use of annotations package can provide a lot of convenience for our development.
Performance is high, and the parsing efficiency of the "stream mode" exceeds the majority of similar JSON packages.
Core package: Jsonpaser (JSON stream read), Jsongenerator (JSON stream output).
Data binding Package: Objectmapper (build tree mode and object binding mode), Jsonnode (tree node)
public static String serialize(Object obj){ ObjectMapper mapper = new ObjectMapper(); String json = null; try { json = mapper.writeValueAsString(obj); } catch (Exception e) { // TODO Auto-generated catch block e.printStackTrace(); } return json;}public static Object deserialize(String json, Class<?> clazz){ ObjectMapper mapper = new ObjectMapper(); Object obj = null; try { obj = mapper.readValue(json, clazz); } catch (Exception e) { // TODO Auto-generated catch block e.printStackTrace(); } return obj;}
3.6 Kryo and Kryoregister
Kryo runs at about 20 times times the speed of Java Serializable
Kryo file size is about half of Java serializable
There are two modes of Kryo:
One is to register first (regist), then write the object, that is, the writeobject function, in fact, if you do not register first, the object will be registered, and the class assigned an ID.
Note that if it is RPC, it must be registered in the same order on both sides, otherwise an error will occur because the unique ID of the class must be clarified.
The other is to write the class name and object, that is, the Writeclassandobject function.
The Writeclassandobject function is to write (-1 + 2) (a contract number), write the class ID (first write-1, then write the class ID + class name), write the reference relationship (see the implementation of the reference), and then write the real data.
Note that the information is emptied every time the writeclassandobject is called, so there is no need to worry about an error when interacting with the client.
static Kryo kryo = new Kryo();public static byte[] serialize(Object obj) { byte[] buffer = new byte[2048]; Output output = new Output(buffer); kryo.writeClassAndObject(output, obj); byte[] bs = output.toBytes(); output.close(); return bs;}public static Object deserialize(byte[] src) { Input input = new Input(src); Object obj = kryo.readClassAndObject(input); input.close(); return obj;}
Register
static Kryo kryo = null;static{ kryo = new Kryo(); kryo.setReferences(false); kryo.setRegistrationRequired(false); kryo.setInstantiatorStrategy(new StdInstantiatorStrategy());}public static byte[] serialize(Object obj) { kryo.register(obj.getClass()); byte[] buffer = new byte[2048]; Output output = new Output(buffer); kryo.writeObject(output, obj); byte[] bs = output.toBytes(); output.close(); return bs;}public static Object deserialize(byte[] src, Class<?> clazz) { kryo.register(clazz); Input input = new Input(src); Object obj = kryo.readObject(input, clazz); input.close(); return obj;}
Object Serializalbe Advantages: Java native support, do not need to provide a third-party class library, the use is relatively simple. Cons: Cannot cross-language, the number of bytes occupies relatively large, in some cases, the change of object properties is more sensitive.
When an object is serialized and deserialized, it must implement the serializable interface, but does not enforce a unique serialversionuid, declaring that the SERIALVERSIONUID has a significant effect on the upward-downward compatibility of object serialization.
4. Summary
In the case of a system that originally used the Java Native serialization scheme, the Kryo Fst-serializer is a good alternative to the Java Native serialization scheme, which not only embodies the simplicity of reprogramming, but also greatly improves speed and performance, especially Fst-serializer, Just replace the Output/inputstream, the performance of the upgrade is also very considerable, the tool has just come out, stability needs more testing.
If the program itself is serialized in JSON format, you might consider introducing an excellent JSON parsing library, and General Service-side Jackson is a popular analytic library.
Protobuffer more is an XML-substituted language of the message exchange format, as fast as possible, but the programming needs to define the message format, on the member variables, business complex javabean cost is more complex, for the stable existing system, the overall cost is higher.
The following table is a comparison of the indicators of several scenarios
serialization tool |
serialization speed |
serialization file size |
programming model complexity |
Community activity |
jar package size |
Kryo |
very fast |
small |
Simple |
High |
132kb |
fst-serializer |
fast |
small |
very simple Single |
high |
246kb |
protobuffer |
fast |
larger |
more complex |
Stable |
329kb |
Fastjson |
faster |
large |
simple |
General |
338kb |
Jackson |
General |
large |
simple |
Stable /td> |
1.1mb |
Gson |
slower |
large |
simple |
Stable |
189kb |
Reference:
http://blog.51cto.com/zlfwmm/1761401
44495549
Https://www.javacodegeeks.com/2010/07/java-best-practices-high-performance.html
Http://www.javacodegeeks.com/2010/07/java-best-practices-high-performance.html
7820018
Http://www.javacodegeeks.com/2012/07/native-cc-like-performance-for-java.html
https://www.ibm.com/developerworks/cn/linux/l-cn-gpb/
Comparison of Java serialization tools