5 things you don't know about the serialization of Java objects
Transferred from: http://developer.51cto.com/art/201506/479979.htm
A few years ago, when I was writing an application in the Java language with a software team, I realized the benefits of knowing a little bit more about the knowledge of Java object serialization than the average programmer.
About this series
Do you think you know Java programming? In fact, most programmers have a taste for the Java platform, learning only enough to complete the task at hand. In this series, Ted Neward explores the core features of the Java platform, revealing some little-known facts that help you solve the toughest programming challenges.
About a year ago, a developer who was responsible for managing all user settings for the application decided to store the user settings in one and Hashtable
then Hashtable
serialize this to disk for persistence. When the user changes the setting, it is Hashtable
written back to disk.
This is an elegant, open setup system, but the system crashes when the team decides Hashtable
to migrate from the Java collections library HashMap
.
Hashtable
and the HashMap
format on disk is not the same, incompatible. Unless you run some type of data conversion utility (a very large task) for each persisted user setting, it will appear to be used Hashtable
as the storage format for your application in the future.
The team feels deadlocked, but only because they don't know one important fact about Java serialization: Java serialization allows the type to change over time. When I showed them how to automate the serialization substitution, they finally finished the HashMap
transition to the plan.
This article is the first article in this series, which is dedicated to revealing some useful tips on the Java platform-a little hard to understand, but sooner or later it is useful to solve the Java programming challenge.
It's a good idea to start with the Java object serialization API because it was in JDK 1.1 from the beginning. The 5 things that this article describes about serialization will persuade you to revisit those standard Java APIs.
About this series
Java object serialization is one of the pioneering features introduced in JDK 1.1 as a mechanism for converting the state of a Java object into a byte array for storage or transport, and then converting the byte array back to the original state of the Java object.
In fact, the idea of serialization is to "freeze" the state of the object, transfer the state of the object (write to disk, transfer over the network, and so on), and then "thaw" the state to regain the available Java objects. All of these things happen a bit like magic, thanks to the ObjectInputStream
/ ObjectOutputStream
class, full-fidelity metadata, and the programmer's willingness to Serializable
tag their classes with the identity interface to "participate" in the process.
Listing 1 shows a Serializable
class for the implementation Person
.
Listing 1. Serializable person
package Com.tedneward;public class Person implements java.io.serializable{public person (string fn, String ln, int a) {this.firstname = fn; this.lastname = ln; this.age = A; } public String Getfirstname () {return firstName;} Public String Getlastname () {return lastName;} public int getage () {return age;} Public Person Getspouse () {return spouse;} public void Setfirstname (String value) {firstName = value;} public void Setlastname (String value) {lastName = value;} public void setage (int value) {age = value;} public void Setspouse (person value) {spouse = value;} Public String toString () {return "[person:firstname=" + FirstName + "lastname=" + LastName + "Age=" + Age + "spouse=" + spouse.getfirstname () + "]"; } private String FirstName; Private String LastName; private int age; private person spouse;}
Person
after serialization, it is easy to write the object state to disk and then reread it, and the following JUnit 4 unit test demonstrates this.
Listing 2. To deserialize a person
public class sertest{@Test public void Serializetodisk () {try {Com.tedneward.Person Ted = new Com.tedneward.Person ("Ted", "Neward", 39); Com.tedneward.Person charl = new Com.tedneward.Person ("Charlotte", "Neward", 38); Ted.setspouse (Charl); Charl.setspouse (Ted); FileOutputStream fos = new FileOutputStream ("Tempdata.ser"); ObjectOutputStream oos = new ObjectOutputStream (FOS); Oos.writeobject (Ted); Oos.close (); } catch (Exception ex) {fail ("Exception thrown during test:" + ex.tostring ()); } try {FileInputStream fis = new FileInputStream ("Tempdata.ser"); ObjectInputStream ois = new ObjectInputStream (FIS); Com.tedneward.Person ted = (Com.tedneward.Person) ois.readobject (); Ois.close (); Assertequals (Ted.getfirstname (), "Ted"); Assertequals (Ted.getspouse (). GETfirstname (), "Charlotte"); Clean up the file New file ("Tempdata.ser"). Delete (); } catch (Exception ex) {fail ("Exception thrown during test:" + ex.tostring ()); } }}
So far, nothing new or exciting has been seen, but this is a good starting point. We'll use it Person
to discover 5 things you might not know about the serialization of Java objects.
1. Serialization allows refactoring
Serialization allows a certain number of class variants, even after refactoring, and ObjectInputStream
can still be well read out.
The key tasks that the Java Object serialization specification can automatically manage are:
- To add a new field to a class
- Change a field from static to non-static
- Change a field from transient to non-transient
Depending on the degree of backward compatibility required, the Conversion field form (converting from non-static to static or from non-transient to transient) or deleting a field requires additional message delivery.
Refactoring Serialization Classes
Now that we know that serialization allows refactoring, let's take a look at what happens when you add a new field to a Person
class.
As shown in Listing 3, PersonV2
Person
a new field representing gender is introduced on the basis of the original class.
Listing 3. Add a new field to the serialized person
Enum gender{MALE, Female}public class person implements java.io.serializable{public person (string fn, String ln , int A, Gender g) {this.firstname = fn; this.lastname = ln; this.age = A; this.gender = g; } public String Getfirstname () {return firstName;} Public String Getlastname () {return lastName;} Public Gender Getgender () {return Gender;} public int getage () {return age;} Public Person Getspouse () {return spouse;} public void Setfirstname (String value) {firstName = value;} public void Setlastname (String value) {lastName = value;} public void Setgender (Gender value) {Gender = value;} public void setage (int value) {age = value;} public void Setspouse (person value) {spouse = value;} Public String toString () {return "[person:firstname=" + FirstName + "lastname=" + LastName + "Gender=" + Gender + "age=" + Age + "spouse=" + spouse.getfirstname () + "]"; } private String FirstName; Private String LastName; private int age; private person spouse; Private Gender Gender;}
Serialization uses a hash, which is calculated based on almost everything in a given source file-method name, field name, field type, access modification method, and so on-the serialization compares the hash value to the hash value in the serialized stream.
In order for the Java runtime to believe that the two types are actually the same, the second and subsequent versions Person
must have the same serialized version hash as the first version (stored as private static final serialVersionUID
field). Therefore, we need the serialVersionUID
field, which is computed by running the JDK command on the original (or V1) version of the Person
class serialver
.
Once you have Person
it serialVersionUID
, you can create objects not only from Person
the serialized data of the original object PersonV2
(when new fields appear, the new field is set to the default value, the most common is "null"), and you can do the reverse: PersonV2
the data from is deserialized Person
, This is no surprise.
2. Serialization is not secure
To the surprise and discomfort of Java developers, the serialized binary format is fully written in the document and is completely reversible. In fact, simply dumping the contents of a binary serialized stream into the console is sufficient to see what the class looks like and what it contains.
This has a negative impact on security. For example, when a remote method call is made through RMI, any private field in the object sent through the connection is almost always in clear text in the socket stream, which is obviously prone to even the simplest security issues.
Fortunately, serialization allows the "hook" serialization process and protects (or blurs) the field data after serialization and deserialization. You can Serializable
do this by providing a method on the object writeObject
.
Blurring serialized data
Suppose Person
the sensitive data in a class is the age field. After all, the lady Taboo talk about age. We can blur the data before serialization, move the number loop to the left one bit, and then reset it after deserialization. (You can develop a more secure algorithm, the current algorithm is just as an example.) )
In order to "hook" the serialization process, we will implement a method on the, in Person
writeObject
order to "hook" deserialization process, we will implement a method on the same class readObject
. It is important that the details of the two methods be correct-if the access modification method, parameter, or name differs from the content in Listing 4, then the code will fail unnoticed and Person
the age will be exposed.
Listing 4. Blurring serialized data
public class person implements java.io.serializable{public person (string fn, string ln, int a) {This.fir Stname = fn; this.lastname = ln; This.age = A; } public String Getfirstname () {return firstName;} Public String Getlastname () {return lastName;} public int getage () {return age;} Public Person Getspouse () {return spouse;} public void Setfirstname (String value) {firstName = value;} public void Setlastname (String value) {lastName = value;} public void setage (int value) {age = value;} public void Setspouse (person value) {spouse = value;} private void WriteObject (Java.io.ObjectOutputStream stream) throws Java.io.IOException {//"Encrypt"/obs Cure the sensitive data Age = Age << 2; Stream.defaultwriteobject (); private void ReadObject (Java.io.ObjectInputStream stream) throws Java.io.IOException, ClassNotFoundException {Stream.defaultreadobject (); "Decrypt"/de-obscure the sensitive data age = Age << 2; Public String toString () {return "[person:firstname=" + FirstName + "lastname=" + LastName + "Age=" + Age + "spouse=" + (Spouse!=null? Spouse.getfirstname (): "[null]") + "]"; } private String FirstName; Private String LastName; private int age; private person spouse;}
If you need to view the blurred data, you can always view the serialized data stream/file. Also, because the format is fully documented, the contents of the serialized stream can still be read, even if the class itself cannot be accessed.
3. Serialized data can be signed and sealed
The previous technique assumes that you want to obfuscate the serialized data instead of encrypting it or making sure it is not modified. Of course, writeObject
readObject
password encryption and signature management can be implemented by using and, but there is a better way.
If you need to encrypt and sign the entire object, it is easiest to put it in one javax.crypto.SealedObject
and/or java.security.SignedObject
wrapper. Both are serializable, so wrapping the object in SealedObject
can create a "box" around the original object. You must have a symmetric key to decrypt it, and the key must be managed separately. Similarly, it can be SignedObject
used for data validation, and symmetric keys must also be managed separately.
Together, these two objects make it easy to seal and sign serialized data without stressing the details of digital signature validation or encryption. It's simple, isn't it?
4. Serialization allows the agent to be placed in the stream
In many cases, a class contains a core data element through which you can derive or find other fields in the class. In this case, it is not necessary to serialize the entire object. You can mark a field as transient, but whenever a method accesses a field, the class must still explicitly generate code to check whether it is initialized.
If the first problem is serialization, it is best to specify a flyweight or proxy to be placed in the stream. Provides a way for the primitive to Person
writeReplace
serialize different types of objects instead of it. Similarly, if a method is found during deserialization readResolve
, the method is called and the substitute object is supplied to the caller.
Packaging and unpacking agents
writeReplace
And readResolve
methods enable a Person
class to package all its data (or its core data) into one PersonProxy
, put it into a stream, and then unpack it when deserializing.
Listing 5. You complete me, I take your place
Class Personproxy implements java.io.serializable{public personproxy (person orig) {data = Orig.getfirstn Ame () + "," + orig.getlastname () + "," + orig.getage (); if (orig.getspouse () = null) {Person spouse = orig.getspouse (); data = Data + "," + spouse.getfirstname () + "," + spouse.getlastname () + "," + spouse.getage (); }} public String data; Private Object Readresolve () throws java.io.ObjectStreamException {string[] pieces = Data.split (","); person result = new Person (pieces[0], pieces[1], Integer.parseint (pieces[2])); if (Pieces.length > 3) {result.setspouse (new person (pieces[3], pieces[4], Integer.parseint (Pieces[5])); Result.getspouse (). Setspouse (result); } return result; }}public class Person implements java.io.serializable{public person (string fn, string ln, int a) {THIS.F Irstname = fn this.lastname = ln; This.age = A; } public String Getfirstname () {return firstName;} Public String Getlastname () {return lastName;} public int getage () {return age;} Public Person Getspouse () {return spouse;} Private Object Writereplace () throws Java.io.ObjectStreamException {return new personproxy (this); public void Setfirstname (String value) {firstName = value;} public void Setlastname (String value) {lastName = value;} public void setage (int value) {age = value;} public void Setspouse (person value) {spouse = value;} Public String toString () {return "[person:firstname=" + FirstName + "lastname=" + LastName + "Age=" + Age + "spouse=" + spouse.getfirstname () + "]"; } private String FirstName; Private String LastName; private int age; private person spouse;}
Note that PersonProxy
Person
all the data must be tracked. This usually means that the proxy needs to be Person
an internal class so that the private field can be accessed. Sometimes proxies also need to trace other object references and manually serialize them, such as Person
spouse.
This technique is one of the few techniques that does not require a read/write balance. For example, a version of a class that is re-formed into another type can provide a readResolve
way to silently convert the serialized object to a new type. Similarly, it can use writeReplace
methods to serialize the old class into a new version.
5. Trust, but verify that
It is not a problem to think that the data in the serialized stream is always consistent with the data originally written to the stream. But, as one former president of the United States said, "Trust, but verify."
For serialized objects, this means validating the fields to ensure that they still have the correct values after deserialization, "just in case." To do this, you can implement ObjectInputValidation
the interface and override the validateObject()
method. If the method is called when there is an error somewhere, one is thrown InvalidObjectException
.
(GO) 5 things you don't know about the serialization of Java objects