Serialization

Serialization enables the state of objects in a Java program to be captured and written out to a byte stream [Sun 04b]. This allows for the object state to be preserved so that it can be reinstated in the future (by deserialization). Serialization also allows for Java method calls to be transmitted over a network for Remote Method Invocation (RMI) wherein objects are marshalled (serialized), exchanged between distributed virtual machines, and unmarshalled (deserialized). Serialization is also extensively used in Java Beans.

An object can be serialized as follows:

ObjectOutputStream oos = new ObjectOutputStream(
    new FileOutputStream("SerialOutput"));
oos.writeObject(someObject);
oos.flush();

The object can then be deserialized as follows:

ObjectInputStream ois = new ObjectInputStream(
    new FileInputStream("SerialOutput"));
someObject = (SomeClass) ois.readObject();

Serialization captures all the fields of an object including the non-public fields that are normally inaccessible, provided that the object's class implements the Serializable interface. If the byte stream to which the serialized values are written is readable, the values of the normally inaccessible fields may be deduced. Moreover, it may be possible to modify or forge the preserved values so that when the class is deserialized, the values become corrupted.

Introducing a security manager fails to prevent the normally inaccessible fields from being serialized and deserialized (although permission must be granted to write to and read from the file or network if the byte stream is being stored or transmitted). Network traffic (including RMI) can be protected, however, by using SSL.

Classes that require special handling during object serialization or deserialization can implement the following methods with exactly the following signatures [API 2006]:

private void writeObject(java.io.ObjectOutputStream out) throws IOException
private void readObject(java.io.ObjectInputStream in) throws IOException, ClassNotFoundException;

When a Serializable class fails to implement writeObject(), it is serialized using a 'default' method, which serializes all its public, protected, and private fields, except for those marked transient. Likewise, if a Serializable class fails to implement readObject(), it is deserialized by deserializing all its public, protected, and private fields, with the exception of the transient fields.

When multiple objects are serialized on an ObjectOutputStream, the ObjectOutputStream ensures that each object is written to the stream only once. It accomplishes this by retaining a reference (or handle) to each object written to the stream. When a previously written object is written to the stream again, it is replaced with a reference to the originally written data in the stream. This substitution takes place without regard to whether the object's contents have changed in the interim. This table of serialized object references prevents garbage collection of the previously written objects because the garbage collector cannot collect live references. This behavior is both desirable and correct for data that potentially contains arbitrary object graphs, especially when the graphs are fully allocated and constructed prior to serialization. Likewise, the deserialization process can then use these references to efficiently deserialize a complete object graph.