What is the serialization?

Question

Santosh Kumar Singh · Answer

IntroductionSerialization can be defined as the process of storing the state of an object instance to a storage medium. During this process, the public and private fields of the object and the name of the class, including the assembly containing the class, is converted to a stream of bytes, which is then written to a data stream. When the object is subsequently deserialized, an exact clone of the original object is created.When implementing a serialization mechanism in an object-oriented environment, you have to make a number of tradeoffs between ease of use and flexibility. The process can be automated to a large extent, provided you are given sufficient control over the process. For example, situations may arise where simple binary serialization is not sufficient, or there might be a specific reason to decide which fields in a class need to be serialized. The following sections examine the robust serialization mechanism provided with the .NET Framework and highlight a number of important features that allow you to customize the process to meet your needs.Persistent StorageIt is often necessary to store the value of fields of an object to disk and then retrieve this data at a later stage. Although this is easy to achieve without relying on serialization, this approach is often cumbersome and error prone, and becomes progressively more complex when you need to track a hierarchy of objects. Imagine writing a large business application containing many thousands of objects and having to write code to save and restore the fields and properties to and from disk for each object. Serialization provides a convenient mechanism for achieving this objective with minimal effort.The Common Language Runtime (CLR) manages how objects are laid out in memory and the .NET Framework provides an automated serialization mechanism by using reflection. When an object is serialized, the name of the class, the assembly, and all the data members of the class instance are written to storage. Objects often store references to other instances in member variables. When the class is serialized, the serialization engine keeps track of all referenced objects already serialized to ensure that the same object is not serialized more than once. The serialization architecture provided with the .NET Framework correctly handles object graphs and circular references automatically. The only requirement placed on object graphs is that all objects referenced by the object that is being serialized must also be marked asand Serializableand (seeand Basic Serialization). If this is not done, an exception will be thrown when the serializer attempts to serialize the unmarked object.When the serialized class is deserialized, the class is recreated and the values of all the data members are automatically restored.Marshal By ValueObjects are only valid in the application domain where they are created. Any attempt to pass the object as a parameter or return it as a result will fail unless the object derives fromand MarshalByRefObjectand or is marked asand Serializable. If the object is marked asand Serializable, the object will automatically be serialized, transported from the one application domain to the other, and then deserialized to produce an exact copy of the object in the second application domain. This process is typically referred to as marshal by value.When an object derives fromand MarshalByRefObject, an object reference will be passed from one application domain to another, rather than the object itself. You can also mark an object that derives fromand MarshalByRefObjectand asand Serializable. When this object is used with remoting, the formatter responsible for serialization, which has been preconfigured with aand SurrogateSelectorand takes control of the serialization process and replaces all objects derived fromand MarshalByRefObjectand with a proxy. Without theand SurrogateSelectorand in place, the serialization architecture follows the standard serialization rules (seeand Steps in the Serialization Process) below.Basic SerializationThe easiest way to make a class serializable is to mark it with theand Serializableand attribute as follows:[Serializable]
public class MyObject {
and  public int n1 = 0;
and  public int n2 = 0;
and  public String str = null;
}The code snippet below shows how an instance of this class can be serialized to afile:MyObject obj = new MyObject();
obj.n1 = 1;
obj.n2 = 24;
obj.str = "Some String";
IFormatter formatter = new BinaryFormatter();
Stream stream = new FileStream("MyFile.bin", 
and  and  and  and  and  and  and  and  and  and  and  and  and FileMode.Create, 
and  and  and  and  and  and  and  and  and  and  and  and  and FileAccess.Write, FileShare.None);
formatter.Serialize(stream, obj);
stream.Close();This example uses a binary formatter to do the serialization. All you need to do is create an instance of the stream and the formatter you intend to use, and then call theand Serializeand method on the formatter. The stream and the object instance to serialize are provided as parameters to this call. Although this is not explicitly demonstrated in this example, all member variables of a class will be serialized, even variables marked as private. In this aspect, binary serialization differs from the XML Serializer, which only serializes public fields.Restoring the object back to its former state is just as easy. First, create a formatter and a stream for reading, and then instruct the formatter to deserialize the object. The code snippet below shows how this is done.IFormatter formatter = new BinaryFormatter();
Stream stream = new FileStream("MyFile.bin", 
and  and  and  and  and  and  and  and  and  and  and  and  and  FileMode.Open, 
and  and  and  and  and  and  and  and  and  and  and  and  and  FileAccess.Read, 
and  and  and  and  and  and  and  and  and  and  and  and  and  FileShare.Read);
MyObject obj = (MyObject) formatter.Deserialize(fromStream);
stream.Close();
// Here's the proof
Console.WriteLine("n1: {0}", obj.n1);
Console.WriteLine("n2: {0}", obj.n2);
Console.WriteLine("str: {0}", obj.str);Theand BinaryFormatterand used above is very efficient and produces a very compact byte stream. All objects serialized with this formatter can also be deserialized with it, which makes it an ideal tool for serializing objects that will be deserialized on the .NET platform. It is important to note that constructors are not called when an object is deserialized. However, this violates some of the usual contracts the run time makes with the object writer, and developers should ensure they understand the ramifications when marking an object as serializable.If portability is a requirement, use theand SoapFormatterand instead. Simply replace the formatter in the code above withand SoapFormatter,and and calland Serializeand andand Deserializeand as before. This formatter produces the following output for the example used above.andlt;SOAP-ENV:Envelope
and  xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance
and  xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
and  xmlns:SOAP- ENC=http://schemas.xmlsoap.org/soap/encoding/
and  xmlns:SOAP- ENV=http://schemas.xmlsoap.org/soap/envelope/
and  SOAP-ENV:encodingStyle=
and  "http://schemas.microsoft.com/soap/encoding/clr/1.0
and  http://schemas.xmlsoap.org/soap/encoding/"
and  xmlns:a1="http://schemas.microsoft.com/clr/assem/ToFile"andgt;
and  andlt;SOAP-ENV:Bodyandgt;
and  and  andlt;a1:MyObject id="ref-1"andgt;
and  and  and  andlt;n1andgt;1andlt;/n1andgt;
and  and  and  andlt;n2andgt;24andlt;/n2andgt;
and  and  and  andlt;str id="ref-3"andgt;Some Stringandlt;/strandgt;
and  and  andlt;/a1:MyObjectandgt;
and  andlt;/SOAP-ENV:Bodyandgt;
andlt;/SOAP-ENV:Envelopeandgt;It is important to note that theand Serializableand attribute cannot be inherited. If we derive a new class fromand MyObject, the new class must be marked with the attribute as well, or it cannot be serialized. For example, when you attempt to serialize an instance of the class below, you will get aand SerializationExceptionand informing you that theand MyStuffand type is not marked as serializable.public class MyStuff : MyObject 
{
and  public int n3;
}Using the serialization attribute is convenient, but it has limitations as demonstrated above. Refer to the guidelines (seeand Serialization Guidelinesand below) regarding when to mark a class for serialization, since serialization cannot be added to a class after it has been compiled.Selective SerializationA class often contains fields that should not be serialized. For example, assume a class stores a thread ID in a member variable. When the class is deserialized, the thread stored for the ID when the class was serialized might not be running anymore, so serializing this value does not make sense. You can prevent member variables from being serialized by marking them with theand NonSerializedand attribute as follows:[Serializable]
public class MyObject 
{
and  public int n1;
and  [NonSerialized] public int n2;
and  public String str;
}Custom SerializationYou can customize the serialization process by implementing theand ISerializableand interface on an object. This is particularly useful in cases where the value of a member variable is invalid after deserialization, but you need to provide the variable with a value in order to reconstruct the full state of the object. Implementingand ISerializableand involves implementing theand GetObjectDataand method and a special constructor that will be used when the object is deserialized. The sample code below shows how to implementand ISerializableand on theMyObjectand class from a previous section.[Serializable]
public class MyObject : ISerializable 
{
and  public int n1;
and  public int n2;
and  public String str;
and  public MyObject()
and  {
and  }
and  protected MyObject(SerializationInfo info, StreamingContext context)
and  {
and  and  n1 = info.GetInt32("i");
and  and  n2 = info.GetInt32("j");
and  and  str = info.GetString("k");
and  }
and  public virtual void GetObjectData(SerializationInfo info, 
StreamingContext context)
and  {
and  and  info.AddValue("i", n1);
and  and  info.AddValue("j", n2);
and  and  info.AddValue("k", str);
and  }
}Whenand GetObjectDataand is called during serialization, you are responsible for populating theand SerializationInfoand object provided with the method call. Simply add the variables to be serialized as name/value pairs. Any text can be used as the name. You have the freedom to decide which member variables are added to theand SerializationInfo, provided that sufficient data is serialized to restore the object during deserialization. Derived classes should call theand GetObjectDataand method on the base object if the latter implementsand ISerializable.It is important to stress that you need to implement bothand GetObjectDataand as well as the special constructor whenand ISerializableand is added to a class. The compiler will warn you ifand GetObjectDataand is missing, but since it is impossible to enforce the implementation of a constructor, no warnings will be given if the constructor is absent and an exception will be thrown when an attempt is made to deserialize a class without the constructor. The current design was favored above aand SetObjectDataand method to get around potential security and versioning problems. For example, aand SetObjectDataand method must be public if it is defined as part of an interface, thus users have to write code to defend against having theand SetObjectDataand method called multiple times. One can imagine the headaches that can potentially be caused by a malicious application that calls theand SetObjectDataand method on an object that was in the process of executing some operation.During deserialization, theand SerializationInfoand is passed to the class using the constructor provided for this purpose. Any visibility constraints placed on the constructor are ignored when the object is deserialized, so you can mark the class as public, protected, internal, or private. It is a good idea to make the constructor protected unless the class is sealed, in which case the constructor should be marked private. To restore the state of the object, simply retrieve the values of the variables from theand SerializationInfoand using the names used during serialization. If the base class implementsand ISerializable, the base constructor should be called to allow the base object to restore its variables.When you derive a new class from one that implementsand ISerializable,and the derived class must implement both the constructor as well as theand GetObjectDataand method if it has any variables that need to be serialized. The code snippet below shows how this is done using theMyObjectand class shown previously.[Serializable]
public class ObjectTwo : MyObject
{
and  public int num;
and  public ObjectTwo() : base()
and  {
and  }
and  protected ObjectTwo(SerializationInfo si, StreamingContext context) : 
base(si,context)
and  {
and  and  num = si.GetInt32("num");
and  }
and  public override void GetObjectData(SerializationInfo si, 
StreamingContext context)
and  {
and  and  base.GetObjectData(si,context);
and  and  si.AddValue("num", num);
and  }
}Do not forget to call the base class in the deserialization constructor; if this is not done, the constructor on the base class will never be called and the object will not be fully constructed after deserialization.Objects are reconstructed from the inside out, and calling methods during deserialization can have undesirable side effects, since the methods called might refer to object references that have not been deserialized by the time the call is made. If the class being deserialized implements theand IDeserializationCallback, theand OnSerializationand method will automatically be called when the entire object graph has been deserialized. At this point, all the child objects referenced have been fully restored. A hash table is a typical example of a class that is difficult to deserialize without using the event listener described above. It is easy to retrieve the key/value pairs during deserialization, but adding these objects back to the hash table can cause problems since there is no guarantee that classes that derived from the hash table have been deserialized. Calling methods on a hash table at this stage is therefore not advisable.Steps in the Serialization ProcessWhen theand Serializeand method is called on a formatter, object serialization proceeds according to the following rules:A check is made to determine if the formatter has a surrogate selector. If it does, check if the surrogate selector handles objects of the given type. If the selector handles the object type,and ISerializable.GetObjectDataand is called on the surrogate selector.If there is no surrogate selector or if it does not handle the type, a check is made to determine if the object is marked with theSerializableand attribute. If it is not, aand SerializationExceptionand is thrown.If it is marked appropriately, check if the object implementsand ISerializable. If it does,and GetObjectDataand is called on the object.If it does not implementand ISerializable, the default serialization policy is used, serializing all fields not marked asand NonSerialized.VersioningThe .NET Framework provides support for versioning and side-by-side execution, and all classes will work across versions if the interfaces of the classes remain the same. Since serializations deals with member variables and not interfaces, be cautious when adding or removing member variables to classes that will be serialized across versions. This is especially true for classes that do not implementand ISerializable. Any change of state of the current version, such as the addition of member variables, changing the types of variables, or changing their names, will mean that existing objects of the same type cannot be successfully deserialized if they were serialized with a previous version.If the state of an object needs to change between versions, class authors have two choices:Implementand ISerializable. This allows you to take precise control of the serialization and deserialization process, allowing future state to be added and interpreted correctly during deserialization.Mark nonessential member variables with theand NonSerializedand attribute. This option should only be used when you expect minor changes between different versions of a class. For example, when a new variable has been added to a later version of a class, the variable can be marked asand NonSerializedand to ensure the class remains compatible with previous versions.Serialization GuidelinesYou should consider serialization when designing new classes since a class cannot be made serializable after it has been compiled. Some questions to ask are: Do I have to send this class across application domains? Will this class ever be used with remoting? What will my users do with this class? Maybe they derive a new class from mine that needs to be serialized. When in doubt, mark the class as serializable. It is probably better to mark all classes as serializable unless:They will never cross an application domain. If serialization is not required and the class needs to cross an application domain, derive the class fromand MarshalByRefObject.The class stores special pointers that are only applicable to the current instance of the class. If a class contains unmanaged memory or file handles, for example, ensure these fields are marked as NonSerialized or don't serialize the class at all.Some of the data members contain sensitive information. In this case, it will probably be advisable to implementand ISerializableand and serialize only the required fields.

Sumit Kesarwani · Answer

Serialization is a process of converting object into a
stream of bites.

interview

What is the serialization?

Sumit Kesarwani

Can you answer this question?

2 Answers

Liked By