Handling Cyclic References during Serialization

In the previous post, we discussed handling subtypes during protubuf serialization. In this post, we will continue with serialization and discuss the handling of cyclic references. We will look into how cyclic references could be addressed in 3 key types of serializations.

  • Json
  • Xml
  • Protobuf

As an example of cyclic references, we will address the classic example of Folder-File relations. Each folder could have multiple children. The child could be either be a Folder or a file. This is a simple relation which everyone understands and hence makes it easier to use in this example.

public abstract class FileFolderBase
{
    public Guid Id { get; set; }

    public string Name { get; set; }

    public Folder Parent { get; set; }
}

public class Folder:FileFolderBase
{
    public IEnumerable<FileFolderBase> Children { get; set; }
}

public class File : FileFolderBase
{
}

public class FolderSerialized
{
    public Folder Root { get; set; }
}

As you can observe, we have a Folder and File classes, both derived from FileFolderBase and sharing a couple of properties like the Id,Name, and Parent. Additionally, the Folder has a Children property, which would contain the collection of subfolders and files.

Also note that we have defined a wrapper class, which would be used for serialization purposes. We will address why this class was added later in the example.

The Acceptance Criteria

Before we delve into each of the different serialization cases, it is important we set up the acceptance criteria for our case, which of course would be able to success retrieve any folder structure after serialization/deserialization. Let us first define the service contract for our intended services.

public interface ISerializer<Tout>
{
    Tout Serialize<T>(T item);
    T Deserialize<T>(Tout data);
}

With that in place, let us write our Unit Test cases, which would act as our acceptance tests in this example.

[Theory]
[ClassData(typeof(TestData))]
public void SerializeAndReadbackTests<T>(ISerializer<T> serializer, FolderSerialized folder)
{
    var serializedData = serializer.Serialize(folder);
    var deserializedData = serializer.Deserialize<FolderSerialized>(serializedData);

    _(folder.Root, deserializedData.Root);

    void _(FileFolderBase expected, FileFolderBase actual)
    {
        (actual switch
        {
            Folder actualFolder => (Action)(() =>
            {
                if (expected is Folder expectedFolder)
                {
                    Assert.Equal(expectedFolder.Name, actualFolder.Name);
                    Assert.Equal(expectedFolder.Id, actualFolder.Id);
                    Assert.Equal(expectedFolder.Children.Count(), actualFolder.Children.Count());

                    foreach (var childItem in actualFolder.Children)
                    {
                        _(expectedFolder.Children.Single(x => x.Id == childItem.Id), childItem);
                    }
                }
            }),
            File actualFile => (Action)(() =>
            {
                if (expected is File expectedFile)
                {
                    Assert.Equal(expectedFile.Name, actualFile.Name);
                    Assert.Equal(expectedFile.Id, actualFile.Id);
                }
            }),
            _ => () => throw new NotImplementedException()
        })();
    };
}

As you can observe, we are also passing the Service Under Test as a parameter to the Test Case, so that we could use the same test case for all 3 serializers. It is time for us to define the Test Case Data for our Unit Test method.

public class TestData : IEnumerable<object[]>
{
    public IEnumerator<object[]> GetEnumerator()
    {
        yield return new object[]
        {
            new JsonSerializer(),
            GetTwoLevelData()
        };

        yield return new object[]
        {
            new JsonSerializer(),
            GetSingleLevelData()
        };

        yield return new object[]
        {
            new XmlDataSerializer(),
            GetTwoLevelData()
        };

        yield return new object[]
        {
            new XmlDataSerializer(),
            GetSingleLevelData()
        };

        yield return new object[]
        {
            new ProtobufSerializer(),
            GetTwoLevelData()
        };

        yield return new object[]
        {
            new ProtobufSerializer(),
            GetSingleLevelData()
        };
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }

    private object GetSingleLevelData()
    {
        FileFolderBase root = new Folder
        {
            Id = Guid.Parse("00000000-0000-0000-0000-000000000000"),
            Name = "root",
            Parent = null,
        };
        FileFolderBase leafAtRoot = new File
        {
            Id = Guid.Parse("00000000-0000-0000-0000-000000000001"),
            Name = "ChildAtRoot",
            Parent = (Folder)root,
        };

        ((Folder)root).Children = new List<FileFolderBase> { leafAtRoot };
        return new FolderSerialized
        {
            Root = (Folder)root,
        };
    }
    private object GetTwoLevelData()
    {
        FileFolderBase root = new Folder
        {
            Id = Guid.Parse("00000000-0000-0000-0000-000000000000"),
            Name = "root",
            Parent = null,
        };

        FileFolderBase leafAtRoot = new File
        {
            Id = Guid.Parse("00000000-0000-0000-0000-000000000001"),
            Name = "ChildAtRoot",
            Parent = (Folder)root
        };

        FileFolderBase folderAtRoot = new Folder
        {
            Id = Guid.Parse("00000000-0000-0000-0000-000000000010"),
            Name = "folderAtRoot",
            Parent = (Folder)root,
        };

        FileFolderBase leafAtSubFolder = new File
        {
            Id = Guid.Parse("00000000-0000-0000-0000-000000000002"),
            Name = "ChildAtSubFolder",
            Parent = (Folder)folderAtRoot
        };

        ((Folder)folderAtRoot).Children = new List<FileFolderBase> { leafAtSubFolder };

        ((Folder)root).Children = new List<FileFolderBase> {leafAtRoot, folderAtRoot};

        return new FolderSerialized
        {
            Root = (Folder)root,
        };
    }
}

We have defined practically two scenarios, which would be tested for different services. I guess the Test Data code is quite self-explanatory. So will move to the more important parts of the code now.

Cyclic Reference in Json Serialization

Let us first address probably what is the easiest among the three, in terms of effort by the programmer. For Json Serialization, we would be using the famous Json.Net.

Json.Net doesn’t mandate any special attribute for the DTO. In fact, there is very little you need to successfully deserialize the above structure with Json.Net.

public class JsonSerializer:ISerializer<string>
{
    public string Serialize<T>(T item)
    {
        return JsonConvert.SerializeObject(item, Formatting.Indented, new JsonSerializerSettings
        {
            TypeNameHandling = TypeNameHandling.Auto
        });
    }

    public T Deserialize<T>(string serializedData)
    {
        ArgumentNullException.ThrowIfNull(serializedData);

        return JsonConvert.DeserializeObject<T>(serializedData, new JsonSerializerSettings
        {
            TypeNameHandling = TypeNameHandling.Auto
        });
    }
}

The above would be sufficient to have your Unit Test cases succeed for Json Serializer.

Cyclic Reference in Xml Serialization

We will use the DataContractSerializer to serialize/deserialize the object into Xml. The DataContractSerializer is more verbose compared to Json.Net and requires decorating our DTOs with certain attributes.

[DataContract(IsReference = true)]
[KnownType(typeof(File))]
[KnownType(typeof(Folder))]
public abstract class FileFolderBase
{
    [DataMember]
    public Guid Id { get; set; }

    [DataMember]
    public string Name { get; set; }

    [DataMember]
    public Folder Parent { get; set; }
}

[DataContract(IsReference = true)]
public class File : FileFolderBase
{
}

[DataContract(IsReference = true)]
public class Folder:FileFolderBase
{
    [DataMember]
    public IEnumerable<FileFolderBase> Children { get; set; }
}

[DataContract]
public class FolderSerialized
{
    [DataMember]
    public Folder Root { get; set; }
}


There are a couple of key points to note here. The first one to note is the Property IsReference in the DataContractAttribute. This instructs the DataContractSerializer to insert XML constructs that preserve object reference information. The other is the usage of KnownTypeAttribute which aids in the identification of the subtypes.

The rest of the deserialization part is unaffected.

public class XmlDataSerializer:ISerializer<string>
{
    public string Serialize<T>(T item)
    {
        var serializer = new DataContractSerializer(typeof(T));
        using var output = new StringWriter();
        using var writer = new System.Xml.XmlTextWriter(output);
        
        serializer.WriteObject(writer, item);
        return output.GetStringBuilder().ToString();
    }

    public T Deserialize<T>(string serializedData)
    {
        var serializer = new DataContractSerializer(typeof(T));
        using MemoryStream memoryStream = new MemoryStream(Encoding.UTF8.GetBytes(serializedData));
        return (T)serializer.ReadObject(memoryStream);
    }
}

That would be all you need for serializing cyclic references in XML with DataContractSerializer. Again, it wasn’t a headache, just a little more verbose compared to Json.

Cyclic Reference in Protobuf Serialization

However, things started to change when it gets to Protobuf serialization and this is the main reason I thought of writing this blog entry. Protobuf Serialization doesn’t quite support cyclic references natively. The most commonly used library by .Net developers for protobuf serialization, the protobuf-net had made a feature in v2.x which made life easier to support cyclic references. This was similar to the IsReference property of the DataContractAttribute. The ProtoMemberAttribute, in this case, had a property AsReference that would do something similar.

However, since v 3.x, the property has been marked obsolete and is no longer supported. For this reason, we will have to depend on workarounds for serializing cyclic references in protobuf.

The solution I used depended on triggering a callback right after deserialization which would fix the callbacks. This would be, however some noticeable changes in the code. The first change is with FileFolderBase.

[ProtoContract]
[ProtoInclude(100, typeof(Folder))]
[ProtoInclude(200, typeof(File))]
public abstract class FileFolderBase
{
    [ProtoMember(1)]
    public Guid Id { get; set; }

    [ProtoMember(2)]
    public string Name { get; set; }

    public Folder Parent { get; set; }

    [ProtoMember(3)]
    public Guid ParentId 
    { 
        get => Parent?.Id ?? Guid.Empty; 
        set
        {
            if(Parent is null)
            {
                Parent = new Folder();
            }
            Parent.Id = value;
        } 
    }

}

As you can observe, we have introduced an additional property ParentId of type GUID. This would now hold the reference to the Parent. Notice we have skipped  ProtoMemberAttribute for the original Parent property and instead marked the ParentId field for serialization. This is because, during the post-processing callback, we would use the ParentId to find the ParentNode for any given node and assign the Parent property.

Other than that, we have also set the ProtoIncludeAttribute to allow the serializer to recognize the allowed subtypes. The Folder and File classes don’t too many changes other than the obvious inclusion of ProtoMemberAttribute and ProtoContractAttribute.

[ProtoContract]
public class Folder:FileFolderBase
{

    [ProtoMember(4)]
    public IEnumerable<FileFolderBase> Children { get; set; }
}

[ProtoContract]
public class File : FileFolderBase
{
}


However, as mentioned earlier, we need to define the callback method , which would be called after deserialization to fix the relationships. This is done in the FolderSerialized class.

[ProtoContract]
public class FolderSerialized
{
    [ProtoMember(1)]
    public Folder Root { get; set; }

    [ProtoAfterDeserialization]
    public void AfterDeserialization()
    {
        var dictionary = new Dictionary<Guid, Folder>()
        {
            [Guid.Empty] = null
        };

        FixParentRelation(Root, dictionary);
    }

    private void FixParentRelation(FileFolderBase node, Dictionary<Guid, Folder> dictionary)
    {
        (node switch
        {
            Folder folder => (Action)(() =>
            {
                folder.Parent = dictionary[folder.ParentId];
                dictionary[folder.Id] = folder;

                foreach(var child in folder.Children)
                {
                    FixParentRelation(child, dictionary);
                }
            }),

            File file => (Action)(() =>
            {
                file.Parent = dictionary[file.ParentId];
            })
        })();
    }
}


The most noticeable code here is the AfterDeserialization method and the ProtoAfterDeserializationAttribute. The ProtoAfterDeserializationAttribute tells the compiler to call the method marked with the attribute at the end of deserialization. The method itself, as you can observe from the code above, iterate over the nodes to fix the missing Parent field and hence complete the cyclic relation.

public class ProtobufSerializer:ISerializer<byte[]>
{
    public byte[] Serialize<T>(T item)
    {
        using var stream = new MemoryStream();
        Serializer.Serialize(stream, item);
        return stream.ToArray();
    }

    public T Deserialize<T>(byte[] data)
    {
        using var stream = new MemoryStream(data);
        return Serializer.Deserialize<T>(stream);
    }
}

As you can notice, the Protobuf serialization has trouble working with cyclic references and it is important for developers to consider it while creating DTOs for serializing. It might now be good to keep in mind that the serialized structure could be different from the original structure and have post-deserialization callbacks to fix the relations if needed.

As usual, the complete source code in this example is available in my Github

That’s all for now.Happy Coding.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s