In possibly every code base you might have seen, usage of Collections would have been a common sight. The introduction of Generic Collections in early stages of evolutin of language has made is all the more favourite of developers comparing to the non-generic cousins that existed before. One of the most common patterns seen in any code base is as follows.
var persons = GetPersonList();
var studentList = new List<Student>();
foreach(var person in persons)
{
studentList.Add(new Student
{
FirstName = person.FirstName,
LastName = person.LastName
});
}
At the first glance, there is nothing wrong with the code. In fact, it works well too. Unlike arrays, one doesn’t need to declare the size of List
beforehand, and it sort of magically does the trick to expand itself. However, there is a hidden problem caused due to the optimized way the Collection
is implemented to magically expand.
Beneath the List declaration, there is still an array, which is made to resize according to the ever growing requirements of the Collection. This is done in the Add method, which is defined as
public void Add(T item)
{
if (_size == _items.Length) EnsureCapacity(_size + 1);
_items[_size++] = item;
_version++;
}
private void EnsureCapacity(int min)
{
if (_items.Length < min) {
int newCapacity = _items.Length == 0? _defaultCapacity : _items.Length * 2;
if ((uint)newCapacity > Array.MaxArrayLength) newCapacity = Array.MaxArrayLength;
if (newCapacity < min) newCapacity = min;
Capacity = newCapacity;
}
}
The Add()
method checks if the array is full, and if so, it calls the EnsureCapacity
method to double the Capacity
.
The resizing of course happens in the Capacity
Property.
public int Capacity
{
get {
Contract.Ensures(Contract.Result<int>() >= 0);
return _items.Length;
}
set {
if (value < _size) {
ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.value, ExceptionResource.ArgumentOutOfRange_SmallCapacity);
}
Contract.EndContractBlock();
if (value != _items.Length) {
if (value > 0) {
T[] newItems = new T[value];
if (_size > 0) {
Array.Copy(_items, 0, newItems, 0, _size);
}
_items = newItems;
}
else {
_items = _emptyArray;
}
}
}
}
The key take away from the Add/EnsureCapcity methods is that every time the size of array is breached, the capacity of the array is doubled. This leads to couple of undesired things.
- Expensive operation to resize the array
As the List grows, there would innumerous expensive resizing operations, which is quite undesired.
- Possibility that a lot of memory spaces left allocated but unused.
Consider if the number of Person
in the personList
is 258. Due to the logic shown above, the Capacity of Array would now 512. That is a whole lot of unused allocated memory space when you are working with memory intensive application.
// Default Behavior
Console.WriteLine($"StudentList(Count={studentList.Count},Capacity={studentList.Capacity})");
// Output
StudentList(Count=258,Capacity=512)
The ideal way to counter this would be to use overloaded constructor and initialize the size of array.
var studentList = new List<Student>(persons.Count());
This would reduce the need for resizing of internal array.
// With Collection Size Iniitialized as above
Console.WriteLine($"StudentList(Count={studentList.Count},Capacity={studentList.Capacity})");
// Output
StudentList(Count=258,Capacity=258)
Of course there could be places where you might not be having the convenience of knowing the collection size beforehand. But still, if you are able to be pick an initial size well, you could eliminate or reduce the need to resize the array.