Two common traps when using Parallel Loops could be summarized as following.
* The amount of work done in the loop is not significantly larger than the amount of time spend in synchronizing any shared states.
* Amount of work done is less than the cost of delegate or method invocation.
Both of the problems results in significant performance implications. However, both issues can be easily solved using the Partitioner.
Partitioner splits the range into set of tuples that describes a subset range that needs be iterated over the original collection. Let’s write some code with and without Partitioner and benchmark them.
[Benchmark] public void ParallelLoopWithoutPartioner() { var maxValue = 100000; var sum = 0L; Parallel.For(0, maxValue, (value) => { Interlocked.Add(ref sum, value); }); } [Benchmark] public void ParallelLoopWithPartioner() { var maxValue = 100000; var sum = 0L; var partioner = Partitioner.Create(0,maxValue); Parallel.ForEach(partioner, range => { var (minValueInRange, maxValueInRange) = range; var subTotal = 0; for (int value = minValueInRange; value < maxValueInRange; value++) { subTotal += value; } Interlocked.Add(ref sum, subTotal); }); }
Both methods calculates the Sum of first N Numbers using Parallel Loops by accessing a shared variable sum.
This creates a significant ‘wait delay’ when using the first approach. The second approach, which uses the Partitioner, splits the range into subsets and access the shared state less frequently. The results of Benchmark are shown below.