PERFORMANCE: IEnumerable<>.Sum - Rudi Grobler
Thursday, June 19, 2008 1:22 PM rudi

PERFORMANCE: IEnumerable<>.Sum

Every once in a while you have to dig deep into your application and look for problems! I have a application that wasn't performing! After searching a bit I found my problem:

I do a huge amount of queries and then sum some of the fields in these collections using the Sum extension method (System.Linq.Enumerable). This does make the code easier to read but I never realized the performance hit!!!

To test this, I wrote a very simple application with a list of employees.

class Employee
{
    public int Age { get; set; }
}

Each employee is assigned a random age

Random rnd = new Random();
List<Employee> employees = new List<Employee>();
for (int i = 0; i < 1000000; i++)
    employees.Add(new Employee() { Age = rnd.Next(18, 80) });

Next, I calculate the sum of each employee's age using 4 different methods.

Method 1: IEnumerable<>.Sum

sum = employees.Sum(e => e.Age);

Method 2: for

for (int i = 0; i < employees.Count; i++)
{
    sum += employees[ i ].Age;
}

Method 3: foreach

foreach (var e in employees)
{
    sum += e.Age;
}

Method 4: IEnumerable<>.ForEach

employees.ForEach(delegate(Employee e)
{
    sum += e.Age;
});

The results

Based on my tests, IEnumerable<>.ForEach, foreach and for performed basically the same (for was the slowest, 17% slower than IEnumerable<>.ForEach), but what was vary scary is that the IEnumerable<>.Sum was on average 92% slower!!!

Conclusion

Although the syntax of the new IEnumerable<>.Sum simplify code, the performance hit MUST not be forgotten!!!

kick it on DotNetKicks.com Filed under: , , ,

Comments

# re: PERFORMANCE: IEnumerable<>.Sum

Friday, June 20, 2008 12:51 PM by Mike Griffin

Have you ever stepped though it line for line, the amount of reflective calls is amazing and my guess is the reason why foreach is way faster.

# re: PERFORMANCE: IEnumerable<>.Sum

Friday, June 20, 2008 2:05 PM by Tim C

And the actual numbers?

78 ms, 54ms, 55ms, 37ms. Yeah it's twice as slow, but we're talking 30 MILLISECONDS here over a million items. I wouldn't say that's a huge performance hit.

Also, there's no reflection at all in the sum method. It's a compiled extension method. The overhead is because of the lambda function call. Making a million function calls can be expensive.

# re: PERFORMANCE: IEnumerable<>.Sum

Friday, June 20, 2008 2:17 PM by rudi

Hi Tim C,

I agree, we are talking of milliseconds... If I only needed to do 1 Sum, the simplicity of the .Sum would win! My problem is that I am doing +/- 100 sums per refresh...

The point of the article is just to say that the simplicity is not free and you should know that is is slower!

# re: PERFORMANCE: IEnumerable<>.Sum

Friday, June 20, 2008 2:39 PM by Chad Myers

Sum is worse because it iterates over the collection twice: 1.) to select() the number properties and put them into their own IEnumerable<int> (or <long> or <short>, etc) and then 2.) to sum() that new IEnumerable.

.ForEach() is more sensible because you can do the selecting and summing in one sweep of the list.

# re: PERFORMANCE: IEnumerable<>.Sum

Friday, June 20, 2008 2:41 PM by Egil Hansen

I my nativity I thought that the c# 3.0 compiler was intelligent enough to optimize the lambda function calls out and build an inline for loop.

Interesting, and good to know.

# re: PERFORMANCE: IEnumerable<>.Sum

Friday, June 20, 2008 7:47 PM by Paco

The performance difference is caused by extra stack-frames only. delegates/lamda's are slower than normal method calls because the JIT compiler has needs some extra startup-time.

If you bother about this small performance-differences, you should try profiling your application. A profiler shows the real performence problem. Doing 100 sums per web-request doesn't sound like a situation where the sum operation is the bottleneck.

# re: PERFORMANCE: IEnumerable<>.Sum

Friday, June 20, 2008 9:52 PM by rudi

Tnx Paco,

I agree, the sum wasn't my only bottlenecks and profiling helped me catch the problems...

The purpose of this post is just to highlight that on average .Sum takes twice as long as a foreach!

# re: PERFORMANCE: IEnumerable<>.Sum

Saturday, June 21, 2008 3:42 AM by Joe Feser

Were these tests done in Debug or Release mode?

Release mode should be able to inline the call.

Joe Feser

# re: PERFORMANCE: IEnumerable<>.Sum

Saturday, June 21, 2008 8:53 AM by rudi

Good point, it was done in debug mode!

Will check the results in release mode

# Arjan`s World &raquo; LINKBLOG for June 21, 2008

Saturday, June 21, 2008 3:16 PM by Arjan`s World » LINKBLOG for June 21, 2008

Pingback from  Arjan`s World    &raquo; LINKBLOG for June 21, 2008

# re: PERFORMANCE: IEnumerable<>.Sum

Saturday, June 21, 2008 9:06 PM by Jeevan James

@Chad: The Sum() function doesn't iterate over the collection twice. It simply uses a foreach to iterate through the collection once.

This is how the Sum<int> method looks like in Reflector:

public static int Sum(this IEnumerable<int> source)

{

   if (source == null)

   {

       throw Error.ArgumentNull("source");

   }

   int num = 0;

   foreach (int num2 in source)

   {

       num += num2;

   }

   return num;

}

# Reflective Perspective - Chris Alcock &raquo; The Morning Brew #120

Pingback from  Reflective Perspective - Chris Alcock  &raquo; The Morning Brew #120

# re: PERFORMANCE: IEnumerable<>.Sum

Tuesday, June 24, 2008 2:42 PM by Jon von Gillern

What will change this is PLINQ (Parallel Linq), even if you don't have a multi-core PC, by using the AsParallel enumeration extension method, you'll make sure that when you're application does get run on a multicore machine (which in the next 3-5 years is an inevitably) you're code will run as fast as possible.