PERFORMANCE: IEnumerable<>.Sum

Every once in a while you have to dig deep into your application and look for problems! I have a application that wasn't performing! After searching a bit I found my problem:

I do a huge amount of queries and then sum some of the fields in these collections using the Sum extension method (System.Linq.Enumerable). This does make the code easier to read but I never realized the performance hit!!!

To test this, I wrote a very simple application with a list of employees.

class Employee
{
    public int Age { get; set; }
}

Each employee is assigned a random age

Random rnd = new Random();
List<Employee> employees = new List<Employee>();
for (int i = 0; i < 1000000; i++)
    employees.Add(new Employee() { Age = rnd.Next(18, 80) });

Next, I calculate the sum of each employee's age using 4 different methods.

Method 1: IEnumerable<>.Sum

sum = employees.Sum(e => e.Age);

Method 2: for

for (int i = 0; i < employees.Count; i++)
{
    sum += employees[ i ].Age;
}

Method 3: foreach

foreach (var e in employees)
{
    sum += e.Age;
}

Method 4: IEnumerable<>.ForEach

employees.ForEach(delegate(Employee e)
{
    sum += e.Age;
});

The results

Based on my tests, IEnumerable<>.ForEach, foreach and for performed basically the same (for was the slowest, 17% slower than IEnumerable<>.ForEach), but what was vary scary is that the IEnumerable<>.Sum was on average 92% slower!!!

Conclusion

Although the syntax of the new IEnumerable<>.Sum simplify code, the performance hit MUST not be forgotten!!!

kick it on DotNetKicks.com
Published Thursday, June 19, 2008 1:22 PM by rudi

Comments

# re: PERFORMANCE: IEnumerable<>.Sum

Have you ever stepped though it line for line, the amount of reflective calls is amazing and my guess is the reason why foreach is way faster.

Friday, June 20, 2008 12:51 PM by Mike Griffin

# re: PERFORMANCE: IEnumerable<>.Sum

And the actual numbers?

78 ms, 54ms, 55ms, 37ms. Yeah it's twice as slow, but we're talking 30 MILLISECONDS here over a million items. I wouldn't say that's a huge performance hit.

Also, there's no reflection at all in the sum method. It's a compiled extension method. The overhead is because of the lambda function call. Making a million function calls can be expensive.

Friday, June 20, 2008 2:05 PM by Tim C

# re: PERFORMANCE: IEnumerable<>.Sum

Hi Tim C,

I agree, we are talking of milliseconds... If I only needed to do 1 Sum, the simplicity of the .Sum would win! My problem is that I am doing +/- 100 sums per refresh...

The point of the article is just to say that the simplicity is not free and you should know that is is slower!

Friday, June 20, 2008 2:17 PM by rudi

# re: PERFORMANCE: IEnumerable<>.Sum

Sum is worse because it iterates over the collection twice: 1.) to select() the number properties and put them into their own IEnumerable<int> (or <long> or <short>, etc) and then 2.) to sum() that new IEnumerable.

.ForEach() is more sensible because you can do the selecting and summing in one sweep of the list.

Friday, June 20, 2008 2:39 PM by Chad Myers

# re: PERFORMANCE: IEnumerable<>.Sum

I my nativity I thought that the c# 3.0 compiler was intelligent enough to optimize the lambda function calls out and build an inline for loop.

Interesting, and good to know.

Friday, June 20, 2008 2:41 PM by Egil Hansen

# re: PERFORMANCE: IEnumerable<>.Sum

The performance difference is caused by extra stack-frames only. delegates/lamda's are slower than normal method calls because the JIT compiler has needs some extra startup-time.

If you bother about this small performance-differences, you should try profiling your application. A profiler shows the real performence problem. Doing 100 sums per web-request doesn't sound like a situation where the sum operation is the bottleneck.

Friday, June 20, 2008 7:47 PM by Paco

# re: PERFORMANCE: IEnumerable<>.Sum

Tnx Paco,

I agree, the sum wasn't my only bottlenecks and profiling helped me catch the problems...

The purpose of this post is just to highlight that on average .Sum takes twice as long as a foreach!

Friday, June 20, 2008 9:52 PM by rudi

# re: PERFORMANCE: IEnumerable<>.Sum

Were these tests done in Debug or Release mode?

Release mode should be able to inline the call.

Joe Feser

Saturday, June 21, 2008 3:42 AM by Joe Feser

# re: PERFORMANCE: IEnumerable<>.Sum

Good point, it was done in debug mode!

Will check the results in release mode

Saturday, June 21, 2008 8:53 AM by rudi

# Arjan`s World &raquo; LINKBLOG for June 21, 2008

Pingback from  Arjan`s World    &raquo; LINKBLOG for June 21, 2008

Saturday, June 21, 2008 3:16 PM by Arjan`s World » LINKBLOG for June 21, 2008

# re: PERFORMANCE: IEnumerable<>.Sum

@Chad: The Sum() function doesn't iterate over the collection twice. It simply uses a foreach to iterate through the collection once.

This is how the Sum<int> method looks like in Reflector:

public static int Sum(this IEnumerable<int> source)

{

   if (source == null)

   {

       throw Error.ArgumentNull("source");

   }

   int num = 0;

   foreach (int num2 in source)

   {

       num += num2;

   }

   return num;

}

Saturday, June 21, 2008 9:06 PM by Jeevan James

# Reflective Perspective - Chris Alcock &raquo; The Morning Brew #120

Pingback from  Reflective Perspective - Chris Alcock  &raquo; The Morning Brew #120

# re: PERFORMANCE: IEnumerable<>.Sum

What will change this is PLINQ (Parallel Linq), even if you don't have a multi-core PC, by using the AsParallel enumeration extension method, you'll make sure that when you're application does get run on a multicore machine (which in the next 3-5 years is an inevitably) you're code will run as fast as possible.

Tuesday, June 24, 2008 2:42 PM by Jon von Gillern