Greg Beech's Website

Translating C# 3.0 query syntax for Linq-to-Objects, Part 2: SelectMany

This is the second entry in a series demonstrating how C# 3.0 query syntax is translated into the underlying method calls for Linq-to-Objects, and that you can write your own versions of the Linq extension methods. In the first entry I outlined the types and collections I'd be using for this series, and looked at the translation of a simple Linq query with a single from clause to the Select extension method.

This translation is very useful for converting collections using a 1:1 mapping, however, it isn't so useful when there is a 1:many relationship, for example if we wanted to get all the orders by stating customer.Orders instead of customer.Id then we'd end up with a return type of IEnumerable<Collection<Order>> whereas what we'd really like is to get an IEnumerable<Order>.

Fortunately the Linq query syntax allows us to have multiple from clauses so we can return individual orders and get the desired return type.

var items = from customer in customers from order in customer.Orders select order;

Unfortunately when we try to compile this we get a compiler error indicating that the compiler can't translate this using our Select method, and instead wants a method named 'SelectMany'.

error CS1935: Could not find an implementation of the query pattern for source type 'System.Collections.ObjectModel.Collection<Customer>'.  'SelectMany' not found.  Are you missing a reference to 'System.Core.dll' or a using directive for 'System.Linq'?

This is a more complex version of Select that rather than mapping a source item directly to a result item, maps a source item to a collection of intermediate items, and then maps each source item and each of its intermediate items to a collection of result items:

public static IEnumerable<TResult> SelectMany<TSource, TCollection, TResult>(
this IEnumerable<TSource> source,
Func<TSource, IEnumerable<TCollection>> collectionSelector,
Func<TSource, TCollection, TResult> resultSelector)
{
foreach (var item in source)
{
foreach (var intermediateItem in collectionSelector(item))
{
yield return resultSelector(item, intermediateItem);
}
}
}

The equivalent method call to the Linq query, as translated by the compiler, is as follows:

var items = customers.SelectMany(customer => customer.Orders, (customer, order) => order);

So we can see that the transformation here is a bit more complex. Once again the first collection after the in keyword is the source for the statement, but this time rather than mapping to the value after the select keyword the first delegate maps to the value after the second in keyword. The second delegate then maps from the first range variable customer and the second range variable order to the value after the select keyword.

Query translation illustration

What about if we had a more complex statement though, that had a third from clause? This clearly can't translate directly to SelectMany as it has two intermediate collections rather than one, and yet if we try to compile the following statement it is successful:

var items =
from customer in customers
from order in customer.Orders
from line in order.Lines
select line;

The key here is chaining more than one SelectMany expressions together:

var items = customers
.SelectMany(
customer => customer.Orders,
(customer, order) => new { customer, order })
.SelectMany(
tuple => tuple.order.Lines,
(tuple, line) => line);

The first SelectMany works in much the same way as the simpler case, but instead of directly selecting the result it creates an anonymous tuple containing both of the range variables. The next SelectMany acts on this to return the second intermediate collection and the final result after the select keyword. You can really start to see the benefit of the query syntax as it handles all the chaining with anonymous types for you in a much cleaner way.

Multiple from query translation illustration

From this we can formulate the more general translation that when there are multiple from clauses in a row, the first two are processed as a pair by SelectMany. Each subsequent from clause translates to a chained SelectMany call and is fed with the result of the previous SelectMany, aggregating each successive range variable into an anonymous tuple. The final from clause yields the result defined after the select keyword.

That's it for Select and SelectMany - next time we'll have a look at filtering the results.


Posted Mar 29 2008, 10:44 AM by Greg Beech
Filed under: ,

Comments

Greg Beech's Tech Blog wrote Translating C# 3.0 query syntax for Linq-to-Objects, Part 3: Where
on 04-03-2008 12:41 AM

This is the third entry in a series demonstrating how C# 3.0 query syntax is translated into the underlying

Greg Beech's Tech Blog wrote Translating C# 3.0 query syntax for Linq-to-Objects, Part 4: Let
on 04-21-2008 8:20 PM

This is the fourth entry in a series demonstrating how C# 3.0 query syntax is translated into the underlying

Add a Comment

(required)  
(optional)
(required)  
Remember Me?

Enter the numbers above:
Copyright (C) Greg Beech. All rights reserved.
Powered by Community Server (Non-Commercial Edition), by Telligent Systems