This is the third entry in a series demonstrating how C# 3.0 query syntax is translated into the underlying method calls for Linq-to-Objects, and that you can write your own versions of the Linq extension methods. The previous entries were:
Now we know how basic queries to select and project data are translated, we'll start looking at the more powerful things you can do with Linq. One of the most common operations is to filter the results using a where clause. For example, if we wanted to get all customers that have not made an order we could use the following statement to retrieve them:
var items = from customer in customers where customer.Orders.Count == 0 select customer;
Here the compiler is looking for a method called Where that takes a predicate for each item to determine whether it should be returned:
public static IEnumerable<TSource> Where<TSource>(
this IEnumerable<TSource> source,
Func<TSource, bool> predicate)
{
foreach (var item in source)
{
if (predicate(item))
{
yield return item;
}
}
}
The Linq query is translated using this method to:
var items = customers.Where(customer => customer.Orders.Count == 0);
It's a straightforward transformation with the range variable becoming the argument to the predicate and the expression after the where clause becoming its body. The only thing that might seem a little odd is that the select customer part doesn't have any corresponding Select method. This is because the Where method already returns an enumerable list of the correct type so the Select would be a no-op (simply projecting customer to customer) and the compiler is smart enough to omit it.

Things get a little more interesting when we have multiple from clauses before the where clause, e.g.
var items =
from customer in customers
from order in customer.Orders
where order.Lines.Count > 0 select order;
We saw last time that multiple from clauses translate into SelectMany calls, and when they are chained an anonymous tuple is built up containing all of the preceding range variables. Using this principle, the translation by the compiler is:
var items = customers
.SelectMany(
customer => customer.Orders,
(customer, order) => new { customer, order })
.Where(
tuple => tuple.order.Lines.Count > 0)
.Select(
tuple => tuple.order);
The anonymous tuple resulting from SelectMany is passed into the Where method, which means you can access any range variable within the predicate and the compiler will extract it for you. The translation also has to make a call to Select as the anonymous tuple is not the desired return type of the expression, so it has to be projected back into an order.
It's interesting to note that the compiler's translation of the Linq query isn't optimal; it could determine that the predicate does not access the customer range variable and make SelectMany project an order rather than the anonymous tuple. This would mean fewer object allocations by SelectMany and allow us to discard the Select call entirely as the collection would already be of the desired type:
var items = customers
.SelectMany(
customer => customer.Orders,
(customer, order) => order)
.Where(
order => order.Lines.Count > 0);
Even so, I'd still recommend using the query syntax because it's clearer, unless this is in an extremely performance-critical part of your application or you're dealing with very large collections where allocations might be an issue. The query translations are likely to be better in future versions of the C# compiler, so live with it for now and enjoy a free performance upgrade later on.
Next time we'll see how orderby clauses are translated.
Posted
Apr 03 2008, 12:41 AM
by
Greg Beech