Greg Beech's Website

Translating C# 3.0 query syntax for Linq-to-Objects, Part 3: Where

This is the third entry in a series demonstrating how C# 3.0 query syntax is translated into the underlying method calls for Linq-to-Objects, and that you can write your own versions of the Linq extension methods. The previous entries were:

Now we know how basic queries to select and project data are translated, we'll start looking at the more powerful things you can do with Linq. One of the most common operations is to filter the results using a where clause. For example, if we wanted to get all customers that have not made an order we could use the following statement to retrieve them:

var items = from customer in customers where customer.Orders.Count == 0 select customer;

Here the compiler is looking for a method called Where that takes a predicate for each item to determine whether it should be returned:

public static IEnumerable<TSource> Where<TSource>(
    this IEnumerable<TSource> source, 
    Func<TSource, bool> predicate)
{
    foreach (var item in source)
    {
        if (predicate(item))
        {
            yield return item;
        }
    }
}

The Linq query is translated using this method to:

var items = customers.Where(customer => customer.Orders.Count == 0);

It's a straightforward transformation with the range variable becoming the argument to the predicate and the expression after the where clause becoming its body. The only thing that might seem a little odd is that the select customer part doesn't have any corresponding Select method. This is because the Where method already returns an enumerable list of the correct type so the Select would be a no-op (simply projecting customer to customer) and the compiler is smart enough to omit it.

Query syntax translation illustration

Things get a little more interesting when we have multiple from clauses before the where clause, e.g.

var items = 
    from customer in customers 
    from order in customer.Orders
    where order.Lines.Count > 0 select order;

We saw last time that multiple from clauses translate into SelectMany calls, and when they are chained an anonymous tuple is built up containing all of the preceding range variables. Using this principle, the translation by the compiler is:

var items = customers
    .SelectMany(
        customer => customer.Orders,
        (customer, order) => new { customer, order })
    .Where(
        tuple => tuple.order.Lines.Count > 0)
    .Select(
        tuple => tuple.order);

The anonymous tuple resulting from SelectMany is passed into the Where method, which means you can access any range variable within the predicate and the compiler will extract it for you. The translation also has to make a call to Select as the anonymous tuple is not the desired return type of the expression, so it has to be projected back into an order.

 Translation with multiple from clauses to where

It's interesting to note that the compiler's translation of the Linq query isn't optimal; it could determine that the predicate does not access the customer range variable and make SelectMany project an order rather than the anonymous tuple. This would mean fewer object allocations by SelectMany and allow us to discard the Select call entirely as the collection would already be of the desired type:

var items = customers
    .SelectMany(
        customer => customer.Orders,
        (customer, order) => order)
    .Where(
        order => order.Lines.Count > 0);

Even so, I'd still recommend using the query syntax because it's clearer, unless this is in an extremely performance-critical part of your application or you're dealing with very large collections where allocations might be an issue. The query translations are likely to be better in future versions of the C# compiler, so live with it for now and enjoy a free performance upgrade later on.

Next time we'll see how orderby clauses are translated.


Posted Apr 03 2008, 12:41 AM by Greg Beech
Filed under: ,

Comments

Frans Bouma wrote re: Translating C# 3.0 query syntax for Linq-to-Objects, Part 3: Where
on 04-04-2008 5:00 PM

Isn't all this documented in the C# 3.0 spec document, chapter 7? You can find a lot of info you're describing here in that chapter, i.e. what is translated into what

Yep the official documentation is section 7.15.2. It's certainly useful from an implementation standpoint, and is technically correct of course (probably more so than my descriptions) but I just didn't find it that readable or that it had a logical progression through building up more complex queries.

As with most things, the information is already available somewhere else. I was just hoping to present it in a more accessible format, and explore what the various methods are actually doing under the covers so people (mainly myself if I'm honest) can fully understand the code they're writing.

The implementations have been pretty simple so far but they get more interesting once we start ordering and grouping - Greg

Greg Beech's Tech Blog wrote Translating C# 3.0 query syntax for Linq-to-Objects, Part 4: Let
on 04-21-2008 8:20 PM

This is the fourth entry in a series demonstrating how C# 3.0 query syntax is translated into the underlying

Add a Comment

(required)  
(optional)
(required)  
Remember Me?

Enter the numbers above:
Copyright (C) Greg Beech. All rights reserved.
Powered by Community Server (Non-Commercial Edition), by Telligent Systems