The Linq query syntax introduced in C# 3.0 is an extremely powerful way to manipulate collections of objects. Depending on the query provider the expression may be translated in a number of ways, such as SQL statements for Linq-to-SQL or XQuery expressions for Linq-to-XML, but the simplest is when you're dealing with collections of normal objects and the syntax is translated into method calls; this is known as Linq-to-Objects. Although there are now hundreds of articles on the web explaining how to use the syntax for Linq-to-Objects, I haven't found any that go into the gritty details of how the query syntax is translated into the underlying method calls. It's a gap this series aims to fill.
Why should you care what's happening underneath the syntax as long as it works? Well firstly you'll understand the code you're writing and, as with things like pointers or garbage collection, although you don't need to understand it to create working code you can write better code if you do. Secondly if you can write the underlying method call equivalents of the query syntax you can take advantage of some of the more powerful features not available through it, such as accessing the index of the current item. Thirdly - well it's just plain interesting, right?
Throughout this series I'm going to use the following types to demonstrate the query syntax and translations:
public class Customer
{
public Customer() { this.Orders = new Collection<Order>(); }
public string Name { get; set; }
public Collection<Order> Orders { get; private set; }
}
public class Order
{
public Order() { this.Lines = new Collection<Line>(); }
public int Id { get; set; }
public string CustomerName { get; set; }
public Collection<Line> Lines { get; private set; }
}
public class Line
{
public string ProductCode { get; set; }
public decimal UnitPrice { get; set; }
public int UnitCount { get; set; }
}
public class Product
{
public string Code { get; set; }
public string Name { get; set; }
}
I'll also assume we have the following collections which are implicitly available, having been constructed for us somewhere:
Collection<Customer> customers; // contains all customers, orders and lines
Collection<Product> products; // contains all products
In this entry I'll warm up with a trivial Linq expression that selects all the customer names:
var names = from customer in customers select customer.Name;
If we try to build this without importing the System.Linq namespace then we get the following compiler error:
error CS1061: 'System.Collections.ObjectModel.Collection<Customer>' does not contain a definition for 'Select' and no extension method 'Select' accepting a first argument of type 'System.Collections.ObjectModel.Collection<Customer>' could be found (are you missing a using directive or an assembly reference?)
This error indicates that to translate the statement the compiler is looking for a method or extension method named 'Select' on the collection. It also allows us to bust the first common myth of Linq - that the extension methods in the System.Linq namespace are somehow special. They aren't. The compiler doesn't care how this Select method is implemented, or whether it is actually declared on the type or an extension method, it is just looking for a method it can call on the collection which matches the following signature:
public static IEnumerable<TResult> Select<TSource, TResult>(
this IEnumerable<TSource> source,
Func<TSource, TResult> selector);
This method takes an enumerable list of source items and produces an enumerable list of result items on a 1:1 basis, using a delegate to convert the source items. Now while we could import the System.Linq namespace which would give us access to a suitable extension method defined on the Enumerable class, we aren't going to. Just to prove that there really is nothing at all magical about Linq, we're going to write our own version of the extension methods and use those instead. Here's Select:
public static IEnumerable<TResult> Select<TSource, TResult>(
this IEnumerable<TSource> source,
Func<TSource, TResult> selector)
{
foreach (var item in source)
{
yield return selector(item);
}
}
Now the compiler will compile the code using our Select implementation and produce the following method call:
var names = customers.Select(customer => customer.Name);
Looking at the Linq query syntax along with the equivalent method call, you can see that the collection after the in keyword becomes the source for Select, the range variable after the from keyword becomes the parameter to the selector delegate, and the value after the select keyword becomes the body of the selector delegate.
That's a fairly straightforward transformation, but next time we'll look at what happens when there are multiple from clauses in a row, where it gets a bit more complicated.
Posted
Mar 25 2008, 10:32 PM
by
Greg Beech