2009 07 10 flattening sub queries in from clauses

Published on July 10th, 2009 at 14:28

Flattening Sub-Queries in From Clauses

Following up on my recent post on the progress we are making with re-linq, I’d like to describe a particular query transformation we implemented last week: flattening of sub-queries in from clauses.

To understand the problem, take a look at the following query:

var query = from c in Customers
            from oi in (from o in c.Orders
                        where o.OrderNumber > 100
                        from oi1 in o.OrderItems
                        select oi1)
            where oi.Product == "Comb"
            select oi;

This is a query that holds a sub-query in its second from clause. re-linq represents this as an AdditionalFromClause with a SubQueryExpression in its FromExpression. But where’s the point?

Well, that query could just as well be written as follows:

var query = from c in Customers
            from o in c.Orders
            where o.OrderNumber > 100
            from oi in o.OrderItems
            where oi.Product == "Comb"
            select oi;

This second query is semantically equivalent to the query above – both select

all order items for “Combs”
of orders with numbers greater than 100
from all customers.

But there is a huge difference in complexity – the first code has a dependent subquery (i.e. the subquery accesses a variable from the outer scope), which is difficult to translate to many query systems, like, for example, SQL. The second query, on the other hand, is very easy to translate to SQL.

That’s why we’ve pre-implemented this transformation – you can find it here:
Remotion.Data.Linq.Transformations.SubQueryFromClauseFlattener.

There’s a caveat, however: The automatic flattening only works if the subquery contains neither result operators (such as Distinct, Take, or GroupBy) nor OrderByClauses. Why? Because a Distinct or OrderByClause can’t simply be pulled into the outer query – that would change the semantics of the query. Don’t believe me? Try it out!

Using the power of Linq-to-Objects:

var values = new[] { 2, 1, 1 };

var query1 = from i1 in values
             from i2 in (from i3 in values select i3).Distinct ()
             select new { i1, i2 };

var query2 = (from i1 in values
              from i3 in values
              select new { i1, i3 }).Distinct ();

Console.WriteLine ("Inner Distinct: "query1.Aggregate ("", (s, r) => s + r.ToString()));
Console.WriteLine ("Outer Distinct: "query2.Aggregate ("", (s, r) => s + r.ToString()));

var query3 = from i1 in values
             from i2 in (from i3 in values orderby i3 select i3)
             select new { i1, i2 };

var query4 = from i1 in values
             from i3 in values
             orderby i3
             select new { i1, i3 };

Console.WriteLine ("Inner OrderBy: query3.Aggregate ("", (s, r) => s + r.ToString ()));
Console.WriteLine ("Outer OrderBy: query4.Aggregate ("", (s, r) => s + r.ToString ()));

This yields the following result (formatted and annotated by me):

Inner Distinct: { i1 = 2, i2 = 2 }{ i1 = 2, i2 = 1 }{ i1 = 1, i2 = 2 }{ i1 = 1, i2 = 1 }
                { i1 = 1, i2 = 2 } { i1 = 1, i2 = 1 }  
Outer Distinct: { i1 = 2, i3 = 2 }{ i1 = 2, i3 = 1 }{ i1 = 1, i3 = 2 }{ i1 = 1, i3 = 1 }

[Much fewer items with an outer distinct!]

Inner OrderBy: { i1 = 2, i2 = 1 }{ i1 = 2, i2 = 1 }{ i1 = 2, i2 = 2 }{ i1 = 1, i2 = 1 }
               { i1 = 1, i2 = 1 }{ i1 = 1, i2 = 2 }{ i1 = 1, i2 = 1 }{ i1 = 1, i2 = 1 }
               { i1 = 1, i2 = 2 }  
Outer OrderBy: { i1 = 2, i3 = 1 }{ i1 = 2, i3 = 1 }{ i1 = 1, i3 = 1 }{ i1 = 1, i3 = 1 }
               { i1 = 1, i3 = 1 }{ i1 = 1, i3 = 1 }{ i1 = 2, i3 = 2 }{ i1 = 1, i3 = 2 }
               { i1 = 1, i3 = 2 }

If you think about it, it’s logical: result operators and OrderByClauses can completely change (or at least reorder) the whole result set of the query. Of course it makes a difference if you only apply them to a part of the data or to all of it.

Therefore, SubQueryFromClauseFlattener will throw an exception if it encounters either a result operator or an OrderByClause in the outer QueryModel.

One last thing: In the code sample at the beginning of this post, why did I put the sub-query into an AdditionalFromClause rather than the MainFromClause? Hint: The answer lies in the expression tree…

2009 07 10 flattening sub queries in from clauses

Flattening Sub-Queries in From Clauses

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally