-
Notifications
You must be signed in to change notification settings - Fork 58
2009 07 10 flattening sub queries in from clauses
Published on July 10th, 2009 at 14:28
Following up on my recent post on the progress we are making with re-linq, I’d like to describe a particular query transformation we implemented last week: flattening of sub-queries in from clauses.
To understand the problem, take a look at the following query:
var query = from c in Customers
from oi in (from o in c.Orders
where o.OrderNumber > 100
from oi1 in o.OrderItems
select oi1)
where oi.Product == "Comb"
select oi;
This is a query that holds a sub-query in its second from clause. re-linq represents this as an AdditionalFromClause with a SubQueryExpression in its FromExpression. But where’s the point?
Well, that query could just as well be written as follows:
var query = from c in Customers
from o in c.Orders
where o.OrderNumber > 100
from oi in o.OrderItems
where oi.Product == "Comb"
select oi;
This second query is semantically equivalent to the query above – both select
- all order items for “Combs”
- of orders with numbers greater than 100
- from all customers.
But there is a huge difference in complexity – the first code has a dependent subquery (i.e. the subquery accesses a variable from the outer scope), which is difficult to translate to many query systems, like, for example, SQL. The second query, on the other hand, is very easy to translate to SQL.
That’s why we’ve pre-implemented this transformation – you can find it here:
Remotion.Data.Linq.Transformations.SubQueryFromClauseFlattener.
There’s a caveat, however: The automatic flattening only works if the subquery contains neither result operators (such as Distinct, Take, or GroupBy) nor OrderByClauses. Why? Because a Distinct or OrderByClause can’t simply be pulled into the outer query – that would change the semantics of the query. Don’t believe me? Try it out!
Using the power of Linq-to-Objects:
var values = new[] { 2, 1, 1 };
var query1 = from i1 in values
from i2 in (from i3 in values select i3).Distinct ()
select new { i1, i2 };
var query2 = (from i1 in values
from i3 in values
select new { i1, i3 }).Distinct ();
Console.WriteLine ("Inner Distinct: "query1.Aggregate ("", (s, r) => s + r.ToString()));
Console.WriteLine ("Outer Distinct: "query2.Aggregate ("", (s, r) => s + r.ToString()));
var query3 = from i1 in values
from i2 in (from i3 in values orderby i3 select i3)
select new { i1, i2 };
var query4 = from i1 in values
from i3 in values
orderby i3
select new { i1, i3 };
Console.WriteLine ("Inner OrderBy: query3.Aggregate ("", (s, r) => s + r.ToString ()));
Console.WriteLine ("Outer OrderBy: query4.Aggregate ("", (s, r) => s + r.ToString ()));
This yields the following result (formatted and annotated by me):
Inner Distinct: { i1 = 2, i2 = 2 }{ i1 = 2, i2 = 1 }{ i1 = 1, i2 = 2 }{ i1 = 1, i2 = 1 }
{ i1 = 1, i2 = 2 } { i1 = 1, i2 = 1 }
Outer Distinct: { i1 = 2, i3 = 2 }{ i1 = 2, i3 = 1 }{ i1 = 1, i3 = 2 }{ i1 = 1, i3 = 1 }
[Much fewer items with an outer distinct!]
Inner OrderBy: { i1 = 2, i2 = 1 }{ i1 = 2, i2 = 1 }{ i1 = 2, i2 = 2 }{ i1 = 1, i2 = 1 }
{ i1 = 1, i2 = 1 }{ i1 = 1, i2 = 2 }{ i1 = 1, i2 = 1 }{ i1 = 1, i2 = 1 }
{ i1 = 1, i2 = 2 }
Outer OrderBy: { i1 = 2, i3 = 1 }{ i1 = 2, i3 = 1 }{ i1 = 1, i3 = 1 }{ i1 = 1, i3 = 1 }
{ i1 = 1, i3 = 1 }{ i1 = 1, i3 = 1 }{ i1 = 2, i3 = 2 }{ i1 = 1, i3 = 2 }
{ i1 = 1, i3 = 2 }
If you think about it, it’s logical: result operators and OrderByClauses can completely change (or at least reorder) the whole result set of the query. Of course it makes a difference if you only apply them to a part of the data or to all of it.
Therefore, SubQueryFromClauseFlattener will throw an exception if it encounters either a result operator or an OrderByClause in the outer QueryModel.
One last thing: In the code sample at the beginning of this post, why did I put the sub-query into an AdditionalFromClause rather than the MainFromClause? Hint: The answer lies in the expression tree…