Can someone explain the relationship between implementing WITH (…) AS X … and integer linear programs?
I wish this post had more examples of concrete non-tree queries. Is cross-joining a CTE to itself realistic?
Imo this discussion seems to assume multiple references of a CTE contain many of the same rows with different transforms which are later joined.. what’s an actual example of this? I don’t think I’ve ever seen it
Usually CTEs are shorthand for doing a few joins. And when you reference them multiple times, it’s almost always with different rows in the CTE table. AKA little-no reuse potential. Postgres WITH AS NOT MATERIALIZED of course
More deserves to be written on the ILP idea, I haven't actually tried to make it work but it seems to me like the only real direction to optimize ("optimize" used in the mathematical sense, rather than the programming sense) queries that you can't just exploit the principle of optimality on (see this earlier article I wrote [1] for a bit more exposition). I think maybe some of the egg [2] people have experimented with this.
Speaking from experience, there's lots of rewrites you'd want to do in a query optimizer that having access to efficient DAG-shaped query plans would make tenable, but as a specific example they are an important part of doing full subquery decorrelation [3].
Also just in general the idea of at least being worst case optimal (polynomial faster multi way join-project w.r.t. input rows, assuming fixed query (but those polynomials are given by concepts like the "fractional hypertree width" of a query) but pathologically connected/connecting input rows to the join):
the simplest example is a simple two-column table of a directed graph's edges, e(from, to). Then:
Triangle(a, b, c) := e(a, b), e(b, c), e(c, a).
#Triangle <= c * (#e)^1.5, for a small c. (I think c = 3 in this case, but I'm not sure.)
Using binary joins for pathological input will however take O((#e)^2). If you do row-granularity lookup using an auxiliary index (actually 2 in the simplified way, but they can be merged by pushing the post-lookup choice down to index creation) along with indexing e twice (once for each of the two columns), you can do it in O((#e)^1.5):
Create auxiliary indices eaux1 and eaux2:
eaux1(from, (#e(from, to))).
eaux2(to, (#e(from, to))).
Iterate over e(a, b):
Look up eaux1(b, ?countbc) and eaux2(a, ? countca).
If countbc < countca, then:
expand through point lookup e(b, ?c). Filter those resulting e(?a, b), e(b, ?c) candidates using a point membership query suitable index of e(c, a).
else:
expand through point lookup e(?c, a). Filter those resulting e(a, ?b), e(?c, a) candidates using a point membership query suitable index of e(b, c).
Done.
As can be seen, the trick is to select between the current options for "next table to join current partial row against" at each nesting level of the generally nested-loop-join structured query execution, using premade indices that reveal the iteration count of the upcoming inner nested loop in O(1) time.
Yes! Technically you have to say `WITH RECURSIVE ...` first, but yes, it's a very common thing to do to, for example, compute the transitive closure of nested groups. Just remember to use `UNION` not `UNION ALL`! (Otherwise you will get an infinite loop if there's a circular group.)
WITH RECURSIVE closure AS (
// Seed the closure
SELECT parent AS parent, child AS child
FROM groups
// Repeat the right side of the UNION
// until no new rows are added
UNION
// Recursively join the closure to the
// groups
SELECT g.parent, c.child
FROM groups g
JOIN closure c ON g.child = c.parent
);
I wish this post had more examples of concrete non-tree queries. Is cross-joining a CTE to itself realistic?
Imo this discussion seems to assume multiple references of a CTE contain many of the same rows with different transforms which are later joined.. what’s an actual example of this? I don’t think I’ve ever seen it
Usually CTEs are shorthand for doing a few joins. And when you reference them multiple times, it’s almost always with different rows in the CTE table. AKA little-no reuse potential. Postgres WITH AS NOT MATERIALIZED of course