Skip to content

Commit 02e9565

Browse files
Functions vs rules (#982)
## Goal Document the similarities and differences between 2.x rules and 3.x functions, and explain how they should be equally expressive while being slightly more flexible.
1 parent 1211d25 commit 02e9565

File tree

3 files changed

+202
-0
lines changed

3 files changed

+202
-0
lines changed

core-concepts/modules/ROOT/pages/typeql/query-variables-patterns.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -187,6 +187,7 @@ A variable is "bound" in a disjunction if it is bound in every branch of the dis
187187
Variables which are present in & bound by only some of the branches of the disjunction
188188
(and not present outside it) are considered "internal" to that disjunction.
189189

190+
[,typeql]
190191
----
191192
#!test[read, count=2]
192193
match
Lines changed: 195 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,195 @@
1+
= Functions v/s rules
2+
3+
TypeDB 3 introduces functions to reason over your data.
4+
They replace rules, which were the way to reason in TypeDB 2.
5+
Functions work in a similar way to rules, providing an intuitive abstraction over subqueries. They should be able to replace them in most cases.
6+
This page discusses the similarities, differences and factors to consider when moving from rules to functions.
7+
8+
== Separating data & computation
9+
Whereas rules were a natural fit for predicate-logic based semantics of TypeDB 2,
10+
functions are better suited for TypeDB 3's type-theory based semantics.
11+
The major difference to the user is that there is now a separation between the data and reasoning constructs.
12+
Rules allowed the user to think reason purely in terms of the data-model by silently completing the data by inferring instances of existing types.
13+
Functions force the user to separate reasoning from the data-model, by explicitly creating a new function for any new computation that infers something new.
14+
15+
Although this shift sounds significant, most advanced rules do require the user to think of the computation involved for efficiency reasons.
16+
17+
== Similarities
18+
Although one would ideally think natively in functions (rather than thinking of them in terms of rules) it is easy to see the similarities between rules and functions.
19+
20+
E.g. A rule completing a `reachable` relation transitively, can be seen as a function which computes all pairs of nodes where one is reachable from the other.
21+
Conversely, the function can be seen as special type of relation where the roles are implicitly specified by the position of arguments or returned concepts.
22+
23+
[,typeql]
24+
----
25+
define
26+
rule transitive-reachability:
27+
when {
28+
{
29+
(from: $from, to: $to) isa edge;
30+
} or {
31+
(from: $from, to: $via) isa edge;
32+
(from: $via, to: $to) isa reachable;
33+
};
34+
} then {
35+
(from: $from, to: $to) isa reachable;
36+
};
37+
38+
# query
39+
match
40+
$x isa node; $y isa node;
41+
(from: $x, to: $y) isa reachable;
42+
----
43+
44+
[,typeql]
45+
----
46+
define
47+
fun reachable() -> { node, node }:
48+
match
49+
{
50+
(from: $from, to: $to) isa edge;
51+
} or {
52+
let $from, $via in reachable();
53+
(from: $via, to: $to) isa edge;
54+
};
55+
return { $from, $to };
56+
57+
# query
58+
match
59+
$x isa node; $y isa node;
60+
let $x, $y in reachable();
61+
----
62+
63+
An equally valid way is to see a function as a predicate computing whether one node is a reachable from the other.
64+
[,typeql]
65+
----
66+
define
67+
fun reachable($from: node, $to: node) -> bool:
68+
match
69+
{
70+
(from: $from, to: $to) isa edge;
71+
} or {
72+
(from: $from, to: $via) isa edge;
73+
true == reachable($via, $to);
74+
};
75+
return check;
76+
77+
# query
78+
match
79+
$x isa node; $y isa node;
80+
true == reachable($x, $y);
81+
----
82+
83+
== Arguments & return values
84+
From the example, there are multiple ways of expressing the same rule as a function.
85+
This is understandable given (mathematically) a relation is a collection of tuples,
86+
whereas a function is a mapping from the input domain to the output range.
87+
So there is some ambiguity as to which concepts are arguments and which are to be returned.
88+
89+
The choice depends on what the query needs to compute.
90+
Does it need to enumerate the pairs of nodes for which one is reachable from the other?
91+
Or check whether a node is reachable from the other?
92+
Or even enumerate the nodes reachable from a given node, as the function below does?
93+
94+
[,typeql]
95+
----
96+
define
97+
fun reachable_from($from: node) -> { node }:
98+
match
99+
{
100+
(from: $from, to: $to) isa edge;
101+
} or {
102+
let $via in reachable_from($from);
103+
(from: $via, to: $to) isa edge;
104+
};
105+
return { $to };
106+
107+
# query
108+
match
109+
$x isa node, has id "123";
110+
$y isa node;
111+
let $y in reachable_from($x);
112+
----
113+
114+
Currently, each of these use cases would need a different function to be defined.
115+
The arguments are inputs to the function and must be bound by the rest of the query.
116+
The function will bind the returned values to the variables on the left of `in`.
117+
118+
=== Contextually bound functions
119+
Although the split between arguments and returned concepts is a reasonable consequence of
120+
having the user think about the computation explicitly, it is less declarative than rules -
121+
where the planner would infer which role-players of a relation should be bound when evaluating the rule.
122+
A future version of TypeDB will introduce "contextually bound functions" where the planner may choose which
123+
variables are input to the functions and which are output - allowing a single function definition
124+
to satisfy all the cases discussed above.
125+
126+
== What functions can do
127+
Since functions aren't forced to return relations (or ownerships) which are defined in the schema,
128+
they are more flexible and can also operate on raw values, including structs (when they are implemented).
129+
Since they don't infer new instances, they also avoid the overhead
130+
of making the result of the computation abide by the rules governing the inferred types.
131+
In short, they're a really simple way of doing reasoning.
132+
133+
Functions are still evaluated on demand in a goal-driven way.
134+
This means we don't materialize all the results of the function when the data is updated.
135+
Instead, we only compute the calls relevant to the query being evaluated.
136+
The advantage of a goal-driven approaches over an eagerly materializing one is that they support very large models
137+
(in theory infinitely large models - i.e. an infinite number of inferred concepts).
138+
A simple illustration is this toy function that produces natural numbers.
139+
140+
[,typeql]
141+
----
142+
with fn nat() -> { integer }:
143+
match
144+
{let $x = 0;} or
145+
{ let $x in nat() + 1; };
146+
return { $x };
147+
148+
match let $x in nat();
149+
----
150+
151+
Since TypeDB's grpc endpoint also supports reactive streaming,
152+
this query will stream out natural numbers on demand.
153+
This isn't quite an infinite domain, since we are limited to the domain of the `integer` type which is a signed 64-bit integer.
154+
Understandable, since this is the case with most programming languages -
155+
but it does bring us to the next section.
156+
157+
=== What functions can't do (yet)
158+
TypeDB in its current state can't do "arbitrarily large" models.
159+
160+
This limitation is unlikely to be practically relevant,
161+
since most uses of functions only need to compute tuples of concepts which exist in the database.
162+
Since this is a combination of finite sets, it is combinatorially large but still finite.
163+
164+
The problem is still (theoretically) interesting for two reasons - (1) TypeDB 2 could do it; (2) A feature complete TypeDB 3 could too.
165+
166+
==== How did TypeDB 2 do it? Nested relations.
167+
In theory, one can infer an arbitrary number of nested relations,
168+
because you can always infer a new relation which relates the one you just inferred.
169+
170+
In particular this makes state-space search on "open" worlds possible.
171+
For games like the Rubik's cube, it is possible to assign each state a number (not necessarily one that fits in 64 bits).
172+
For open worlds such as an arbitrary block's world, this doesn't work since the number of "objects" of each type aren't fixed.
173+
You would instead represent a state as a relation involving the previous state and the action taken.
174+
175+
==== How will TypeDB 3 do it? Lists.
176+
Lists can be arbitrarily long. A state can be represented by specifying the initial state,
177+
and the list of actions which brings us to this state.
178+
179+
180+
[NOTE]
181+
====
182+
Modern computers have finite memory so none of this is truly infinite.
183+
The difference between the finiteness of TypeDB 3's model without and with lists
184+
is that the former is constrained by the limits of TypeQL's expressivity,
185+
while the latter is constrained by the limits of the underlying computer.
186+
====
187+
188+
[NOTE]
189+
====
190+
This is based on ideas & results from the deductive databases field.
191+
A lot of research there is conducted on datalog -
192+
a "syntactic subset" of prolog that bans compound terms.
193+
Lists are central to prolog and are implemented as recursive compound terms.
194+
The exclusion of compound terms has implications on the decidability of datalog.
195+
====

reference/modules/ROOT/pages/typeql/functions/index.adoc

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,4 +64,10 @@ Learn how to work with stream-return functions.
6464
****
6565
Learn how to work with scalar functions.
6666
****
67+
68+
.xref:{page-version}@reference::typeql/functions/functions-vs-rules.adoc[]
69+
[.clickable]
70+
****
71+
How do functions differ from TypeDB 2 rules?
72+
****
6773
--

0 commit comments

Comments
 (0)