Skip to content

Commit 07e70c5

Browse files
Write up draft
1 parent 50501b3 commit 07e70c5

File tree

1 file changed

+192
-2
lines changed

1 file changed

+192
-2
lines changed
Lines changed: 192 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,194 @@
11
= Functions v/s rules
22

3-
[placeholder]
4-
Comparing 3.x functions to 2.x rules
3+
TypeDB 3 introduces functions to reason over your data.
4+
They replace rules, which were the way to reason in TypeDB 2.
5+
Functions work in a similar way to rules, providing an intuitive abstraction over subqueries. They should be able to replace them in most cases.
6+
This page discusses the similarities, differences and factors to consider when moving from rules to functions.
7+
8+
== Separating data & computation
9+
Whereas rules were a natural fit for predicate-logic based semantics of TypeDB 2,
10+
rules are better suited for TypeDB 3's type-theory based semantics.
11+
The major difference to the user is that there is now a separation between the data and reasoning machinery.
12+
Rules allowed the user to think reason purely in terms of the data-model and silently completed the data with inferred concepts.
13+
Functions force the user to separate reasoning from the data-model, making the computation involved in the query more explicit and the process more transparent overall.
14+
15+
Although this shift sounds significant, the truth is most advanced rules required the user to think of the computation involved for efficiency reasons.
16+
17+
== Similarities
18+
Although one would ideally think natively in functions, it is easy to see the similarities between rules and functions.
19+
20+
E.g. A rule completing a `reachable` relation transitively, can be seen as a function which computes all pairs of reachable nodes.
21+
Conversely, the function can be seen as special type of relation where the roles are implicitly specified by the position of arguments or returned concepts.
22+
23+
[,typeql]
24+
----
25+
define
26+
rule transitive-reachability:
27+
when {
28+
{
29+
(from: $from, to: $to) isa edge;
30+
} or {
31+
(from: $from, to: $via) isa edge;
32+
(from: $via, to: $to) isa reachable;
33+
};
34+
} then {
35+
(from: $from, to: $to) isa reachable;
36+
};
37+
38+
# query
39+
match
40+
$x isa node; $y isa node;
41+
(from: $x, to: $y) isa reachable;
42+
----
43+
44+
[,typeql]
45+
----
46+
define
47+
fun reachable() -> { node, node }:
48+
match
49+
{
50+
(from: $from, to: $to) isa edge;
51+
} or {
52+
let $from, $via in reachable();
53+
(from: $via, to: $to) isa edge;
54+
};
55+
return { $from, $to };
56+
57+
# query
58+
match
59+
$x isa node; $y isa node;
60+
let $x, $y in reachable();
61+
----
62+
63+
An equally valid way is to see a function as a predicate computing whether one node is a reachable from the other.
64+
[,typeql]
65+
----
66+
define
67+
fun reachable($from: node, $to: node) -> bool:
68+
match
69+
{
70+
(from: $from, to: $to) isa edge;
71+
} or {
72+
(from: $from, to: $via) isa edge;
73+
true == reachable($via, $to);
74+
};
75+
return check;
76+
77+
# query
78+
match
79+
$x isa node; $y isa node;
80+
true == reachable($x, $y);
81+
----
82+
83+
== Arguments & return values
84+
From the example, there are multiple ways of expressing the same rule as a function.
85+
This is understandable given (mathematically) a relation is a collection of tuples,
86+
whereas a function is a mapping from the input domain to the output range.
87+
So there is some ambiguity as to which members are inputs and which are outputs.
88+
89+
This depends on the context - what needs to be computed in the query.
90+
We may want to enumerate the pairs of nodes for which one is reachable from the other,
91+
check whether a node is reachable from the other,
92+
or even enumerate the nodes reachable from a given node:
93+
94+
[,typeql]
95+
----
96+
define
97+
fun reachable_from($from: node) -> { node }:
98+
match
99+
{
100+
(from: $from, to: $to) isa edge;
101+
} or {
102+
let $via in reachable_from($from);
103+
(from: $via, to: $to) isa edge;
104+
};
105+
return { $to };
106+
107+
# query
108+
match
109+
$x isa node, has id "123";
110+
$y isa node;
111+
let $y in reachable_from($x);
112+
----
113+
114+
Currently, each of these use cases would need a different function to be defined.
115+
The arguments are inputs to the function and must be bound by the rest of the query.
116+
The function will bind the returned values to the variables on the left of `in`.
117+
118+
119+
== Partial evaluation of functions
120+
Although the split between arguments and returned concepts is a reasonable consequence of
121+
having the user think about the computation explicitly, it's less declarative than the rules
122+
where the planner would infer which role-players of a relation should be bound before evaluating the rules.
123+
In the future, we hope to introduce "partial functions" where the planner may choose to leave some
124+
it to the function to bind some of the arguments instead of requiring them as outputs.
125+
126+
== What functions can do
127+
Since functions aren't forced to return relations (or ownerships) which are defined in the schema,
128+
they are more flexible and can also operate on raw values, including structs (when they are implemented).
129+
Since they don't infer new instances, they also avoid the overhead
130+
of making the result of the subquery abide by the rules governing the inferred types.
131+
In short, they're a really simple way of doing reasoning.
132+
133+
Functions are still evaluated on demand in a goal-driven way.
134+
This means we don't materialize all the results of the function when the data is updated.
135+
Instead, we only compute the calls relevant to the query being evaluated.
136+
The advantage of a goal-driven approaches over an eagerly materializing one is that they support very large models
137+
(in theory infinitely large models - i.e. an infinite number of inferred concepts).
138+
A simple illustration is this toy function that produces natural numbers.
139+
140+
[,typeql]
141+
----
142+
with fn nat() -> { integer }:
143+
match
144+
{let $x = 0;} or
145+
{ let $x in nat() + 1; };
146+
return { $x };
147+
148+
match let $x in nat();
149+
----
150+
151+
Since TypeDB's grpc endpoint supports reactive streaming,
152+
this query will stream them out natural numbers on demand.
153+
This isn't quite an infinite domain, since we are limited to the domain of the `integer` type which is a signed 64-bit integer.
154+
Understandable, since this is the case with most programming languages -
155+
but it does bring us to the next section.
156+
157+
=== What functions can't do (yet)
158+
To the best of our knowledge, TypeDB in its current state can't do "arbitrarily large" models.
159+
160+
This limitation is unlikely to be practically relevant, since most uses of functions only need to compute tuples of concepts which exist in the database -
161+
that means combination of finite sets - combinatorially large but still finite.
162+
163+
It's still (theoretically) interesting for two reasons - (1) TypeDB 2 could do it; (2) A feature complete TypeDB 3 can too.
164+
165+
==== How did TypeDB 2 do it? Nested relations.
166+
In theory, one can infer an arbitrary number of nested relations,
167+
because you can always infer a new one which relates the one you just inferred.
168+
169+
In particular this makes state-space search on "open" worlds possible.
170+
For games like the Rubik's cube, it is possible to assign each state a number (not necessarily one that fits in 64 bits).
171+
For open worlds such as an arbitrary block's world, this doesn't work since the number of "objects" of each type aren't fixed.
172+
You would instead represent a state as a relation involving the previous state and the action taken.
173+
174+
==== How will TypeDB 3 solve this problem? Lists.
175+
Lists can be arbitrarily long. A state can be represented by specifying the initial state,
176+
and the list of actions which brings us to this state.
177+
178+
179+
[NOTE]
180+
====
181+
Modern computers have finite memory so none of this is truly infinite.
182+
The difference between the finiteness of TypeDB 3's model without and with lists
183+
is that the former is constrained by TypeQL's limited expressivity,
184+
while the latter is constrained by the limits of the underlying computer.
185+
====
186+
187+
[NOTE]
188+
====
189+
This is based on ideas & results from the deductive databases field.
190+
A lot of research there is conducted on datalog -
191+
a "syntactic subset" of prolog that bans compound terms.
192+
Lists are central to prolog and are implemented as recursive compound terms.
193+
The exclusion of compound terms has implications on the decidability of datalog.
194+
====

0 commit comments

Comments
 (0)