|
| 1 | += Functions v/s rules |
| 2 | + |
| 3 | +TypeDB 3 introduces functions to reason over your data. |
| 4 | +They replace rules, which were the way to reason in TypeDB 2. |
| 5 | +Functions work in a similar way to rules, providing an intuitive abstraction over subqueries. They should be able to replace them in most cases. |
| 6 | +This page discusses the similarities, differences and factors to consider when moving from rules to functions. |
| 7 | + |
| 8 | +== Separating data & computation |
| 9 | +Whereas rules were a natural fit for predicate-logic based semantics of TypeDB 2, |
| 10 | +functions are better suited for TypeDB 3's type-theory based semantics. |
| 11 | +The major difference to the user is that there is now a separation between the data and reasoning constructs. |
| 12 | +Rules allowed the user to think reason purely in terms of the data-model by silently completing the data by inferring instances of existing types. |
| 13 | +Functions force the user to separate reasoning from the data-model, by explicitly creating a new function for any new computation that infers something new. |
| 14 | + |
| 15 | +Although this shift sounds significant, most advanced rules do require the user to think of the computation involved for efficiency reasons. |
| 16 | + |
| 17 | +== Similarities |
| 18 | +Although one would ideally think natively in functions (rather than thinking of them in terms of rules) it is easy to see the similarities between rules and functions. |
| 19 | + |
| 20 | +E.g. A rule completing a `reachable` relation transitively, can be seen as a function which computes all pairs of nodes where one is reachable from the other. |
| 21 | +Conversely, the function can be seen as special type of relation where the roles are implicitly specified by the position of arguments or returned concepts. |
| 22 | + |
| 23 | +[,typeql] |
| 24 | +---- |
| 25 | +define |
| 26 | +rule transitive-reachability: |
| 27 | +when { |
| 28 | + { |
| 29 | + (from: $from, to: $to) isa edge; |
| 30 | + } or { |
| 31 | + (from: $from, to: $via) isa edge; |
| 32 | + (from: $via, to: $to) isa reachable; |
| 33 | + }; |
| 34 | +} then { |
| 35 | + (from: $from, to: $to) isa reachable; |
| 36 | +}; |
| 37 | +
|
| 38 | +# query |
| 39 | +match |
| 40 | + $x isa node; $y isa node; |
| 41 | + (from: $x, to: $y) isa reachable; |
| 42 | +---- |
| 43 | + |
| 44 | +[,typeql] |
| 45 | +---- |
| 46 | +define |
| 47 | +fun reachable() -> { node, node }: |
| 48 | +match |
| 49 | + { |
| 50 | + (from: $from, to: $to) isa edge; |
| 51 | + } or { |
| 52 | + let $from, $via in reachable(); |
| 53 | + (from: $via, to: $to) isa edge; |
| 54 | + }; |
| 55 | +return { $from, $to }; |
| 56 | +
|
| 57 | +# query |
| 58 | +match |
| 59 | + $x isa node; $y isa node; |
| 60 | + let $x, $y in reachable(); |
| 61 | +---- |
| 62 | + |
| 63 | +An equally valid way is to see a function as a predicate computing whether one node is a reachable from the other. |
| 64 | +[,typeql] |
| 65 | +---- |
| 66 | +define |
| 67 | +fun reachable($from: node, $to: node) -> bool: |
| 68 | +match |
| 69 | + { |
| 70 | + (from: $from, to: $to) isa edge; |
| 71 | + } or { |
| 72 | + (from: $from, to: $via) isa edge; |
| 73 | + true == reachable($via, $to); |
| 74 | + }; |
| 75 | +return check; |
| 76 | +
|
| 77 | +# query |
| 78 | +match |
| 79 | + $x isa node; $y isa node; |
| 80 | + true == reachable($x, $y); |
| 81 | +---- |
| 82 | + |
| 83 | +== Arguments & return values |
| 84 | +From the example, there are multiple ways of expressing the same rule as a function. |
| 85 | +This is understandable given (mathematically) a relation is a collection of tuples, |
| 86 | +whereas a function is a mapping from the input domain to the output range. |
| 87 | +So there is some ambiguity as to which concepts are arguments and which are to be returned. |
| 88 | + |
| 89 | +The choice depends on what the query needs to compute. |
| 90 | +Does it need to enumerate the pairs of nodes for which one is reachable from the other? |
| 91 | +Or check whether a node is reachable from the other? |
| 92 | +Or even enumerate the nodes reachable from a given node, as the function below does? |
| 93 | + |
| 94 | +[,typeql] |
| 95 | +---- |
| 96 | +define |
| 97 | +fun reachable_from($from: node) -> { node }: |
| 98 | +match |
| 99 | + { |
| 100 | + (from: $from, to: $to) isa edge; |
| 101 | + } or { |
| 102 | + let $via in reachable_from($from); |
| 103 | + (from: $via, to: $to) isa edge; |
| 104 | + }; |
| 105 | +return { $to }; |
| 106 | +
|
| 107 | +# query |
| 108 | +match |
| 109 | + $x isa node, has id "123"; |
| 110 | + $y isa node; |
| 111 | + let $y in reachable_from($x); |
| 112 | +---- |
| 113 | + |
| 114 | +Currently, each of these use cases would need a different function to be defined. |
| 115 | +The arguments are inputs to the function and must be bound by the rest of the query. |
| 116 | +The function will bind the returned values to the variables on the left of `in`. |
| 117 | + |
| 118 | +=== Contextually bound functions |
| 119 | +Although the split between arguments and returned concepts is a reasonable consequence of |
| 120 | +having the user think about the computation explicitly, it is less declarative than rules - |
| 121 | +where the planner would infer which role-players of a relation should be bound when evaluating the rule. |
| 122 | +A future version of TypeDB will introduce "contextually bound functions" where the planner may choose which |
| 123 | +variables are input to the functions and which are output - allowing a single function definition |
| 124 | +to satisfy all the cases discussed above. |
| 125 | + |
| 126 | +== What functions can do |
| 127 | +Since functions aren't forced to return relations (or ownerships) which are defined in the schema, |
| 128 | +they are more flexible and can also operate on raw values, including structs (when they are implemented). |
| 129 | +Since they don't infer new instances, they also avoid the overhead |
| 130 | +of making the result of the computation abide by the rules governing the inferred types. |
| 131 | +In short, they're a really simple way of doing reasoning. |
| 132 | + |
| 133 | +Functions are still evaluated on demand in a goal-driven way. |
| 134 | +This means we don't materialize all the results of the function when the data is updated. |
| 135 | +Instead, we only compute the calls relevant to the query being evaluated. |
| 136 | +The advantage of a goal-driven approaches over an eagerly materializing one is that they support very large models |
| 137 | +(in theory infinitely large models - i.e. an infinite number of inferred concepts). |
| 138 | +A simple illustration is this toy function that produces natural numbers. |
| 139 | + |
| 140 | +[,typeql] |
| 141 | +---- |
| 142 | +with fn nat() -> { integer }: |
| 143 | +match |
| 144 | + {let $x = 0;} or |
| 145 | + { let $x in nat() + 1; }; |
| 146 | +return { $x }; |
| 147 | +
|
| 148 | +match let $x in nat(); |
| 149 | +---- |
| 150 | + |
| 151 | +Since TypeDB's grpc endpoint also supports reactive streaming, |
| 152 | +this query will stream out natural numbers on demand. |
| 153 | +This isn't quite an infinite domain, since we are limited to the domain of the `integer` type which is a signed 64-bit integer. |
| 154 | +Understandable, since this is the case with most programming languages - |
| 155 | +but it does bring us to the next section. |
| 156 | + |
| 157 | +=== What functions can't do (yet) |
| 158 | +TypeDB in its current state can't do "arbitrarily large" models. |
| 159 | + |
| 160 | +This limitation is unlikely to be practically relevant, |
| 161 | +since most uses of functions only need to compute tuples of concepts which exist in the database. |
| 162 | +Since this is a combination of finite sets, it is combinatorially large but still finite. |
| 163 | + |
| 164 | +The problem is still (theoretically) interesting for two reasons - (1) TypeDB 2 could do it; (2) A feature complete TypeDB 3 could too. |
| 165 | + |
| 166 | +==== How did TypeDB 2 do it? Nested relations. |
| 167 | +In theory, one can infer an arbitrary number of nested relations, |
| 168 | +because you can always infer a new relation which relates the one you just inferred. |
| 169 | + |
| 170 | +In particular this makes state-space search on "open" worlds possible. |
| 171 | +For games like the Rubik's cube, it is possible to assign each state a number (not necessarily one that fits in 64 bits). |
| 172 | +For open worlds such as an arbitrary block's world, this doesn't work since the number of "objects" of each type aren't fixed. |
| 173 | +You would instead represent a state as a relation involving the previous state and the action taken. |
| 174 | + |
| 175 | +==== How will TypeDB 3 do it? Lists. |
| 176 | +Lists can be arbitrarily long. A state can be represented by specifying the initial state, |
| 177 | +and the list of actions which brings us to this state. |
| 178 | + |
| 179 | + |
| 180 | +[NOTE] |
| 181 | +==== |
| 182 | +Modern computers have finite memory so none of this is truly infinite. |
| 183 | +The difference between the finiteness of TypeDB 3's model without and with lists |
| 184 | +is that the former is constrained by the limits of TypeQL's expressivity, |
| 185 | +while the latter is constrained by the limits of the underlying computer. |
| 186 | +==== |
| 187 | + |
| 188 | +[NOTE] |
| 189 | +==== |
| 190 | +This is based on ideas & results from the deductive databases field. |
| 191 | +A lot of research there is conducted on datalog - |
| 192 | +a "syntactic subset" of prolog that bans compound terms. |
| 193 | +Lists are central to prolog and are implemented as recursive compound terms. |
| 194 | +The exclusion of compound terms has implications on the decidability of datalog. |
| 195 | +==== |
0 commit comments