Whilst the query vertex tree functionality provided by the fabric-facts core module is powerful, it’s somewhat “low-level”, imperative and verbose to use. Therefore, this namespace provides a more userfriendly, succinct “high level” layer to construct query trees in a completely declarative way and exposes further functionality, like an expression language to define result filters, query variable projection, variable injections, aggregations, sorting etc.
The overall approach to define queries is based on the idea of matching patterns in the graph, usually by specifying a mixture of qvars (query variables) and fixed literals and/or other constraint expressions. The query spec DSL is highly inspired by the W3C SPARQL language, albeit expressed in a more Clojuresque way and not limited to RDF semantics.
Query specs are described as standard Clojure maps and therefore offer all the usual benefits including composability & EDN serialization. The latter therefore also allows query specs to be stored in the graphs as part of facts and then be able to define queries returning query specs (e.g. for UI templating/widgets), thus enabling a completely dynamic and datadriven approach to design applications/UIs.
The expression language part of the DSL is responsible to compile variable lookups and result processing expressions into actual Clojure functions. The language is extensible via its multi-method implementation. See section further below for details.
The only public part of this namespace consists of this single function:
(defn add-query-from-spec!
[g spec]
(-> (compile-query g spec)
(compile-query-result g spec)))
Dozens of examples can be found in the test namespace for this module.
For a general overview of query patterns and supported query types,
please also see the Queries section in the thi.ng.fabric.facts.core
namespace.
(dsl/add-query-from-spec!
graph
'{:q [{:where [[?res schema ?schema] [?res type ?type]]}
{:optional [[?res title ?title]]}]
:filter (or (= ?res toxi) (= ?type person))
:group-by [?type ?res]
:aggregate {?num (agg-count ?res)}
:order ?type})
Via the thi.ng.fabric.facts.queryviz namespace, such query specs can also be visualized (using Graphviz). The following diagram is a visual representation of the query above:
A query spec map can have the following toplevel keys. Only the =:q= key is required, all others are optional.
Key | Possible values | Default | Description |
---|---|---|---|
:q | Vector | none | Vector of sub-queries (see below) |
:aggregate | aggregation fn / DSL expr | none | Inject new qvars of aggregated values |
:bind | {?qvar fn} | none | Inject new qvars in result maps |
:filter | predicate fn / DSL expr | none | Keep only result maps for which fn returns truthy |
:group-by | fn, qvar, qvar vec, DSL expr | none | Grouping criteria for results, causes result map |
:limit | positive int | none | Limit results to max. N items |
:order | fn, qvar, qvar vec, DSL expr | none | Sort results by given key fn |
:select | :* or [?a ?b ..] | :* | Keep all or only given qvars in result maps |
:unique | true / false | false | Ensure all qvars in each result map have unique vals |
:values | {?qvar #{v1 v2} …} | none | Pre-bind qvar to set of possible values |
The DSL supports the following sub-query types. Each type (apart from
:path
) takes a number of query patterns and always computes their
joined result set. The sub-query types only differ in how each
sub-query’s results are merged with those of others. Each sub-query
can also specify a number of options to modify its result set. These
options are described in this section.
Standard join with other sub-query results.
;; match any two people knowing eachother
;; but only if we know both of their nicknames too
{:q [{:where [[?a knows ?b]
[?a nickname ?anick]
[?b nickname ?bnick]]}]}
Optional join with other sub-query results. Since on the RHS is the optional part in this join type, this type should not be used as first sub-query in a main spec.
Examples:
;; match people acquaintances
;; optionally match 1st person's nickname and/or age
{:q [{:where [[?a knows ?b]]}
{:optional [[?a nickname ?nick]]}
{:optional [[?a age ?age]]}]}
Important: Note the use of two separate :optional
sub-queries. If
both the optional ?nick
and ?age
where part of the same sub-query,
then only ?a
’s with both ?nick
and ?age
would be matched. By
keeping them separate, this query will also match ?a
’s with only
?nick
or ?age
values.
A query union simply merges its own result maps with that of other sub-queries. This query type too should not be used as first sub-query in a main spec.
Important: This type is only meant to merge distinct result sets without any shared qvars!
Examples:
;; return any "knowing" and friend relationships
{:q [{:where [[?a knows ?b]]}
{:union [[?e friend ?f]]}]}
;; remove matching triples from existing result set:
;; first match all "knows" relationships, but
;; remove any results where ?b is bill's friend
{:q [{:where [[?a knows ?b]]}
{:minus [[?b friend bill]]}]}
A sub-query of this type takes a single path query pattern and options defining the accepted min/max path lengths:
Examples:
;; match grandparent rels
{:q [{:path [?gp [parent-of parent-of] ?gc]}]}
;; match ancestor rels w/ path lens 2-4
{:q [{:path [?a [parent-of] ?b] :min 2 :max 4}]}
;; match friends of children (and optionally their friends)
{:q [{:path [?p [parent-of friend friend] ?f] :min 2 :max 3}]}
Unless noted otherwise, query options have the same semantics regardless if used at the top-level or for individual sub-queries. In the latter case, of course they only apply to a single sub-query.
Use of these modifiers can have a positive impact on down-stream joins and other result processing performance and memory requirements. However, be careful to not overconstrain intermediate results.
Furthermore, sub-queries with options are not re-usable by other queries. So in cases where there’re many similar queries, it is recommended to only apply top-level options, manipulating the final result set of a query.
TODO (see tests for various examples)
Inject (or overwrite) qvars in result maps (e.g. to format dates,
concatenate strings etc.). The value of this option must be a map of
{?qvar expression..}
;; find all persons and their first/surnames
;; inject new qvar ?name (string concat) into results
;; only return person & name
{:q [{:where [[?p firstname ?fn]
[?p surname ?sn]]
:bind {?name (str ?fn " " ?sn)}
:select [?p ?name]}]}
A single DSL expression or Clojure predicate fn applied to each single result (map). The result is only kept if the filter is truthy.
;; global & local filters
;; find procedures created by any of the 3 authors before given date
{:q [{:where [[?p type procedure] [?p date-created ?date]]}
{:optional [[?p author ?a]] :filter (in-set? ?a alice bob charlie)}]
:filter (< ?date #inst "2015-09-01")}
Limit the result set to N items.
;; find first underage person and (optionally his/her name)
{:q [{:where [[?a type person] [?a age ?age]] :filter (< ?age 21) :limit 1}
{:optional [[?a name ?n]]}]}
Select qvars to be projected into result set. This is also useful for large join queries w/ various intermediate qvars, not required to be retained outside the sub-query.
;; find books in price range w/ their authors and topics (via category)
;; don't include price and category in results
{:q [{:where [[?a author ?b]
[?b type book]
[?b price ?p]]
:filter (and (>= ?p 10) (<= ?p 20))
:select [?a ?b]}
{:where [[?b category ?cat]
[?cat includes-topic ?t]]
:select [?b ?t]}]}
The query engine ensures that - within a single query pattern - unique
qvars contain unique values (see fact-verifier description in core
namespace). However, when using query joins, unions, aggregations,
qvars injections etc. this behavior is not guaranteed, unless
explicitly enabled by setting this option key to true
. If enabled,
each result map’s qvars will have unique values (or the result map is
removed if not). For most cases, this can be ignored, but this feature
is essential for some queries, as shown below.
;; given these facts...
[a knows b]
[b knows a]
[b knows c]
[c knows b]
;; ...this query...
{:q [{:where [[?x knows ?y] [?y knows ?z]]}] :select [?x ?z]}
;; ...produces these results:
#{{?x b, ?z b}
{?x a, ?z a}
{?x a, ?z c} ;; unique
{?x c, ?z a} ;; unique
{?x c, ?z c}}
;; Without :unique flag this query produces results where ?x & ?z are
;; bound to same values, which is not what we want in this case!
;; Same query with :unique...
{:q [{:where [[?x knows ?y] [?y knows ?z]]}] :select [?x ?z] :unique true}
;; ...produces the correct results:
#{{?x a, ?z c} {?x c, ?z a}}
Pre-constrains qvars to a set of possible values. Unlike the :filter
option, which applies to results and allows a lot of flexibility
through the use of expression syntax, specifying :values
acts as
pre-filter and is applied to index selection vertices, preceeding
the query vertex in terms of data flow. Only literal values are
supported for this option. Also, queries specifying this option are
not re-usable, but in most circumstances are faster than
post-filtered queries, since the initial set of matching facts is
potentially much smaller.
;; query based on :filter example above
;; qvar ?a is pre-bound to values in given set
{:q [{:where [[?p type procedure] [?p date-created ?date]]}
{:optional [[?p author ?a]]}]
:values {?a #{alice bob charlie}}}
(defn compile-query-option
[expr] (if (fn? expr) expr (compile-expr expr)))
(defn compile-result-order
[order]
(if (fn? order)
order
(if (sequential? order)
(fn [r] (reduce #(conj % (get r %2)) [] order))
(fn [r] (get r order)))))
(defn compile-query-bindings
[binds]
(let [binds (reduce-kv
(fn [acc k v] (assoc acc k (compile-expr v)))
{} binds)]
(fn [res]
(let [res'(reduce-kv
(fn [acc k f]
(assoc acc k (f acc)))
res binds)]
(info :pre-bind res)
(info :post-bind res')
res'))))
(defn compile-result-aggregation
[agg]
(let [agg (reduce-kv
(fn [acc k v] (assoc acc k (compile-expr v)))
{} agg)]
(fn [results]
(reduce-kv
(fn [acc k afn]
(assoc acc k (afn results)))
{} agg))))
(defn compile-result-grouping
[group]
(if (fn? group)
group
(cond
(and (sequential? group) (every? ff/qvar? group))
(fn [r] (reduce #(conj % (get r %2)) [] group))
(ff/qvar? group)
(fn [r] (get r group))
:else
(compile-expr group))))
(defn sub-query-options
[opts spec]
(-> opts
(select-keys [:filter :limit :select :unique :bind :values])
(update :filter #(if % (compile-query-option %)))
(update :bind #(if % (compile-query-bindings %)))
(update :values #(or % (:values spec)))))
(defmulti compile-sub-query
(fn [g parent q spec] (some #{:where :optional :union :minus :path} (keys q))))
(defmethod compile-sub-query :where
[g parent q spec]
(let [pat (:where q)
opts (sub-query-options q spec)
q (if (< 1 (count pat))
(ff/add-query-join! g (:transform spec) pat opts)
(ff/add-param-query! g (:transform spec) (first pat) opts))]
(if parent
(ff/add-join! g parent q {})
q)))
(defmethod compile-sub-query :optional
[g parent q spec]
(let [pat (:optional q)
opts (sub-query-options q spec)
q (if (< 1 (count pat))
(ff/add-query-join! g (:transform spec) pat opts)
(ff/add-param-query! g (:transform spec) (first pat) opts))]
(if parent
(ff/add-join! g ff/join-optional parent q {})
q)))
(defmethod compile-sub-query :union
[g parent q spec]
(let [pat (:union q)
opts (sub-query-options q spec)
q (if (< 1 (count pat))
(ff/add-query-join! g (:transform spec) pat opts)
(ff/add-param-query! g (:transform spec) (first pat) opts))]
(if parent
(ff/add-query-union! g [parent q] {})
q)))
(defmethod compile-sub-query :minus
[g parent q spec]
(let [pat (:minus q)
opts (sub-query-options q spec)
q (if (< 1 (count pat))
(ff/add-query-join! g (:transform spec) pat opts)
(ff/add-param-query! g (:transform spec) (first pat) opts))]
(if parent
(ff/add-query-negation! g parent q {})
q)))
(defmethod compile-sub-query :path
[g parent q spec]
(let [opts (merge (select-keys q [:min :max]) (sub-query-options q spec))
q (ff/add-path-query! g (:transform spec) (:path q) opts)]
(if parent
(ff/add-join! g parent q {})
q)))
(defn compile-query
[g {:keys [q] :as spec}]
(let [spec (update spec :transform #(or % (ff/fact-transform g)))]
(reduce
(fn [acc sq] (compile-sub-query g acc sq spec))
(compile-sub-query g nil (first q) spec)
(rest q))))
(defn compile-query-result-spec
[spec]
(cond-> spec
(:bind spec) (update :bind compile-query-bindings)
(:filter spec) (update :filter compile-query-option)
(:order spec) (update :order compile-result-order)
(:group-by spec) (update :group-by compile-result-grouping)
(:aggregate spec) (-> (assoc :aggregate* (:aggregate spec))
(update :aggregate compile-result-aggregation))))
(defn compile-query-result
[result g spec]
(ff/add-query-result! g (compile-query-result-spec spec) result))
Expressions in the DSL are defined as standard S-Expressions and in
general follow the same rules as in Clojure/Lisp. However, due to the
context in which they’re used, there’re slightly different evaluation
semantics. Furthermore, the DSL does not differentiate between lists
or vectors, so it’s valid to use either (expr (expr))
or [expr
[expr]]
. Using S-Expressions allows users to serialize query specs as
EDN and the multi-method based implementation provides an easy-to-use
mechanism to extend the DSL with more functionality.
A simple query filter might state (< ?age 21)
(or [< ?age 21]
). As
in standard Clojure, this is a predicate producing a boolean result.
However, the DSL compiles this expression to this Clojure code:
(fn [res] (< ((fn [r] (r '?age)) res) ((fn [_] 21) res)))
This indirection is needed because the qvar ?age
is not bound to a
fixed value (remember, the result set of a query is a multi-set) and
for a :filter
expression this compiled function is applied to each
individual result map in the set:
(def results '#{{?age 23 ?name "alice"} {?age 18 ?name "bob"}})
(filter (compile-expr '(< ?age 21)) results)
;; ({?age 18, ?name "bob"})
The compile-expr
multi-method is used recursively to analyze and
compile a given DSL (sub)expression and (if possible) delegates to
matched implementations:
::const
- equivalent toidentity
(default catch-all)::qvar
- query variable lookup function::unary
- special overrides for single arity fns::binary
- special overrides for 2-arity fns::varargs
- fns with variable length arity
The last 3 dispatch types above can make use of basic type checking of
their arguments and will return nil
if any of the args violates the
type check. If type checking is disabled (e.g. see str
below), any
argument type can be used.
Comparisons have been implemented using c.c/compare
to allow not
just comparing of numbers, but also strings, dates etc. This is only
safe to use though, if it’s known that both arguments are of the same
type.
(def vararg-ops
{'+ [number? +]
'- [number? -]
'* [number? *]
'/ [number? /]
'= [nil =]
'not= [nil not=]
'str [nil str]})
(def unary-ops
{'not [nil not]
'int [number? int]
'float [number? double]
'abs [number? #(if (neg? %) (- %) %)]
'sqrt [number? #(Math/sqrt (double %))]
'exp [number? #(Math/exp (double %))]
'sin [number? #(Math/sin (double %))]
'asin [number? #(Math/asin (double %))]
'cos [number? #(Math/cos (double %))]
'acos [number? #(Math/acos (double %))]
'tan [number? #(Math/tan (double %))]
'atan [number? #(Math/atan (double %))]
'floor [number? #(long (Math/floor (double %)))]
'ceil [number? #(long (Math/ceil (double %)))]
'round [number? #(Math/round (double %))]})
(def binary-ops
{'< [nil nil #(neg? (compare % %2))]
'> [nil nil #(pos? (compare % %2))]
'<= [nil nil #(<= (compare % %2) 0)]
'>= [nil nil #(>= (compare % %2) 0)]
'pow [number? number? #(Math/pow (double %) (double %2))]
'atan2 [number? number? #(Math/atan2 (double %) (double %2))]
'logn [number? number? #(/ (Math/log (double %)) (Math/log (double %2)))]})
(defmulti compile-expr
(fn [expr]
(cond
(sequential? expr) (let [op (first expr)]
(cond
(vararg-ops op) ::varargs
(unary-ops op) ::unary
(binary-ops op) ::binary
:else op))
(ff/qvar? expr) ::qvar
:else ::const)))
(defmethod compile-expr ::const
[const] (fn [_] const))
(defmethod compile-expr ::qvar
[qvar] #(% qvar))
(defmethod compile-expr ::varargs
[[op & more]]
(let [[check op] (vararg-ops op)
args (mapv compile-expr more)]
(if check
(fn [res]
(let [args' (sequence (comp (map #(% res)) (filter identity)) args)]
(when (every? check args')
(apply op args'))))
(fn [res]
(apply op (sequence (comp (map #(% res)) (filter identity)) args))))))
(defmethod compile-expr ::unary
[[op x]]
(let [[check op] (unary-ops op)
x (compile-expr x)]
(if check
(fn [res] (let [x' (x res)] (when (check x') (op x'))))
(fn [res] (op (x res))))))
(defmethod compile-expr ::binary
[[op x y]]
(let [[checkx checky op] (binary-ops op)
x (compile-expr x)
y (compile-expr y)]
(cond
(and checkx checky) (fn [res]
(let [x' (x res) y' (y res)]
(when (and (checkx x') (checky y')) (op x' y'))))
checkx (fn [res]
(let [x' (x res) y' (y res)]
(when (checkx x') (op x' y'))))
checky (fn [res]
(let [x' (x res) y' (y res)]
(when (checky y') (op x' y'))))
:else (fn [res] (op (x res) (y res))))))
(defmethod compile-expr 'and
[[_ & more]]
(let [args (mapv compile-expr more)]
(fn [res] (every? #(% res) args))))
(defmethod compile-expr 'or
[[_ & more]]
(let [args (mapv compile-expr more)]
(fn [res] (some #(% res) args))))
(defmethod compile-expr 'match
[[_ re x]]
(let [re (if (regexp?* re) re (re-pattern re))
x (compile-expr x)]
(fn [res] (let [x' (x res)] (when (string? x') (re-find re x'))))))
(defmethod compile-expr 'in-set?
[[_ x & more]]
(let [x (compile-expr x)
choices (mapv compile-expr more)]
(fn [res] (let [x' (x res)] (some #(= (% res) x') choices)))))
These expressions are used to compute reductions and inject them as new qvars into the result set. Unlike the other expressions above, aggregation fns are provided the full result set (instead of just single results).
Important: For queries making use of :group-by
and :aggregate
,
the aggregation fns are applied to each group separately, thus only
receiving sub-sets of the full result set. A decision had to be made
which of the two options is processed earlier and :group-by
made
sense to take precedence in most use cases.
(defn aggregation-with
[op x]
(let [x (compile-expr x)
tx (comp (map x) (filter identity))]
(fn [results]
(when (seq results)
(transduce tx op results)))))
(def min* (fn ([] nil) ([x] x) ([x y] (if x (min x y) y))))
(def max* (fn ([] nil) ([x] x) ([x y] (if x (max x y) y))))
(defmethod compile-expr 'agg-sum
[[_ x]]
(aggregation-with + x))
(defmethod compile-expr 'agg-min
[[_ x]]
(aggregation-with min* x))
(defmethod compile-expr 'agg-max
[[_ x]]
(aggregation-with max* x))
(defmethod compile-expr 'agg-avg
[[_ x]]
(let [x (compile-expr x)
tx (comp (map x) (filter identity))]
(fn [results]
(let [res (sequence tx results)]
(when (seq res)
(double (/ (reduce + res) (count res))))))))
(defmethod compile-expr 'agg-mean
[[_ x]]
(let [x (compile-expr x)
tx (comp (map x) (filter identity))]
(fn [results]
(let [res (sequence tx results)]
(nth (sort res) (bit-shift-right (count res) 1) nil)))))
(defmethod compile-expr 'agg-collect
[[_ x]]
(let [x (compile-expr x)
tx (comp (map x) (filter identity))]
(fn [results] (into #{} tx results))))
(defmethod compile-expr 'agg-count
[[_ x]]
(if x
(let [collect (compile-expr ['agg-collect x])]
(fn [results] (count (collect results))))
(fn [results] (count results))))
(defmethod compile-expr 'group-bins-of
[[_ x n]]
(let [x (compile-expr x)]
(fn [res] (* (Math/floor (/ (x res) n)) n))))
(defn regexp?*
[x] #?(:clj (= java.util.regex.Pattern (type x)) :cljs (regexp? x)))
(ns thi.ng.fabric.facts.dsl
#?@(:clj
[(:require
[thi.ng.fabric.core :as f]
[thi.ng.fabric.facts.core :as ff]
[clojure.set :as set]
[clojure.core.async :as a :refer [go go-loop chan close! <! >! alts! timeout]]
[taoensso.timbre :refer [debug info warn]])]
:cljs
[(:require-macros
[cljs.core.async.macros :refer [go go-loop]]
[cljs-log.core :refer [debug info warn]])
(:require
[thi.ng.fabric.core :as f]
[thi.ng.fabric.facts.core :as ff]
[clojure.set :as set]
[cljs.core.async :refer [chan close! <! >! alts! timeout]])]))
<<helpers>>
<<expr>>
<<dsl>>
<<dsl-public>>