diff --git a/core-concepts/modules/ROOT/pages/typeql/index.adoc b/core-concepts/modules/ROOT/pages/typeql/index.adoc index 4927a6228..46a044ca4 100644 --- a/core-concepts/modules/ROOT/pages/typeql/index.adoc +++ b/core-concepts/modules/ROOT/pages/typeql/index.adoc @@ -66,4 +66,10 @@ TypeQL functions bring the familiar abstraction from programming languages to da **** Short definitions of terms used throughout the guide, with links to the section which introduces them. **** --- \ No newline at end of file + +.xref:{page-version}@core-concepts::typeql/invalid-patterns.adoc[] +[.clickable] +**** +A deeper look at invalid patterns involving disjoint variable reuse +**** +-- diff --git a/core-concepts/modules/ROOT/pages/typeql/invalid-patterns.adoc b/core-concepts/modules/ROOT/pages/typeql/invalid-patterns.adoc new file mode 100644 index 000000000..1b88db715 --- /dev/null +++ b/core-concepts/modules/ROOT/pages/typeql/invalid-patterns.adoc @@ -0,0 +1,198 @@ += Advanced: Invalid Patterns + +This section dives into the semantics of TypeQL, +using the <<_disjunctive_normal_form>> to explain why <<_disjoint_variable_reuse>> makes a pattern invalid. + + +== Disjoint variable reuse +This section demonstrates some of the cases which can arise due to +xref:new_core_concepts::typeql/query-variables-patterns.adoc#_disjoint_variable_usage_errors[disjoint variable reuse] + +[unbound negations] +=== Unbound negation inputs +Notice we can write a pattern where the input variable to a negation is not bound in the parent. +For example: +[,typeql] +---- +match +$p1 isa person; $p2 isa person; +{ + $emp1 isa employment, links (employer: $company, employee: $p1); +} or { + $edu1 isa education, links (institute: $institute, attendee: $p2); +}; + +not { $emp2 isa employment, links (employer: $company, employee: $p2); }; +not { $edu2 isa employment, links (institute: $institute, attendee: $p2); }; +---- +At first glance, this looks like a reasonable query: +we query for persons `$p1` and `$p2` who neither worked for the same company, +nor attended the same institute. +However, you can see that the input variables for the negations (`$company` and `$institute`) +are not local to the negation, but also not bound in the parent conjunction. + +==== Disjunctive Normal Form +The best way to think about these requirements is to convert the query to Disjunctive Normal Form +by rewriting the pattern using "distributivity" and examining each branch: + +`A; { B; } or { C; };` becomes `{A; B} or {A; C};` + +In this case, we get the pattern: +[,typeql] +---- +match +{ + $p1 isa person; $p2 isa person; + $emp1 isa employment, links (employer: $company, employee: $p1); + not { $emp2 isa employment, links (employer: $company, employee: $p2); }; + not { $edu2 isa employment, links (institute: $institute, attendee: $p2); }; +} or { + $p1 isa person; $p2 isa person; + $edu1 isa education, links (institute: $institute, attendee: $p2); + not { $emp2 isa employment, links (employer: $company, employee: $p2); }; + not { $edu2 isa education, links (institute: $institute, attendee: $p2); }; +}; +---- +Although this could now be a valid logic query, +the first branch requires that `$p2` did not attend *any* institute, +and the second branch requires that `$p2` was not employed by *any* employer. +This is clearly not what we intended to write. Hence, we flag these as invalid TypeQL queries. + +[NOTE] +==== +There is, of course, a way to express the intended query: +[,typeql] +.The correct query +---- +$p1 isa person; $p2 isa person; +not { + $emp1 isa employment, links (employer: $company, employee: $p1); + $emp2 isa employment, links (employer: $company, employee: $p2); +}; +not { + $edu1 isa education, links (institute: $institute, attendee: $p2); + $edu2 isa education, links (institute: $institute, attendee: $p2); +}; +---- +==== + +=== Reusing branch local variables in disjunctions + +Consider another case of questionable query composition: +[,typeql] +---- +match +$p1 isa person; $p2 isa person; +{ + $emp1 isa employment, links (employer: $company, employee: $p1); +} or { + $edu1 isa education, links (institute: $institute, attendee: $p2); +}; +{ + $emp2 isa employment, links (employer: $company, employee: $p2); +} or { + $edu2 isa education, links (institute: $institute, attendee: $p2); +}; +---- +Ideally, this would be a query to find two persons `$p1` and `$p2` who +were either employed by the same company, or attended the same institute. + +The DNF quickly reveals the mistake: +[,typeql] +---- +match +{ + $p1 isa person; $p2 isa person; + $emp1 isa employment, links (employer: $company, employee: $p1); + $emp2 isa employment, links (employer: $company, employee: $p2); +} or { + $p1 isa person; $p2 isa person; + $edu1 isa education, links (institute: $institute, attendee: $p2); + $emp2 isa employment, links (employer: $company, employee: $p2); +} or +{ + $p1 isa person; $p2 isa person; + $emp1 isa employment, links (employer: $company, employee: $p1); + $edu2 isa education, links (institute: $institute, attendee: $p2); +} or { + $p1 isa person; $p2 isa person; + $edu1 isa education, links (institute: $institute, attendee: $p2); + $edu2 isa education, links (institute: $institute, attendee: $p2); +}; +---- + +You can see the query we meant to write in two of those branches: +[,typeql] +---- +match +$p1 isa person; $p2 isa person; +{ + $emp1 isa employment, links (employer: $company, employee: $p1); + $emp2 isa employment, links (employer: $company, employee: $p2); +} or { + $edu1 isa education, links (institute: $institute, attendee: $p2); + $edu2 isa education, links (institute: $institute, attendee: $p2); +}; +---- + +The problem lies in the other two branches. +[,typeql] +---- +match +{ + $p1 isa person; $p2 isa person; + $emp1 isa employment, links (employer: $company, employee: $p1); + $edu2 isa education, links (institute: $institute, attendee: $p2); +} or { + $p1 isa person; $p2 isa person; + $edu1 isa education, links (institute: $institute, attendee: $p2); + $emp2 isa employment, links (employer: $company, employee: $p2); +}; +---- +This will return any persons `$p1` & `$p2` when +either (1) `$p1` is employed by *any* and `$p2` attended *any* institute; +or (2) `$p2` is employed by *any* company and `$p1` attended *any* institute. + +Notice `$company` is "internal" to both the first and second disjunctions (The same is the case for `$institute`). +TypeQL throws a "disjoint variable re-use" error for such cases. + +=== Single origin for optionals +Optional variables can only be used in only one `try` block in a given match clause. Consider: +[,typeql] +---- +match + try { $x isa person, has name "John"; }; + try { $x isa person, has name "James"; }; +---- +The reason this is banned is similar to the <<_unbound_negation_inputs>> case above: +It is unclear if `$x` is an input or either of the `try` blocks, or an optional variable. + +The same variable can be used in `try` blocks in the next stage. +This is because the variable is bound - either to a concept, or to `None` - and is an 'input' to the block. +The block simply fails if it tries to use a variable bound to `None` in a constraint. + +[NOTE] +==== +The semantics of try blocks dictate the equivalence of + +`try { P };` and `{ P } or { not { P' }; };` + +where, `P'` is the pattern obtained by renaming all the 'optional' variables in `P` with fresh ones. +Rewriting the above query with this equivalence (assuming `$x` to be 'optional' and renaming it, although this is ambiguous) +[,tyepql] +---- +{ $x "John"; } or not { $x_1 "John"; }; +{ $x "James"; } or not { $x_2 "James"; }; +---- + +Taking the DNF gives us 4 cases, and some very unintuitive behaviour: +[,typeql] +---- +{ $x "John"; $x "James"; } or # persons named both James and John. +{ $x "John"; not { $x_2 "James"; }; } or # If nobody is named James, then persons named John +{ not { $x_1 "John"; }; $x "James"; } or # If nobody is named John, then persons named James +{ not { $x_1 "John"; }; not { $x_2 "James"; }; }; # If nobody is named either, then a single answer `None` +# If there is one person named John, and a different person named James, then no answers +---- + +==== diff --git a/core-concepts/modules/ROOT/pages/typeql/query-variables-patterns.adoc b/core-concepts/modules/ROOT/pages/typeql/query-variables-patterns.adoc index dc1c3f42b..6eae54bd3 100644 --- a/core-concepts/modules/ROOT/pages/typeql/query-variables-patterns.adoc +++ b/core-concepts/modules/ROOT/pages/typeql/query-variables-patterns.adoc @@ -329,156 +329,20 @@ Notice that a variable that is "bound" in a conjunction is guaranteed to be boun in ALL execution paths - i.e., regardless of which branch is taken in each disjunction. An answer to a pattern contains only those variables bound in the root conjunction. -// TODO: This could be its own page -=== Invalid patterns: unbound negation inputs -Notice we can write a pattern where the input variable to a negation is not bound in the parent. -For example: +=== Disjoint variable usage errors +Since variable names are scoped to a pipeline, it's possible to write a query in which +a certain variable is internal to several different patterns. For example: [,typeql] ---- match -$p1 isa person; $p2 isa person; -{ - $emp1 isa employment, links (employer: $company, employee: $p1); -} or { - $edu1 isa education, links (institute: $institute, attendee: $p2); -}; - -not { $emp2 isa employment, links (employer: $company, employee: $p2); }; -not { $edu2 isa employment, links (institute: $institute, attendee: $p2); }; ----- -At first glance, this looks like a reasonable query: -we query for persons `$p1` and `$p2` who neither worked for the same company, -nor attended the same institute. -However, you can see that the input variables for the negations (`$company` and `$institute`) -are not bound in the parent conjunction. Hence, this is an invalid query. - -==== DNF -The best way to think about these requirements is to convert the query to Disjunctive Normal Form -by rewriting the pattern using "distributivity" and examining each branch: - -`A; { B; } or { C; };` becomes `{A; B} or {A; C};` - -In this case, we get the pattern: -[,typeql] ----- -match -{ - $p1 isa person; $p2 isa person; - $emp1 isa employment, links (employer: $company, employee: $p1); - not { $emp2 isa employment, links (employer: $company, employee: $p2); }; - not { $edu2 isa employment, links (institute: $institute, attendee: $p2); }; -} or { - $p1 isa person; $p2 isa person; - $edu1 isa education, links (institute: $institute, attendee: $p2); - not { $emp2 isa employment, links (employer: $company, employee: $p2); }; - not { $edu2 isa education, links (institute: $institute, attendee: $p2); }; -}; ----- -Although this could now be a valid logic query, -the first branch requires that `$p2` did not attend *any* institute, -and the second branch requires that `$p2` was not employed by *any* employer. -This is clearly not what we meant. Hence, we flag these as invalid TypeQL queries. - -[NOTE] -==== -There is, of course, a way to express the intended query: -[,typeql] -.The correct query ----- -$p1 isa person; $p2 isa person; -not { - $emp1 isa employment, links (employer: $company, employee: $p1); - $emp2 isa employment, links (employer: $company, employee: $p2); -}; -not { - $edu1 isa education, links (institute: $institute, attendee: $p2); - $edu2 isa education, links (institute: $institute, attendee: $p2); -}; ----- -==== - -=== Invalid patterns: Disjoint variable re-use - -Consider another case of questionable query composition: -[,typeql] +{ $x isa person, has name "John"; } or { $y isa person, has name "James"; }; +{ $x isa person, has age 25; } or { $y isa person, has age 50; }; ---- -match -$p1 isa person; $p2 isa person; -{ - $emp1 isa employment, links (employer: $company, employee: $p1); -} or { - $edu1 isa education, links (institute: $institute, attendee: $p2); -}; -{ - $emp2 isa employment, links (employer: $company, employee: $p2); -} or { - $edu2 isa education, links (institute: $institute, attendee: $p2); -}; ----- -Ideally, this would be a query to find two persons `$p1` and `$p2` who -were either employed by the same company, or attended the same institute. - -The DNF quickly reveals the mistake: -[,typeql] ----- -match -{ - $p1 isa person; $p2 isa person; - $emp1 isa employment, links (employer: $company, employee: $p1); - $emp2 isa employment, links (employer: $company, employee: $p2); -} or { - $p1 isa person; $p2 isa person; - $edu1 isa education, links (institute: $institute, attendee: $p2); - $emp2 isa employment, links (employer: $company, employee: $p2); -} or -{ - $p1 isa person; $p2 isa person; - $emp1 isa employment, links (employer: $company, employee: $p1); - $edu2 isa education, links (institute: $institute, attendee: $p2); -} or { - $p1 isa person; $p2 isa person; - $edu1 isa education, links (institute: $institute, attendee: $p2); - $edu2 isa education, links (institute: $institute, attendee: $p2); -}; ----- - -You can see the query we meant to write in two of those branches: -[,typeql] ----- -match -$p1 isa person; $p2 isa person; -{ - $emp1 isa employment, links (employer: $company, employee: $p1); - $emp2 isa employment, links (employer: $company, employee: $p2); -} or { - $edu1 isa education, links (institute: $institute, attendee: $p2); - $edu2 isa education, links (institute: $institute, attendee: $p2); -}; ----- - -The problem lies in the other two branches. -[,typeql] ----- -match -{ - $p1 isa person; $p2 isa person; - $emp1 isa employment, links (employer: $company, employee: $p1); - $edu2 isa education, links (institute: $institute, attendee: $p2); -} or { - $p1 isa person; $p2 isa person; - $edu1 isa education, links (institute: $institute, attendee: $p2); - $emp2 isa employment, links (employer: $company, employee: $p2); -}; ----- -This will return any persons `$p1` & `$p2` when -either (1) `$p1` is employed by *any* and `$p2` attended *any* institute; -or (2) `$p2` is employed by *any* company and `$p1` attended *any* institute. - -Notice `$company` is "internal" to both the first and second disjunctions (The same is the case for `$institute`). -TypeQL throws a "disjoint variable re-use" error for such cases. +Such patterns are considered invalid and will throw disjoint variable usage errors. +See the xref:core-concepts::typeql/invalid-patterns.adoc[invalid patterns] page for more examples. ==== Valid variable scopes: -From these, it can be seen that a variable is either: +Rephrasing the xref:reference::typeql/patterns/index.adoc#_scopes[unique scope principle], a variable is either: 1. Internal to one branch of one disjunction. 2. Internal to one negation. diff --git a/core-concepts/modules/ROOT/partials/nav.adoc b/core-concepts/modules/ROOT/partials/nav.adoc index f183bf1c3..edf64879d 100644 --- a/core-concepts/modules/ROOT/partials/nav.adoc +++ b/core-concepts/modules/ROOT/partials/nav.adoc @@ -20,6 +20,7 @@ ** xref:{page-version}@core-concepts::typeql/query-variables-patterns.adoc[] ** xref:{page-version}@core-concepts::typeql/query-clauses.adoc[] ** xref:{page-version}@core-concepts::typeql/queries-as-functions.adoc[] +** xref:{page-version}@core-concepts::typeql/invalid-patterns.adoc[] // ** xref:{page-version}@core-concepts::typeql/best-practices.adoc[] * xref:{page-version}@core-concepts::drivers/index.adoc[]