Skip to content

Commit 84e2954

Browse files
gregfeliceclaude
andauthored
Add MERGE ON CREATE SET / ON MATCH SET support (#2347)
- Add MERGE ON CREATE SET / ON MATCH SET support Implements the openCypher-standard ON CREATE SET and ON MATCH SET clauses for the MERGE statement. This allows conditional property updates depending on whether MERGE created a new path or matched an existing one: MERGE (n:Person {name: 'Alice'}) ON CREATE SET n.created = timestamp() ON MATCH SET n.updated = timestamp() Implementation spans parser, planner, and executor: - Grammar: new merge_actions_opt/merge_actions/merge_action rules in cypher_gram.y, with ON keyword added to cypher_kwlist.h - Nodes: on_match/on_create lists on cypher_merge, corresponding on_match_set_info/on_create_set_info on cypher_merge_information, and prop_expr on cypher_update_item (all serialized through copy/out/read funcs) - Transform: cypher_clause.c transforms ON SET items and stores prop_expr for direct expression evaluation - Executor: cypher_set.c extracts apply_update_list() from process_update_list(); cypher_merge.c calls it at all merge decision points (simple merge, terminal, non-terminal with eager buffering, and first-clause-with-followers paths) Key design choice: prop_expr stores the Expr* directly in cypher_update_item rather than using prop_position into the scan tuple. The planner strips target list entries for SET expressions that CustomScan doesn't need, making prop_position references dangling. By storing the expression directly (only for MERGE ON SET items), we evaluate it with ExecInitExpr/ExecEvalExpr independent of the scan tuple layout. Includes regression tests covering: basic ON CREATE SET, basic ON MATCH SET, combined ON CREATE + ON MATCH, multiple SET items, expression evaluation, interaction with WITH clause, and edge property updates. - Move ExecInitExpr for ON CREATE/MATCH SET items from per-row execution in apply_update_list() to plan initialization in begin_cypher_merge(). Follows the established pattern used by cypher_target_node (id_expr_state, prop_expr_state). - Add prop_expr_state field to cypher_update_item with serialization support in outfuncs/readfuncs/copyfuncs. - apply_update_list() uses pre-initialized state when available, falls back to per-row init for plain SET callers. - Fix misleading comment: "ON MATCH SET" → "ON CREATE SET" for Case 1 first-run test. - Add Case 1 second-run test that triggers ON MATCH SET with a predecessor clause (MATCH ... MERGE ... ON MATCH SET). - Add ON to safe_keywords in cypher_gram.y so that property keys and labels named 'on' still work (e.g., n.on, MATCH (n:on)). All other keywords added as tokens are also in safe_keywords. - Add chained (non-terminal) MERGE regression tests exercising the eager-buffering code path with ON CREATE SET and ON MATCH SET. First run creates both nodes (ON CREATE SET fires), second run matches both (ON MATCH SET fires). - Move ExecStoreVirtualTuple before apply_update_list unconditionally in Case 1 non-terminal and terminal MERGE paths, matching the pattern at Case 3 (line 994). Ensures tts_nvalid is set for downstream ExecProject even when ON CREATE SET is absent. - Add resolve_merge_set_exprs() helper to deduplicate the prop_expr resolution loops for ON MATCH SET and ON CREATE SET. Includes ereport when target entry is missing (internal error, should never happen). - Add regression test for ON keyword as label name, confirming backward compatibility via safe_keywords grammar path. - The four ExecStoreVirtualTuple calls in exec_cypher_merge were triggering an Assert failure under --enable-cassert: TRAP: failed Assert("TTS_EMPTY(slot)"), File: execTuples.c, Line: 1748 ExecStoreVirtualTuple (execTuples.c:1748) asserts that its target slot is in the TTS_EMPTY state. In our MERGE executor, process_path writes directly into the subquery's scan tuple slot -- which already holds the subquery's output tuple and therefore is NOT empty. On a release build the assertion compiles out and ExecStoreVirtualTuple just clears the flag and sets tts_nvalid; on an --enable-cassert build the backend aborts and takes down the regression run. We only need the bookkeeping half of ExecStoreVirtualTuple (clear TTS_FLAG_EMPTY and set tts_nvalid = natts) -- not the "store semantics" that motivate the assertion. Add a small static helper mark_scan_slot_valid() that does exactly the bookkeeping, and replace the four call sites. Release-build behavior is byte-identical since Assert() compiles to nothing; cassert-build behavior now matches release. - Fix MERGE ON CREATE/MATCH SET crash when RHS references a bound variable When MERGE has a previous clause (e.g. MATCH, UNWIND), transform_cypher_merge takes the lateral-left-join path via transform_merge_make_lateral_join. That helper called addRangeTableEntryForJoin with nscolumns=NULL, leaving the join ParseNamespaceItem's p_nscolumns unset. For queries that did not subsequently resolve a column reference against that nsitem (e.g. RETURN, which runs in a fresh namespace built by handle_prev_clause), the NULL was harmless. Our ON CREATE / ON MATCH SET transform runs in-line, before the MERGE query becomes a subquery, so transform_cypher_set_item_list consulted the join's nsitem directly. colNameToVar -> scanNSItemForColumn then dereferenced p_nscolumns[attnum-1] = NULL[0] and the backend segfaulted on any ON SET whose RHS referenced a bound variable. Populate the join's p_nscolumns from res_colvars. The Var we end up producing for a bound entity lives inside prop_expr, which is opaque to the planner, so it is not rewritten to match the plan's output slots. At ExecEvalScalarVar time only varattno is consulted, and scantuple's layout mirrors the join's eref->colnames (via make_target_list_from_join). Use the join rtindex and 1-based eref position so scantuple[varattno - 1] resolves to the correct entity column at runtime; without this, Vars for a (varno=l_rte) and b (varno=r_rte) with varattno=1 both hit scantuple[0] and b.id evaluated to a.id. Also initialise apply_update_list's new_property_value at its declaration. All control paths reach the single alter_property_value call with the variable set, but -Wmaybe-uninitialized fires at -O2 because the compiler cannot prove remove_property == isnull when prop_expr is non-NULL. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 6f520fe commit 84e2954

13 files changed

Lines changed: 788 additions & 28 deletions

File tree

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,3 +15,4 @@ __pycache__
1515
**/apache_age_python.egg-info
1616

1717
drivers/python/build
18+
*.bc

regress/expected/cypher_merge.out

Lines changed: 249 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2001,9 +2001,258 @@ SELECT * FROM cypher('issue_1954', $$ MATCH (n) DETACH DELETE n $$) AS (a agtype
20012001
---
20022002
(0 rows)
20032003

2004+
--
2005+
-- ON CREATE SET / ON MATCH SET tests (issue #1619)
2006+
--
2007+
SELECT create_graph('merge_actions');
2008+
NOTICE: graph "merge_actions" has been created
2009+
create_graph
2010+
--------------
2011+
2012+
(1 row)
2013+
2014+
-- Basic ON CREATE SET: first run creates the node
2015+
SELECT * FROM cypher('merge_actions', $$
2016+
MERGE (n:Person {name: 'Alice'})
2017+
ON CREATE SET n.created = true
2018+
RETURN n.name, n.created
2019+
$$) AS (name agtype, created agtype);
2020+
name | created
2021+
---------+---------
2022+
"Alice" | true
2023+
(1 row)
2024+
2025+
-- ON MATCH SET: second run matches the existing node
2026+
SELECT * FROM cypher('merge_actions', $$
2027+
MERGE (n:Person {name: 'Alice'})
2028+
ON MATCH SET n.found = true
2029+
RETURN n.name, n.created, n.found
2030+
$$) AS (name agtype, created agtype, found agtype);
2031+
name | created | found
2032+
---------+---------+-------
2033+
"Alice" | true | true
2034+
(1 row)
2035+
2036+
-- Both ON CREATE SET and ON MATCH SET (first run = create)
2037+
SELECT * FROM cypher('merge_actions', $$
2038+
MERGE (n:Person {name: 'Bob'})
2039+
ON CREATE SET n.created = true
2040+
ON MATCH SET n.matched = true
2041+
RETURN n.name, n.created, n.matched
2042+
$$) AS (name agtype, created agtype, matched agtype);
2043+
name | created | matched
2044+
-------+---------+---------
2045+
"Bob" | true |
2046+
(1 row)
2047+
2048+
-- Both ON CREATE SET and ON MATCH SET (second run = match)
2049+
SELECT * FROM cypher('merge_actions', $$
2050+
MERGE (n:Person {name: 'Bob'})
2051+
ON CREATE SET n.created = true
2052+
ON MATCH SET n.matched = true
2053+
RETURN n.name, n.created, n.matched
2054+
$$) AS (name agtype, created agtype, matched agtype);
2055+
name | created | matched
2056+
-------+---------+---------
2057+
"Bob" | true | true
2058+
(1 row)
2059+
2060+
-- ON CREATE SET with MERGE after MATCH (Case 1: has predecessor, first run = create)
2061+
SELECT * FROM cypher('merge_actions', $$
2062+
MATCH (a:Person {name: 'Alice'})
2063+
MERGE (a)-[:KNOWS]->(b:Person {name: 'Charlie'})
2064+
ON CREATE SET b.source = 'merge_create'
2065+
RETURN a.name, b.name, b.source
2066+
$$) AS (a agtype, b agtype, source agtype);
2067+
a | b | source
2068+
---------+-----------+----------------
2069+
"Alice" | "Charlie" | "merge_create"
2070+
(1 row)
2071+
2072+
-- ON MATCH SET with MERGE after MATCH (Case 1: has predecessor, second run = match)
2073+
SELECT * FROM cypher('merge_actions', $$
2074+
MATCH (a:Person {name: 'Alice'})
2075+
MERGE (a)-[:KNOWS]->(b:Person {name: 'Charlie'})
2076+
ON MATCH SET b.visited = true
2077+
RETURN a.name, b.name, b.visited
2078+
$$) AS (a agtype, b agtype, visited agtype);
2079+
a | b | visited
2080+
---------+-----------+---------
2081+
"Alice" | "Charlie" | true
2082+
(1 row)
2083+
2084+
-- Multiple SET items in a single ON CREATE SET
2085+
SELECT * FROM cypher('merge_actions', $$
2086+
MERGE (n:Person {name: 'Dave'})
2087+
ON CREATE SET n.a = 1, n.b = 2
2088+
RETURN n.name, n.a, n.b
2089+
$$) AS (name agtype, a agtype, b agtype);
2090+
name | a | b
2091+
--------+---+---
2092+
"Dave" | 1 | 2
2093+
(1 row)
2094+
2095+
-- Reverse order: ON MATCH before ON CREATE should work
2096+
SELECT * FROM cypher('merge_actions', $$
2097+
MERGE (n:Person {name: 'Eve'})
2098+
ON MATCH SET n.seen = true
2099+
ON CREATE SET n.new = true
2100+
RETURN n.name, n.new
2101+
$$) AS (name agtype, new agtype);
2102+
name | new
2103+
-------+------
2104+
"Eve" | true
2105+
(1 row)
2106+
2107+
-- Error: ON CREATE SET specified more than once
2108+
SELECT * FROM cypher('merge_actions', $$
2109+
MERGE (n:Person {name: 'Bad'})
2110+
ON CREATE SET n.a = 1
2111+
ON CREATE SET n.b = 2
2112+
RETURN n
2113+
$$) AS (n agtype);
2114+
ERROR: ON CREATE SET specified more than once
2115+
LINE 1: SELECT * FROM cypher('merge_actions', $$
2116+
^
2117+
-- Error: ON MATCH SET specified more than once
2118+
SELECT * FROM cypher('merge_actions', $$
2119+
MERGE (n:Person {name: 'Bad'})
2120+
ON MATCH SET n.a = 1
2121+
ON MATCH SET n.b = 2
2122+
RETURN n
2123+
$$) AS (n agtype);
2124+
ERROR: ON MATCH SET specified more than once
2125+
LINE 1: SELECT * FROM cypher('merge_actions', $$
2126+
^
2127+
-- Chained (non-terminal) MERGE with ON CREATE SET (eager-buffering path)
2128+
SELECT * FROM cypher('merge_actions', $$
2129+
MERGE (a:Person {name: 'Frank'})
2130+
ON CREATE SET a.created = true
2131+
MERGE (a)-[:KNOWS]->(b:Person {name: 'Grace'})
2132+
ON CREATE SET b.created = true
2133+
RETURN a.name, a.created, b.name, b.created
2134+
$$) AS (a_name agtype, a_created agtype, b_name agtype, b_created agtype);
2135+
a_name | a_created | b_name | b_created
2136+
---------+-----------+---------+-----------
2137+
"Frank" | true | "Grace" | true
2138+
(1 row)
2139+
2140+
-- Chained (non-terminal) MERGE with ON MATCH SET (second run = match)
2141+
SELECT * FROM cypher('merge_actions', $$
2142+
MERGE (a:Person {name: 'Frank'})
2143+
ON MATCH SET a.matched = true
2144+
MERGE (a)-[:KNOWS]->(b:Person {name: 'Grace'})
2145+
ON MATCH SET b.matched = true
2146+
RETURN a.name, a.matched, b.name, b.matched
2147+
$$) AS (a_name agtype, a_matched agtype, b_name agtype, b_matched agtype);
2148+
a_name | a_matched | b_name | b_matched
2149+
---------+-----------+---------+-----------
2150+
"Frank" | true | "Grace" | true
2151+
(1 row)
2152+
2153+
-- ON keyword as label name (backward compat via safe_keywords)
2154+
SELECT * FROM cypher('merge_actions', $$
2155+
CREATE (n:on {name: 'test'})
2156+
RETURN n.name
2157+
$$) AS (name agtype);
2158+
name
2159+
--------
2160+
"test"
2161+
(1 row)
2162+
2163+
-- Issue #2347: RHS of ON CREATE / ON MATCH SET referencing a bound
2164+
-- variable crashed the backend when MERGE had a previous clause, because
2165+
-- the lateral-join's ParseNamespaceItem had p_nscolumns=NULL.
2166+
-- ON CREATE SET with RHS referencing the outer MATCH's variable
2167+
SELECT * FROM cypher('merge_actions', $$ CREATE (:Person {name:'Anchor'}) $$) AS (a agtype);
2168+
a
2169+
---
2170+
(0 rows)
2171+
2172+
SELECT * FROM cypher('merge_actions', $$
2173+
MATCH (a:Person {name: 'Anchor'})
2174+
MERGE (b:Person {name: 'FromOuter'})
2175+
ON CREATE SET b.source_name = a.name
2176+
RETURN a.name, b.name, b.source_name
2177+
$$) AS (a_name agtype, b_name agtype, b_source agtype);
2178+
a_name | b_name | b_source
2179+
----------+-------------+----------
2180+
"Anchor" | "FromOuter" | "Anchor"
2181+
(1 row)
2182+
2183+
-- ON CREATE SET with RHS referencing the MERGE-bound variable itself
2184+
SELECT * FROM cypher('merge_actions', $$
2185+
MATCH (a:Person {name: 'Anchor'})
2186+
MERGE (b:Person {name: 'SelfRef'})
2187+
ON CREATE SET b.echo_name = b.name
2188+
RETURN b.name, b.echo_name
2189+
$$) AS (b_name agtype, b_echo agtype);
2190+
b_name | b_echo
2191+
-----------+-----------
2192+
"SelfRef" | "SelfRef"
2193+
(1 row)
2194+
2195+
-- ON CREATE SET driven by UNWIND with self-reference on the RHS
2196+
-- (Muhammad's second reproducer)
2197+
SELECT * FROM cypher('merge_actions', $$
2198+
UNWIND ['U1', 'U2'] AS nm
2199+
MERGE (n:Person {name: nm})
2200+
ON CREATE SET n.copy_name = n.name
2201+
RETURN n.name, n.copy_name
2202+
$$) AS (n_name agtype, n_copy agtype);
2203+
n_name | n_copy
2204+
--------+--------
2205+
"U1" | "U1"
2206+
"U2" | "U2"
2207+
(2 rows)
2208+
2209+
-- Multiple SET items mixing outer-ref, self-ref, and literal RHS
2210+
SELECT * FROM cypher('merge_actions', $$
2211+
MATCH (a:Person {name: 'Anchor'})
2212+
MERGE (b:Person {name: 'MultiItem'})
2213+
ON CREATE SET b.from_a = a.name, b.self = b.name, b.lit = 'literal'
2214+
RETURN b.from_a, b.self, b.lit
2215+
$$) AS (fa agtype, sf agtype, lit agtype);
2216+
fa | sf | lit
2217+
----------+-------------+-----------
2218+
"Anchor" | "MultiItem" | "literal"
2219+
(1 row)
2220+
2221+
-- ON MATCH SET with variable RHS (second run on existing node)
2222+
SELECT * FROM cypher('merge_actions', $$
2223+
MATCH (a:Person {name: 'Anchor'})
2224+
MERGE (b:Person {name: 'FromOuter'})
2225+
ON CREATE SET b.source_name = a.name
2226+
ON MATCH SET b.last_seen_by = a.name
2227+
RETURN b.source_name, b.last_seen_by
2228+
$$) AS (src agtype, last agtype);
2229+
src | last
2230+
----------+----------
2231+
"Anchor" | "Anchor"
2232+
(1 row)
2233+
2234+
-- cleanup
2235+
SELECT * FROM cypher('merge_actions', $$ MATCH (n) DETACH DELETE n $$) AS (a agtype);
2236+
a
2237+
---
2238+
(0 rows)
2239+
20042240
--
20052241
-- delete graphs
20062242
--
2243+
SELECT drop_graph('merge_actions', true);
2244+
NOTICE: drop cascades to 5 other objects
2245+
DETAIL: drop cascades to table merge_actions._ag_label_vertex
2246+
drop cascades to table merge_actions._ag_label_edge
2247+
drop cascades to table merge_actions."Person"
2248+
drop cascades to table merge_actions."KNOWS"
2249+
drop cascades to table merge_actions."on"
2250+
NOTICE: graph "merge_actions" has been dropped
2251+
drop_graph
2252+
------------
2253+
2254+
(1 row)
2255+
20072256
SELECT drop_graph('issue_1907', true);
20082257
NOTICE: drop cascades to 4 other objects
20092258
DETAIL: drop cascades to table issue_1907._ag_label_vertex

0 commit comments

Comments
 (0)