ci: add graphify structural impact analysis to PR review and structure audit by andreatgretel · Pull Request #567 · NVIDIA-NeMo/DataDesigner

andreatgretel · 2026-04-21T20:42:23Z

📋 Summary

Add a graphify-based AST analysis tool that builds a
directed graph of the codebase (~6s, zero LLM cost) to surface architectural risk before the
review agent starts. Integrates into both the PR review workflow (pre-computed context for the
agent) and the Wednesday structure audit (week-over-week graph diff).

Hat tip to @nabinchha for suggesting graphify as the right tool for this.

🔗 Related Issue

Closes #472 (Phase 4 — structural analysis)

🔄 Changes

✨ Added

structural_impact.py —
standalone analysis tool with two modes:
- --changed-files (PR review): extracts changed AST entities against the full codebase graph,
  reports risk level (LOW/MEDIUM/HIGH), god nodes affected, import direction violations, and
  cross-package dependencies
- --full (structure audit): god node rankings, cross-package edge summary table, import
  violation detection, and graph diff against the previous week's cached graph
Step 3.5 in the
review-code skill —
agent reads the pre-computed structural impact to calibrate review depth (HIGH risk = extra
scrutiny on blast radius and backward compatibility)
graphify-out/ added to .gitignore (9MB graph cache is CI-only, persisted via actions/cache)

🔧 Changed

agentic-ci-pr-review.yml —
new step installs graphifyy in a lightweight temp venv and runs structural analysis before the
agent starts. Tool is read from the base-branch checkout (base-agents/) so fork PRs cannot
inject code that runs with GH_TOKEN in scope. Base-branch checkout expanded from
.agents/recipes to also include .agents/tools
agentic-ci-daily.yml —
conditional graphifyy install for the structure suite; graphify-out/ added to the cache path
for week-over-week graph diffs
pr-review recipe —
agent reads /tmp/structural-impact-{{pr_number}}.md and appends findings to the review
structure recipe —
new Section 4 with graphify instructions, baseline tracking, and week-over-week diff guidance

🔍 Attention Areas

⚠️ Reviewers: Please pay special attention to the following:

agentic-ci-pr-review.yml —
the security model: structural_impact.py runs from base-agents/ (base-branch sparse
checkout), not from the PR's files, to prevent fork PRs from injecting code that executes
with secrets in scope. The --repo-root flag ensures the tool still analyzes the PR's
source code correctly despite living in a different checkout
structural_impact.py —
new 361-line tool; pure AST analysis, no DD imports at runtime. Depends on
graphifyy (tree-sitter based) and networkx (transitive)

📊 Retroactive validation on 3 merged PRs

Ran graphify retroactively against PRs #502, #545, and #557 to validate the signal quality:

PR	Graphify risk	Key signal	Actual critical findings
#502 (skip.when, 22 files)	HIGH — 5 god nodes, 9 violations, 12/73 clusters	`DatasetBuilder` (88 deps) + `ExecutionGraph` (89 deps) flagged as highest-risk	allow_resize propagation bug and positional merge divergence — both in the flagged entities
#557 (jinja engine, 20 files)	HIGH — config→engine violation, 52-dep env class	`config.__getattr__()` calls into engine	jsonpath filter bug at exactly that cross-package seam
#545 (async bridge, 6 files)	MEDIUM — 9 clusters for 6 files	Error hierarchy connectivity (52 deps)	Fragile error string matching — in the flagged error class

In all 3 cases, the most critical review findings correlated with the highest-connectivity
entities in the graphify report. The risk level (LOW/MEDIUM/HIGH) matched actual review
effort needed.

🧪 Testing

structural_impact.py smoke-tested in both modes (--changed-files, --full)
against the live DD codebase
--repo-root flag tested from external checkout directory (simulates CI base-agents/ path)
graphify API signatures verified against graphifyy==0.4.23
Pre-commit hooks pass (ruff, yaml, merge conflict check)
Unit tests added/updated — N/A: CI tool, no testable library code
E2E tests added/updated — validated retroactively against 3 merged PRs (see above)

✅ Checklist

Follows commit message conventions
Commits are signed off (DCO)
Architecture docs updated — N/A: .agents/ tooling, not package architecture

…e audit Add a graphify-based AST analysis tool that builds a directed graph of the codebase (~2s, no LLM calls) to detect architectural impact. Integrates into both the PR review workflow (pre-computed before claude runs) and the Wednesday structure audit (with week-over-week diff). PR review: extracts changed files against the full codebase graph, reports risk level (LOW/MEDIUM/HIGH), god nodes affected, import direction violations, and cross-package dependencies. Output saved to /tmp and read by the review agent. Structure audit: produces god node rankings, cross-package edge summary table, import violation detection, and graph diff against previous week's cached graph. Baselines saved for runner memory trend tracking.

- Fix KeyError: god_nodes() returns 'degree' not 'edges' (3 call sites) - Fix deduped vs raw violation count inconsistency in baselines.json - Security: run structural_impact.py from base-branch checkout so fork PRs cannot inject code that executes with GH_TOKEN in scope - Add --repo-root flag so the tool resolves package paths correctly when invoked from a different checkout directory - Replace make install-dev + .venv with lightweight /tmp/graphify-venv (only graphifyy needed, saves ~2min CI per PR review) - Add graphify-out/ to .gitignore (9MB graph cache is CI-only)

github-actions · 2026-04-21T20:44:11Z

Summary

PR #567 adds a graphify-based AST structural-impact tool and wires it into two CI workflows:

agentic-ci-pr-review.yml: a pre-review step installs graphifyy into a temp venv, runs the new .agents/tools/structural_impact.py against changed Python files, and writes /tmp/structural-impact-{PR}.md for the review agent to consume.
agentic-ci-daily.yml (structure suite): conditionally installs graphify and caches graphify-out/ for week-over-week graph diffs.
New tool (.agents/tools/structural_impact.py, 361 LOC): builds a directed AST graph of the three DD packages, flags god nodes, import-direction violations (interface → engine → config), cross-package edges, and emits risk levels (LOW/MEDIUM/HIGH).
Recipe/skill updates: pr-review recipe reads the pre-computed report and appends it to the review; structure recipe adds a Section 4 with baseline tracking; review-code skill adds Step 3.5 describing how to calibrate review depth based on risk.

Retroactive validation against 3 merged PRs is cited in the description; signal correlation with actual critical findings looks plausible.

Findings

Security — correctly handled, one residual concern

The critical security story (tool must run from trusted base-branch code, not PR code) is executed well:

Sparse checkout expanded from .agents/recipes to also include .agents/tools; the script is invoked from base-agents/.agents/tools/structural_impact.py, not from the PR checkout.
--repo-root "${{ github.workspace }}" cleanly separates trusted tool location from untrusted analysis target.
continue-on-error: true on the structural step means a tool regression or graphify pin drift will not block PR review.

One residual concern in agentic-ci-pr-review.yml:

CHANGED_PY=$(gh pr diff "$PR_NUMBER" --name-only | grep '\.py$' || true)
...
--changed-files $CHANGED_PY \

$CHANGED_PY is deliberately unquoted to word-split into multiple args, but that also subjects the values to IFS word-splitting and glob expansion. A PR introducing a .py file whose name contains whitespace or glob characters would produce incorrect arg boundaries. Git allows such filenames. Low risk (the tool treats each path as a file and filters by Path.exists()/.suffix), but consider reading into an array:

mapfile -t CHANGED_PY < <(gh pr diff "$PR_NUMBER" --name-only | grep '\.py$' || true)
...
--changed-files "${CHANGED_PY[@]}"

This also fixes a latent bug: if CHANGED_PY is only newlines (no matches), the unquoted expansion produces zero args and nargs="+" will error — but the early if [ -z "$CHANGED_PY" ] guard covers that case today.

Correctness

_dedup key truncation (structural_impact.py:60): dedup key uses label[:30] on each side. Two distinct entities that share the first 30 characters of their label would be merged. Probably rare in practice but not obviously intended — prefer the full label, or document the truncation as intentional.
pip install ... | tail -3 (both workflows): piping suppresses the pip exit code under the default set +o pipefail. A failed install would proceed to run structural_impact.py, which would then ImportError on graphify. continue-on-error catches this, but the log would be confusing. Either set pipefail for these steps or drop | tail -3.
Global _REPO_ROOT mutation (structural_impact.py:21, 308): the global _REPO_ROOT pattern works but is fragile — _PACKAGE_SUBDIRS is resolved at import time against the default, while _collect_source_files reads _REPO_ROOT at call time. If anyone later changes _PACKAGE_SUBDIRS to absolute paths, the override breaks silently. Pass repo_root explicitly through the call chain, or evaluate subdirs inside the function.
_get_package interface detection (structural_impact.py:53-57): matches on the literal directory segment data-designer followed by src. This correctly skips data-designer-engine / data-designer-config because the earlier substring checks win, but the fallthrough logic is subtle. A brief comment explaining the ordering dependency would save a future reader.
NetworkX API compatibility (structural_impact.py:273-277): the try: node_link_graph(..., edges="links") / except TypeError: node_link_graph(...) fallback covers a real API change across NetworkX versions. Good defensive coding; worth a one-line comment referencing the NetworkX version boundary so it isn't mistaken for dead code later.

Style / project conventions

Missing type annotations: project requires full annotations (AGENTS.md). _build_graph, _cross_package_edges, _changed_files_mode, _full_mode, main are missing return types on some paths; _build_graph returns dict which hides its structure — consider a TypedDict or a small @dataclass. The returned G is a NetworkX DiGraph; typing it as dict defeats downstream type checking.
from __future__ import annotations: present at top of file. Good.
Absolute imports only: satisfied.
SPDX header: present.
Lazy heavy imports (AGENTS.md "Fast imports" invariant): graphify.* and networkx are imported at module top-level. For a standalone CI tool invoked directly this is fine — the invariant is about keeping data_designer import time low, and this file is in .agents/tools/, not in the package. Worth confirming the tool isn't auto-imported from anywhere in data_designer.

Test coverage

The checklist marks unit tests as "N/A: CI tool, no testable library code." This is partially true but several pure helpers are clearly testable and carry real logic: _get_package, _dedup, the risk-level decision tree in _changed_files_mode, and the direction-violation classifier in _cross_package_edges. A small test module (even a single pytest file in .agents/tools/tests/) would protect against silent regressions — especially since graphify is pinned loosely (graphifyy, no version) and API drift is the most likely failure mode.
Retroactive validation on 3 merged PRs is good signal-quality evidence, but it isn't a regression guard.

Performance

~6s per PR and ~2s for full audit (per description) against a small codebase is acceptable.
The full graph is built even for PR review (only changed files would not be enough to compute cross-package context). That is the correct trade-off.
graphify-out/ at ~9MB adds to the cache footprint but only for the structure suite. Fine.

Minor / nits

agentic-ci-pr-review.yml:166: pip install graphifyy --quiet with no version pin. Recommend pinning (e.g., graphifyy==0.4.23, the version the PR says was tested against) to match the existing graphify API signatures verified against graphifyy==0.4.23 claim in the description. Same for the daily workflow.
structural_impact.py:9-12: docstring usage examples are helpful; consider adding --full --previous-graph ... example too.
_fmt_gods mixes bullet-list output (affected) and table output (full) in one function via a flag — two small formatters would read more cleanly, but this is purely cosmetic.
Recipe change in pr-review/recipe.md asks the agent to append structural impact before the Verdict section. The skill's Step 3.5 also tells the agent to use it to calibrate review depth. These two instructions are compatible but slightly duplicative — confirm the agent applies both (use during review, then append the raw report). Not a blocker.

Verdict

Approve with minor follow-ups. The core design — trusted-path execution of a pure-AST tool with --repo-root pointing at the untrusted checkout — is sound, the fallback story (continue-on-error) is correct, and the integration into both review and audit surfaces is clean.

Recommended before merge:

Quote/array-expand $CHANGED_PY in the workflow (latent filename-handling bug).
Pin graphifyy version in both pip install steps.
Either set pipefail or drop | tail -3 around pip install so install failures surface cleanly.

Recommended after merge:
4. Add a minimal pytest for _get_package, _dedup, and the risk-level branches in _changed_files_mode — graphify API drift is the most likely future break.
5. Replace the dict return type of _build_graph with a TypedDict/dataclass for proper typing.

No blocking issues; security model is correct; retroactive validation supports the signal claim.

greptile-apps · 2026-04-21T20:45:11Z

Greptile Summary

This PR integrates a graphify-based AST analysis tool (structural_impact.py) into the PR review and weekly structure audit workflows. It adds a pre-computed structural impact report (risk level, god nodes, import violations, cross-package deps) that the review agent reads before starting, and extends the daily structure suite with god node tracking and week-over-week graph diffs. All bugs flagged in previous review rounds (dedup truncation, risk reason accuracy, changed_node_ids derivation from the already-built graph) have been resolved.

Confidence Score: 5/5

Safe to merge — all previously flagged bugs have been resolved and the remaining logic is correct.

No P0 or P1 findings remain. The security model (tool executed from base-branch checkout), path matching, risk classification logic, and dedup/rank computations all look correct. The continue-on-error guard on the analysis step prevents CI failures from blocking reviews if the tool misbehaves.

No files require special attention.

Important Files Changed

Filename	Overview
.agents/tools/structural_impact.py	New 411-line AST graph analysis tool; all previously flagged bugs (dedup truncation, risk reason, changed_node_ids derivation) addressed in prior commits; remaining code logic is correct.
.github/workflows/agentic-ci-pr-review.yml	Adds structural impact analysis step; security model correct (tool read from base-branch checkout); graphifyy pinned to 0.4.23; continue-on-error guards against analysis failures; path handling and mapfile usage correct.
.github/workflows/agentic-ci-daily.yml	Adds conditional graphifyy install for structure suite and extends cache path to include graphify-out/ for week-over-week diffs; changes are minimal and correct.
.agents/recipes/pr-review/recipe.md	Adds step to read pre-computed structural impact file and append it to the review output; integration instructions are clear and correct.
.agents/recipes/structure/recipe.md	Adds Section 4 with graphify run instructions, god node tracking guidance, and baseline update steps; template is coherent and consistent with tool's output format.
.agents/skills/review-code/SKILL.md	Adds Step 3.5 describing how the agent should consume the structural impact file; guidance is well-structured and clearly scoped (HIGH/MEDIUM/LOW risk calibration).
.gitignore	Adds graphify-out/ to .gitignore; correct since the cache directory is CI-only and persisted via actions/cache.

Sequence Diagram

sequenceDiagram
    participant GHA as GitHub Actions
    participant BC as base-agents/ checkout
    participant WS as PR workspace
    participant Agent as Review Agent

    GHA->>GHA: gh pr diff --name-only | grep .py → /tmp/changed-py.txt
    GHA->>BC: sparse-checkout .agents/recipes + .agents/tools (base branch)
    GHA->>BC: python base-agents/.agents/tools/structural_impact.py
    Note over BC,WS: --repo-root points to PR workspace<br/>tool reads PR source, runs from base branch
    BC->>WS: extract(all_source_files) → build graph G
    BC->>BC: derive changed_node_ids from G.nodes[source_file]
    BC->>BC: assess risk (god nodes / violations / clusters)
    BC->>GHA: write /tmp/structural-impact-{pr}.md
    GHA->>Agent: start review agent
    Agent->>Agent: read /tmp/structural-impact-{pr}.md
    Agent->>Agent: calibrate review depth (HIGH→extra pass 2 scrutiny)
    Agent->>Agent: /review-code {pr_number}
    Agent->>Agent: append Structural Impact section to review file

_{Reviews (4): Last reviewed commit: "feat(ci): warn when changed files touch ..." | Re-trigger Greptile}

Pin graphifyy==0.4.23 in both CI workflows to prevent breaking changes from unpinned installs. Fix _dedup() label truncation at 30 chars that could merge distinct entities sharing a common prefix.

…filenames Replace unquoted $CHANGED_PY word-split with mapfile + array expansion to prevent glob expansion and correctly handle filenames containing spaces or special characters.

Derive changed_node_ids from the already-built graph by matching source_file paths instead of running a separate extraction pass. Removes implicit dependency on graphify ID stability across independent extractions. Fix MEDIUM risk reason to reflect the actual trigger (cluster spread vs high-connectivity entity) instead of always reporting cluster count.

…stale artifacts Split the workflow step to isolate GH_TOKEN from graphifyy execution, preventing a compromised package release from exfiltrating write-scoped tokens. Scan both edge directions in _cross_package_edges so inbound dependents and violations where the changed node is the target are visible. Detect deleted files and report them as a risk signal. Include relation type in dedup key so distinct edge types between the same labels are not collapsed. Clean stale /tmp artifacts before running analysis to prevent reruns from reading old reports.

nabinchha · 2026-04-30T21:04:46Z

Thanks for putting this together, @andreatgretel — the retroactive validation against three merged PRs is a really nice way to make the case for the signal quality.

Summary

Adds a graphify-based AST analyzer at .agents/tools/structural_impact.py that builds a directed graph of the codebase and produces a risk report consumed by both the PR-review workflow (pre-computed structural context for the agent) and the Wednesday structure audit (with week-over-week diff). Implementation matches the stated intent, and several rounds of self-review/Codex feedback have already addressed the most critical issues (version pinning, dedup correctness, MEDIUM risk reason, ID stability, edge-coverage scan-both-directions, base-branch security model, and changed-files word-splitting).

Findings

Warnings — Worth addressing

.agents/tools/structural_impact.py:85 — Missing type annotation on G parameter

What: _cross_package_edges(G, node_ids: set[str] | None = None) doesn't annotate G. STYLEGUIDE.md is explicit: "ALWAYS add type annotations to all functions, methods, and class attributes."
Why: Pure function in a new file, so this sets the precedent for follow-on contributions. _build_graph returns a bare dict for the same reason.
Suggestion: Since networkx is on its way out of the repo, annotating with nx.DiGraph would point in the wrong direction. The pragmatic answer for a private helper is G: Any — STYLEGUIDE discourages Any for return types, but it's acceptable on params when the underlying type is intentionally not part of the project's surface. For _build_graph, swap the dict-literal return for a small @dataclass (STYLEGUIDE prefers dataclass over TypedDict for "typed struct with no validation"), which also lets callers do analysis.graph instead of analysis["graph"]:

from dataclasses import dataclass
from typing import Any

@dataclass(frozen=True, slots=True)
class _Analysis:
    graph: Any
    communities: dict[int, list[str]]
    cohesion: dict[str, Any]
    god_nodes: list[dict[str, Any]]

def _build_graph(files: list[Path]) -> _Analysis: ...
def _cross_package_edges(G: Any, node_ids: set[str] | None = None) -> ...: ...

Call sites become G, communities, gods = analysis.graph, analysis.communities, analysis.god_nodes. If a richer type for G is wanted later without taking a dep, a Protocol with just the methods we use (.nodes, .edges, .degree, number_of_nodes, number_of_edges) is the structural-typing route.

.agents/tools/structural_impact.py:30-34 — _PACKAGE_SUBDIRS silently misses files in new packages

What: The package-source roots are hardcoded to the three current packages. If someone adds a new package (packages/data-designer-foo/...), _collect_source_files won't pick it up and _get_package won't classify it, so its files would silently report "0 AST entities" with no warning.
Why: The graphify report would falsely show a localized change for what could be a major new-package PR. This is a quiet failure mode — the kind the structural analyzer is meant to catch, not produce.
Suggestion: Either (a) discover packages dynamically by globbing packages/*/src/data_designer/ and inferring layer from the parent dirname, or (b) add a guard: when a --changed-files path doesn't fall under any known subdir, emit a visible warning line in the report (e.g., _Note: 3 changed files outside known package roots — analysis may be incomplete_). Option (a) is more future-proof; option (b) is a minimal hedge.

Suggestions — Take it or leave it

.agents/tools/structural_impact.py — No unit tests for pure helpers

What: PR description marks unit tests as N/A ("CI tool, no testable library code"), but _get_package, _dedup, and _collect_source_files are pure functions that don't depend on graphify and are exactly the kind of thing that breaks silently during refactors.
Why: AGENTS.md says "no untested code paths"; even though .agents/tools/ is meta-tooling rather than package code, the parser-classifier logic carries enough subtle behavior (substring vs path-segment matching, dedup key composition) that tests would pay for themselves the first time someone touches it. Codex/Greptile already caught two latent bugs here in the same review cycle.
Suggestion: A small tests/test_structural_impact.py covering _get_package (engine/config/interface/external paths), _dedup (collisions on shared prefixes — exactly the bug fixed in bfc5db69), and a smoke test for _collect_source_files against a tmp dir would be ~30 lines and prevent regressions.

.agents/tools/structural_impact.py:313 — Bare except Exception

What: The diff fallback catches Exception broadly to print _Could not diff against previous graph: {exc}_.
Why: STYLEGUIDE: "Prefer specific exception types over bare except." The intent (graceful degradation) is fine, but a broad catch will also swallow KeyboardInterrupt-adjacent bugs and typos.
Suggestion: Narrow to the realistic set, e.g., except (json.JSONDecodeError, KeyError, TypeError, OSError) as exc:. If graphify defines its own diff exception, include that.

.agents/tools/structural_impact.py:297, 317 — In-function imports

What: from networkx.readwrite import json_graph (line 297, inside conditional) and from graphify.export import to_json (line 317, unconditional within _full_mode).
Why: STYLEGUIDE: "Place imports at module level, not inside functions (exception: unavoidable for performance reasons)." to_json is always called when _full_mode runs, so deferring it doesn't save anything; networkx is already a transitive dep of graphifyy and loaded eagerly.
Suggestion: Hoist both to the existing top-level import block.

.agents/tools/structural_impact.py:25, 340 — Module-level mutable _REPO_ROOT

What: _REPO_ROOT is a module global mutated in main() via global _REPO_ROOT, then read by _collect_source_files, _rel, and _full_mode.
Why: Common CLI shortcut, but it makes the helpers harder to unit-test (you'd need to mutate a global to vary the input) and couples three otherwise-pure functions to module state.
Suggestion: Optional refactor — thread repo_root through as a parameter on the helpers that need it. Not blocking; just easier on future-you when adding tests per the Suggestion above.

.github/workflows/agentic-ci-daily.yml:124-126 — Install graphify shares the project venv

What: The PR-review workflow isolates graphifyy in /tmp/graphify-venv, but the daily workflow installs it into the project's .venv directly (.venv/bin/python -m pip install graphifyy==0.4.23). networkx is currently a real data-designer-engine dep so this is invisible today.
Why: Once the planned networkx removal lands (it's currently only used in samplers per offline context), this step will silently re-introduce networkx into .venv as a transitive of graphifyy. That partially undoes the dep cleanup, and you only notice if someone audits pip list mid-job.
Suggestion: Use the same temp-venv pattern as the PR-review workflow:

- name: Install graphify
  if: matrix.suite == 'structure'
  run: |
    python -m venv /tmp/graphify-venv
    /tmp/graphify-venv/bin/python -m pip install graphifyy==0.4.23 2>&1 | tail -3

Then invoke structural_impact.py via /tmp/graphify-venv/bin/python in the audit step. Symmetrical to the PR-review setup, and isolates graphifyy's transitives from the project venv permanently.

.agents/tools/structural_impact.py:196 vs 79 — Path-normalization asymmetry

What: changed_paths = {str(p.resolve()) for p in changed_files} resolves symlinks, but the source files passed to extract(files) (via _collect_source_files) are built as _REPO_ROOT / sub and only inherit whatever resolution _REPO_ROOT had. If _REPO_ROOT is already resolved and the package dirs contain no symlinks (the realistic CI case), they match. If a self-hosted runner ever has a symlinked checkout dir or a bind-mounted source tree, the sets could diverge and silently produce "0 entities."
Why: Retroactive validation against three real PRs shows it works today, so this is latent rather than active. But it's the same class of silent-mismatch bug we've fixed twice already in this PR.
Suggestion: Either drop .resolve() on changed_paths and trust _REPO_ROOT resolution, or also .resolve() the paths inside _collect_source_files. Symmetric is what we want.

What Looks Good

The split-step security model is the right call — structural_impact.py runs in a separate workflow step with no env: block, so even a compromised graphifyy release can't see GH_TOKEN or the API key. The base-branch sparse checkout for .agents/tools (so fork PRs can't inject the runner code itself) closes the obvious hole. The expanded SECURITY comment makes the intent legible to future maintainers.
The retroactive validation table (PRs feat: add skip.when conditional column generation #502, fix: bridge model.generate() to agenerate() for custom columns in async engine #545, feat: add RunConfig jinja rendering engine #557 with correlation between graphify risk and actual review findings) is exactly the kind of evidence that justifies a new tool. It also makes the LOW/MEDIUM/HIGH thresholds defensible rather than arbitrary.
Iterative response to review feedback is impressive: in six commits the PR went through five distinct fixes (pinning, dedup truncation, MEDIUM risk reason, ID stability via graph-derived nodes, both-direction edge coverage), each with a clear commit message explaining what changed and why. The integration is meaningfully more robust because of it.

Verdict

Ship it (with nits) — Only Warnings around type annotations and the hardcoded package list, plus a handful of Suggestions. Nothing blocking, but the type annotations are worth addressing in this PR since they set the convention for the new file. Everything else can land as-is or in a follow-up.

This review was generated by an AI assistant.

…w except, isolate daily graphify - structural_impact.py: - replace bare _build_graph dict return with frozen _Analysis dataclass - add G: Any annotation on _cross_package_edges (STYLEGUIDE: all params typed) - hoist `from graphify.export import to_json` and `from networkx.readwrite import json_graph` to module top (no perf justification for deferred import) - narrow `except Exception` in graph-diff fallback to (JSONDecodeError, KeyError, TypeError, OSError) - agentic-ci-daily.yml: install graphifyy into /tmp/graphify-venv instead of the project .venv, matching agentic-ci-pr-review.yml. Keeps graphify's transitives (networkx) out of the project venv permanently. - structure/recipe.md: invoke the tool via /tmp/graphify-venv/bin/python to match the workflow change.

A new package under packages/ that isn't in _PACKAGE_SUBDIRS is silently absent from the graph - the analyzer would falsely report LOW risk with 0 entities. Add a _Note line in the changed-files report when any changed or deleted file lives under packages/<unknown>/, so the failure mode the analyzer is supposed to surface isn't itself silent. _KNOWN_PACKAGE_DIRS is derived from _PACKAGE_SUBDIRS so future additions stay in sync without a second source of truth.

andreatgretel · 2026-05-04T17:29:18Z

Thanks @nabinchha, this was a careful review and the type-annotation push set the right precedent for the file. Almost everything went into two follow-up commits on the branch:

508f119 — STYLEGUIDE-cited fixes + daily venv isolation

_build_graph now returns a frozen _Analysis dataclass; call sites use attribute access. _cross_package_edges(G: Any, ...) per your suggestion - graphify intentionally not in the project's typed surface, and networkx is on its way out so nx.DiGraph would point in the wrong direction.
Hoisted from graphify.export import to_json and from networkx.readwrite import json_graph to module top - no perf justification for either being deferred.
Narrowed the graph-diff fallback to (json.JSONDecodeError, KeyError, TypeError, OSError), matching what the try block can realistically raise.
Daily workflow now installs graphifyy into /tmp/graphify-venv, symmetric to the PR-review setup. Tweaked the structure recipe to invoke the tool via that interpreter so graphify's transitives stay out of .venv permanently - aligns with the planned networkx removal you mentioned.

07b0e74 — unknown-package hedge

Picked option (b) from your suggestion. _KNOWN_PACKAGE_DIRS is derived from _PACKAGE_SUBDIRS (single source of truth), and the changed-files report now emits a _Note: changes touch unknown package(s) (...)_ line when any changed or deleted file lives under packages/<unknown>/. The silent-miss failure mode the analyzer was producing is now self-reporting.
Smoke-tested with a synthetic packages/data-designer-foo/... path and confirmed the warning fires there but stays quiet for known packages.

Deferred to #601 — unit tests for _get_package, _dedup, _collect_source_files, plus the related _REPO_ROOT parameter-threading and path-resolution symmetry. The three are cheap to do together and the test wiring needs a small CI decision (.agents/tools/ is outside the package suites), so it lands cleaner as its own change.

andreatgretel added 2 commits April 21, 2026 20:14

andreatgretel requested a review from a team as a code owner April 21, 2026 20:42

andreatgretel temporarily deployed to agentic-ci April 21, 2026 20:42 — with GitHub Actions Inactive

greptile-apps Bot reviewed Apr 21, 2026

View reviewed changes

Comment thread .github/workflows/agentic-ci-pr-review.yml Outdated

Comment thread .github/workflows/agentic-ci-daily.yml

Comment thread .agents/tools/structural_impact.py Outdated

fix: pin graphifyy version and fix dedup truncation

bfc5db6

Pin graphifyy==0.4.23 in both CI workflows to prevent breaking changes from unpinned installs. Fix _dedup() label truncation at 30 chars that could merge distinct entities sharing a common prefix.

greptile-apps Bot reviewed Apr 21, 2026

View reviewed changes

Comment thread .agents/tools/structural_impact.py Outdated

fix(ci): use array expansion for changed-files arg to handle special …

913cb28

…filenames Replace unquoted $CHANGED_PY word-split with mapfile + array expansion to prevent glob expansion and correctly handle filenames containing spaces or special characters.

greptile-apps Bot reviewed Apr 21, 2026

View reviewed changes

Comment thread .agents/tools/structural_impact.py Outdated

andreatgretel added 2 commits April 21, 2026 21:15

github-actions Bot mentioned this pull request Apr 27, 2026

Agentic CI: Issue & PR Triage Tracker #562

Open

andreatgretel added 2 commits May 4, 2026 17:18

andreatgretel mentioned this pull request May 4, 2026

tests: cover structural_impact.py pure helpers; thread repo_root and align path resolution #601

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: add graphify structural impact analysis to PR review and structure audit#567

ci: add graphify structural impact analysis to PR review and structure audit#567
andreatgretel wants to merge 8 commits intomainfrom
andreatgretel/feat/graphify-structural-rebase

andreatgretel commented Apr 21, 2026

Uh oh!

github-actions Bot commented Apr 21, 2026

Uh oh!

greptile-apps Bot commented Apr 21, 2026 •

edited

Loading

Confidence Score: 5/5

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nabinchha commented Apr 30, 2026

Uh oh!

andreatgretel commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

andreatgretel commented Apr 21, 2026

📋 Summary

🔗 Related Issue

🔄 Changes

✨ Added

🔧 Changed

🔍 Attention Areas

📊 Retroactive validation on 3 merged PRs

🧪 Testing

✅ Checklist

Uh oh!

github-actions Bot commented Apr 21, 2026

Summary

Findings

Security — correctly handled, one residual concern

Correctness

Style / project conventions

Test coverage

Performance

Minor / nits

Verdict

Uh oh!

greptile-apps Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nabinchha commented Apr 30, 2026

Summary

Findings

Warnings — Worth addressing

Suggestions — Take it or leave it

What Looks Good

Verdict

Uh oh!

andreatgretel commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented Apr 21, 2026 •

edited

Loading