Skip to content

Conversation

@maparent
Copy link
Collaborator

@maparent maparent commented Nov 1, 2025

https://linear.app/discourse-graphs/issue/ENG-1032/minimal-json-ld-export-of-the-discourse-graph

Summary by CodeRabbit

Release Notes

  • New Features

    • Added JSON-LD export format for improved knowledge graph integration
    • Enhanced Markdown export with improved formatting options
    • Added modified timestamp tracking to improve data audit trails
  • Refactor

    • Reorganized export utilities for improved code maintainability
  • Chores

    • Added foundational RDF vocabulary schemas for DiscourseGraph ontology

✏️ Tip: You can customize this high-level summary in your review settings.

@linear
Copy link

linear bot commented Nov 1, 2025

@supabase
Copy link

supabase bot commented Nov 1, 2025

This pull request has been ignored for the connected project zytfjzqyijgagqxrzbmz because there are no changes detected in packages/database/supabase directory. You can change this behaviour in Project Integrations Settings ↗︎.


Preview Branches by Supabase.
Learn more about Supabase Branching ↗︎.

@maparent maparent marked this pull request as draft November 3, 2025 02:05
@maparent maparent force-pushed the eng-1032-minimal-json-ld-export-of-the-discourse-graph branch 5 times, most recently from 3ff8a2c to 4632249 Compare December 23, 2025 02:57
@maparent
Copy link
Collaborator Author

@CodeRabbit review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 23, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 23, 2025

📝 Walkthrough

Walkthrough

This PR introduces refactored export utilities for Roam data, extracting previously inline logic into dedicated modules (canonicalRoamUrl, exportUtils, jsonld, pageToMarkdown) and updating getExportTypes to depend on them. It also adds RDF schema definitions for DiscourseGraph node types and relations.

Changes

Cohort / File(s) Summary
URL/Link Utilities
apps/roam/src/utils/canonicalRoamUrl.ts, apps/roam/src/utils/supabaseContext.ts
New utility for constructing canonical Roam graph URLs; supabaseContext refactored to use this utility instead of inline concatenation.
Export Utilities
apps/roam/src/utils/exportUtils.ts
New module providing export-related helpers: array deduplication, page data aggregation with Discourse node matching, filename sanitization and shortening, link formatting, and PullBlock-to-TreeNode transformation with recursive UID collection.
Markdown Export
apps/roam/src/utils/pageToMarkdown.ts, apps/roam/src/utils/getExportTypes.ts
New markdown rendering module converting Roam blocks to Markdown with support for embeds, block refs, TODO markers, and frontmatter; getExportTypes refactored to delegate markdown generation and remove 10+ internal helpers.
Page Metadata
apps/roam/src/utils/getPageMetadata.ts
Added modified timestamp tracking alongside created time; destructuring and return types updated to include modified date.
JSON-LD Export
apps/roam/src/utils/jsonld.ts
New module for building JSON-LD representations of Discourse-derived data, including schema context, node/relation definitions, and complete graph assembly with metadata and content rendering.
RDF Schema Definitions
apps/website/public/schema/dg_base.ttl, apps/website/public/schema/dg_core.ttl
New Turtle ontology files defining DiscourseGraph foundation schema (NodeSchema, Role, RelationDef, RelationInstance) and core node types (Question, Claim, Evidence, Source) with bidirectional relations.

Sequence Diagram(s)

sequenceDiagram
    participant GetExport as getExportTypes
    participant PageMD as pageToMarkdown
    participant ExportUtils as exportUtils
    participant JLDBuilder as jsonld
    participant RoamAPI as Roam API
    participant MetaUtil as getPageMetadata

    GetExport->>MetaUtil: get page metadata
    MetaUtil->>RoamAPI: pull page with timestamps
    MetaUtil-->>GetExport: page metadata (created/modified)

    GetExport->>PageMD: render page to markdown
    PageMD->>ExportUtils: getFilename, toLink, uniqJsonArray
    ExportUtils-->>PageMD: formatted links & names
    PageMD->>RoamAPI: fetch tree/content as needed
    PageMD-->>GetExport: markdown content + uid set

    GetExport->>ExportUtils: getPageData (match nodes, deduplicate)
    ExportUtils->>RoamAPI: fallback fetch if needed
    ExportUtils-->>GetExport: enriched results with types

    GetExport->>JLDBuilder: build JSON-LD graph
    JLDBuilder->>ExportUtils: pull metadata, process relations
    JLDBuilder->>PageMD: render page content for JSON-LD
    JLDBuilder-->>GetExport: complete JSON-LD document
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title directly reflects the main objective of adding JSON-LD export functionality for the discourse-graph, which is the core purpose evidenced by new modules (jsonld.ts, pageToMarkdown.ts, exportUtils.ts) and schema files (dg_base.ttl, dg_core.ttl).
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (7)
apps/website/public/schema/dg_base.ttl (2)

3-3: Remove or correct redundant prefix declaration.

The : prefix duplicates the rdfs namespace from line 2. Since the default prefix : is not used elsewhere in this file, consider removing this line.

🔎 Proposed fix
 @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
-@prefix : <http://www.w3.org/2000/01/rdf-schema#> .
 @prefix dc: <http://purl.org/dc/elements/1.1/> .

18-20: Inconsistent indentation.

Line 19 lacks indentation before rdfs:subClassOf, which is inconsistent with the formatting style used elsewhere (e.g., lines 23-26). While valid Turtle syntax, consistent indentation improves readability.

🔎 Proposed fix
 dgb:NodeSchema
-rdfs:subClassOf owl:Class;
+    rdfs:subClassOf owl:Class;
     rdfs:comment "Subclasses of DiscourseGraph nodes"@en .
apps/website/public/schema/dg_core.ttl (1)

3-3: Remove redundant prefix declaration.

Similar to dg_base.ttl, the : prefix duplicates the rdfs namespace and is unused in this file. Consider removing it for clarity.

🔎 Proposed fix
 @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
-@prefix : <http://www.w3.org/2000/01/rdf-schema#> .
 @prefix dc: <http://purl.org/dc/elements/1.1/> .
apps/roam/src/utils/canonicalRoamUrl.ts (1)

2-3: Add explicit return type for the function.

Per coding guidelines, functions should have explicit return types. Consider adding : string to clarify the return type.

🔎 Proposed fix
-const canonicalRoamUrl = (graphName = window.roamAlphaAPI.graph.name) =>
-  ROAM_URL_PREFIX + graphName;
+const canonicalRoamUrl = (graphName = window.roamAlphaAPI.graph.name): string =>
+  ROAM_URL_PREFIX + graphName;
apps/roam/src/utils/getPageMetadata.ts (1)

15-37: LGTM! The metadata enhancement is well-implemented.

The addition of the modified timestamp enriches the metadata surface for downstream consumers. The fallback behavior (using date for both created and modified when no results) is sensible.

Consider adding an explicit return type for getPageMetadata per coding guidelines.

apps/roam/src/utils/pageToMarkdown.ts (1)

302-372: Consider extracting content assembly into smaller helper functions.

The content assembly template literal is complex and spans multiple concerns (main content, discourse context, references). While functional, breaking this into smaller helper functions would improve readability and maintainability.

apps/roam/src/utils/jsonld.ts (1)

136-149: Consider avoiding direct mutation of page.content.

The code mutates page.content directly within the Promise.all callback. While this works, it couples the async processing with side effects. Consider returning the content and merging it afterward for clearer data flow.

🔎 Alternative approach
-  await Promise.all(
-    pageData.map(async (page: Result) => {
-      const r = await pageToMarkdown(page, {
-        ...settings,
-        allNodes,
-        linkType: "roam url",
-        blockAnchors: true,
-        blockRefsAsLinks: true,
-      });
-      page.content = r.content;
-      numTreatedPages += 1;
-      await updateExportProgress(0.1 + (numTreatedPages / numPages) * 0.75);
-    }),
-  );
+  const pageContents = await Promise.all(
+    pageData.map(async (page: Result, index: number) => {
+      const r = await pageToMarkdown(page, {
+        ...settings,
+        allNodes,
+        linkType: "roam url",
+        blockAnchors: true,
+        blockRefsAsLinks: true,
+      });
+      await updateExportProgress(0.1 + ((index + 1) / numPages) * 0.75);
+      return { uid: page.uid, content: r.content };
+    }),
+  );
+  const contentByUid = Object.fromEntries(
+    pageContents.map(({ uid, content }) => [uid, content]),
+  );
+  pageData.forEach((page) => {
+    page.content = contentByUid[page.uid];
+  });
📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 670fe22 and 767adbc.

📒 Files selected for processing (9)
  • apps/roam/src/utils/canonicalRoamUrl.ts
  • apps/roam/src/utils/exportUtils.ts
  • apps/roam/src/utils/getExportTypes.ts
  • apps/roam/src/utils/getPageMetadata.ts
  • apps/roam/src/utils/jsonld.ts
  • apps/roam/src/utils/pageToMarkdown.ts
  • apps/roam/src/utils/supabaseContext.ts
  • apps/website/public/schema/dg_base.ttl
  • apps/website/public/schema/dg_core.ttl
🧰 Additional context used
📓 Path-based instructions (6)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/main.mdc)

**/*.{ts,tsx}: Use Tailwind CSS for styling where possible
When refactoring inline styles, use tailwind classes
Prefer type over interface in TypeScript
Use explicit return types for functions
Avoid any types when possible
Prefer arrow functions over regular function declarations
Use named parameters (object destructuring) when a function has more than 2 parameters
Use PascalCase for components and types
Use camelCase for variables and functions
Use UPPERCASE for constants
Function names should describe their purpose clearly
Prefer early returns over nested conditionals for better readability

Files:

  • apps/roam/src/utils/jsonld.ts
  • apps/roam/src/utils/exportUtils.ts
  • apps/roam/src/utils/getExportTypes.ts
  • apps/roam/src/utils/getPageMetadata.ts
  • apps/roam/src/utils/pageToMarkdown.ts
  • apps/roam/src/utils/supabaseContext.ts
  • apps/roam/src/utils/canonicalRoamUrl.ts
apps/roam/**/*.{js,ts,tsx,jsx,json}

📄 CodeRabbit inference engine (.cursor/rules/roam.mdc)

Prefer existing dependencies from package.json when working on the Roam Research extension

Files:

  • apps/roam/src/utils/jsonld.ts
  • apps/roam/src/utils/exportUtils.ts
  • apps/roam/src/utils/getExportTypes.ts
  • apps/roam/src/utils/getPageMetadata.ts
  • apps/roam/src/utils/pageToMarkdown.ts
  • apps/roam/src/utils/supabaseContext.ts
  • apps/roam/src/utils/canonicalRoamUrl.ts
apps/roam/**/*.{ts,tsx,jsx,js,css,scss}

📄 CodeRabbit inference engine (.cursor/rules/roam.mdc)

Use BlueprintJS 3 components and Tailwind CSS for platform-native UI in the Roam Research extension

Files:

  • apps/roam/src/utils/jsonld.ts
  • apps/roam/src/utils/exportUtils.ts
  • apps/roam/src/utils/getExportTypes.ts
  • apps/roam/src/utils/getPageMetadata.ts
  • apps/roam/src/utils/pageToMarkdown.ts
  • apps/roam/src/utils/supabaseContext.ts
  • apps/roam/src/utils/canonicalRoamUrl.ts
apps/roam/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.cursor/rules/roam.mdc)

apps/roam/**/*.{ts,tsx,js,jsx}: Use the roamAlphaApi docs from https://roamresearch.com/#/app/developer-documentation/page/tIaOPdXCj when implementing Roam functionality
Use Roam Depot/Extension API docs from https://roamresearch.com/#/app/developer-documentation/page/y31lhjIqU when implementing extension functionality

Files:

  • apps/roam/src/utils/jsonld.ts
  • apps/roam/src/utils/exportUtils.ts
  • apps/roam/src/utils/getExportTypes.ts
  • apps/roam/src/utils/getPageMetadata.ts
  • apps/roam/src/utils/pageToMarkdown.ts
  • apps/roam/src/utils/supabaseContext.ts
  • apps/roam/src/utils/canonicalRoamUrl.ts
apps/roam/**

📄 CodeRabbit inference engine (.cursor/rules/roam.mdc)

Implement the Discourse Graph protocol in the Roam Research extension

Files:

  • apps/roam/src/utils/jsonld.ts
  • apps/roam/src/utils/exportUtils.ts
  • apps/roam/src/utils/getExportTypes.ts
  • apps/roam/src/utils/getPageMetadata.ts
  • apps/roam/src/utils/pageToMarkdown.ts
  • apps/roam/src/utils/supabaseContext.ts
  • apps/roam/src/utils/canonicalRoamUrl.ts
apps/website/**

📄 CodeRabbit inference engine (.cursor/rules/website.mdc)

Prefer existing dependencies from apps/website/package.json when adding dependencies to the website application

Files:

  • apps/website/public/schema/dg_core.ttl
  • apps/website/public/schema/dg_base.ttl
🧠 Learnings (20)
📓 Common learnings
Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/** : Implement the Discourse Graph protocol in the Roam Research extension
📚 Learning: 2025-11-25T00:52:41.934Z
Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use Roam Depot/Extension API docs from https://roamresearch.com/#/app/developer-documentation/page/y31lhjIqU when implementing extension functionality

Applied to files:

  • apps/roam/src/utils/jsonld.ts
  • apps/roam/src/utils/exportUtils.ts
  • apps/roam/src/utils/getExportTypes.ts
  • apps/roam/src/utils/pageToMarkdown.ts
  • apps/roam/src/utils/supabaseContext.ts
  • apps/roam/src/utils/canonicalRoamUrl.ts
📚 Learning: 2025-11-25T00:52:41.934Z
Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use the roamAlphaApi docs from https://roamresearch.com/#/app/developer-documentation/page/tIaOPdXCj when implementing Roam functionality

Applied to files:

  • apps/roam/src/utils/jsonld.ts
  • apps/roam/src/utils/exportUtils.ts
  • apps/roam/src/utils/getExportTypes.ts
  • apps/roam/src/utils/pageToMarkdown.ts
  • apps/roam/src/utils/supabaseContext.ts
  • apps/roam/src/utils/canonicalRoamUrl.ts
📚 Learning: 2025-11-25T00:52:41.934Z
Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/**/*.{js,ts,tsx,jsx,json} : Prefer existing dependencies from package.json when working on the Roam Research extension

Applied to files:

  • apps/roam/src/utils/jsonld.ts
  • apps/roam/src/utils/getExportTypes.ts
  • apps/roam/src/utils/supabaseContext.ts
  • apps/roam/src/utils/canonicalRoamUrl.ts
📚 Learning: 2025-11-25T00:52:41.934Z
Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/** : Implement the Discourse Graph protocol in the Roam Research extension

Applied to files:

  • apps/roam/src/utils/jsonld.ts
  • apps/roam/src/utils/supabaseContext.ts
  • apps/roam/src/utils/canonicalRoamUrl.ts
📚 Learning: 2025-06-22T10:40:52.752Z
Learnt from: sid597
Repo: DiscourseGraphs/discourse-graph PR: 232
File: apps/roam/src/utils/getAllDiscourseNodesSince.ts:18-31
Timestamp: 2025-06-22T10:40:52.752Z
Learning: In apps/roam/src/utils/getAllDiscourseNodesSince.ts, the user confirmed that querying for `?title` with `:node/title` and mapping it to the `text` field in the DiscourseGraphContent type is the correct implementation for retrieving discourse node content from Roam Research, despite it appearing to query page titles rather than block text content.

Applied to files:

  • apps/roam/src/utils/jsonld.ts
  • apps/roam/src/utils/exportUtils.ts
  • apps/roam/src/utils/getExportTypes.ts
  • apps/roam/src/utils/getPageMetadata.ts
  • apps/roam/src/utils/pageToMarkdown.ts
  • apps/roam/src/utils/supabaseContext.ts
📚 Learning: 2025-06-17T23:37:45.289Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 220
File: apps/roam/src/utils/conceptConversion.ts:42-56
Timestamp: 2025-06-17T23:37:45.289Z
Learning: In the DiscourseNode interface from apps/roam/src/utils/getDiscourseNodes.ts, the field `node.type` serves as the UID field rather than having a conventional `node.uid` field. This is an unusual naming convention where the type field actually contains the unique identifier.

Applied to files:

  • apps/roam/src/utils/exportUtils.ts
📚 Learning: 2025-06-17T23:37:45.289Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 220
File: apps/roam/src/utils/conceptConversion.ts:42-56
Timestamp: 2025-06-17T23:37:45.289Z
Learning: In the DiscourseNode interface from apps/roam/src/utils/getDiscourseNodes.ts, the field `type` serves as the unique identifier field, not a type classification field. The interface has no `uid` or `id` field, making `node.type` the correct field to use for UID-related operations.

Applied to files:

  • apps/roam/src/utils/exportUtils.ts
📚 Learning: 2025-12-07T20:54:20.007Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 559
File: apps/roam/src/utils/findDiscourseNode.ts:37-39
Timestamp: 2025-12-07T20:54:20.007Z
Learning: In apps/roam/src/utils/findDiscourseNode.ts, the function findDiscourseNodeByTitleAndUid accepts both uid and title parameters where uid is primarily used for cache access (as the cache key) while title is used for the actual matching via matchDiscourseNode. This design reflects the pattern where downstream, the uid is mostly used to fetch the title, so the function caches by uid but matches by title.

Applied to files:

  • apps/roam/src/utils/exportUtils.ts
  • apps/roam/src/utils/getPageMetadata.ts
  • apps/roam/src/utils/supabaseContext.ts
📚 Learning: 2025-11-05T21:57:14.909Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 534
File: apps/roam/src/utils/createReifiedBlock.ts:40-48
Timestamp: 2025-11-05T21:57:14.909Z
Learning: In the discourse-graph repository, the function `getPageUidByPageTitle` from `roamjs-components/queries/getPageUidByPageTitle` is a synchronous function that returns a string directly (the page UID or an empty string if not found), not a Promise. It should be called without `await`.

Applied to files:

  • apps/roam/src/utils/exportUtils.ts
  • apps/roam/src/utils/getExportTypes.ts
  • apps/roam/src/utils/getPageMetadata.ts
  • apps/roam/src/utils/pageToMarkdown.ts
  • apps/roam/src/utils/supabaseContext.ts
📚 Learning: 2025-07-13T16:47:14.352Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 0
File: :0-0
Timestamp: 2025-07-13T16:47:14.352Z
Learning: In the discourse-graph codebase, types.gen.ts contains automatically generated database function type definitions that may have reordered signatures between versions. This reordering is expected behavior from the code generation process and should not be flagged as an issue requiring fixes.

Applied to files:

  • apps/roam/src/utils/getExportTypes.ts
📚 Learning: 2025-09-01T19:09:07.052Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 0
File: :0-0
Timestamp: 2025-09-01T19:09:07.052Z
Learning: User prefers to avoid making types explicit in each require statement due to anti-DRY concerns when working with module imports in TypeScript.

Applied to files:

  • apps/roam/src/utils/getExportTypes.ts
📚 Learning: 2025-07-21T14:22:20.752Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 291
File: packages/database/supabase/functions/create-space/index.ts:0-0
Timestamp: 2025-07-21T14:22:20.752Z
Learning: In the discourse-graph codebase, types.gen.ts is not accessible from Supabase edge functions, requiring duplication of types and utilities when needed in the edge function environment at packages/database/supabase/functions/.

Applied to files:

  • apps/roam/src/utils/getExportTypes.ts
📚 Learning: 2025-11-25T00:52:41.934Z
Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/**/*.{ts,tsx,jsx,js,css,scss} : Use BlueprintJS 3 components and Tailwind CSS for platform-native UI in the Roam Research extension

Applied to files:

  • apps/roam/src/utils/getExportTypes.ts
  • apps/roam/src/utils/supabaseContext.ts
📚 Learning: 2025-05-20T03:06:16.600Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 165
File: packages/database/schema.yaml:116-121
Timestamp: 2025-05-20T03:06:16.600Z
Learning: In the discourse-graph project's LinkML schema (packages/database/schema.yaml), attributes and slots are equivalent constructs. Items can be defined either as slots or attributes without needing to duplicate them in both sections.

Applied to files:

  • apps/website/public/schema/dg_base.ttl
📚 Learning: 2025-10-08T18:55:18.029Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 0
File: :0-0
Timestamp: 2025-10-08T18:55:18.029Z
Learning: In the discourse-graph database schema, security views like `my_concepts`, `my_contents`, and `my_documents` were introduced to significantly improve query performance. Direct table access had dismal performance, while the view-based approach provides substantial performance gains.

Applied to files:

  • apps/website/public/schema/dg_base.ttl
📚 Learning: 2025-11-23T23:53:43.094Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 0
File: :0-0
Timestamp: 2025-11-23T23:53:43.094Z
Learning: In `apps/roam/src/utils/supabaseContext.ts`, the Roam URL normalization uses a targeted string replacement `url.replace("/?server-port=3333#/", "/#/")` rather than URL parsing to avoid impacting unforeseen URL patterns. This conservative approach is intentional to handle only the specific known case.

Applied to files:

  • apps/roam/src/utils/supabaseContext.ts
  • apps/roam/src/utils/canonicalRoamUrl.ts
📚 Learning: 2025-05-30T14:37:30.215Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 182
File: apps/website/app/api/supabase/rpc/search-content/route.ts:53-56
Timestamp: 2025-05-30T14:37:30.215Z
Learning: The search-content route (apps/website/app/api/supabase/rpc/search-content/route.ts) is intentionally designed to be platform-agnostic rather than Roam-specific. The generic "Platform" terminology (like subsetPlatformIds) is used because the route will support multiple platforms in the near future, not just Roam.

Applied to files:

  • apps/roam/src/utils/supabaseContext.ts
📚 Learning: 2025-06-25T22:56:17.522Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 0
File: :0-0
Timestamp: 2025-06-25T22:56:17.522Z
Learning: In the Roam discourse-graph system, the existence of the configuration page (identified by DISCOURSE_CONFIG_PAGE_TITLE) and its corresponding UID is a system invariant. The code can safely assume this page will always exist, so defensive null checks are not needed when using `getPageUidByPageTitle(DISCOURSE_CONFIG_PAGE_TITLE)`.

Applied to files:

  • apps/roam/src/utils/supabaseContext.ts
📚 Learning: 2025-06-23T11:49:45.457Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 220
File: apps/roam/src/utils/conceptConversion.ts:11-40
Timestamp: 2025-06-23T11:49:45.457Z
Learning: In the DiscourseGraphs/discourse-graph codebase, direct access to `window.roamAlphaAPI` is the established pattern throughout the codebase. The team prefers to maintain this pattern consistently rather than making piecemeal changes, and plans to address dependency injection as a global refactor when scheduled.

Applied to files:

  • apps/roam/src/utils/canonicalRoamUrl.ts
🧬 Code graph analysis (2)
apps/roam/src/utils/jsonld.ts (5)
apps/roam/src/utils/getExportSettings.ts (1)
  • getExportSettings (112-125)
apps/roam/src/utils/pageToMarkdown.ts (1)
  • pageToMarkdown (234-375)
apps/roam/src/utils/types.ts (1)
  • Result (42-46)
apps/roam/src/utils/getRelationData.ts (1)
  • getRelationDataUtil (9-61)
apps/roam/src/utils/exportUtils.ts (2)
  • getPageData (19-67)
  • uniqJsonArray (8-17)
apps/roam/src/utils/exportUtils.ts (1)
apps/roam/src/utils/types.ts (1)
  • Result (42-46)
🪛 ast-grep (0.40.3)
apps/roam/src/utils/exportUtils.ts

[warning] 79-85: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(
^${nodeFormat .replace(/\[/g, "\\[") .replace(/]/g, "\\]") .replace("{content}", "(.*?)") .replace(/{[^}]+}/g, "(?:.*?)")}$,
)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🔇 Additional comments (15)
apps/website/public/schema/dg_core.ttl (2)

12-26: LGTM! Well-defined node types.

The four core node types (Question, Claim, Evidence, Source) are clearly defined with appropriate labels and descriptive comments that align with scientific discourse modeling.


9-9: Fix namespace mismatch between dg_core.ttl and dg_base.ttl.

This file references <https://discoursegraphs.com/schema/dg_base#> but the actual dg_base.ttl file (line 8) declares its namespace as <https://discoursegraphs.com/schema/db_base#> (note db_base not dg_base). This mismatch will cause undefined prefix errors when loading these ontologies together.

Either:

  • Update dg_base.ttl to use dg_base in its namespace URI, or
  • Update this line to use db_base to match the existing definition
⛔ Skipped due to learnings
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 0
File: :0-0
Timestamp: 2025-10-08T18:55:18.029Z
Learning: In the discourse-graph database schema, security views like `my_concepts`, `my_contents`, and `my_documents` were introduced to significantly improve query performance. Direct table access had dismal performance, while the view-based approach provides substantial performance gains.
apps/website/public/schema/dg_base.ttl (1)

22-26: Reconsider OWL modeling approach for metaproperty restrictions.

The dgb:Role class uses owl:Restriction on rdfs:domain and rdfs:range (lines 24-25). While these are not annotation properties but rather global RDFS typing axioms, using OWL restrictions on them is an unusual pattern. Metaproperties like rdfs:domain are typically not the target of OWL class-local restrictions; they define schema-level typing rather than instance-level constraints. Verify this modeling choice aligns with your ontology design intent.

apps/roam/src/utils/supabaseContext.ts (1)

5-5: LGTM!

Clean refactor that centralizes URL construction by using the new canonicalRoamUrl utility instead of inline concatenation.

Also applies to: 58-58

apps/roam/src/utils/getExportTypes.ts (3)

56-82: LGTM!

The getJsonData function is well-refactored to use getPageData utility, maintaining clean separation of concerns.


84-135: LGTM! Good use of early return pattern.

The Markdown export callback correctly uses early return guard and leverages pageToMarkdown for content generation.


150-172: LGTM! JSON-LD export is well-structured.

The experimental JSON-LD export is properly integrated with progress updates and uses the new getJsonLdData utility. Good use of pretty-printing with JSON.stringify(data, undefined, " ") for readability.

apps/roam/src/utils/pageToMarkdown.ts (2)

31-36: LGTM! Regex patterns are well-defined.

The embed regex patterns correctly capture Roam embed syntax with proper handling of multiple parentheses variations.


114-135: LGTM! Well-structured function signature with typed options.

The toMarkdown function has a clear signature with properly typed options object using named parameters, which aligns with coding guidelines for functions with many parameters.

apps/roam/src/utils/jsonld.ts (1)

11-35: LGTM! Well-structured JSON-LD context.

The context mapping includes standard vocabularies (RDF, RDFS, OWL, Dublin Core, PROV, SIOC) and custom DiscourseGraph schema terms. Good use of CURIE prefixes.

apps/roam/src/utils/exportUtils.ts (5)

8-17: LGTM! Clean deduplication utility.

The uniqJsonArray function correctly deduplicates objects by stable JSON serialization of sorted entries.


19-67: LGTM! Robust page data resolution.

The getPageData function handles both discourse graph export mode and standard export mode with proper deduplication via matchedTexts set.


90-120: LGTM! Well-implemented filename sanitization.

The getFilename function properly handles:

  • Special character removal
  • Length truncation with ellipsis
  • Extension handling
  • Content simplification option

122-129: LGTM!

The toLink function correctly handles wikilinks, alias, and roam URL link types with proper extension stripping.


131-153: LGTM! Clean tree transformation utilities.

Both pullBlockToTreeNode and collectUids are well-implemented with proper recursive handling of children and sensible defaults for optional fields.

@maparent maparent force-pushed the eng-1032-minimal-json-ld-export-of-the-discourse-graph branch 2 times, most recently from 03b5309 to 38e7953 Compare December 24, 2025 18:16
@maparent
Copy link
Collaborator Author

@CodeRabbit check corrections in last commit

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 24, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get the list of commits in the PR branch
echo "=== Recent commits in PR branch ==="
git log --oneline -10 origin/eng-1032-minimal-json-ld-export-of-the-discourse-graph

echo -e "\n=== Last commit details ==="
git log -1 --pretty=fuller origin/eng-1032-minimal-json-ld-export-of-the-discourse-graph

echo -e "\n=== Diff of last commit ==="
git diff origin/eng-1032-minimal-json-ld-export-of-the-discourse-graph~1..origin/eng-1032-minimal-json-ld-export-of-the-discourse-graph

Length of output: 884


🏁 Script executed:

#!/bin/bash
# Get PR commits using GitHub CLI
echo "=== PR Commits ==="
gh pr view 529 --json commits --jq '.commits[] | "\(.oid[0:7]) \(.messageHeadline)"'

echo -e "\n=== Last commit SHA ==="
LAST_COMMIT=$(gh pr view 529 --json commits --jq '.commits[-1].oid')
echo "$LAST_COMMIT"

echo -e "\n=== Last commit details ==="
gh pr view 529 --json commits --jq '.commits[-1] | "Commit: \(.oid)\nAuthor: \(.authors[0].login)\nDate: \(.committedDate)\nMessage: \(.messageHeadline)\n\(.messageBody)"'

echo -e "\n=== Diff of last commit ==="
git fetch origin "$LAST_COMMIT" 2>/dev/null
git show "$LAST_COMMIT" --stat

echo -e "\n=== Full diff ==="
git show "$LAST_COMMIT"

Length of output: 4470


@maparent I've reviewed the corrections in your last commit (38e7953). Overall, excellent work addressing the feedback! Here's what I found:

✅ Correct fixes applied:

  1. jsonld.ts: Fixed OWL property name from owl:inverse-of to owl:inverseOf - this is the correct camelCase form in OWL.

  2. dg_base.ttl:

    • Fixed namespace prefix typo: db_basedg_base
    • Fixed range typo: dbg:NodeSchemadgb:NodeSchema
  3. dg_core.ttl:

    • Corrected all owl:inverseOf references to use specific relation names instead of generic labels (e.g., dg:Opposesdg:OpposesCE, dg:Supportsdg:SupportsCE, etc.)
    • Fixed indentation for consistency
    • Fixed casing: dg:Addressesdg:addresses

⚠️ Potential improvement:

exportUtils.ts - The regex escaping was improved from escaping only [ and ] to also escaping ?, ., and +. However, the current implementation may still be incomplete. Consider escaping additional regex metacharacters that could appear in nodeFormat:

.replace(/([[\](){}?.*+^$|\\])/g, "\\$1")

This would handle: [ ] ( ) { } ? . * + ^ $ | \

Since the code later handles {...} as template placeholders separately, escaping them upfront shouldn't cause issues, but you may want to verify this doesn't break the template logic.

The corrections look solid! Would you like me to help verify if the additional regex escaping is needed based on the actual nodeFormat values used in the codebase?

@maparent
Copy link
Collaborator Author

@CodeRabbit I will stick to the same regex escaping as getDiscourseNodeFormatInnerExpression for now.

@maparent maparent marked this pull request as ready for review December 24, 2025 18:21
@maparent maparent requested a review from mdroidian December 24, 2025 18:21
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 24, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find the getDiscourseNodeFormatInnerExpression function to understand the regex pattern
rg -nP "getDiscourseNodeFormatInnerExpression" -A 20

Length of output: 4759


@maparent Sounds good! I can confirm that your current implementation matches the pattern in getDiscourseNodeFormatInnerExpression, which escapes [, ], ?, ., and +. Maintaining consistency with the existing codebase is a solid approach for maintainability.

The corrections in your last commit look great overall. Nice work on fixing the OWL property names and namespace issues! 👍


🧠 Learnings used
Learnt from: sid597
Repo: DiscourseGraphs/discourse-graph PR: 372
File: apps/roam/src/components/DiscourseNodeMenu.tsx:116-116
Timestamp: 2025-08-25T15:53:21.799Z
Learning: In apps/roam/src/components/DiscourseNodeMenu.tsx, when handling tag insertion, multiple leading hashtags (like ##foo) should be preserved as they represent user intent, not normalized to a single hashtag. The current regex /^#/ is correct as it only removes one leading # before adding one back, maintaining any additional hashtags the user intended.

@maparent
Copy link
Collaborator Author

Copy link
Contributor

@mdroidian mdroidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work on this. I want to flag that this PR is quite large (~1,500 lines across 9 files), which makes it difficult to review safely and efficiently.

Per our Project Handbook PR guidelines (linked below), we generally aim for smaller, well-scoped PRs (~200 lines is the sweet spot; larger changes should be tightly scoped and self-contained):

https://linear.app/discourse-graphs/document/project-handbook-dd139f6e26fc#devops-aec020c3

To move this forward, could you please do one of the following:

  1. Split the PR into smaller, focused PRs (e.g., foundational changes vs. feature usage/integration), or

  2. If splitting isn’t feasible, add guidance to make review tractable, specifically:

    • Clarify which changes are required for core functionality vs. stylistic / organizational / refactor.
    • Add inline GitHub comments calling out any non-functional changes so reviewers can focus on correctness first.

If helpful, I’m also open to a brief walkthrough (written or Loom) explaining the structure of the changes and how the pieces fit together.

Let me know which approach you’d prefer.

Copy link
Collaborator Author

Hmmm… Here's a proposal: Let me break this into

1. Just refactorings, breaking up the existing code in different files

  1. JSON-LD specific changes.
    1. If needed, I could separate changes to markdown from the main jsonld.

Would that be easier to consume?

Copy link
Contributor

Yes. Please separate into the smallest, most focused tickets/PRs.
And yes, please separate the changes to markdown and add why they are needed, a link to the request, or how they relate to the jsonld ticket.
Thank you.

@maparent maparent force-pushed the eng-1032-minimal-json-ld-export-of-the-discourse-graph branch from 38e7953 to 3e400cb Compare December 24, 2025 21:52
Copy link
Collaborator Author

Ok will work on that tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

No open projects
Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants