Skip to content

Conversation

@caidanw
Copy link
Contributor

@caidanw caidanw commented Dec 2, 2025

Summary

Validate lexions DID authority through a DNS check, and validate that the lexicon matches the zod schema from the official atproto sdk.

Also added a dev compose file, and squashed the db schema since there is no production data yet.

…n ingest route

- Use LexiconSchemaRecord type and type guard for incoming records
- Resolve and check NSID DID authority matches record DID
- Return error if record is invalid or authority does not match
- Refactor to use new @atproto/lexicon and @atproto/lexicon-resolver utilities
…r NSID resolution

Remove extension from @atproto/lexicon-resolver type and explicitly define LexiconSchemaRecord with required 'id' property. Update type guard and documentation to clarify purpose for DID authority resolution.
…nd zod in ingest route

Parse incoming lexicon records with parseLexiconDoc, validate with zod, and improve error handling for invalid records. Store parsed LexiconDoc in the database instead of raw input.
- Replace single lexicons table with valid_lexicons and invalid_lexicons tables
- Add repo_did to primary key [nsid, cid, repo_did] to track lexicon migrations
- Store validation errors in invalid_lexicons for developer debugging
- Use different column names (data vs raw_data) for semantic clarity
- Add indexes on nsid and repo_did for both tables
- Include repo_rev field to track repository state at ingestion time
- Store valid lexicons in valid_lexicons table after successful parsing
- Store invalid lexicons in invalid_lexicons table with validation errors
- DNS validation acts as gate before any storage (prevents DDOS)
- Include repo_did and repo_rev in all stored records
- Improve logging to distinguish valid vs invalid ingestion events
- Remove onConflictDoUpdate logic (primary key now includes repo_did)
- Combine initial lexicons table and updated schema into one migration
- Remove intermediate migration file (0001)
- Clean schema shows final valid_lexicons and invalid_lexicons tables
…om Nexus and Postgres settings

Provides a Docker Compose override for development, disabling lexhub in Docker, reducing resource usage for Nexus and Postgres, and enabling debug logging and SQL query logging.
- Check for error shape (has 'issues' array) instead of instanceof check
- parseLexiconDoc throws errors that may not pass instanceof z.ZodError check
- Add onConflictDoNothing to invalid_lexicons insert
- Ensures invalid lexicons are now properly stored for debugging
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive lexicon validation to the ATProto lexicon hub by implementing DNS-based DID authority verification and Zod schema validation. The changes transform the simple lexicon storage system into a robust validation pipeline that separates valid from invalid lexicons while maintaining detailed debugging information.

Key Changes

  • Two-tier validation: DNS authority check followed by schema validation using @atproto/lexicon parser
  • Dual table architecture: Separate valid_lexicons and invalid_lexicons tables for production data and debugging
  • Enhanced schema: Added repo_did and repo_rev tracking with composite primary keys to handle lexicon migrations

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/util/lexicon.ts New type guard and interface for lexicon schema records
src/db/schema.ts Complete schema refactor with separate valid/invalid lexicon tables and migration tracking
src/app/api/ingest/route.ts Added DNS validation and schema parsing with error handling for invalid lexicons
package.json Added @atproto dependencies for lexicon parsing and DNS resolution
package-lock.json Dependency lockfile updates for new @atproto packages
drizzle/0000_init.sql Squashed migration creating dual table structure with appropriate indexes
drizzle/meta/_journal.json Updated migration timestamp
compose.override.yaml New development configuration for local development workflow

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// DNS validation gate: Reject if DNS doesn't resolve or doesn't match the repo DID
// This helps prevents spoofing and DDOS attacks by only storing lexicons with valid DNS authority
if (did === undefined || did !== commit.did) {
return ackEvent("NSID DID authority does not match record DID");
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error message could be more helpful for debugging. Consider providing more context such as: "NSID DID authority validation failed: expected ${did ?? 'none'}, got ${commit.did}" to help developers understand which DID mismatched.

Suggested change
return ackEvent("NSID DID authority does not match record DID");
return ackEvent(`NSID DID authority validation failed: expected ${did ?? 'none'}, got ${commit.did}`);

Copilot uses AI. Check for mistakes.
Comment on lines +110 to +112
let lexiconDoc: LexiconDoc;
try {
lexiconDoc = parseLexiconDoc(lexiconRecord);
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable lexiconDoc is declared with let but is only assigned once and never reassigned. Consider using const instead for better code clarity and to prevent accidental reassignment.

Suggested change
let lexiconDoc: LexiconDoc;
try {
lexiconDoc = parseLexiconDoc(lexiconRecord);
try {
const lexiconDoc = parseLexiconDoc(lexiconRecord);

Copilot uses AI. Check for mistakes.
const did = await resolveLexiconDidAuthority(nsid);

// DNS validation gate: Reject if DNS doesn't resolve or doesn't match the repo DID
// This helps prevents spoofing and DDOS attacks by only storing lexicons with valid DNS authority
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple issues in this comment:

  1. Grammar: "This helps prevents" should be "This helps prevent" (no 's')
  2. Spelling: "DDOS" should be "DDoS" (standard capitalization for Distributed Denial of Service)
Suggested change
// This helps prevents spoofing and DDOS attacks by only storing lexicons with valid DNS authority
// This helps prevent spoofing and DDoS attacks by only storing lexicons with valid DNS authority

Copilot uses AI. Check for mistakes.

// DNS validation gate: Reject if DNS doesn't resolve or doesn't match the repo DID
// This helps prevents spoofing and DDOS attacks by only storing lexicons with valid DNS authority
if (did === undefined || did !== commit.did) {
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DNS validation check at line 105 could allow a subtle security issue. If resolveLexiconDidAuthority returns null instead of undefined, this check would pass when it shouldn't. Consider using a more explicit check: if (!did || did !== commit.did) to handle both undefined and null cases.

Suggested change
if (did === undefined || did !== commit.did) {
if (!did || did !== commit.did) {

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants