-
Notifications
You must be signed in to change notification settings - Fork 0
feat: validate lexicons #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…n ingest route - Use LexiconSchemaRecord type and type guard for incoming records - Resolve and check NSID DID authority matches record DID - Return error if record is invalid or authority does not match - Refactor to use new @atproto/lexicon and @atproto/lexicon-resolver utilities
…r NSID resolution Remove extension from @atproto/lexicon-resolver type and explicitly define LexiconSchemaRecord with required 'id' property. Update type guard and documentation to clarify purpose for DID authority resolution.
…nd zod in ingest route Parse incoming lexicon records with parseLexiconDoc, validate with zod, and improve error handling for invalid records. Store parsed LexiconDoc in the database instead of raw input.
- Replace single lexicons table with valid_lexicons and invalid_lexicons tables - Add repo_did to primary key [nsid, cid, repo_did] to track lexicon migrations - Store validation errors in invalid_lexicons for developer debugging - Use different column names (data vs raw_data) for semantic clarity - Add indexes on nsid and repo_did for both tables - Include repo_rev field to track repository state at ingestion time
- Store valid lexicons in valid_lexicons table after successful parsing - Store invalid lexicons in invalid_lexicons table with validation errors - DNS validation acts as gate before any storage (prevents DDOS) - Include repo_did and repo_rev in all stored records - Improve logging to distinguish valid vs invalid ingestion events - Remove onConflictDoUpdate logic (primary key now includes repo_did)
- Combine initial lexicons table and updated schema into one migration - Remove intermediate migration file (0001) - Clean schema shows final valid_lexicons and invalid_lexicons tables
…om Nexus and Postgres settings Provides a Docker Compose override for development, disabling lexhub in Docker, reducing resource usage for Nexus and Postgres, and enabling debug logging and SQL query logging.
…thing in ingest route
- Check for error shape (has 'issues' array) instead of instanceof check - parseLexiconDoc throws errors that may not pass instanceof z.ZodError check - Add onConflictDoNothing to invalid_lexicons insert - Ensures invalid lexicons are now properly stored for debugging
… handling in ingest route
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds comprehensive lexicon validation to the ATProto lexicon hub by implementing DNS-based DID authority verification and Zod schema validation. The changes transform the simple lexicon storage system into a robust validation pipeline that separates valid from invalid lexicons while maintaining detailed debugging information.
Key Changes
- Two-tier validation: DNS authority check followed by schema validation using
@atproto/lexiconparser - Dual table architecture: Separate
valid_lexiconsandinvalid_lexiconstables for production data and debugging - Enhanced schema: Added
repo_didandrepo_revtracking with composite primary keys to handle lexicon migrations
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/util/lexicon.ts | New type guard and interface for lexicon schema records |
| src/db/schema.ts | Complete schema refactor with separate valid/invalid lexicon tables and migration tracking |
| src/app/api/ingest/route.ts | Added DNS validation and schema parsing with error handling for invalid lexicons |
| package.json | Added @atproto dependencies for lexicon parsing and DNS resolution |
| package-lock.json | Dependency lockfile updates for new @atproto packages |
| drizzle/0000_init.sql | Squashed migration creating dual table structure with appropriate indexes |
| drizzle/meta/_journal.json | Updated migration timestamp |
| compose.override.yaml | New development configuration for local development workflow |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // DNS validation gate: Reject if DNS doesn't resolve or doesn't match the repo DID | ||
| // This helps prevents spoofing and DDOS attacks by only storing lexicons with valid DNS authority | ||
| if (did === undefined || did !== commit.did) { | ||
| return ackEvent("NSID DID authority does not match record DID"); |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error message could be more helpful for debugging. Consider providing more context such as: "NSID DID authority validation failed: expected ${did ?? 'none'}, got ${commit.did}" to help developers understand which DID mismatched.
| return ackEvent("NSID DID authority does not match record DID"); | |
| return ackEvent(`NSID DID authority validation failed: expected ${did ?? 'none'}, got ${commit.did}`); |
| let lexiconDoc: LexiconDoc; | ||
| try { | ||
| lexiconDoc = parseLexiconDoc(lexiconRecord); |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable lexiconDoc is declared with let but is only assigned once and never reassigned. Consider using const instead for better code clarity and to prevent accidental reassignment.
| let lexiconDoc: LexiconDoc; | |
| try { | |
| lexiconDoc = parseLexiconDoc(lexiconRecord); | |
| try { | |
| const lexiconDoc = parseLexiconDoc(lexiconRecord); |
| const did = await resolveLexiconDidAuthority(nsid); | ||
|
|
||
| // DNS validation gate: Reject if DNS doesn't resolve or doesn't match the repo DID | ||
| // This helps prevents spoofing and DDOS attacks by only storing lexicons with valid DNS authority |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple issues in this comment:
- Grammar: "This helps prevents" should be "This helps prevent" (no 's')
- Spelling: "DDOS" should be "DDoS" (standard capitalization for Distributed Denial of Service)
| // This helps prevents spoofing and DDOS attacks by only storing lexicons with valid DNS authority | |
| // This helps prevent spoofing and DDoS attacks by only storing lexicons with valid DNS authority |
|
|
||
| // DNS validation gate: Reject if DNS doesn't resolve or doesn't match the repo DID | ||
| // This helps prevents spoofing and DDOS attacks by only storing lexicons with valid DNS authority | ||
| if (did === undefined || did !== commit.did) { |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DNS validation check at line 105 could allow a subtle security issue. If resolveLexiconDidAuthority returns null instead of undefined, this check would pass when it shouldn't. Consider using a more explicit check: if (!did || did !== commit.did) to handle both undefined and null cases.
| if (did === undefined || did !== commit.did) { | |
| if (!did || did !== commit.did) { |
Summary
Validate lexions DID authority through a DNS check, and validate that the lexicon matches the zod schema from the official atproto sdk.
Also added a dev compose file, and squashed the db schema since there is no production data yet.