Skip to content

fix: auto-reject PROGRAM messages with non-dict metadata#1137

Open
odesenfans wants to merge 1 commit into
mainfrom
fix/reject-program-invalid-metadata
Open

fix: auto-reject PROGRAM messages with non-dict metadata#1137
odesenfans wants to merge 1 commit into
mainfrom
fix/reject-program-invalid-metadata

Conversation

@odesenfans
Copy link
Copy Markdown
Collaborator

Summary

Some PROGRAM messages slipped past validation while ExecutableContent.metadata accepted lists. The current validator requires a dict, so reading those rows fails parsed_content and surfaces as 500s on GET /api/v0/messages/<hash> (ex: 42a4a8...3d96f3 returns 500, while the same hash on epyc properly reports the message as rejected).

This change:

  • Adds mark_processed_message_as_rejected in aleph.repair. It mirrors mark_pending_message_as_rejected but starts from a MessageDb row instead of a PendingMessageDb: cleans up VM rows for program/instance, upserts rejected_messages, flips message_status to REJECTED, and deletes the messages row. The trigger keeps message_counts consistent; FK cascades clean message_confirmations and account_costs.
  • Adds _reject_invalid_program_metadata and wires it into repair_node so the API rejects affected PROGRAM messages on every startup. The query uses jsonb_typeof(content->'metadata') = 'array'; an empty result is a no-op.
  • Ships deployment/scripts/reject_processed_messages.py for ad-hoc cleanups when a restart is not an option. Dry-run by default, --commit to persist; targets specific hashes via --hash / --hashes-file. Runs from inside the API container against the deployed config at /var/pyaleph/config.yml.

Test plan

  • venv/bin/python -m pytest tests/test_repair.py -v — 5 tests, all pass (rejects list metadata, preserves dict/None metadata, ignores non-program types, no-op on empty DB).
  • venv/bin/python -m pytest tests/db/test_messages.py tests/db/test_credit_balances.py — adjacent suites still pass (63 total).
  • venv/bin/ruff check + black + isort clean on changed files.
  • Manual: on a staging snapshot, confirm targeted hashes flip from PROCESSED to REJECTED and GET /messages/<hash> no longer 500s.

🤖 Generated with Claude Code

Some PROGRAM messages slipped past validation while ExecutableContent.metadata
accepted lists. The current validator requires a dict, so reading those rows
fails parsed_content and surfaces as 500s on GET /messages/<hash>. Move them
to REJECTED at startup so the API renders them like nodes that rejected them
in the first place.

The transition logic also lives behind a deployment/scripts helper for ad-hoc
cleanups when waiting for a restart is not an option.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@foxpatch-aleph foxpatch-aleph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, well-structured fix for a production bug where PROGRAM messages with list-typed metadata cause 500s via parsed_content. Implements a reusable rejection utility for processed messages, wires a repair function into startup, and ships a companion CLI script. Thorough test coverage and good code quality throughout.

src/aleph/repair.py (line 69): Consider using session.execute(delete_vm_updates(...)) instead of _ = list(...) to avoid loading results into memory and make the intent clearer. The list() is needed to force execution, but a comment explaining why would help maintainers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants