For a high-level overview and introduction to the project, please see the README.
This document describes the architecture of the nft_ingester
project, outlining its components, their responsibilities, data flow, and database interactions.
The system is designed with the following principles in mind:
- Scalability: Components are designed to handle large volumes of data and high transaction throughput.
- Maintainability: Clear separation of concerns between components, well-defined interfaces, and extensive use of logging and metrics.
- Data Consistency: Robust mechanisms for ensuring data integrity, including validation against finalized data sources and gap filling.
- Performance: Optimized for both write-heavy ingestion and read-heavy API access, leveraging both RocksDB and PostgreSQL.
- Extensibility: Modular design allows for future additions and modifications, such as the potential separation of metadata into its own database.
The system is composed of several specialized binaries, each performing a specific role in processing and managing NFT data from the Solana blockchain. The system is heavily asynchronous, leveraging tokio
for concurrency.
-
ingester
(nft_ingester/src/bin/ingester/main.rs
):- Responsibilities:
- The core of the system, responsible for real-time ingestion of NFT data from the Solana blockchain.
- Processes account updates and token transfers, handling both confirmed and finalized data.
- Writes data to the main RocksDB database.
- Integrates multiple sub-components:
- API server (similar to the standalone
api
binary, but integrated for performance). - Backfill (for historical data).
- Gap filler (to handle missing data).
- Sequence consistent checker (ensures data integrity).
- Redis workers (for asynchronous processing of account and transaction data).
- API server (similar to the standalone
- Manages connections to Redis, RocksDB, and PostgreSQL.
- Handles graceful shutdown and metrics reporting.
- Performs signature fetching for finalized data validation (critical for Bubblegum tree consistency).
- Key Dependencies: RocksDB, PostgreSQL, Redis.
- Responsibilities:
-
synchronizer
(nft_ingester/src/bin/synchronizer/main.rs
):- Responsibilities:
- Ensures data consistency between RocksDB (the primary data store) and PostgreSQL (the indexed, queryable store).
- Performs a full synchronization of data, handling different asset types.
- Uses a configurable batch size and parallel tasks for efficient data transfer.
- Key Dependencies: RocksDB, PostgreSQL.
- Responsibilities:
-
slot_persister
(nft_ingester/src/bin/slot_persister/main.rs
):- Responsibilities:
- Fetches and persists finalized slot data to the slots RocksDB database.
- Provides a reliable source of finalized data for backfilling and validation.
- Can fetch data from either BigTable (optional) or an RPC source.
- Handles retries and error handling, ensuring the slots data is consistent and sequential - that is the slot number is always increasing and never skipped (although it still may miss some slots due to the eventual consistency of bigtable).
- Key Dependencies: RocksDB, BigTable (optional), Solana RPC.
- Responsibilities:
-
backfill
(nft_ingester/src/bin/backfill/main.rs
):- Responsibilities:
- Handles backfilling of historical blockchain data from the slots database, ensuring complete data coverage.
- Fetches and processes historical transaction data from the Slots RocksDB.
- Parses transaction data to extract Bubblegum NFT-related operations.
- Uses
DirectBlockParser
andBubblegumTxProcessor
for specific transaction types. - Writes processed data to the main RocksDB database.
- Provides progress tracking.
- Key Dependencies: RocksDB, Solana RPC.
- Responsibilities:
-
api
(nft_ingester/src/bin/api/main.rs
):- Responsibilities:
- Provides RPC API for querying NFT data.
- Queries both RocksDB and PostgreSQL databases.
- Optionally performs checks for gaps in Merkle tree sequences.
- Uses a
JsonWorker
to handle JSON metadata fetching and refreshing.
- Current Status: While functional, it's less actively used than the integrated API within the
ingester
due to performance considerations with secondary RocksDB instances. Future optimization is planned. - Key Dependencies: RocksDB, PostgreSQL, Redis.
- Responsibilities:
-
explorer
(nft_ingester/src/bin/explorer/main.rs
):- Responsibilities:
- Provides a RESTful service for exploring the RocksDB data directly.
- Allows iteration over keys and retrieval of values within specific RocksDB column families.
- Useful for debugging and data inspection.
- Key Dependencies:
rocksdb::DB
,axum
,metrics_utils
. - Key Dependencies: RocksDB.
- Responsibilities:
-
migrator
(nft_ingester/src/bin/migrator/main.rs
):- Responsibilities:
- Handles JSON metadata keys migration from RocksDB to PostgreSQL.
- Supports "full" (migrate JSONs and set tasks) and "jsons only" modes.
- Is a one-time tool to migrate the data from RocksDB to PostgreSQL.
- Key Dependencies: RocksDB, PostgreSQL.
- Responsibilities:
-
rocksdb_backup
(nft_ingester/src/bin/rocksdb_backup/main.rs
):- Responsibilities:
- Creates backups of the RocksDB database.
- Configurable backup and archive directories.
- Option to flush RocksDB before backup.
- Key Dependencies: RocksDB.
- Responsibilities:
-
raw_backup
(nft_ingester/src/bin/raw_backup/main.rs
):- Responsibilities:
- (Deprecated) Creates a raw backup of the RocksDB
OffChainData
column family.
- (Deprecated) Creates a raw backup of the RocksDB
- Key Dependencies: RocksDB.
- Responsibilities:
-
slot_checker
(nft_ingester/src/bin/slot_checker/main.rs
): * Responsibilities:- Verifies the consistency of slot data between RocksDB and BigTable (if used) or RPC source.
- Identifies missing slots. * Key Dependencies: RocksDB, BigTable (optional), Solana RPC.
-
synchronizer_utils
(nft_ingester/src/bin/synchronizer_utils/main.rs
): * Responsibilities:- Provides utilities for working with the synchronizer, allowing querying of asset indexes.
- Useful for debugging and data inspection. * Key Dependencies: RocksDB, PostgreSQL.
-
dumper
(nft_ingester/src/bin/dumper/main.rs
):- Responsibilities:
- Exports data from RocksDB to CSV files in the same manner as the synchronizer will do for a full sync.
- Iterates through RocksDB data (creators, assets, authorities, etc.).
- Writes the data to separate CSV files, potentially sharding the output for large datasets.
- Dumps last known keys for tracking progress.
- Key Dependencies: RocksDB, PostgreSQL, Redis.
- Responsibilities:
-
Finalized Data Sources:
- Slots Database (RocksDB): Contains only finalized slot data. Used for backfilling and data validation.
- Data Origin: The Slots DB can be populated from either BigTable or a Solana RPC node.
- Signature Fetching (Integrated in
ingester
): Fetches and validates transaction signatures, crucial for Bubblegum tree consistency and resolving fork-related issues. Signatures can be fetched from either BigTable or a Solana RPC node.
-
Non-Finalized Data Source:
- Redis: Receives real-time account and transaction updates from the Geyser plugin via a Redis queue.
This section describes the main data flow and processing steps within the system.
flowchart TB
subgraph Data Sources
Geyser[Geyser Plugin] -->|"Real-time\nTransactions"| Redis
BigTable[(BigTable)] -->|"Historical\nTransactions"| Ingester
SolanaRPC[Solana RPC] -->|"Historical\nTransactions"| Ingester
SlotsDB[(Slots RocksDB)] -->|"Finalized\nSlots"| Backfill
end
subgraph Ingestion and Processing
Redis([Redis]) -->|"Real-time\nTransactions"| Ingester
Ingester -->|"Schedule Metadata\nDownload"| DownloadQueue[(PostgreSQL\nDownload Queue)]
Ingester -->|"Store Asset\nData"| MainRocksDB[(Main RocksDB)]
Backfill -->|"Process Historical\nData"| Ingester
JsonWorker[JSON Worker] -->|"Fetch Metadata\nTask"| DownloadQueue
JsonWorker -->|"Download\nMetadata"| MetadataSource[External Metadata Source]
JsonWorker -->|"Save Metadata"| MainRocksDB
end
subgraph Synchronization and API
MainRocksDB -->|"Synchronize"| Synchronizer
Synchronizer -->|"Update Index"| IndexDB[(PostgreSQL Index)]
APIServer[API Server] -->|"Query Data"| MainRocksDB
APIServer -->|"Query Index"| IndexDB
Clients <--->|"API Requests"| APIServer
end
subgraph OptionalComponents
SlotChecker[Slot Checker] -->|"Check Slot\nConsistency"| SlotsDB
SlotChecker -.-> BigTable
end
Data Preparation:
- Real-time Data: The
ingester
receives real-time transaction data from the Geyser plugin via Redis. It also has the capability to receive data from Solana RPC nodes and BigTable. - Historical Data: The
backfill
component fetches historical transaction data from the Slots RocksDB, processing it through theingester
. - Filtering and Scheduling: The
ingester
filters transactions, focusing on NFT-related records. It schedules metadata download tasks in the PostgreSQL download queue. - Metadata Download: The
JsonWorker
(used byingester
andapi
) picks up tasks from the PostgreSQL queue, fetches metadata from external sources, and saves it to the main RocksDB. - Slot Data: The
slot_persister
fetches finalized slot data and stores it in the Slots RocksDB. Theslot_checker
can optionally verify slot consistency between the Slots RocksDB and BigTable.
Data Synchronization and Access:
- Synchronization: The
synchronizer
continuously updates the PostgreSQL index with data from the main RocksDB, enabling efficient searching. - API Access: Clients interact with the system through the API server (integrated within the
ingester
), which queries both RocksDB and PostgreSQL.
- Location: Configured via
ROCKS_DB_PATH
environment variable. - Purpose: Fast, local storage for NFT data. Serves as the primary data source for many operations.
- Data Stored:
AssetCompleteDetails
: Core NFT data (ownership, metadata URIs, etc.).OffChainData
: Downloaded JSON metadata.ClItems
: Data related to compressed NFTs and Merkle trees.TokenAccount
: Data related to token accounts.- Various indexing information.
- Column Families: Uses various column families (e.g.,
asset
,offchain_data
,cl_items
,parameters
, etc.). - Access Pattern: Write-heavy from
ingester
andbackfill
. Read-heavy fromingester
(integrated API),api
,explorer
, andsynchronizer
.
- Location: Configured via
ROCKS_SLOTS_DB_PATH
environment variable. - Purpose: Stores raw, finalized block data. Used for backfilling and data validation.
- Data Stored:
RawBlock
data. - Column Families: Uses the
RawBlock::NAME
column family. - Access Pattern: Write-heavy from
slot_persister
. Read-heavy fromingester
,backfill
andslot_checker
.
- Purpose: Provides a relational database for indexed, queryable NFT data. Used for complex queries and API access. Also serves as a task queue for metadata downloads.
- Data Stored: Mirrors much of the data in RocksDB, but in a structured, indexed format. Includes asset information, ownership, etc. Also stores metadata download tasks.
- Migrations: Uses the
PG_MIGRATIONS_PATH
to manage database schema changes. - Access Pattern: Write-heavy from
synchronizer
and (minorly)ingester
(for task queue). Read-heavy fromingester
(integrated API) andapi
.
- Purpose: Used as a message queue for receiving real-time updates from the Geyser plugin and for various asynchronous tasks, such as processing account updates from the snapshot parsing.
- Data Stored: Messages representing token account updates and bubblegum transactions.
- Access Pattern: Used by the
ingester
for receiving real-time and snapshot data.
- Purpose: Can be used as an alternative source for fetching slot data.
- Data Stored: Raw block data.
- Access Pattern: Read-only, used by
slot_checker
,slot_persister
, and potentiallybackfill
.
The current architecture supports the potential separation of metadata into a dedicated database (e.g., another RocksDB database or a document store like MongoDB, or a private S3-compatible object storage).
- Benefits:
- Improved query performance for metadata-specific operations.
- Reduced RocksDB size and improved performance.
- Independent scaling of the metadata store.
- Implementation Considerations:
- Would require a synchronization mechanism between the main data store and the metadata store.
- Need for consistency guarantees.
- Potential for a cache layer for frequently accessed metadata.
Future improvements are planned to enable more efficient use of the standalone api
binary, addressing performance issues related to secondary RocksDB instances. This might involve:
- Optimizing request timing consistency.
- Implementing a caching layer.
- Potentially consolidating API functionality within the
ingester
.
The system uses several mounted volumes for different purposes:
- Main RocksDB data store
- Slots database (finalized slots only)
- PostgreSQL data
- Backup directories
- File storage for additional assets
- Heap dumps for debugging
- Migration files
Each component has specific access patterns (read-only, read-write) to these volumes to maintain data integrity and performance.