Skip to content

Design proposal for multiple context creation & use #282

@rvagg

Description

@rvagg

Related to #275.

This is an extension of the original refactor (#135) that gave us contexts, with the intent of making our APIs support multiple SPs in single calls.

Currently we:

const context = await synapse.storage.createContext()
// and then
await synapse.storage.upload(blob, { context })

In synapse.storage (StorageManager), the upload(), download() and even terminateDataSet() (should be renamed to terminate()) functions should be extended so that their context argument can be an array of contexts. Then the operations will be performed on all contexts in ~parallel.

const context1 = await synapse.storage.createContext()
const context2 = await synapse.storage.createContext()
// and then
await synapse.storage.upload(blob, { context: [ context1, context2 ] })

Further, we should make it easy to make multiple contexts, and document this as the golden path to setting up:

const contexts = await synapse.storage.createContexts({ count: 2 }) // we can make the default `2` so the option is not even necessary
await synapse.storage.upload(blob, { contexts })

This becomes our default way of setting up and uploading. createContext() (singular) becomes an advanced operation, devs are encouraged just to createContexts() and they're up and going and get built-in duplicate operations.

There are some challenges here:

Context creation

createContexts() could have the following options object:

export interface CreateContextsOptions {
  /** Number of contexts to create (optional, defaults to 2) */
  count?: number
  /**
   * Specific provider IDs to use (if not using providerAddresses)
   * must be no longer than count (optional)
   */
  providerIds?: number[]
  /**
   * Specific provider address to use (optional)
   * If not using providerIds
   * Must be no longer than count
   */
  providerAddresses?: string[]
  /**
   * Specific data set IDs to use (optional)
   * Cannot be used with provider options
   * Must be no longer than count
   */
  dataSetIds?: number[]
  /**
   * Custom metadata for the data sets (key-value pairs)
   * When smart-selecting data sets, this metadata will be used to match.
   */
  metadata?: Record<string, string>
  /** Force creation of a new data sets, even if a candidates exist */
  forceCreateDataSets?: boolean
  /** Callbacks for creation process (will need to change to handle multiples) */
  callbacks?: StorageCreationCallbacks
  /** Maximum number of uploads to process in a single batch (default: 32, minimum: 1) */
  uploadBatchSize?: number
}
  • By default we can use the smart provider selection, but we have to be able to land on separate providers when we do multiple. This could involve extending the smart selection to exclude a list of providers so that it smart selects any but the one(s) you've listed. So, in the case of a fresh wallet with no data sets, we run it once, get a provider, run it again with an exclude for that provider and we now have two separate providers.
  • We have to support the various cases of existing data sets - there may be enough that match metadata requirements, or there may be fewer than count so we need to make more.
  • We currently have a limitation of having to pipeline data set creation due to the clientDataSetId nonce in the contract. @wjmelements is working on an EIP-3009-style non-sequential nonce solution for us to make this parallel, but for now, we can just queue them up and run them one by one.
  • Callbacks for context creation will need tweaking to handle multiples (StorageCreationCallbacks above)
  • How to handle failures - do we fall-back automatically to trying again excluding the provider that failed?

Upload

Uploads can be done in parallel without any problems, and we only have per-data set nonces so we can even submit AddRoots in parallel for different providers. But there are some nuances:

  • Callbacks may need to be adjusted to account for multiples - we could pretend and present operations as singular, but that would give users the opportunity to be more granular in their UX feedback.
  • Error conditions will need some care - we could just start off with basic failure notifications, but how to communicate partial failure and is there any remedial action we can/should take? We could at least advise the user to try again with only the context(s) that failed, but it'll be a bit knotty.
  • CommP should only be calculated once! The streaming calculation in feat(pdp): new PieceCID-last flow (internal only) #280 only needs to be done for one of the upload streams, and the resulting CommP can be used to finalize all of the uploads.

Download

This is nice because you're simply saying "try downloading from any of these providers", and that's an easy operation of just racing to find the first to return a response body and cancelling the rest. We have code that does most of this already.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    🐱 Todo

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions