Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transport protocol update take 2 #333

Conversation

hannahhoward
Copy link
Collaborator

@hannahhoward hannahhoward commented Jun 3, 2022

What

This is an addition to #331, aiming to set us up for multiple transport protocols AS WELL AS protocol detection & transport detection.

How

#331 introduced a wrapped message format containing the transport information, which enables multiple transports to share the libp2p protocol for sending messages.

However, it does not actually help us do protocol negotiation with multiple transports, or transport versioning

Specifically:

  • I want to know what potential data transfer message formats another data transfer node supports (for an evolving message format)
  • I want to know whether a data transfer node can perform a transfer over a specific transport
  • I want to known what potential versions of a well known transport another node supports (for an evolving transport)
  • I want to connect to another node over a given transport with the latest possible message formats I support and latest transport versions I support, but fall back to those I support for compatibility

Here's what I implemented here:

Introduced a transport version

What is a transport version? A transport version indicates a way of using a transport protocol to facilitate data transfer. For example, in go-graphsync we've ended up switching graphsync extensions over time to support request resuming. At one time, we supported Do-Not-Send-Cids, then we switched to using Do-Not-Send-First-Blocks. It was actually this change that led to the unusual introduction of go-data-transfer 1.2.0 libp2p protocol even though there was no change in the message format (hence the confusingly name "message1_1prime" folder -- our messages haven't changed since 1.1.0, but we upped the libp2p protocol cause how the graphsync transport changed). A transport version doesn't say anything about the version of messages encoded within, but if there are specific mechanics to how a transport works that change, this allows us to version it independently.

While this might seem like another layer of versioning on top of already complex versioning scheme, it actually makes sense so that it's possible to produce clear specifications for how data transfer and a transport protocol interact to a potential implementer.

So "data transfer graphsync transport 1.0.0" (starting at 1.0.0 as the only supported version):

  • Use Do-Not-Send-First-Blocks for resumes
  • Use fil/data-transfer/incoming-request/1.1, fil/data-transfer/outgoing-block/1.1, and fil/data-transfer/1.1 extensions to encode messages (why these terrible names? oy cause awful haphazardly evolving design with no intention behind it)

Handle transport support and versioning negotiation by adding them to the protocol string for Libp2p

This is potentially controversial, as it technically means each transport gets its own libp2p stream per version, which implies a fairly rapid enumeration of protocols. In practice it seems to work pretty well given a very clear naming scheme /data-transfer/~message version~/~transport-name~/~transport-version~. This enables us to support an evolution of the Protocolmethod on the network interface that tells me:Given a peer and a transport, tell me if they support that transport, what their best message version is, and what their best transport version is"

The alternative here is just to make a seperate transport-query protocol. I'm torn on this. If we do that, we always need a message round trip between peers to tell us what is supported, which seems poorer for performance (in the other scheme, the peer store conveniently stores the important information about connection after first connection).

Keep Wrapping Messages Including Adding Transport Versions

While encoding transport information in the protocol name actually means we don't necessarily need wrapped messages, the cost to do so is minimal. Moreover, it has good usages for other potential transports that may want to implement how they operate without the libp2p protocol negotiation offered.

move the network to a subdirectory, also cleanup usage of all selector, move gstestdata to the itest
directory
introduces a new wrapped protocol that encodes the transport id so that we can support multiple
transports using the network protocol
@hannahhoward hannahhoward requested review from rvagg and aschmahmann June 3, 2022 23:45
Support multiple transports on the libp2p protocol, via different protocol naming, and using libp2p
to do protocol negotiation
@hannahhoward hannahhoward force-pushed the feat/refactor-transport-protocol-update-part-2 branch from b82ce2e to f4b0b9b Compare June 3, 2022 23:51
messageVersion, _ := MessageVersion(dtProtocol)
if isLegacyProtocol(dtProtocol) {

if transportID == datatransfer.LegacyTransportID {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no check here for if LegacyTransportVersion is in the list of versions, so does that mean that any registration of "graphsync" will set up a legacy protocol regardless of what versions are provided? so there's no way to use this with "graphsync" without also wiring up for legacy? Probably not an immediate concern but we'll presumably want to phase that out at some point in the future.

Copy link
Member

@rvagg rvagg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me, thanks for the detailed explanation. On your last point, of doubling up on the transport and versioning information being represented in messages and protocol negotiation - it's not entirely free, but can be minimal cost. But can you expand on this?

Moreover, it has good usages for other potential transports that may want to implement how they operate without the libp2p protocol negotiation offered.

Can you enumerate some usages that we may end up with, do you see any on the immediate horizon or are you just anticipating hypotheticals?

Regarding the message format, I wouldn't mind trimming it down a bit, currently it'll be like:

{
  "ID": "graphsync",
  "TV": "1.2.0",
  "Msg": { ... }
}

Perhaps we should go for the single-key union map style again for this, make it:

{
  "graphsync/1.2.0": { ... }
}

I mentioned this in multiformats/js-multiformats#161 too, which just has the version though. We could use union with a representation keyed to do this, so we get to enforce a single-key union with precise matches of transports and versions we support at the message decoding layer. We'd then be pushing the verbosity up into the Go type layer, so we'd have things like type Graphsync120Message struct { Message } (which we could make an interface for with Transport() and Version() methods that give us that information).

@hannahhoward
Copy link
Collaborator Author

@rvagg

Can you enumerate some usages that we may end up with, do you see any on the immediate horizon or are you just anticipating hypotheticals?

Honestly, in the back of my mind is always HTTP and whether we could make data transfer work with it.

@hannahhoward
Copy link
Collaborator Author

@rvagg I'm open to the keyed union approach, but can we check in about how bindnode works with keyed unions? If I'm reading the code right, if we have a schema like this:

type WrappedMessage union {
  | Bitswap100Message "bitswap/1.0.0"
  | Graphsync120Message "graphsync/1.2.0"
} representation keyed

what we end up in go with is something like this no:

type Bitswap100Message struct { Message }
type Graphsync120Message struct { Message }
type WrappedMessage struct {
    Graphsync120Message *Graphsync120Message
    Bitswap100Message *Bitswap100Message
 }

That seems mildly unwieldy, though perhaps we can just thank go's type system for not supporting unions well.

Anyway let's talk more about that.

@rvagg
Copy link
Member

rvagg commented Jun 9, 2022

That struct might be unwieldily as it is but I don't think it's hard to fix if we have strong enough assumptions about it being a single-property union:

type Bitswap100Message struct { Message }
type Graphsync120Message struct { Message }
type WrappedMessage struct {
    Graphsync120Message *Graphsync120Message
    Bitswap100Message *Bitswap100Message
 }

func (wm *WrappedMessage) Message() Message {
  if wm.Graphsync120Message != nil {
    return wm.Graphsync120Message.Message
  }
  return wm.Bitswap100Message.Message
}

func (wm *WrappedMessage) ProtocolName() string {
  if wm.Graphsync120Message != nil {
    return "graphsync"
  }
  return "bitswap"
}

func (wm *WrappedMessage) ProtocolVersion() string {
  // same thing
}

Possibly some panics inserted where the assumptions fail would be appropriate too.

Then we just evolve the parent through its methods as we add protocols and have to deal with compatibility concerns.

@hannahhoward hannahhoward merged commit 6b581f2 into feat/refactor-transport-part-1 Jun 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants