-
Notifications
You must be signed in to change notification settings - Fork 5
SRD
1. INTRODUCTION
This document describes the Scalable Reliable Datagram (SRD) Transport QP type
initially used in AWS Elastic Fabric networks.
SRD allows significant savings in the number of QPs required to establish all
to all process connectivity in large clusters, where a large number of processes
typically run on each endnode.
With the Reliable Connected (RC) Transport Service, the number of QPs and
connection contexts required per endnode to achieve full process to process
connectivity is equal to N*p*p (where N is the number of nodes in the cluster
and p the number of processes per node). As the number of processes grows
together with the number of cores per system, the number of RC QPs (and its
associated memory resources) start to become of significant impact. Within the last
five years, the maximal available processors per node in public cloud grow 3×.
Reliable Datagram (RD) model reduces the number of QPs required for full
connectivity in the scenario above to p, thus significantly improving the
scalability of the solution for large clusters of multicore endnodes. However,
RD Transport Service has a significant limitation which is the single
outstanding message supported per EE context. SRD is a new approach in the
spirit of the Reliable Datagram (RD) model which is different from RD in several
ways but first and foremost it eliminates the above limitation. With SRD there
is no limit to the number of outstanding messages on the wire. As RD, SRD QPs
provide reliable delivery. Unlike RD (and similar to UD) SRD QPs provide
out-of-order delivery without segmentation support. This allows support for
multiple outstanding messages without creating head-of-line blocking, with
decoupling of transport processing from QP buffer management, so that separate
application flows can be multiplexed without interfering with each other. Moreover,
out-of-order delivery is beneficial even on a single application flow, as it
prevents head-of-line blocking between independent messages. For example, SCSI
and NVMe allow out-of-order command execution, and thus can benefit from
out-of-order delivery from transport to ULP.
SRD does not expose EE contexts to users as RD does. Instead of EE context, each
SRD WR includes the AH (Address Handle) of the remote destination (as in UD). Each
AH is implicitly associated with an SRD context. SRD context is used to provide
reliable communication to a remote node, similar to RD EE context, but without
explicit management by a user. SRD contexts are implicitly controlled by AH and
QP management operations. If a QP is destroyed, all pending Send WRs on that QP
are implicitly canceled, and their transport processing is aborted, without
affecting SRD processing of other WRs. If an AH is destroyed, any
outstanding WRs using that AH are completed in error.
Similar to RD, SRD guarantees at-most once delivery, and indicates in Send
completions whether a WR was accepted at responder. SRD requester detects lost
packets and retransmits them, similar to RC/RD. ACKs generated by responder are
somewhat different from RC/RD ACKs, because of the shared nature of SRD contexts
(Receive WQEs could be fetched from multiple QPs, and Send WRs could be
addressed to different destinations). A request that passed SRD transport checks
but failed QP checks will be dropped and ACKed with a drop indication, so that
the requester will generate a completion with appropriate error status. I.e.,
receiver QP errors do not affect SRD handling of any other WRs (in particular on
the same SRD context or the same SQ).
2. SRD SOFTWARE TRANSPORT INTERFACE
The SRD QP type offers semantics that are similar to those of UD on both
the requester and the responder side, except reliability and ordering
guarantees. AH is provided with each Work Request as in UD, but SRD
retransmits lost packets and it can deliver packets out-of-order. Currently
only Send operation is supported, but nothing precludes RDMA operations
support in future (with weak memory consistency).
3. SRD SOFTWARE TRANSPORT VERBS
SRD does not require new verbs, but it adds new completion status error for SQ
WRs and new async event types.
3.1 TRANSPORT RESOURCE MANAGEMENT
3.1.1 QUEUE PAIR
3.1.1.1 CREATE QUEUE PAIR
A new transport service type is defined for SRD QPs. The rest of input modifiers
are the same as for UD QP type. Likewise the QP state machine is the same as for
UD QP type.
3.2 WORK REQUEST PROCESSING
3.2.1 QUEUE PAIR OPERATIONS
3.2.1.1 POST SEND REQUEST
For SRD QPs, Post Send Request is the same as for UD QPs.
3.2.1.2 POST RECEIVE REQUEST
For SRD QPs, Post Receive Request us the same as for UD QPs.
3.2.2 COMPLETION QUEUE OPERATIONS
3.2.2.1 POLL FOR COMPLETION
Completion for Send WRs posted to SRD QPs are similar to WRs posted to regular
QPs. Success is reported after the WR is acked by the responder.
In addition to local errors, new types of remote errors are returned for
requests that caused the responder to send an ack with drop indication. These
errors could have been caused when the destination QP either does not exist, or
is in error state, or does not have posted WRs. These errors do not affect SRD
context state or local QP state.
Completions for Receive WRs posted to SRD RQs are same as for WRs posted to UD
RQs.
3.3 RESULT TYPES
3.3.1 COMPLETION RETURN STATUS
The following existing error status values can be reported for SRD Send WRs:
- Success – Operation completed successfully.
- Local Length Error – the sum of the Data Segment lengths exceeds port MTU.
- Local QP Operation Error – An internal QP consistency error was detected
while processing this Work Request. - Local Protection Error – The locally posted Work Request’s Data Segment
does not reference a Memory Region that is valid for the requested operation. - Work Request Flushed Error – A Work Request was in process or outstanding
when the QP transitioned into the Error State. Another possible cause is the
requester destroyed the AH used on this WR. - Bad Response Error – An unexpected transport layer opcode was returned by the
responder. - Remote Invalid Request Error – The responder detected an invalid message on
the channel. Possible causes include the operation is not supported by this
receive queue. - Remote Operation Error – The operation could not be completed successfully
by the responder. Possible causes include a responder QP-related error that
prevented the responder from completing the request or a malformed WQE on
the Receive Queue.
In addition, new SRD-specific Work Completion errors can be reported:
- Bad Dest QP Error – the responder rejected the request because the destination
QP does not exist or is in error state. -
SRD RNR error – returned for requests rejected by the responder because
Receive Queue does not have posted WRs. Requester does not perform any
retries.
3.3.2 ASYNCHRONOUS EVENTS
3.3.2.1 UNAFFILIATED ASYNCHRONOUS EVENTS
- Remote Unresponsive Event – The local transport timeout was exceeded while
trying to send messages to a specific destination (AH). SRD context state
associated with the AH is not affected.