Skip to content

Eventstream Refactor #816

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 61 commits into
base: main
Choose a base branch
from
Open

Eventstream Refactor #816

wants to merge 61 commits into from

Conversation

bretambrose
Copy link
Contributor

@bretambrose bretambrose commented Jun 19, 2025

This PR is a substantial rewrite of the eventstream RPC bindings as well as the code-generated clients for eventstream-based services. This refactor was necessitated by a variety of deadlock and race condition problems with the original implementation. The complexity of the original implementation made targeted fixes nearly impossible to apply.

Refactor Goals

  • No blocking in destructors. In order to try and maintain behavioral compatibility with the previous implementation, we try and synchronously simulate the asynchronous events that would happen during a blocking destroy.
  • Simplified synchronization model where
    • User callbacks are never invoked inside a lock
    • C APIs are never invoked inside a lock

Public API Changes

The original implementation exposed a large amount of unnecessary details in the public API. As part of the refactor, we make a number of publicly visible changes that, while technically breaking, we believe should not be user-impacting. We consider a change to be user-impacting if it is a breaking change to a type that is used during service client interaction.

We detail each change below as well as the reasoning why we think making this change is safe. Obviously, if you were mocking out any of these changed type contracts, then they will be breaking.

  • All OperationModelContext subclasses have been made private. These types were used internally by the service model and there is no reason to expose them.
  • ContinuationCallbackData removed. Was not user-facing. Unneeded in refactor
  • ClientContinuationHandler - Public functions that were only for internal use have been removed. Class now useless but has been retained in case users were tracking operations by it.
  • ClientContinuation - Internal type that has been re-implemented as the private type ClientContinuationImpl
  • ClientOperation
    • Constructor type signature has changed - This type is only constructed internally by generated code
    • GetOperationResult API removed - This function could not be called externally without triggering exceptions by multi-consuming a promise's future
    • WithLaunchMode - This function persists but no longer does anything useful. Launch mode is no longer relevant to the processing of operations and was a mistake to include originally.
  • ClientConnection - This is an internal type used by generated service clients.
    • Constructor signature changed.
    • SendPing and SendPingResponse removed.
    • Connect and NewStream signatures changed.
    • bool operator removed

Additional Changes

We now launch an EchoTest RPC server in CI and run a much larger suite of tests against it.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Bret Ambrose added 30 commits April 11, 2025 13:55
@bretambrose bretambrose changed the title Eventstream refactor2 Eventstream Refactor Jun 19, 2025
MessageType messageType,
uint32_t messageFlags,
OnMessageFlushCallback onMessageFlushCallback) noexcept;
std::shared_ptr<ClientContinuationImpl> NewStream() noexcept;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having ClientContinuationImpl in public API doesn't seem right. The ClientConnection users won't be able to do anything meaningful with this method, unless they have access to this private class, which basically limits the set of possible users of this method to only ClientOperation.

However, I don't see a good solution right away. In essence, ClientConnection creates a new instance of ClientOperation, as it's a thin wrapper around ClientContinuationImpl, so from the technical perspective, a new method like ClientOperation createNewClientOperation(...); would make more sense. But it'll probably look ugly in the ClientOperation kids. And it looks weird from logic perspective - connection creates operation.
A better alternative might be to make NewStream() private and mark ClientOperation as a friend of ClientConnection.

m_valueByteBuf =
Crt::ByteBufNewCopy(lhs.m_allocator, lhs.m_valueByteBuf.buffer, lhs.m_valueByteBuf.len);
m_underlyingHandle = lhs.m_underlyingHandle;
m_underlyingHandle.header_value.variable_len_val = m_valueByteBuf.buffer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we check for AWS_EVENT_STREAM_HEADER_STRING and AWS_EVENT_STREAM_HEADER_BYTE_BUF for this assignment to make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it changes anything to unconditionally assign them.

Comment on lines +2284 to +2288
// Horribly awkward cast due to the infuriating original API design
Crt::Allocator *allocator = m_allocator;
auto errorResponse = Crt::ScopedResource<OperationError>(
static_cast<OperationError *>(result.m_message.value().m_shape.release()),
[allocator](OperationError *shape) { Crt::Delete(shape, allocator); });
Copy link
Contributor

@sfod sfod Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cast from ScopedResource<Base> to ScopedResource<Derived> indeed looks awkward. Similar cast happens in couple other places in this source. And in lot of places in the generated code.

We can define CastToBase and CastToDerived for ScopedResource in aws-crt-cpp like this:

template <
    typename Derived,
    typename Base,
    typename std::enable_if<std::is_base_of<Base, Derived>::value, bool>::type = true>
ScopedResource<Base> CastToBase(ScopedResource<Derived> derived)
{
    const auto &deleter = derived.get_deleter();
    return ScopedResource<Base>(
        derived.release(), [deleter](Base *base) { deleter(static_cast<Derived *>(base)); });
}

template <
    typename Base,
    typename Derived,
    typename std::enable_if<std::is_base_of<Base, Derived>::value, bool>::type = true>
ScopedResource<Derived> CastToDerived(ScopedResource<Base> base)
{
    return ScopedResource<Derived>(static_cast<Derived *>(base.release()), base.get_deleter());
}

// plus specializations for std::is_base_of<Base, Derived> == false, to print human-readable errors
// maybe also add second parameter for a custom deleter

and then these awkward blocks will transform to something kinda still verbose, but more readable:

auto errorResponse = Crt::CastToDerived<AbstractShapeBase, OperationError>(
    std::move(result.m_message->m_shape));

and

auto errorShape = m_operationModelContext->AllocateOperationErrorFromPayload(
    modelName, payloadStringView, m_allocator);
auto shape = Crt::CastToBase<OperationError, AbstractShapeBase>(std::move(errorShape));
result.m_message = MessageDeserialization{
    EventStreamMessageRoutingType::Error,
    std::move(shape)
};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: I'll implement these functions if you're not against the idea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants