Skip to content

local echo (5/n): Create outbox messages on send #1472

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

PIG208
Copy link
Member

@PIG208 PIG208 commented Apr 10, 2025

This is stacked atop #1463.

No UI/user-facing change in this PR.

@PIG208 PIG208 force-pushed the pr-echo-5 branch 15 times, most recently from eeb6ef2 to ac35860 Compare April 16, 2025 01:35
@PIG208 PIG208 requested a review from chrisbobbe April 16, 2025 01:35
@PIG208 PIG208 added the maintainer review PR ready for review by Zulip maintainers label Apr 16, 2025
Copy link
Collaborator

@chrisbobbe chrisbobbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Comments below.

Comment on lines 557 to 561
/// Always equal to `connection.zulipFeatureLevel`
/// and `account.zulipFeatureLevel`.
int get zulipFeatureLevel => connection.zulipFeatureLevel!;

String get zulipVersion => account.zulipVersion;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about also moving zulipVersion along with zulipFeatureLevel, so they stay together?

@@ -626,6 +627,17 @@ class MessageListView with ChangeNotifier, _MessageSequence {
}
}

void handleOutboxMessage(OutboxMessage outboxMessage) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a name like _addOutboxMessage? There are multiple tasks that could accurately be described as a message list "handling" an outbox message.

/// Remove the [outboxMessage] from the view.
///
/// This is a no-op if the message is not found.
void removeOutboxMessageIfExists(OutboxMessage outboxMessage) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just removeOutboxMessage? I think we can leave the "if exists" part as implied.

/// ```
///
/// During its lifecycle, it is guaranteed that the outbox message is deleted
/// as soon an message event with a matching [MessageEvent.localMessageId]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "a message event"

Comment on lines 21 to 36
/// ```
/// ┌─────────────────────────────────────┐
/// │ Event received, │
/// Send │ or we abandoned │
/// immediately. │ 200. the queue. ▼
/// (create) ──────────────► sending ────────► sent ──────────────► (delete)
/// │4xx, │ ▲
/// │other error, │Reached User │
/// │or reached │time limit. cancels.│
/// │time limit. ▼ │
/// └───────────► failed ─────────────────┘
/// ```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we leave out the "reached time limit" parts in this first version? The "failed" state doesn't feel accurate for when that happens (especially when coming there from "sent"), and it's not part of the spec for #133, the more complicated outbox design, so we'd have to remove it when implementing that. It's also not necessary for the main idea of #1441:

The point is to better handle the case where you type out a message and hit send, and it fails because you don't have working network at that moment.

It looks like "reached time limit" comes from a parenthetical in the #1441 spec, with "perhaps":

Then if a message fails to send, we show on the local-echo placeholder an option that lets you recover it. (Similarly perhaps if it's been a while, like 10s, since trying to send and the request hasn't completed one way or another.)

If we want to keep "reached time limit" in this PR, how about adding a new node for it in the diagram, separate from "failed"? Then I think it'll be easier to reason clearly about some things later: what should the UI say for this state (not "failed" because we haven't been told that it failed); what should happen if the send-message event arrives when in this state.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the "User cancels" label comes from #133. In that context, it means the user decided not to press a "Retry" button to retry the message-send request, and they want us to just forget about the message-send attempts and their failures.

That label doesn't feel accurate in this context, where a retry button isn't part of the picture. I think the word from the #1441 spec is "recover"; how about we say "User recovers the draft" or similar:

Then if a message fails to send, we show on the local-echo placeholder an option that lets you recover it. […]

You might want to retry sending, or just copy the text to save elsewhere. To cover both options, it can take the text and just put it back in the compose box. […] The placeholder in the message list then disappears.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the "sent" state needed? What would happen if we just ignored a 200 response and removed "sent" from the diagram, and didn't consider the message to be sent until we got its event?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the "Send immediately" label is implied and can be removed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combining all those points, what do you think of this as an updated diagram:

///                  We abandoned the queue.
///                 ┌──────────────────────────────────────┐
///                 │                                      │
///                 │    Event received.                   ▼
/// (create) ─► sending ──────────────────────────────► (delete)
///                 │                                      ▲
///                 │ 4xx or other           User restores │
///                 │ error.                 the draft.    │
///                 └──────────────► failed ───────────────┘

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding time limit, started a discussion here: #mobile > handle failed send @ 💬

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the diagram update, started a discussion here: #mobile > #F1441 Handle retry state machine @ 💬

Comment on lines +169 to +177
check(connection.lastRequest).isA<http.Request>()
..bodyFields['queue_id'].equals(store.queueId)
..bodyFields['local_id'].equals('${outboxMessage.localMessageId}');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this connection.lastRequest needed, and if so, should the other similar tests have a check like it too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could move these checks to the helper, but I moved them from there to this test in a previous revision since just checking it once seems fine to me; this de-duplicates some helper code and most tests focus on other things.

..hidden.isTrue();
}));

test('while message is being sent, message event arrives, then the send fails', () => awaitFakeAsync((async) async {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps worth a comment (if there isn't one in the implementation) that this can actually happen: the message-send can succeed, but then the message-send request has a network issue that doesn't affect the event-poll request.


// Handle the event after the message is sent but before the debounce
// timeout. The outbox message should remain hidden since the send
// request was sucessful.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "successful"

Comment on lines 233 to 236
check(store.outboxMessages).isEmpty();
check(outboxMessage)
..state.equals(OutboxMessageLifecycle.sent)
..hidden.isTrue();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of these tests have checks on outboxMessage after checking that store.outboxMessages is empty. Do they all need checks like that? Anyway, this test and later ones do it without a comment explaining why:

// […] The outbox message should no
// longer get updated because it is not in the store any more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second sentence of the comment before store.handleEvent is meant to address this check:

      // Handle the event after the message is sent but before the debounce
      // timeout.  The outbox message should remain hidden since the send
      // request was successful.

Perhaps it will be clearer to move this right before the check.

..hidden.isFalse();
});

test('send request pending until after kSendMessageTimeLimit, completes successfully, then message event arrives', () => awaitFakeAsync((async) async {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I skipped reading the tests with kSendMessageTimeLimit in their names, pending an earlier comment that might lead to removing them)

@PIG208
Copy link
Member Author

PIG208 commented Apr 16, 2025

Updated the PR! Thanks for the review. There are some pending questions mainly with regards to the state diagram and the send time limit. I introduced the new waitPeriodExpired state in this revision, with some changes to the states.

@PIG208 PIG208 removed the maintainer review PR ready for review by Zulip maintainers label Apr 17, 2025
@PIG208
Copy link
Member Author

PIG208 commented Apr 17, 2025

Will be working on a new revision to reorganize some of the implementation code with further state machine changes.

@PIG208
Copy link
Member Author

PIG208 commented Apr 17, 2025

The PR has been updated to implement the state diagram discussed in chat: #mobile > #F1441 Handle retry state machine @ 💬

@PIG208 PIG208 force-pushed the pr-echo-5 branch 2 times, most recently from b96c5e9 to 13edb83 Compare April 17, 2025 22:56
@PIG208 PIG208 added the maintainer review PR ready for review by Zulip maintainers label Apr 18, 2025
@PIG208 PIG208 requested a review from chrisbobbe April 18, 2025 00:44
@PIG208
Copy link
Member Author

PIG208 commented Apr 22, 2025

(pushed another update to make timestamps used in sendMessage testable)

@PIG208 PIG208 removed the maintainer review PR ready for review by Zulip maintainers label Apr 22, 2025
@PIG208 PIG208 added the maintainer review PR ready for review by Zulip maintainers label Apr 22, 2025
@PIG208 PIG208 force-pushed the pr-echo-5 branch 3 times, most recently from 975bc28 to e6c9192 Compare April 23, 2025 22:39
PIG208 added 5 commits April 25, 2025 19:40
The point of this helper is to replicate what a topic sent from the
client will become, after being processed by the server.

This important when trying to create a local copy of a stream message,
whose topic can get translated when it's delivered by the server.
This will be the same as `DateTime.timestamp()` in live code (therefore
the NFC). For testing, utcNow uses a clock instance that can be controlled
by FakeAsync.

We could have made call sites of `DateTime.now()` use it too, but
those for now don't need it for testing.
While we do create outbox messages, there are in no way user-visible
changes since the outbox messages don't end up in message list views.
We create skeletons for helpers needed from message list view, but don't
implement them yet, to make the diff smaller.

For testing, similar to TypingNotifier.debugEnable, we add
MessageStoreImpl.debugOutboxEnable for tests that do not intend to
cover outbox messages.

Some of the delays to fake responses added in tests are not necessary
because the future of sendMessage is not completed immediately, but we
still add them to keep the tests realistic.
Copy link
Collaborator

@chrisbobbe chrisbobbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Here's a review of the first four commits:

0035700 api [nfc]: Add TopicName.interpretAsServer
d7cb9ec store [nfc]: Move zulip{FeatureLevel,Version} to PerAccountStoreBase
d09a52d test [nfc]: Generate timestamps
44af0d3 binding [nfc]: Add utcNow

and a partial review of the fifth, the main commit:

d61e252 message: Create an outbox message on send; manage its states

Comment on lines +598 to +620
/// Convert this topic to match how it would appear on a message object from
/// the server, assuming the topic is originally for a send-message request.
///
/// For a client that does not support empty topics,
/// a modern server (FL>=334) would convert "(no topic)" and empty topics to
/// `store.realmEmptyTopicDisplayName`.
///
/// See also: https://zulip.com/api/send-message#parameter-topic
TopicName interpretAsServer({
required int zulipFeatureLevel,
required String? realmEmptyTopicDisplayName,
}) {
if (zulipFeatureLevel < 334) {
assert(_value.isNotEmpty);
return this;
}
if (_value == kNoTopicTopic || _value.isEmpty) {
// TODO(#1250): this assumes that the 'support_empty_topics'
// client_capability is false; update this when we set it to true
return TopicName(realmEmptyTopicDisplayName!);
}
return TopicName(_value);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a // TODO(server-10) for…removing this method, I guess? Or at least simplifying.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is whitespace-trimming also a step in any servers' interpretation of topics? Not that this method has to mirror that logic necessarily. But we'd probably want to explicitly expect (and assert) that that step has already been done when this method is called, otherwise the summary line wouldn't be quite accurate:

  /// Convert this topic to match how it would appear on a message object from
  /// the server, assuming the topic is originally for a send-message request.

Comment on lines +16 to +17
const kLocalEchoDebounceDuration = Duration(milliseconds: 300); // TODO(#1441) find the right values for this
const kSendMessageRetryWaitPeriod = Duration(seconds: 10); // TODO(#1441) find the right values for this
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "find the right value for this"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, kSendMessageRetryWaitPeriod doesn't sound like the right name to me. There's no automatic retry, and, as I mentioned at #1472 (comment), no "Retry" button. This wait-period logic will need to be cleanly removed as part of #133 (see my comment just before that one) which has both kinds of retry, and that should be easier if this thing doesn't also have the "retry" label :)

Maybe kSendMessageOfferRestoreWaitPeriod?

Comment on lines +208 to +222
// Because either of the values can get updated, the actual topic
// can change, for example, between "(no topic)" and "general chat",
// or between different names of "general chat". This should be
// uncommon during the lifespan of an outbox message.
//
// There's also an unavoidable race that has the same effect:
// an admin could change the name of "general chat"
// (i.e. the value of realmEmptyTopicDisplayName) concurrently with
// the user making the send request, so that the setting in effect
// by the time the request arrives is different from the setting the
// client last heard about. The realm update events do not have
// information about this race for us to update the prediction
// correctly.
zulipFeatureLevel: zulipFeatureLevel,
realmEmptyTopicDisplayName: realmEmptyTopicDisplayName),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a draft where I try to explain the bug a bit more (especially with the first line):

            // Doing this interpretation just once on creating the outbox message
            // allows an uncommon bug, because either of these values can change.
            // During the outbox message's life, a predicted "(no topic)" topic
            // could become stale/wrong when zulipFeatureLevel changes,
            // or a predicted "general chat" topic could become stale/wrong
            // when realmEmptyTopicDisplayName changes.
            //
            // Shrug. The same effect is caused by an unavoidable race:
            // an admin could change the name of "general chat"
            // (i.e. the value of realmEmptyTopicDisplayName)
            // concurrently with the user making the send request,
            // so that the setting in effect by the time the request arrives
            // is different from the setting the client last heard about.
            zulipFeatureLevel: zulipFeatureLevel,
            realmEmptyTopicDisplayName: realmEmptyTopicDisplayName),

Comment on lines +233 to +243
_outboxMessageDebounceTimers[localMessageId] = Timer(kLocalEchoDebounceDuration, () {
assert(outboxMessages.containsKey(localMessageId));
_outboxMessageDebounceTimers.remove(localMessageId);
_updateOutboxMessage(localMessageId, newState: OutboxMessageState.waiting);
});

_outboxMessageWaitPeriodTimers[localMessageId] = Timer(kSendMessageRetryWaitPeriod, () {
assert(outboxMessages.containsKey(localMessageId));
_outboxMessageWaitPeriodTimers.remove(localMessageId);
_updateOutboxMessage(localMessageId, newState: OutboxMessageState.waitPeriodExpired);
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The asserts don't look right to me; wouldn't they throw if an outbox message was removed before the timer finished?, which can happen in this part of the state diagram:

///              Event received.
///              Or we abandoned the queue.
/// (any state) ────────────────────────────► (delete)

Comment on lines +253 to +255
// `localMessageId` is not necessarily in the store. This is because
// message event can still arrive before the send request fails to
// networking issues.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe "fails with networking issues" or "fails because of networking issues"

required OutboxMessageState newState,
}) {
final outboxMessage = outboxMessages[localMessageId];
if (outboxMessage == null || outboxMessage.state == newState) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a legitimate need for callers to pass a new state that's the same as the old state? Reading this, I'm wondering if it might be accidentally covering up some confusion in how the state machine is implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintainer review PR ready for review by Zulip maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants