Refactor WebSocketContext::read_message_frame #515

akonradi-signal · 2025-09-08T18:21:48Z

Reduce the amount of nested code by returning earlier on errors and from self-contained logic instead of branching with if/else and match expressions. Use an enum type to consolidate common code and avoid a panic! call.

akonradi-signal · 2025-09-08T18:24:30Z

src/protocol/mod.rs

    /// Try to decode one message frame. May return None.
    fn read_message_frame(&mut self, stream: &mut impl Read) -> Result<Option<Message>> {
-        if let Some(frame) = self
+        let frame = match self


It would be nice if this could be written as let Some(frame) = ... else { ... }; but that's not available until Rust 1.65 and the MSRV here is 1.63.

daniel-abramov

I agree with these ones:

✅ Early return from the first match. It definitely looks better!
✅ Replacing if let Some(ref mut msg) = self.incomplete { .. } else { return .. } with self.incomplete.as_mut().ok_or(..). Definitely looks more elegant!

As for excessive branching: while I generally agree that it may make code less readable and that read_message_frame() would benefit from some refactoring, I'm not sure that the rest of the changes bring the desired simplification for the following reasons:

The current handling of the frame's opcode is just under 70 LOC. I may be biased, but I find 70 LOC that contain a "flat representation" of all possible cases (including errors) quite useful. With the new changes, I get a bit more confused, because, e.g., the handling of the reserved code for the Ctl opcode is done within the match, but the handling of the reserved code for the Data opcode is done in a different place (inside the DataMessageType::try_from() conversion function) for no apparent reason.
I'm generally confused about the new DataMessageType. It feels like it was only introduced to remove a single match case inside the read_message_frame() and that it has no utility otherwise. The TryFrom<Data> for DataMessageType essentially states that we can create DataMessageType from Data, by e.g. doing Data::Binary => Self::Initial(IncompleteMessageType::Binary), which feels logically incorrect, because we cannot state that the message is incomplete before checking fin first (frame.header().is_final). Another problem that it introduces is that it changes the order of operations: since try_from() transforms reserved code into an error, DataMessageType::try_from(data)? would early return an error when the reserved code is used. However, the previous implementation would not do so if the self.incomplete is Some(..) (it would return a different error), because the match arm for _ if self.incomplete.is_some() would be checked before the OpData::Reserved(i).

akonradi-signal · 2025-09-09T14:38:55Z

I've pulled out the uncontroversial changes into separate commits while leaving the total change the same. As for the rest of the feedback:

I agree 70 LOC is not bad now. I'm working on adding support for the "permessage-deflate" extension (another take on Add permessage-deflate support, again #426), which makes this code a bit less compact, and wanted to try to get ahead of that. If separating the data and control frame handling doesn't seem useful now I can omit that.
The DataMessageType is my attempt to not move a match but to remove a panic! that is required to satisfy the compiler but can't be hit in practice. I believe that transforming an input into a custom enum type that is matched exhaustively is a good pattern for ensuring that this kind of "knowledge at a (for now, small) distance" doesn't fail to get updated in the future. I had originally written DataMessageType as an inline definition in read_message_frame but figured that might be stylistically unpalatable and so moved it; I think that scoping might make it more obvious that the intended usage is only for control flow. The early try_from introducing a different error path is a great observation. It's not inherent to the usage of a separate enum type, though, and I'm happy to preserve the original error path if you agree with me that this "custom enum" path is still worth pursuing.

akonradi-signal · 2025-09-09T17:15:29Z

Funnily enough I tried writing a test against the current master for the existing behavior of read_message_frame:

#[test]
fn reserved_data_frame_type_in_incomplete_message() {
    let mut incoming = WriteMoc(Cursor::new(&[
        0x01, 0x06, 0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x2c, // "Hello"
        0x06, // reserved frame code
        0x00, 0x06, 0x20, 0x74, 0x68, 0x65, 0x72, 0x65, // (continuation) " there"
    ]));

    let mut context = WebSocketContext::new(Role::Client, None);

    let err = context.read(&mut incoming).unwrap_err();
    assert!(
        matches!(err, Error::Protocol(ProtocolError::UnknownDataFrameType(0x06))),
        "err was {err:?}"
    );
}

An error is produced by WebSocketContext::read but it's not that one! It turns out that there's an extra check for the reserved opcodes over in

tungstenite-rs/src/protocol/frame/frame.rs

Lines 193 to 196 in 6520d8f

    
           match opcode { 
        
               OpCode::Control(Control::Reserved(_)) | OpCode::Data(Data::Reserved(_)) => { 
        
                   return Err(Error::Protocol(ProtocolError::InvalidOpcode(first & 0x0F))) 
        
               }

! So the match block below shouldn't even encounter those reserved opcodes.

daniel-abramov · 2025-09-10T17:38:17Z

I've pulled out the uncontroversial changes into separate commits while leaving the total change the same.

Thanks! If you prefer, you could create a separate PR with those commits, so that I can merge them right away.

The DataMessageType is my attempt to not move a match but to remove a panic! that is required to satisfy the compiler but can't be hit in practice.

I agree that having panic!() for an unreachable branch is less than ideal (we could replace it with unreachable!() though, to make the intention clear), but I'm still not quite convinced that it's worth introducing a new entity (type) that has somewhat unclear semantics and the the only utility is to remove that single match.

Eagerly check conditions for a data frame and encode the result explicitly in a custom enum type. This lets us combine code from two match arms and eliminate a panic! that was impossible to hit but required by the compiler.

akonradi-signal · 2025-09-10T21:19:13Z

I've pushed a new final commit on the branch that takes a more focused approach to the refactoring. I've preserved the top-level match on the frame opcode and the precedence of checks for error conditions.

As noted in the commit description, this does let us get rid of a panic!() and lets us collapse the max size checks for text and binary messages. If you remain unconvinced I'll go ahead and open a separate PR for the first two commits and we can iterate more or just drop the last one.

daniel-abramov · 2025-09-11T12:53:13Z

Thanks! I've pondered over the changes in order to understand what exactly I do not like about the current (master) implementation of that logic, and why the suggested change (final commit) still feels a bit wrong to me.

I think the main thing I find suboptimal about the implementation in master is not the use of panic!() in the unreachable branch (we could use unreachable!() after all; this does not make the code unreadable). My main concern is that the match in question is used as a replacement for a long and very nested if { } else if { } else if { } state machine, which leads to the use of numerous different match arms, including several conditionals with if. This makes the execution flow feel somewhat complicated (at least for those unfamiliar with the codebase or who do not recall all the details), and the order of the match checks becomes significant. I assume that it might have been the reason why you did not like that match block as well?

The only way to get rid of this complication (without some major re-thinking / refactoring) is to split the processing of the data frame into 2 stages. I believe that your latest change tries to achieve it: judging from the code, the first match stage seems to be trying to process the data opcode by ensuring that the received frame adheres to the protocol and matches the expectation of the current state of the websocket (e.g., self.complete). However, the next match again checks that the (unchanged) internal state (self.incomplete) is in a proper state and returns a protocol error. In other words, I don't feel like it simplifies the execution flow much (but I may be biased).

I attempted to formulate a suggestion based on your changes and what I believe might address the points you dislike, while considering my concerns and maintaining code readability.

Your Version

My Suggestion

enum FrameType {
    Continue,
    Initial(IncompleteMessageType),
}

let fin = frame.header().is_final;
let frame_type = match data {
    OpData::Continue => Ok(FrameType::Continue),
    _ if self.incomplete.is_some() => Err(ProtocolError::ExpectedFragment(data)),
    OpData::Text => Ok(FrameType::Initial(IncompleteMessageType::Text)),
    OpData::Binary => Ok(FrameType::Initial(IncompleteMessageType::Binary)),
    OpData::Reserved(i) => Err(ProtocolError::UnknownDataFrameType(i)),
}?;

match frame_type {
    FrameType::Continue => {
        let msg = self
            .incomplete
            .as_mut()
            .ok_or(ProtocolError::UnexpectedContinueFrame)?;
        msg.extend(frame.into_payload(), self.config.max_message_size)?;

        if fin {
            Ok(Some(self.incomplete.take().unwrap().complete()?))
        } else {
            Ok(None)
        }
    }
    FrameType::Initial(data_type) => {
        if fin {
            check_max_size(frame.payload().len(), self.config.max_message_size)?;
            Ok(Some(match data_type {
                IncompleteMessageType::Text => Message::Text(frame.into_text()?),
                IncompleteMessageType::Binary => {
                    Message::Binary(frame.into_payload())
                }
            }))
        } else {
            let mut incomplete = IncompleteMessage::new(data_type);
            incomplete
                .extend(frame.into_payload(), self.config.max_message_size)?;
            self.incomplete = Some(incomplete);
            Ok(None)
        }
    }
}

let fin = frame.header().is_final;

let payload = match (data, self.incomplete.as_mut()) {
    (OpData::Continue, None) => Err(ProtocolError::UnexpectedContinueFrame),
    (OpData::Continue, Some(incomplete)) => {
        incomplete.extend(frame.into_payload(), self.config.max_message_size)?;
        Ok(None)
    }
    (_, Some(_)) => Err(ProtocolError::ExpectedFragment(data)),
    (OpData::Text, _) => Ok(Some((frame.into_payload(), MessageType::Text))),
    (OpData::Binary, _) => Ok(Some((frame.into_payload(), MessageType::Binary))),
    (OpData::Reserved(i), _) => Err(ProtocolError::UnknownDataFrameType(i)),
}?;

match (payload, fin) {
    (None, true) => Ok(Some(self.incomplete.take().unwrap().complete()?)),
    (None, false) => Ok(None),
    (Some((payload, t)), true) => {
        check_max_size(payload.len(), self.config.max_message_size)?;
        match t {
            MessageType::Text => Ok(Some(Message::Text(payload.try_into()?))),
            MessageType::Binary => Ok(Some(Message::Binary(payload))),
        }
    }
    (Some((payload, t)), false) => {
        let mut incomplete = IncompleteMessage::new(t);
        incomplete.extend(payload, self.config.max_message_size)?;
        self.incomplete = Some(incomplete);
        Ok(None)
    }
}

What do you think?

daniel-abramov · 2025-09-14T16:33:42Z

Superseded by:

akonradi-signal · 2025-09-26T20:21:41Z

I'm working on adding support for the "permessage-deflate" extension (another take on Add permessage-deflate support, again #426), which makes this code a bit less compact, and wanted to try to get ahead of that.

Closing the loop: https://github.com/signalapp/tungstenite-rs supports permessage-deflate. I incorporated the refactors from #518 as 119f4d7. I had been worried the would be difficult to adopt but I'm pleasantly surprised with how the overall flow turned out!

akonradi-signal commented Sep 8, 2025

View reviewed changes

akonradi-signal force-pushed the refactor-read-frame branch from e0bd90e to 49f024a Compare September 8, 2025 18:30

daniel-abramov reviewed Sep 8, 2025

View reviewed changes

akonradi-signal added 2 commits September 9, 2025 10:24

Refactor top-level branch with early error return

d33a2b3

Replace if let Some(...) with .ok_or(...)?

3ab50c6

akonradi-signal force-pushed the refactor-read-frame branch from 49f024a to 20a8c8f Compare September 9, 2025 14:25

akonradi-signal force-pushed the refactor-read-frame branch from 20a8c8f to 5da8286 Compare September 10, 2025 21:13

akonradi-signal mentioned this pull request Sep 11, 2025

Reduce nesting in WebSocketContext::read_message_frame #517

Merged

daniel-abramov mentioned this pull request Sep 11, 2025

refactor: simplify processing of incoming data frames #518

Merged

daniel-abramov closed this Sep 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor WebSocketContext::read_message_frame #515

Refactor WebSocketContext::read_message_frame #515

Uh oh!

akonradi-signal commented Sep 8, 2025

Uh oh!

akonradi-signal Sep 8, 2025 •

edited

Loading

Uh oh!

daniel-abramov left a comment •

edited

Loading

Uh oh!

akonradi-signal commented Sep 9, 2025

Uh oh!

akonradi-signal commented Sep 9, 2025

Uh oh!

daniel-abramov commented Sep 10, 2025

Uh oh!

akonradi-signal commented Sep 10, 2025

Uh oh!

daniel-abramov commented Sep 11, 2025

Uh oh!

daniel-abramov commented Sep 14, 2025

Uh oh!

akonradi-signal commented Sep 26, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Refactor WebSocketContext::read_message_frame #515

Refactor WebSocketContext::read_message_frame #515

Uh oh!

Conversation

akonradi-signal commented Sep 8, 2025

Uh oh!

akonradi-signal Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

daniel-abramov left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

akonradi-signal commented Sep 9, 2025

Uh oh!

akonradi-signal commented Sep 9, 2025

Uh oh!

daniel-abramov commented Sep 10, 2025

Uh oh!

akonradi-signal commented Sep 10, 2025

Uh oh!

daniel-abramov commented Sep 11, 2025

Uh oh!

daniel-abramov commented Sep 14, 2025

Uh oh!

akonradi-signal commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

akonradi-signal Sep 8, 2025 •

edited

Loading

daniel-abramov left a comment •

edited

Loading

akonradi-signal commented Sep 26, 2025 •

edited

Loading