Skip to content

Conversation

@SvizelPritula
Copy link

@SvizelPritula SvizelPritula commented May 16, 2024

In #328, @kazk implemented support for the permessage-deflate extention. Implementing this extention required the ability to parse the Sec-Websocket-Extentions header, so @kazk submitted a pull request to the headers crate. However, the maintainers of the headers crate don't wan't to support this header yet, as they don't wan't to commit to any particular API. As such, they suggested that it should be implemented in this or some other crate, tweaked as necessary and potentially moved into the headers crate in the future.

This PR reverts the commit that reverted @kazk's commit that added permessage-deflate support. It also copies his implementation of SecWebsocketExtentions intended for the headers crate.

The implementation of SecWebsocketExtentions relied on some internal utilities of the headers crate. I've removed some of those dependencies and re-implemented others. I have also gated the implementation behind the existing handshake feature, as well as any code that relies on it, since it relies on the headers crate, which in turn relies on many other HTTP crates.

Blocking issues

  • @nakedible-p found an error in the implementation. As I know very little about the Websocket spec, I'm unsure if I can fix it properly.
  • The headers crate is licensed under the MIT license, while this crate uses a dual MIT+Apache license. We would need to add a proper license notice, or @kazk would need to consent to the re-licensing of his PR.

Unresolved questions

  • It is impossible to use the deflate feature without having tungstenite handle the Websocket handshake. There is a WebSocket::from_raw_socket_with_extensions method to allow for exactly that, but it takes an Extensions struct, which has no public constructor.
  • The only way to get such a struct is from the WebSocketConfig::accept_offers method, which implements the server part of extension negotiation and requires the handshake feature. This method is intended to allow for tungstenite to be integrated into web frameworks. No similar public method exists for clients, although that might not be a big issue, since a client knows whether a request will be a Websocket request up front.
  • To me, the WebSocketConfig::accept_offers feels out of place on the WebSocketConfig. We could make it a standalone function, perhaps in the extensions module.
  • I have made the SecWebsocketExtentions implementation public. This is necessary to make the WebSocketConfig::accept_offers method usable. This does mean that if SecWebsocketExtentions were to be added to headers in the future, switching to that implementation would (I think) be a breaking change. An alternative would be to have WebSocketConfig::accept_offers take and return a HeaderValue, which it would parse itself. This would allow us to make SecWebsocketExtentions a private implementation detail.
  • This is a breaking change, since a new field was added to WebSocketConfig and a new variant was added to the Error enum. Since this field and variant depends on the deflate feature, I've annotated both with #[non_exhaustive]. I've done the same to ProtocolError, which currently has variants gated behind handshake. This sadly means that you cannot create WebSocketConfig using the initializer syntax, instead you have to create an instance with WebSocketConfig::default and mutate it afterward. Sadly, I see no way to avoid this. It would be possible to disable #[non_exhaustive] if all relevant features are enabled.

@nakedible-p
Copy link

I will be happy to go through the details of the permessage-deflate spec – or rather even fix the bug myself, while some missing window bits support. However, the bug is extremely niche as these extension fields are very rarely used, so I'd prefer to try to get this merged and then do an improvement pull on top of that.

@SvizelPritula
Copy link
Author

@kazk Would you be willing to relicense your PR under the MIT + Apache dual license as used by this crate?

If not, I believe we can use the code under MIT + Apache anyways, as long as we add a NOTICE file to the repo that looks a little bit like this:

This crate contains code copied from the headers crate, licensed as follows:

*Insert copy of headers license here.*

Additionaly, we would probably have to add "Copyright (c) 2014-2023 Sean McArthur" to LICENSE-MIT, which is odd, given that no part of the code in this PR was actually written by Sean McArthur.

@zitsen
Copy link

zitsen commented Nov 13, 2024

What's the status of this pr?

@SvizelPritula
Copy link
Author

What's the status of this pr?

I forgot about it, sorry.

I'd like to modify it to meet the necessary licensing obligations, which shouldn't be too hard.

@SvizelPritula SvizelPritula marked this pull request as ready for review November 14, 2024 18:09
@SvizelPritula
Copy link
Author

I'm not a lawyer, but I think that the notices I've to the license files should be compliant with all relevant licenses. As such, this PR is now ready for review! 🥳 Sorry for the delay.

@kazk
Copy link
Contributor

kazk commented Nov 15, 2024

@SvizelPritula

I apologize for the delay.

@kazk Would you be willing to relicense your hyperium/headers#88 under the MIT + Apache dual license as used by this crate?

Yes, you can do whatever is necessary for what I did :)
Thanks for your work!

kazk has agreed to relicense his original PR under MIT+Apache,
which means we don't have to add extra notices to our license
files.

This reverts commit 049c753.
@zitsen
Copy link

zitsen commented Dec 11, 2024

Glad to see it still going on, thanks your great work @SvizelPritula @kazk .

So are there any other problems pending this?

Sorry for disturbing, but seems that I should ping you @nakedible-p @daniel-abramov .

@daniel-abramov
Copy link
Member

I remember I checked #328 back then, and it seems like there are many common parts (back then it was reverted due to dependencies. So yeah, I'm not against merging it.

I'd also appreciate it if someone who tracked the issue could approve it, though (to ensure that I did not miss anything since I did not go as thoroughly through all the changes as I typically try to do).

@goriunov
Copy link

goriunov commented Dec 18, 2024

We have been running a copy of this branch for couple of weeks now in production, at least in the standard cases it seems to work good, both directions server client and client server, different languages clients connect to the server.

@daniel-abramov
Copy link
Member

We have been running a copy of this branch for couple of weeks now in production, at least in the standard cases it seems to work good, both directions server client and client server, different languages clients connect to the server.

This sounds pretty good! It's always good to know that someone tested it in production!

@SvizelPritula, would you be interested in rebasing on top of a master branch? (there have been quite significant changes recently that affect performance and change the API surface a bit)

@goriunov
Copy link

@daniel-abramov @SvizelPritula just checking in. Is there anything else that needs to be done before this branch can be merged?

@SvizelPritula
Copy link
Author

The tests are now fixed, everything should be working.

Sorry for taking so long, I was busy with exams.

Copy link
Member

@daniel-abramov daniel-abramov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update!

I had to quickly go through the changes again (albeit skipping all deflate-specific logic) as I keep forgetting the things between reviews (the PR is pretty large).

Generally it looks good, I'm only a bit concerned with a slightly more complicated logic in protocol/** (we have more branching and more conditions), but making it better would possibly require a bigger overhaul.

I also decided to run our benchmarks locally and it seems like this version is somewhat slower than the version in master when it comes to reading. I consistently get a regression of around 15-17%.

read+unmask 100k small messages (server)
                        time:   [6.6915 ms 6.7087 ms 6.7273 ms]
                        change: [+14.968% +15.822% +16.508%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 17 outliers among 100 measurements (17.00%)
  6 (6.00%) low mild
  5 (5.00%) high mild
  6 (6.00%) high severe

read 100k small messages (client)
                        time:   [6.4633 ms 6.4821 ms 6.5024 ms]
                        change: [+17.048% +17.722% +18.370%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

Unfortunately, I did not have time to dig in and investigate the root causes, but I assume that it might be related to the changes around IncompleteMessage / extend_incomplete(), and/or handling of OpData::Continue.

Comment on lines +4 to +5
autobahn/client/
autobahn/server/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
autobahn/client/
autobahn/server/

Probably leftovers from a local testing? :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those folders contain the test results after running scripts/autobahn-client.sh or scripts/autobahn-server.sh. Those shouldn't be commited, so I added them to .gitignore.

.as_mut()
.and_then(|x| x.compression.as_mut())
.unwrap()
.decompress(payload.to_vec(), fin)?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this involves copying the data which might get expensive. Since the original payload is not required after the decompression, it's probably better to use into() to directly convert it into Vec<u8>.

SvizelPritula and others added 2 commits February 21, 2025 16:44
Removes unused code from a test and fixes a typo in a comment.

Co-authored-by: Daniel Abramov <[email protected]>
@zitsen
Copy link

zitsen commented Mar 18, 2025

Any update? @SvizelPritula @daniel-abramov

@daniel-abramov
Copy link
Member

Any update? @SvizelPritula @daniel-abramov

Unfortunately, I did not have an opportunity to investigate what exactly causes the performance regression.

I understand that some people would benefit from permessage-deflate merged and released with the new version, but unfortunately, I can't promise that it will be part of the next release as of now. I'd need to take a closer look to see why the performance degraded and if there are easy ways to solve this.

PS: It takes some time to check the PR each time I see changes made to this PR since it is very large and so whenever I want to check any updates, I oftentimes need to go through all of the changes to recall things.

@SvizelPritula
Copy link
Author

Any update? @SvizelPritula @daniel-abramov

I have tried to investigate the regression a bit, but I haven't been successful yet.

This PR is very old, and the crate has seen a lot of performance improvements since it began. (In fact, IIRC this PR is older than the benchmark that regressed.) This unfortunately makes it difficult to find out which change exactly is to blame.

@goriunov
Copy link

Hey, wondering if there is any progress on this PR?

Would be great to get it in main at some point. We are currently running some production builds from older version of this branch, so not very ideal.

unfortunately we can not run system without compression as we have very very traffic dependent systems that streams terabytes per months and if we remove compression AWS bill goes crazy.

unfortunately there is no proper tested alternative in rust websockets that has compression built in, so we kinda have to hack around

@daniel-abramov
Copy link
Member

@goriunov, I can understand that. However, merging it "as is" is a bit complicated because there is a minor performance degradation, and no one has had time to investigate its cause. I would be somewhat uncomfortable merging it in the current state, knowing that it may cause some issues that I won't be able to debug/solve quickly (it's a pretty large PR after all).

I thought about it for a bit, and I think one of the ways to approach the problem would be as follows:

  1. 70% of the code is deflate and WebSocket extension parsing. Perhaps this could be merged separately under a feature gate or be moved to a separate crate (unless there is already one for it), as it may also help other websocket crates that might want to implement the feature.
  2. Then, the rest of the code could be submitted as separate (smaller) PR (or PRs!) that are easier to review and discuss. This would also allow us to see at which point the performance degradation starts to happen and debug it appropriately.

Unfortunately I did not participate that much in the development of this feature and could not take it over as I did not really use deflate in my projects and I rarely use WebSockets nowadays, so I always struggled to find time to justify the effort (I believe it would probably take a couple of days to properly go through changes one more time, merge them one-by-one, investigate the issues, benchmark, etc).

@SvizelPritula
Copy link
Author

I think I managed to resolve the performance regression.

Unfortunately, the performance of the benchmarks still regresses when the deflate feature is enabled - but that at least doesn't impact existing users.

// Only `permessage-deflate` is supported at the moment.
pub(crate) fn generate_offers(&self) -> Option<SecWebsocketExtensions> {
#[cfg(feature = "deflate")]
{
Copy link

@erebe erebe May 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be simplified for readability and avoid mutable

       #[cfg(feature = "deflate")]
        {
            if let Some(compression) = self.compression.map(|c| c.generate_offer()) {
                Some(SecWebsocketExtensions::new(vec![compression]))
            } else {
                None   
            }
        }

&self,
#[allow(unused)] offers: &SecWebsocketExtensions,
) -> Option<(SecWebsocketExtensions, Extensions)> {
#[cfg(feature = "deflate")]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read: avoid mutable and make flow explicit

        #[cfg(feature = "deflate")]
        {
            // To support more extensions, store extension context in `Extensions` and
            // concatenate negotiation responses from each extension.
            if let Some(compression) = &self.compression {
                if let Some((agreed, compression)) = compression.accept_offer(offers) {
                    let extensions = Extensions { compression: Some(compression) };
                    return Some((SecWebsocketExtensions::new(vec![agreed]), extensions))
                }
            } 
            
            None
        }


/// Value for `Sec-WebSocket-Extensions` request header.
pub(crate) fn generate_offer(&self) -> WebsocketExtension {
let mut offers = Vec::new();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The allocation can be avoided by unrolling the if

        // > a client informs the peer server of a hint that even if the server doesn't include the
        // > "client_no_context_takeover" extension parameter in the corresponding
        // > extension negotiation response to the offer, the client is not going
        // > to use context takeover.
        // > https://www.rfc-editor.org/rfc/rfc7692#section-7.1.1.2
        match (self.server_no_context_takeover, self.client_no_context_takeover) {
            (true, true) => to_header_value(&[HeaderValue::from_static(SERVER_NO_CONTEXT_TAKEOVER), HeaderValue::from_static(CLIENT_NO_CONTEXT_TAKEOVER)]),
            (true, false) => to_header_value(&[HeaderValue::from_static(SERVER_NO_CONTEXT_TAKEOVER)]),
            (false, true) => to_header_value(&[HeaderValue::from_static(CLIENT_NO_CONTEXT_TAKEOVER)]),
            (false, false) => to_header_value(&[]), 
        }

}

#[cfg(feature = "handshake")]
fn to_header_value(params: &[HeaderValue]) -> WebsocketExtension {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be speeded up for the nominal case, where params is empty.
Instead of creating each time a new buffer and trying to parse the header for holding the known value permessage-deflate

if params.is_empty() {
   return WebsocketExtension::default()
}

// Pay the cost of creating a new buffer, creating header value, and parsing it for websocket extension

// https://datatracker.ietf.org/doc/html/rfc7692#section-7.2.1
// 1. Compress all the octets of the payload of the message using DEFLATE.
let mut output = Vec::with_capacity(data.len());
let before_in = self.compressor.total_in() as usize;
Copy link

@erebe erebe May 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not cast u64 into a usize, as on 32bits arch this can overflow easily.
Instead, turn the usize of data.len() into an u64 which is safer (as 128bits arch is not really a thing)

the offset can be converted into usize more safely, or use usize::try_from()

// After this step, the last octet of the compressed data contains
// (possibly part of) the DEFLATE header bits with the "BTYPE" bits
// set to 00.
output.truncate(output.len() - 4);
Copy link

@erebe erebe May 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 can be replaced by TRAILER.len()

data.extend_from_slice(&TRAILER);
}

let before_in = self.decompressor.total_in() as usize;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment regarding turning u64 into a usize

is_final: bool,
) -> Result<Vec<u8>, DeflateError> {
if is_final {
data.extend_from_slice(&TRAILER);
Copy link

@erebe erebe May 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if possible it would be better to avoid the extend_from_slice as it can re-allocate/copy the whole vector to make more place.

If possible, it should be better to do an extra decompress_vec on error if is_final is set

}

// Compress the data of message.
pub(crate) fn compress(&mut self, data: &[u8]) -> Result<Vec<u8>, DeflateError> {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more aligned with the lib the signature should use Bytes instead of &[u8]

fn compress(&mut self, data: Bytes) -> Result<Bytes, DeflateError>

It would avoid turning the Vec<u8> into a Bytes when creating the Frame objet.
For the argument data, it does not change anything, it simply forward the Bytes buffer

Ok(output)
}

pub(crate) fn decompress(
Copy link

@erebe erebe May 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same it should use Bytesxxx, it would avoid conversion

.map_err(|e| DeflateError::Decompress(e.into()))?
{
Status::Ok => output.reserve(2 * output.len()),
Status::BufError | Status::StreamEnd => break,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't check at all if the stream has actually ended with the final packed, I think.

pub fn verify_response(
&self,
response: Response,
_config: &Option<WebSocketConfig>,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since _config is always being used in the function (not behind a feature gate), it should probably not be marked as intentionally unused

@akonradi-signal
Copy link
Contributor

I'd like to request that the additions of #[non_exhaustive], especially on enum types, be removed. For authors of libraries that use tungstenite, being able to rely on exhaustive matching lets us leverage the compiler to ensure at compile time that we've handled every possible error case appropriately. Without this guarantee, we have to include a branch for the "unknown" case in every match on an error so that there's a generic fallback path. The danger is that updating our tungstenite dependency from one version to another would change those generic fallback paths from never-used to sometimes-used. I'd much rather be forced, as a tungstenite consumer, to update my code when new error cases are added because my crate fails to compile.

Messages of doom aside, there's precedent already for having error cases that can't be produced without certain features being turned on. My library doesn't enable tungstenite's TLS support, but we still have to handle the Error::Tls case. I'd rather do that (with an unreachable!("tungstenite TLS isn't used") than have to forever be vigilant for new error variants on upgrade because #[non_exhaustive] means the compiler is no longer helping me.

@0x676e67 0x676e67 mentioned this pull request Jul 8, 2025
4 tasks
akonradi-signal added a commit to signalapp/tungstenite-rs that referenced this pull request Sep 12, 2025
Add support for the "permessage-deflate" websocket protocol extension as
specified by RFC 7692.

This is based off of snapview#426 but adds
- separation between header parsing and negotiation logic
- support for the client/server max window bits parameters
- more maintainable feature-guarding
- additional unit testing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants