Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SYSTEM_SENDBUF_SIZE is massively oversized on big-endian Linux systems #355

Closed
awilfox opened this issue Aug 20, 2024 · 5 comments
Closed

Comments

@awilfox
Copy link
Contributor

awilfox commented Aug 20, 2024

I was attempting to try out Servo on a Power9 system, and was met with:

awilcox on gwyn ~/Code/contrib/servo % ./mach run -d
memory allocation of 985162418487256 bytes failed
Redirecting call to abort() to mozalloc_abort

That didn't seem right, and running in the debugger I saw:

#9  alloc::alloc::handle_alloc_error (layout=...) at library/alloc/src/alloc.rs:391
#10 0x0000000100695530 in alloc::raw_vec::handle_error (e=...) at library/alloc/src/raw_vec.rs:594
#11 0x00000001088d2750 in alloc::raw_vec::RawVec<u8, alloc::alloc::Global>::with_capacity_in<u8, alloc::alloc::Global> (
    capacity=<optimized out>)
    at /home/awilcox/Code/awilfox/rust-next/user/rust/src/rustc-1.80.0-src/library/alloc/src/raw_vec.rs:160
#12 alloc::vec::Vec<u8, alloc::alloc::Global>::with_capacity_in<u8, alloc::alloc::Global> (capacity=<optimized out>)
    at /home/awilcox/Code/awilfox/rust-next/user/rust/src/rustc-1.80.0-src/library/alloc/src/vec/mod.rs:699
#13 alloc::vec::Vec<u8, alloc::alloc::Global>::with_capacity<u8> (capacity=985162418487256)
    at /home/awilcox/Code/awilfox/rust-next/user/rust/src/rustc-1.80.0-src/library/alloc/src/vec/mod.rs:481
#14 0x00000001088bf970 in ipc_channel::platform::unix::recv (fd=12, blocking_mode=...) at src/platform/unix/mod.rs:1002
(gdb) frame 14
#14 0x00000001088bf970 in ipc_channel::platform::unix::recv (fd=12, blocking_mode=...) at src/platform/unix/mod.rs:1002
1002	        main_data_buffer = Vec::with_capacity(OsIpcSender::get_max_fragment_size());

get_max_fragment_size() calls Self::first_fragment_size(*SYSTEM_SENDBUF_SIZE). which is lazily-initialised with the value of get_system_sendbuf_size(), which is implemented as:

           let mut socket_sendbuf_size: usize = 0;
            let mut socket_sendbuf_size_len = mem::size_of::<usize>() as socklen_t;
            if getsockopt(self.fd.0, libc::SOL_SOCKET, libc::SO_SNDBUF,
                &mut socket_sendbuf_size as *mut _ as *mut c_void,
                &mut socket_sendbuf_size_len as *mut socklen_t,
            ) < 0 {
                return Err(UnixError::last());
            }
            Ok(socket_sendbuf_size)

If I write the equivalent C out with some debugging printfs:

#include <errno.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <sys/socket.h>

int main(void) {
	int sock = socket(AF_UNIX, SOCK_SEQPACKET | SOCK_CLOEXEC, 0);
	size_t size = 0;
	socklen_t len = sizeof(size);
	if(getsockopt(sock, SOL_SOCKET, SO_SNDBUF, &size, &len) < 0)
		fprintf(stderr, "failed: %d (%s)\n", errno, strerror(errno));
	else
		fprintf(stdout, "succeeded: %u: %lu\n", len, size);
	if(len == 4) fprintf(stdout, "corrected: %u\n", (uint32_t)size);
	if(len == 4) fprintf(stdout, "corrected other side: %u\n", size >> 32);
	return 0;
}

I get:

succeeded: 4: 985162418487296
corrected: 0
corrected other side: 229376

That 64-bit value is the same number we see in the crash. The shifted value of 229,376 looks correct from the output of sysctl:

net.core.rmem_default = 229376
@awilfox
Copy link
Contributor Author

awilfox commented Aug 20, 2024

Forcing it to only ever use 32-bit values, like:

diff --git a/src/platform/unix/mod.rs b/src/platform/unix/mod.rs
index ec3c712..bd24794 100644
--- a/src/platform/unix/mod.rs
+++ b/src/platform/unix/mod.rs
@@ -213,8 +213,8 @@ impl OsIpcSender {
     /// Some of it is reserved by the kernel for bookkeeping.
     fn get_system_sendbuf_size(&self) -> Result<usize, UnixError> {
         unsafe {
-            let mut socket_sendbuf_size: usize = 0;
-            let mut socket_sendbuf_size_len = mem::size_of::<usize>() as socklen_t;
+            let mut socket_sendbuf_size: u32 = 0;
+            let mut socket_sendbuf_size_len = mem::size_of::<u32>() as socklen_t;
             if getsockopt(
                 self.fd.0,
                 libc::SOL_SOCKET,
@@ -225,7 +225,7 @@ impl OsIpcSender {
             {
                 return Err(UnixError::last());
             }
-            Ok(socket_sendbuf_size)
+            Ok(socket_sendbuf_size as usize)
         }
     }
 

allows ipc-channel to pass its whole test suite:

test result: ok. 82 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 17.35s

where before it just crashed:

running 82 tests
memory allocation of 985162418487256 bytes failed
error: test failed, to rerun pass `--lib`

I don't like this change, but I can't find a way to do the equivalent of a reinterpret_cast in Rust. A naive cast to u32 just results in a bunch of panics (attempt to subtract with overflow) because it is reading the wrong 'half' of the 64-bit value.

If this change would be acceptable I'd happily make a merge request with a DCO. But I'd rather do this properly.

@sagudev
Copy link
Member

sagudev commented Aug 20, 2024

I think we need to read to byte array, then use https://doc.rust-lang.org/std/primitive.usize.html#method.from_be_bytes to read back using native byte order.

@sagudev
Copy link
Member

sagudev commented Aug 20, 2024

Actually per https://pubs.opengroup.org/onlinepubs/007904975/functions/getsockopt.html this should be c_int, so I think your u32 patch is more correct. See also https://docs.rs/socket2/latest/src/socket2/socket.rs.html#1004-1009

@sagudev
Copy link
Member

sagudev commented Aug 20, 2024

Ideally we would just rewrite it using https://github.com/rust-lang/socket2

awilfox added a commit to awilfox/ipc-channel that referenced this issue Aug 24, 2024
This is defined by POSIX as an 'int'.  Using a usize causes an invalid
value to be read on 64-bit big endian platforms.

Signed-off-by: A. Wilcox <[email protected]>
github-merge-queue bot pushed a commit that referenced this issue Aug 24, 2024
This is defined by POSIX as an 'int'.  Using a usize causes an invalid
value to be read on 64-bit big endian platforms.

Signed-off-by: A. Wilcox <[email protected]>
@sagudev
Copy link
Member

sagudev commented Aug 24, 2024

Fixed in #357

@sagudev sagudev closed this as completed Aug 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants