Skip to content

Stateful NAT: IP/port allocator #697

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
Draft

Stateful NAT: IP/port allocator #697

wants to merge 7 commits into from

Conversation

qmonnet
Copy link
Member

@qmonnet qmonnet commented Jul 15, 2025

WIP

@qmonnet qmonnet added this to the GW R1 milestone Jul 15, 2025
@qmonnet qmonnet requested a review from mvachhar July 15, 2025 13:02
@qmonnet qmonnet self-assigned this Jul 15, 2025
@qmonnet qmonnet added the area/nat Related to Network Address Translation (NAT) label Jul 15, 2025
impl NatPort {
const MIN: u16 = 1024 + 1;

pub fn new_checked(port: u16) -> Result<NatPort, NatPortError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to forbid representation of < 1024 ports? Would it be worth having instead a NatUnreservedPort or NatNonPrivPort or something? Also, what happens if the source port we are Natting is priveleged?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to forbid representation of < 1024 ports?

This was my understanding. I can't remember the context of the discussion.

Would it be worth having instead a NatUnreservedPort or NatNonPrivPort or something?
Also, what happens if the source port we are Natting is priveleged?

We probably have an issue with the return session (not implemented yet). I'll need to revisit this part in a follow-up PR.

target_src_port,
target_dst_port,
);
Self::update_stats(&mut new_state, total_bytes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you get here with target_* being None? Is that ok? Would you not add the state entry? If so, can the check for no translation earlier just be dropped?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we hide all the session manipulation behind a trait and a separate object? We need to move to have a single session table for all flows that can be shared between NAT and firewall, so some kind of abstraction where the session table manipulation object can be passed in to NAT would be very helpful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you get here with target_* being None? Is that ok? Would you not add the state entry? If so, can the check for no translation earlier just be dropped?

PR has changed and I'm not sure this question still applies.

Can we hide all the session manipulation behind a trait and a separate object? We need to move to have a single session table for all flows that can be shared between NAT and firewall, so some kind of abstraction where the session table manipulation object can be passed in to NAT would be very helpful.

This is what traits NatSession and NatSessionManager are for?

// SPDX-License-Identifier: Apache-2.0
// Copyright Open Network Fabric Authors

use super::AllocatorError;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this commit, it would be helpful to separate the code that finds the session table from the code that uses it so that the session table manipulation, and even the session table entry can be passed into the NAT module in the (near term) future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this commit, it would be helpful to separate the code that finds the session table from the code that uses it so that the session table manipulation, and even the session table entry can be passed into the NAT module in the (near term) future.

This commit (and PR) is not related to the session table, but instead focuses on the IP/port allocation that is used to create new entries in the table.

I think we have a decent amount of separation for the allocator? It is currently attached as a field to the struct StatefulNat, but the only way the stage interacts with it is by calling its self.allocator.allocate_v4() (or IPv6 equivalent) method, it never accesses to the allocator's internals directly.

Let's chat, if this does not address your concern.

@qmonnet qmonnet force-pushed the pr/qmonnet/nat-alloc branch from 63f429e to b931830 Compare July 16, 2025 17:08
qmonnet added 7 commits July 17, 2025 13:12
Leave room for the IP/port allocator in allocator.rs, move NatPort
definitions and implementations to their own, dedicated file.

Signed-off-by: Quentin Monnet <[email protected]>
The IP/port allocator may, or may not return NAT-related target source
and destination addresses and ports, depending on whether NAT is
desirable for the considered connection. For ports, we already used an
Option<NatPort>; make the allocator use one for the IP addresses, too.

Adjust the stateful NAT code accordingly, depending on whether we need
to NAT or not (based on whether we get Some IP address, or None).

Signed-off-by: Quentin Monnet <[email protected]>
Make set_source_port() and set_destination_port() correctly handle and
pass up any error that might happen when trying to update the source or
destination ports, respectively, as part of the NAT processing for a
packet.

Also change them to take a non-optional NatPort object as an argument
(unwrap the Option in the caller rather than in methods that shouldn't
be called at all if there's no port to translate).

Signed-off-by: Quentin Monnet <[email protected]>
Rework our allocator draft to introduce a NatAllocator trait and a
default NatDefaultAllocator implementation. It should be easy to replace
this default allocator with alternative NatAllocator implementations for
future experiments.

The default allocator implementation is currently only a skeleton, and
will be completed in future commits.

One change worth noting is that we no longer "find" the NAT pool prior
to calling the allocation function. Instead, the allocator itself will
be responsible for looking up from the right pool - or whatever internal
structure it decides to use - and return a valid IP for the Peering in
use, based on the information it gets from the NatTuple object.

Signed-off-by: Quentin Monnet <[email protected]>
Implement port allocation for stateful NAT. Introduce structures
PortBlockAllocator, PortAllocator.

The idea is the following: we split the L4 ports space in "blocks" of
256 ports. A PortAllocator works on one of these blocks, and allocates
ports until the block is full. A PortBlockAllocator is a collection of
all the blocks forming the 65k-port range, listed in a randomised order,
and holds states (free, heating (being allocated), or cooling (fully
allocated, waiting for connections timeouts to expire before being
garbage-collected)) for each of these blocks. It also contains a
PortAllocator object, to work with the block currently in use for
allocating new ports. Ports are allocated in a linear fashion within a
port block.

Signed-off-by: Quentin Monnet <[email protected]>
Introduce an IP allocator for stateful NAT. This is a simple IP
allocator relying on a large, albeit compressed, bitmap. With up to 2^32
elements, it can map the whole IPv4 address range. For IPv6, we can map
it to 2^32 elements, which should realistically cover the use cases we
can afford: with 65k ports per IP address, this makes for about 277
trillion connections, each requiring multiple bytes in memory.

Allocation for IPv4 and IPv6 differs a little: our bitmap is large
enough to cover the IPv4 space, so we can just translate IPv4 addresses
from and to u32 integers. For IPv6, the bitmap does not represent all
possible addresses, but instead we use it to store up to the first 4^32
addresses from our list of allowed prefixes for the VpcExpose in use,
which means we have a small conversion step between the bitmap lookup
and the obtention of the address to allocate.

Signed-off-by: Quentin Monnet <[email protected]>
Based on the IP and port allocator submodules that we implemented in the
last commits, connect the different parts to implement the default
IP/port allocator for stateful NAT.

This default allocator contains several maps of NatPool objects, using
VPD-related IDs, destination addresses, and L4 protocol type to form
keys. For each packet, the ID, L4 protocol, and destination address, all
passed as part of the packet tuple, are used to retrieve the two NatPool
objects to use for source and destination NAT, respectively. If there is
no NatPool for a given direction, then NAT is not required for this
direction, for the connection under consideration.

Signed-off-by: Quentin Monnet <[email protected]>
@qmonnet qmonnet force-pushed the pr/qmonnet/nat-alloc branch from b931830 to 4b4596b Compare July 17, 2025 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/nat Related to Network Address Translation (NAT)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants