-
Notifications
You must be signed in to change notification settings - Fork 178
snet: implement STUN for NAT traversal #4863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…r on same port as SCION dataplane
pkg/snet/stun_conn.go
Outdated
| if timeout <= 0 { | ||
| return nil, os.ErrDeadlineExceeded | ||
| } | ||
| deadlineExceeded = time.After(timeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
time.After is quite expensive. Maybe we can reuse a time.Timer, potentially even allocated once as member of *stunConn (probably can't be reused for all callers though, but at least across retransmissions)? This would reduce the number of runtime allocations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
package main
import (
"testing"
"time"
)
func BenchmarkTimeAfter(b *testing.B) {
for b.Loop() {
for range 7 {
<-time.After(1 * time.Microsecond)
}
}
}
func BenchmarkReuseTimer(b *testing.B) {
t := time.NewTimer(0)
<-t.C // wait for initial fire
for b.Loop() {
for range 7 {
t.Reset(1 * time.Microsecond)
<-t.C
}
}
}goos: darwin
goarch: arm64
pkg: benchfmt
cpu: Apple M4 Pro
BenchmarkTimeAfter-14 39968 29366 ns/op 1736 B/op 21 allocs/op
BenchmarkReuseTimer-14 42736 28290 ns/op 0 B/op 0 allocs/op
PASS
ok benchfmt 2.624s
Reusing a time.Timer results in no garbage being generated for the GC to clean up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented the change, reusing time.Timer on retransmissions and deadline changes to reduce the number of timer allocations. Allocating as member on stunConn does not work though, since we need separate timers for concurrent calls.
| close(c.readDeadlineChanged) | ||
| close(c.writeDeadlineChanged) | ||
| c.readDeadlineChanged = make(chan struct{}) | ||
| c.writeDeadlineChanged = make(chan struct{}) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks rather wasteful as it probably can be done with sync.Cond without runtime allocations. But I can't tell straight away how complicated that would be to do here and whether it's worth it. This is just an idea, so feel free to resolve this comment without action.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot wait on sync.Cond directly inside a select statement. If we use a Cond, we would still need to bridge it to a channel in the waiting call, e.g. like this:
deadlineChanged := make(chan struct{})
go func() {
c.mutex.Lock()
c.readDeadlineCond.Wait()
c.mutex.Unlock()
deadlineChanged <- struct{}{}
}()
select {
...
I don't know which is better idiomatically. Any thoughts?
romshark
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, stated my suggestions in the comments.
| } | ||
| listenHostPort := uint16(c.local.Host.Port) | ||
|
|
||
| // Rewrite source IP if STUN is in use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably move this to a separate function to keep the code here easily readable:
// Rewrite source IP if STUN is in use
var err error
listenHostIP, listenHostPort, err = c.rewriteSourceForSTUN(raddr, nextHop, listenHostIP, listenHostPort)
if err != nil {
return 0, err
}and at the end of the file:
// rewriteSourceForSTUN returns the source IP and port for STUN rewriting.
// If STUN is in use and the destination is in a different IA, it returns
// the mapped address; otherwise it returns the original address unchanged.
func (c *scionConnWriter) rewriteSourceForSTUN(
raddr net.Addr,
nextHop *net.UDPAddr,
listenHostIP netip.Addr,
listenHostPort uint16,
) (netip.Addr, uint16, error) {
scionPacketConn, ok := c.conn.(*SCIONPacketConn)
if !ok {
return listenHostIP, listenHostPort, nil
}
stunConn, ok := scionPacketConn.conn.(*stunConn)
if !ok {
return listenHostIP, listenHostPort, nil
}
var sameIA bool
switch a := raddr.(type) {
case *UDPAddr:
sameIA = a.IA.Equal(c.local.IA)
case *SVCAddr:
sameIA = a.IA.Equal(c.local.IA)
}
if sameIA {
return listenHostIP, listenHostPort, nil
}
nextHopIP, ok := netip.AddrFromSlice(nextHop.IP)
if !ok {
return netip.Addr{}, 0, serrors.New("invalid next hop IP", "ip", nextHop.IP)
}
nextHopIP = nextHopIP.Unmap()
nextHopAddrPort := netip.AddrPortFrom(nextHopIP, uint16(nextHop.Port))
mappedAddr, err := stunConn.mappedAddr(nextHopAddrPort)
if err != nil {
return netip.Addr{}, 0, serrors.New("Error getting mapped address for STUN", "stun", err)
}
return mappedAddr.Addr(), mappedAddr.Port(), nil
}| listenHostPort := uint16(c.local.Host.Port) | ||
|
|
||
| // Rewrite source IP if STUN is in use | ||
| if scionPacketConn, ok := c.conn.(*SCIONPacketConn); ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of doing those casts twice I think we should cache whether it's a stun conn or not and then simply check a boolean. Otherwise we have to do a type cast for every write...
// in NewCookedConn
hasSTUN: hasSTUNConn(pconn),
// below in the same file
// hasSTUNConn checks if the provided PacketConn has STUN enabled.
func hasSTUNConn(pc PacketConn) bool {
scionPacketConn, ok := pc.(*SCIONPacketConn)
if !ok {
return false
}
_, ok = scionPacketConn.conn.(*stunConn)
return ok
}then the check here becomes: (if you use my function suggestion from above)
if !c.hasSTUN {
return listenHostIP, listenHostPort, nil
}
scionPacketConn := c.conn.(*SCIONPacketConn)
stunConn := scionPacketConn.conn.(*stunConn)| // whether the underlying connection is a stunConn, which indicates that NAT traversal | ||
| // is in use. | ||
| // TODO: Is it necessary to check that the address matches one of the mapped addresses? | ||
| scionConn, ok := c.conn.(*SCIONPacketConn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar to the other comment I have in the writer, I think having to cast twice for checking for a stun conn is not great. Maybe a bool flag could help?
Background
SCION packet headers contain a SRC address to which packets should be returned. This address needs to be visible/reachable by the first-hop border router, assuming that the path gets simply reversed by the peer. This address may not be easy to discover if the sender is separated from the receiver by a NAT.
A solution has been designed that uses STUN for NAT traversal, allowing clients behind NATs to query the first-hop border router to obtain their external mapped address. This external address can then be used as the SRC address in the SCION header. The relevant design document can be found here. The implementation of a STUN server at the border router has been done here. This PR and its implementation details were originally discussed here
Summary of Changes
The present PR concerns the client side implementation of the NAT travelsal mechanism in the snet library. Modifications involve adding a configurable option to
SCIONNetworkthat enables transparent NAT traversal. If enabled, SCION connections obtained bySCIONNetwork.Dialwill automatically query the first-hop border router for the external address using a STUN request. The external address is then used as the clients SRC address instead of its local address when sending packets.The STUN mechanism is implemented compliant with RFC8489, which includes retransmissions on timeouts. Changes to the behavior of the SCION connection's methods are kept to a minimum. Specifically, the
ReadFromandWriteTomethods closely mimick the behavior ofnet.UDPconn. The STUN mechanism happens in the background and is transparent towards client applications.In summary, the changes in this PR allow client applications to connect to the SCION internet even in the presence of a NAT device.
Testing
A new integration test has been added in
/acceptance/stun. The test simulates a SCION network topology with a NAT device present. A test client behind the NAT sends a test packet using snet'sSCIONNetwork.Dialto a test server outside of the NAT, in a different AS. The test server, in turn, responds with a packet to the client. The test succeeds if the returning packet arrives at the client without issue. As usual, this test is part of the CD/CI pipeline.The integration test can be run, manually, using the following instructions:
make, thenmake docker-imagesbazel test --config=integration --test-output=streamed //acceptance/stun:test