Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto/tls: SMTP connection timeout in Go Version 1.23 #71257

Closed
littlerest1 opened this issue Jan 13, 2025 · 14 comments
Closed

crypto/tls: SMTP connection timeout in Go Version 1.23 #71257

littlerest1 opened this issue Jan 13, 2025 · 14 comments
Labels
BugReport Issues describing a possible bug in the Go implementation. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.

Comments

@littlerest1
Copy link

Go version

go version go1.23.2 darwin/arm64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/yanz1/Library/Caches/go-build'
GOENV='/Users/yanz1/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/yanz1/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/yanz1/go'
GOPRIVATE=''
GOPROXY=''
GOROOT='/opt/homebrew/Cellar/[email protected]/1.23.2/libexec'
GOSUMDB='off'
GOTMPDIR=''
GOTOOLCHAIN='local'
GOTOOLDIR='/opt/homebrew/Cellar/[email protected]/1.23.2/libexec/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.23.2'
GCCGO='gccgo'
AR='ar'
CC='cc'
CXX='c++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/s4/j38_716j1q1dmb564z0ynfq00000gq/T/go-build782530738=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

Was using library crypto/tls and net for smtp dialing and communication
the code is working fine in MacOs arm64 system. However the binary file built by command env GOOS=linux GOARCH=amd64 go build -o and run in AWS Linux EC2 instance will receive read tcp: timeout issue. This issue is resolved by downgrading the Go version to 1.22.


package main

import (
	"crypto/tls"
	"io"
	"fmt"
	"log"
	"net"
	"strings"
	"time"
)

func main() {
	smtpHost := "PRIVATE HOST"
	smtpPort := "587"

	tlsConfig := &tls.Config{
		ServerName: smtpHost,
	}

	dialer := &net.Dialer{
		Timeout: 10 * time.Second,
	}
	conn, err := dialer.Dial("tcp", smtpHost+":"+smtpPort)

	if err != nil {
		log.Fatal("failed to create tlsDialer:", "error", err)
	}

	fmt.Fprintf(conn, "STARTTLS\r\n")
	buf := make([]byte, 0, 4096) // big buffer
	tmp := make([]byte, 256)     // using small tmo buffer for demonstrating
	for {
		n, err := conn.Read(tmp)
		log.Println(string(tmp))
		if err != nil {
			if err != io.EOF {
				log.Fatal("read error:", err)
			}
			break
		}
		if strings.Contains(string(tmp), "Go ahead with TLS") {
			client := tls.Client(conn, tlsConfig)
			fmt.Fprintf(client, "AUTH LOGIN\r\n") // Time out happens here. Tried to send`HLEO` also received timeout issue
			log.Println("Start TLS connection and request AUTH")
		}

		///log.Println("got", string(tmp))
		buf = append(buf, tmp[:n]...)
	}
	log.Println("total size:", len(buf))

}

What did you see happen?

{
    "time": "2025-01-07T09:16:34.914889396Z",
    "level": "WARN",
    "msg": "error while sending email",
    "error": "read tcp 10.33.124.75:46246->10.248.46.8:587: read: connection timed out"
}

What did you expect to see?

AUTH LOGIN command successfully send after STARTTLS command and receive the request for username and password for further authentication

@gabyhelp gabyhelp added the BugReport Issues describing a possible bug in the Go implementation. label Jan 14, 2025
@ianlancetaylor
Copy link
Member

CC @golang/security

@seankhliao
Copy link
Member

maybe the same as #70232

@seankhliao seankhliao added WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jan 14, 2025
@littlerest1
Copy link
Author

maybe the same as #70232

seems like so

@mjl-
Copy link

mjl- commented Jan 22, 2025

@littlerest1 I am not sure this is the same situation. The other issue is for a specific non-Go TLS client connecting to a Go TLS server. And the symptom is that that other client is dropping the connection, there is no timeout involved. You are using a Go TLS client. Which software is the TLS server?

Looking at your code, I have a few suggestions for the code you're using:

  • You're looking for "Go ahead with TLS", but the conn.Read just before may not have consumed the full the line. If that's the case, the tls client is initialized on data that comes before the start of the TLS packets. That may cause bytes to be interpreted as something else, e.g. a large packet size, larger than what remote is sending, resulting in a read timeout eventually. You should read to the end of the line. Log what you've read with "%q", it will show if you've consumed the newline.
  • Handle errors from fmt.Printf on the connection/client. May reveal useful information.
  • After making the tls client, explicitly do the handshake with client.Handshake() and handle any errors. May help a little understanding which i/o exactly is timing out.
  • Also use tmp[:n] instead of tmp when logging.

@littlerest1
Copy link
Author

@littlerest1 I am not sure this is the same situation. The other issue is for a specific non-Go TLS client connecting to a Go TLS server. And the symptom is that that other client is dropping the connection, there is no timeout involved. You are using a Go TLS client. Which software is the TLS server?

Looking at your code, I have a few suggestions for the code you're using:

  • You're looking for "Go ahead with TLS", but the conn.Read just before may not have consumed the full the line. If that's the case, the tls client is initialized on data that comes before the start of the TLS packets. That may cause bytes to be interpreted as something else, e.g. a large packet size, larger than what remote is sending, resulting in a read timeout eventually. You should read to the end of the line. Log what you've read with "%q", it will show if you've consumed the newline.
  • Handle errors from fmt.Printf on the connection/client. May reveal useful information.
  • After making the tls client, explicitly do the handshake with client.Handshake() and handle any errors. May help a little understanding which i/o exactly is timing out.
  • Also use tmp[:n] instead of tmp when logging.

The reason I was using the tls dial, its because the is no reasonable error log when I was using net/smtp library. So that I demonstrate the email sending process one by one.

Will try better logging for more info about why it get io timeout

@mjl-
Copy link

mjl- commented Jan 23, 2025

ah yeah, that makes sense. when debugging, looking at the traffic with tcpdump may also be helpful (and easy to do).

@thomvaill
Copy link
Contributor

Hi,
We have exactly the same issue. Our net/smtp client was working fine on go 1.22.
After upgrading to 1.23, we get these timeouts.
After debugging like you did thanks to your snippet, it also times out at the TLS handshake step.
I agree with @mjl- : this is a different issue than #70232

@littlerest1 have you managed to find a workaround?
On our side we tried to mess with the allowed ciphers but without success

Could this issue be reopened?

@thomvaill
Copy link
Contributor

I managed to find 2 workarounds!

  1. Re-implement net/smtp SendEmail() function to be able to pass this custom tls.Config option:
config := &tls.Config{
    MaxVersion: tls.VersionTLS12,
}

Because in our case our SMTP server only accepts TLS v1.2, and with this fix it works

  1. Or just set GODEBUG=tlskyber=0

So the regression looks like to come from this commit: d0edd9a
Which is documented here: https://tip.golang.org/doc/go1.23

The experimental post-quantum key exchange mechanism X25519Kyber768Draft00 is now enabled by default when Config.CurvePreferences is nil. The default can be reverted by adding tlskyber=0 to the GODEBUG environment variable.

We definitely should re-open this issue or open a new one.
I would love to open a new one but unfortunately I was only able to reproduce the issue with our company's internal SMTP server (I don't know the software they use), not a public one. I also tested to create a custom one with go, but still not able to reproduce.
So the bug must be related to some implementations of TLS v1.2 on the server side :(

@seankhliao
Copy link
Member

if it's the remote server breaking on kyber, it's not really considered a bug in go, see https://tldr.fail/

@thomvaill
Copy link
Contributor

Thank you @seankhliao you are completely right!
I totally understand the progressive rollout of post-quantum algorithms, but I think a better communication with some disclaimers should have been issued.
This bug is very difficult to debug when you don't have the knowledge provided by https://tldr.fail/ (very helpful by the way!)
I saw in the changelog that Kyber was removed in go 1.24 and replaced by another one, the problem must be the same?
Maybe the Go team should communicate more about this

@rolandshoemaker
Copy link
Member

We could perhaps add a note to the 1.24 release notes (and perhaps the crypto/tls package) that usage of the X25519MLKEM768 kex (the name of the standardized Kyber kex) may cause failures with TLS stacks that don't handle large records correctly.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/651036 mentions this issue: doc: add a disclamer for the TLS handshake timeouts TL;DR bug in godebug.md

@thomvaill
Copy link
Contributor

We could perhaps add a note to the 1.24 release notes (and perhaps the crypto/tls package) that usage of the X25519MLKEM768 kex (the name of the standardized Kyber kex) may cause failures with TLS stacks that don't handle large records correctly.

You are right! Usually when you have regressions like this after an upgrade, you instinctively check the release notes, so it might help.
I created 2 PRs:

Regarding the crypto/tls package I was not sure where to document this, so I didn't.

gopherbot pushed a commit to golang/website that referenced this issue Feb 20, 2025
…bug.md

Relates to golang/go#71257. Since post-quantum TLS algorithms are enabled by default, we should warn about the possible bugs with legacy servers (see https://tldr.fail/)

Change-Id: Id5a565c2959c5b0f7b8bdd53ce9e70b191c4bf38
GitHub-Last-Rev: a709e6f
GitHub-Pull-Request: #311
Reviewed-on: https://go-review.googlesource.com/c/website/+/650857
Reviewed-by: Ian Lance Taylor <[email protected]>
Reviewed-by: Roland Shoemaker <[email protected]>
Auto-Submit: Ian Lance Taylor <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
gopherbot pushed a commit that referenced this issue Feb 20, 2025
…bug.md

Relates to #71257. Since post-quantum TLS algorithms are enabled by default, we should warn about the possible bugs with legacy servers (see https://tldr.fail/)

Change-Id: I06a5d8a927497ea2141007b14a90af27e0891720
GitHub-Last-Rev: 476e646
GitHub-Pull-Request: #71865
Reviewed-on: https://go-review.googlesource.com/c/go/+/651036
Reviewed-by: Roland Shoemaker <[email protected]>
Reviewed-by: Ian Lance Taylor <[email protected]>
Auto-Submit: Ian Lance Taylor <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BugReport Issues describing a possible bug in the Go implementation. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

8 participants