Skip to content

Using xray.SQLContext with SetMaxOpenConns/SetConnMaxLifetime sometimes "failed to begin subsegment". #391

Open
@vbqz

Description

@vbqz

Hi,

My team bumped into failed to begin subsegment panics when using SQLContext together with setting limits on connection numbers and durations.

With aws-xray-sdk-go v1.8.0 the problem still occurs, but logs instead of panics (which is great!). But I wanted to check if we are misusing the SDK in anyway, and give you a heads up if there is some kind of race condition here.

Issue

When using a database connection created like:

instance, err = xray.SQLContext("postgres", url)
...
instance.SetMaxOpenConns(10)
instance.SetConnMaxLifetime(30 * time.Second)

Executing queries with instrumented contexts occasionally caused panics like:

"LOG panic: failed to begin subsegment named 'xyz@localhost': segment cannot be found."

From what I can tell, there is a small chance that database/sql will try to open a fresh conn using context.Background(), instead of using the caller context, or re-using an existing connection.

Reprod

I managed to come up with this slightly contrived program (tons of connections and very low lifetimes) that reliably triggers the failure on my machine:

package main

import (
	"context"
	"database/sql"
	"log"
	"net"
	"time"

	"github.com/aws/aws-xray-sdk-go/xray"

	// "sqlite" database driver.
	_ "modernc.org/sqlite"
)

func initDB() (*sql.DB, error) {
	db, err := xray.SQLContext("sqlite", "tmp.sqlite")
	if err != nil {
		return nil, err
	}

	db.SetMaxOpenConns(10)
	db.SetConnMaxLifetime(time.Millisecond)

	return db, nil

}

func oneRequest(db *sql.DB) error {
	var err error

	ctx, seg := xray.BeginSegment(context.Background(), "main")
	defer seg.Close(err)

	_, err = db.ExecContext(ctx, "select sqlite_version();")
	if err != nil {
		return err
	}

	return nil
}

func xraySink() {
	pc, err := net.ListenPacket("udp", ":2000")
	if err != nil {
		log.Fatalf("listen: %s", err)
	}
	defer pc.Close()

	buf := make([]byte, 1024)
	for {
		pc.ReadFrom(buf)
	}
}

func main() {

	// Give X-ray something to log to.
	go xraySink()

	db, err := initDB()
	if err != nil {
		log.Fatalf("open DB: %s", err)
	}

	start := make(chan struct{})

	for i := 0; i < 10_000; i++ {
		go func(i int) {
			<-start
			err := oneRequest(db)
			if err != nil {
				log.Fatalf("oneRequest failed: %s", err)
			}
		}(i)
	}
	close(start)
	time.Sleep(time.Second)
}

The stacktrace at the failed subsegment is:

0  0x00000001029a3e90 in github.com/aws/aws-xray-sdk-go/xray.BeginSubsegment
   at /Users/viktor/go/pkg/mod/github.com/aws/[email protected]/xray/segment.go:283
1  0x000000010299db84 in github.com/aws/aws-xray-sdk-go/xray.Capture
   at /Users/viktor/go/pkg/mod/github.com/aws/[email protected]/xray/capture.go:19
2  0x00000001029ae85c in github.com/aws/aws-xray-sdk-go/xray.(*driverConnector).Connect
   at /Users/viktor/go/pkg/mod/github.com/aws/[email protected]/xray/sql_go110.go:59
3  0x0000000102267f58 in database/sql.(*DB).openNewConnection
   at /Users/viktor/local/go/src/database/sql/sql.go:1238
4  0x0000000102267ec0 in database/sql.(*DB).connectionOpener
   at /Users/viktor/local/go/src/database/sql/sql.go:1228
5  0x0000000102265a00 in database/sql.OpenDB.func1
   at /Users/viktor/local/go/src/database/sql/sql.go:792
6  0x00000001021a47e4 in runtime.goexit
   at /Users/viktor/local/go/src/runtime/asm_arm64.s:1172

So its the connectionOpener goroutine that gets a request for another connection, and tries to do it with context.Background().

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions