Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CreateProxyTxn is called twice for a reqID #387

Closed
trentm opened this issue Apr 25, 2023 · 2 comments
Closed

CreateProxyTxn is called twice for a reqID #387

trentm opened this issue Apr 25, 2023 · 2 comments
Labels
aws-λ-extension AWS Lambda Extension bug Something isn't working

Comments

@trentm
Copy link
Member

trentm commented Apr 25, 2023

/cc @lahsivjar

APM AWS Lambda extension version (apm-aws-lambda version): latest main (commit 02f4ef1) plus a patch for some debug logging

Describe the bug

I believe the extension is reporting a transaction twice in a certain case -- at least it appears that CreateProxyTxn is called twice. My case is when (a) using the extension and Node.js APM agent layers that support the recent partial transactions work and (b) a lambda function executes that results in terminating the function process (via process.exit(42) in my case). This results in the extension receiving the platform.runtimeDone event from the Logs API, and then the SHUTDOWN event. The handlers for both of those events result in calling b.finalizeInvocation(...).

I'll show full repro steps below, but the annotated log is here: https://gist.github.com/trentm/66ebb5eb18c43eea0a8c013e193a41e6

My guess at the issue: should this code be deleting reqID from b.invocations?

func (b *Batch) OnLambdaLogRuntimeDone(reqID, status string, time time.Time) error {
	b.mu.Lock()
	defer b.mu.Unlock()
	return b.finalizeInvocation(reqID, status, time)
}

To Reproduce

  • create a new Lambda function, mine is named trentm-play-fn3

    name: trentm-play-fn3
    runtime: Node.js 18.x
    architecture: x86_64
    advanced settings/
        [x] Enable function URL
            Auth type: NONE
    

    Function URL: https://[REDACTED].lambda-url.us-west-2.on.aws/
    You don't have to enable a function URL, but it is a convenient way to invoke the function.

  • Handler code.
    WARNING: The AWS Lambda console defaults to creating an index.mjs handler
    file. This uses ES modules that the Node.js APM agent doesn't currently support.
    You must (a) delete index.mjs and (b) create a index.js file with:

    module.exports.handler = async (event) => {
        console.log('[handler] going to process.exit now');
        process.exit(42);
    };
  • Add layers to get the latest (unreleased) ext change, and the draft Node.js APM
    changes (feat: pre-register partial transactions in AWS Lambda apm-agent-nodejs#3285) to make the
    POST /register/transaction calls to the extension. You can use the following, or build your own:

    arn:aws:lambda:us-west-2:627286350134:layer:trentm-play-apm-lambda-extension:8
    arn:aws:lambda:us-west-2:627286350134:layer:trentm-dev-elastic-apm-node:53
    

    See note below for the patch to the extension in my build (for debug logging).

  • Configure envvars for the Lambda:

    NODE_OPTIONS=-r elastic-apm-node/start
    ELASTIC_APM_LAMBDA_APM_SERVER=https://trentm-play11.apm.us-west2.gcp.elastic-cloud.com:9243
    ELASTIC_APM_SECRET_TOKEN=[REDACTED]
    ELASTIC_APM_LOG_LEVEL=trace
    
  • Call the Lambda function once. We expect a 502 because it will crash.

    % curl -i https://[REDACTED].lambda-url.us-west-2.on.aws/
    HTTP/1.1 502 Bad Gateway
    Date: Tue, 25 Apr 2023 22:30:32 GMT
    Content-Type: application/json
    Content-Length: 21
    Connection: keep-alive
    x-amzn-RequestId: 343754cf-7763-402d-a7d8-0d2e5bda956a
    X-Amzn-Trace-Id: root=1-64485487-0126d8ea0a9690263d1bb94b;sampled=0;lineage=d073293c:0
    
    Internal Server Error
    
  • Look at the logs via aws logs tail --since 10s --follow /aws/lambda/trentm-play-fn3
    Gist of the annotated log output: https://gist.github.com/trentm/66ebb5eb18c43eea0a8c013e193a41e6

Patch to apm-aws-lambda.git for debug logging

the diff
diff --git a/accumulator/batch.go b/accumulator/batch.go
index 61618a4..5db1869 100644
--- a/accumulator/batch.go
+++ b/accumulator/batch.go
@@ -25,6 +25,7 @@ import (
        "time"

        "github.com/tidwall/gjson"
+       "go.uber.org/zap"
 )

 var (
@@ -79,15 +80,17 @@ type Batch struct {
        // invoke lifecycle then it is possible to receive the agent init request
        // before extension invoke is registered.
        currentlyExecutingRequestID string
+       logger                      *zap.SugaredLogger
 }

 // NewBatch creates a new BatchData which can accept a
 // maximum number of entries as specified by the arguments.
-func NewBatch(maxSize int, maxAge time.Duration) *Batch {
+func NewBatch(maxSize int, maxAge time.Duration, logger *zap.SugaredLogger) *Batch {
        return &Batch{
                invocations: make(map[string]*Invocation),
                maxSize:     maxSize,
                maxAge:      maxAge,
+               logger:      logger,
        }
 }

@@ -311,6 +314,7 @@ func (b *Batch) finalizeInvocation(reqID, status string, time time.Time) error {
        if !ok {
                return fmt.Errorf("invocation for requestID %s does not exist", reqID)
        }
+       b.logger.Infof("XXX call inc.CreateProxyTxn: status=%s time=%s", status, time)
        proxyTxn, err := inc.CreateProxyTxn(status, time)
        if err != nil {
                return err
diff --git a/app/app.go b/app/app.go
index 23ef134..d981c00 100644
--- a/app/app.go
+++ b/app/app.go
@@ -61,7 +61,6 @@ func New(ctx context.Context, opts ...ConfigOption) (*App, error) {

        app := &App{
                extensionName: c.extensionName,
-               batch:         accumulator.NewBatch(defaultMaxBatchSize, defaultMaxBatchAge),
        }

        var err error
@@ -70,6 +69,8 @@ func New(ctx context.Context, opts ...ConfigOption) (*App, error) {
                return nil, err
        }

+       app.batch = accumulator.NewBatch(defaultMaxBatchSize, defaultMaxBatchAge, app.logger)
+
        apmServerAPIKey, apmServerSecretToken, err := loadAWSOptions(ctx, c.awsConfig, app.logger)
        if err != nil {
                return nil, err
@trentm trentm added the bug Something isn't working label Apr 25, 2023
@github-actions github-actions bot added the aws-λ-extension AWS Lambda Extension label Apr 25, 2023
@lahsivjar
Copy link
Contributor

lahsivjar commented Apr 26, 2023

at least it appears that CreateProxyTxn is called twice.

This is expected. CreateProxyTxn can be called multiple times. Once an invocation is finalized, we mark it as such (ref). In CreateProxyTxn we check if it really needs to create a proxy transaction - here we check if the finalized flag is set (ref).

I have created a PR to rename CreateProxyTxn to MaybeCreateProxyTxn to better indicate what the method does.

My guess at the issue: should this code be deleting reqID from b.invocations?

runtimeDone is not the last event for the invocation. After runtimeDone we receive the platformReport log, this is the event which triggers cleanup from the invocations datastructure. We use some of the invocation data to enrich the platformReport and that is why we don't cleanup invocations when we receive runtimeDone. Note that platfromReport will not be reported for the last invocation in case of failures, this is known (ref) but there is no known solution for this.

Are you observing duplicate transactions being reported by the extension?

@trentm
Copy link
Member Author

trentm commented Apr 26, 2023

Are you observing duplicate transactions being reported by the extension?

No. My bad for jumping the gun on this. Thanks for the update.

@trentm trentm closed this as completed Apr 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws-λ-extension AWS Lambda Extension bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants