Skip to content

Conversation

@aaronc
Copy link
Member

@aaronc aaronc commented Oct 29, 2025

Description

This PR:

  • adds out of the box global OpenTelemetry declarative configuration support to the telemetry/ package via https://pkg.go.dev/go.opentelemetry.io/contrib/otelconf/v0.3.0
  • deprecates all the existing methods in telemetry/ which are based on https://pkg.go.dev/github.com/hashicorp/go-metrics which is under-maintained and requires mutex locks and map lookups on every telemetry method
  • provides default routing of go-metrics telemetry data to OpenTelemetry when OpenTelemetry is enabled
  • instruments BaseApp with OpenTelemetry spans and block and tx counter metrics
  • integrates OpenTelemetry shutdown into server and provides a TestingMain function for tests to export telemetry data
  • configures the log/slog default logger to send logs to OpenTelemetry and allows otelslog bridged to be used for logging. NOTE: this leaves a bit of a disconnect between the SDK's existing logging infrastructure which currently just writes to stdout - this can either be default with in this PR or dealt with in a follow up

The OpenTelemetry go libraries are very actively maintained, most vendors in the space are adding OpenTelemetry support and generally it seems like the industry is headed in this direction. Much of our existing telemetry code is to configure basic telemetry exporting, but with otelconf declarative config, we don't need to maintain any of this ourselves and the out of the box experience is quite simple even for usage in testing.

Closes: SDK-427

@github-actions github-actions bot removed the C:log label Oct 31, 2025
_ "cosmossdk.io/api/cosmos/app/v1alpha1"
fmt "fmt"
io "io"
reflect "reflect"

Check notice

Code scanning / CodeQL

Sensitive package import Note

Certain system packages contain functions which may be a possible source of non-determinism
_ "cosmossdk.io/api/cosmos/app/v1alpha1"
fmt "fmt"
io "io"
reflect "reflect"

Check notice

Code scanning / CodeQL

Sensitive package import Note

Certain system packages contain functions which may be a possible source of non-determinism
import (
fmt "fmt"
io "io"
reflect "reflect"

Check notice

Code scanning / CodeQL

Sensitive package import Note

Certain system packages contain functions which may be a possible source of non-determinism
_ "cosmossdk.io/api/cosmos/app/v1alpha1"
fmt "fmt"
io "io"
reflect "reflect"

Check notice

Code scanning / CodeQL

Sensitive package import Note

Certain system packages contain functions which may be a possible source of non-determinism
// InitializeOpenTelemetry initializes the OpenTelemetry SDK.
// We assume that the otel configuration file is in `~/.<your_node_home>/config/otel.yaml`.
// An empty otel.yaml is automatically placed in the directory above in the `appd init` command.
func InitializeOpenTelemetry(homeDir string) error {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer we just pass in a filename here instead of a dir. Then the user can use whatever filename they want. That would be better for testing too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, a more future proof approach here would be either an options struct, ex:

type Options struct {
  ConfigFilename string
}

func InitializeOpenTelemetry(Options) error {

or vararg Options ex:

type Option interface {
  applyOption(*options)
}

func WithConfigFile(filename string) Option { ...}

func InitializeOpenTelemetry(...Option) error {

otel go universally uses the ...Option pattern FWIW

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codecov
Copy link

codecov bot commented Dec 3, 2025

Codecov Report

❌ Patch coverage is 46.26506% with 223 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.43%. Comparing base (3516334) to head (6c2a994).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
telemetry/config.go 1.47% 134 Missing ⚠️
telemetry/compat.go 0.00% 59 Missing ⚠️
server/start.go 0.00% 10 Missing ⚠️
telemetry/testing.go 0.00% 7 Missing ⚠️
types/module/module.go 88.63% 5 Missing ⚠️
baseapp/baseapp.go 94.59% 2 Missing ⚠️
client/cmd.go 0.00% 2 Missing ⚠️
telemetry/metrics.go 0.00% 2 Missing ⚠️
contrib/x/crisis/abci.go 0.00% 1 Missing ⚠️
x/genutil/client/cli/init.go 66.66% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main   #25516      +/-   ##
==========================================
+ Coverage   70.35%   70.43%   +0.08%     
==========================================
  Files         825      833       +8     
  Lines       53809    54387     +578     
==========================================
+ Hits        37857    38308     +451     
- Misses      15952    16079     +127     
Files with missing lines Coverage Δ
baseapp/abci.go 86.90% <100.00%> (+0.78%) ⬆️
baseapp/grpcserver.go 74.07% <100.00%> (ø)
baseapp/state/manager.go 84.37% <100.00%> (+2.23%) ⬆️
baseapp/state/state.go 85.71% <ø> (ø)
blockstm/stm.go 79.31% <100.00%> (ø)
server/api/server.go 66.31% <ø> (ø)
server/config/config.go 93.10% <ø> (ø)
server/config/toml.go 81.81% <ø> (ø)
server/grpc/server.go 45.00% <100.00%> (+0.69%) ⬆️
telemetry/wrapper.go 57.14% <ø> (ø)
... and 29 more

... and 10 files with indirect coverage changes

Impacted file tree graph

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

baseapp/abci.go Outdated
// are used as part of gRPC paths
if grpcHandler := app.grpcQueryRouter.Route(req.Path); grpcHandler != nil {
return app.handleQueryGRPC(grpcHandler, req), nil
return app.handleQueryGRPC(grpcHandler, req, ctx), nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes more sense to have the ctx be first argument here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

server/start.go Outdated
traceCleanupFn()

// shutdown telemetry with a 5 second timeout
shutdownCtx, _ := context.WithTimeout(context.Background(), 5*time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it ok to discard this cancel?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

)

var (
sdk *otelconf.SDK
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we create a more descriptive name for this var?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


// parse cosmos extra config
var extraCfg extraConfig
err = yaml.Unmarshal(bz, &extraCfg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, we can support all of these as sinks simultaneously?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - these are all independent exporters. it just tells otel to forward all:

  • metrics if metricsfile is set
  • traces if tracesfile is set
  • logs if logsfile is set

Copy link
Contributor

@Eric-Warehime Eric-Warehime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm--please document additional config values.

Comment on lines 250 to 258
type cosmosExtra struct {
TraceFile string `json:"trace_file" yaml:"trace_file" mapstructure:"trace_file"`
MetricsFile string `json:"metrics_file" yaml:"metrics_file" mapstructure:"metrics_file"`
MetricsFileInterval string `json:"metrics_file_interval" yaml:"metrics_file_interval" mapstructure:"metrics_file_interval"`
LogsFile string `json:"logs_file" yaml:"logs_file" mapstructure:"logs_file"`
InstrumentHost bool `json:"instrument_host" yaml:"instrument_host" mapstructure:"instrument_host"`
InstrumentRuntime bool `json:"instrument_runtime" yaml:"instrument_runtime" mapstructure:"instrument_runtime"`
Propagators []string `json:"propagators" yaml:"propagators" mapstructure:"propagators"`
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document what these fields are and the potential valid values for them?

Copy link
Member Author

@aaronc aaronc Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also note that this stuff should be considered experimental. Otel contrib will update their configuration setup upstream and based on what they do we may want to change this. I expect they will include some native support for dumping to the file soon. They should be adding support for propagators soon too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aljo242
Copy link
Contributor

aljo242 commented Dec 4, 2025

lint failing just on deprecated fields

@technicallyty
Copy link
Contributor

lint failing just on deprecated fields

should be fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants