Skip to content

Conversation

@gregory-at-cvs
Copy link

@gregory-at-cvs gregory-at-cvs commented Nov 22, 2025

PR #2: Session Management Infrastructure

Description

Adds foundational infrastructure for session tracking across all telemetry signals.

This PR includes:

Core Session Utilities

  • SessionIdentifiers - Data class for session.id and session.previous_id
  • SessionExtensions - Kotlin extensions for adding session IDs to spans/logs/metrics
  • SessionIdentifierFacade - Facade pattern for consistent session access
  • SessionIdFacade - UUID fallback for session ID generation

Metrics Infrastructure

  • SessionMetricExporterAdapter - Adapter pattern for adding session IDs to metrics
  • Decorator pattern for all metric data types (Sum, Gauge, Histogram, etc.)
  • Factory classes for creating session-enhanced metric data
  • 11 wrapper models for different metric types

Key Addition: This enables session-based metrics analytics - a capability
not present in the iOS or JavaScript SDKs! This allows correlating metrics
like memory usage, network bytes, or performance counters to specific user sessions.

Design Patterns Used

  • Adapter Pattern - Wraps metric exporters to inject session attributes
  • Factory Pattern - Creates session-enhanced metric data
  • Decorator Pattern - Adds session attributes without modifying original data
  • Facade Pattern - Simplifies session identifier access

Related PRs

This is part 2 of 3 in a comprehensive session management enhancement:

Dependencies:

  • ⚠️ Depends on [PR 1] - Requires thread-safe SessionManager before building infrastructure
  • 📦 Provides foundation for [PR 3] - Core utilities will be used by instrumentation integration

Review Strategy:

  • Ready for review now - Can review for design/architecture feedback
  • ⚠️ Should not merge until PR #1 merges - Has technical dependency
  • 💡 Best reviewed alongside PR 3 - Provides context on how infrastructure is used

Type of Change

  • Bug fix
  • New feature (non-breaking)
  • Breaking change
  • Infrastructure improvement

Checklist

  • Code follows project style guidelines (spotless applied)
  • Self-review completed
  • Comprehensive test coverage added
  • New and existing tests pass locally
  • Documentation added for new components

Testing

  • Test coverage for all factories and adapters (~1,400 lines of tests)
  • Metric data transformation tests
  • Session identifier extraction tests
  • Edge cases (empty IDs, null handling, etc.)

Additional Context

Comparison with Other Platforms:

  • iOS Swift SDK: Has session support for spans/logs, but NOT metrics
  • Web JavaScript SDK: Has interface only, no automatic implementation yet
  • This Android implementation: First to support session IDs on metrics!

This infrastructure establishes the foundation for comprehensive session tracking.
PR 3 integrates this across all instrumentation modules.

Why metrics matter: Session-based metrics enable queries like:

  • "Average memory usage per session"
  • "Network bytes transferred in sessions that crashed"
  • "Session duration vs. performance metrics correlation"

None of the other SDKs can answer these questions today!

References

Squashed commits:
[749a138] split up the metrics exporter adapter interface and concrete class into tow separate files
[1156525] addressed warnings
[1ebaf46] added the incubating annotation, added documentation, and reordered the metrics exporters to ensure the ids are applied prior to writing the metrics to disk to ensure the right ids are set
[c137908] removed redundant comments
[097eef5] spotless apply updates
[7e9fee9] added infrastructure to set the ids for metrics as well
[a582f73] Reduced the repeated code amongst the code to set the session identifiers
[9ddf60f] api and spotless apply updates
[907e06e] added trim to the previous session id before the empty check and reduced a condition for adding attributes
[e98ffc5] added documetnation and integrated the log record builder with setting the session identifiers
[974f26f] enabled consistent session ID handling across spans and log record with the appenders
[260d3c3] added SessionIdFacade pattern with UUID fallback, session extensions for telemetry types, and test coverage
[df2c13d] fixed SessionManager concurrency
Fixes detekt UnusedImports warnings in:
- MetricDataTypeFactory.kt
- PointDataFactory.kt
- SessionMetricDataTypeFactory.kt

Signed-off-by: Gregory-Rasmussen_cvsh <[email protected]>
@gregory-at-cvs gregory-at-cvs force-pushed the feat/session-management-infrastructure branch from 01506ff to e4b2def Compare November 22, 2025 05:09
Add missing getPreviousSessionId() method to SessionProvider mock in
SdkPreconfiguredRumBuilderTest to match updated interface.

Signed-off-by: Gregory-Rasmussen_cvsh <[email protected]>
@codecov
Copy link

codecov bot commented Nov 22, 2025

Codecov Report

❌ Patch coverage is 93.33333% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.46%. Comparing base (7e5a63e) to head (6f25335).
⚠️ Report is 10 commits behind head on main.

Files with missing lines Patch % Lines
...opentelemetry/android/OpenTelemetryRumBuilder.java 50.00% 4 Missing and 1 partial ⚠️
...oid/agent/session/factory/SessionManagerFactory.kt 0.00% 4 Missing ⚠️
...ndroid/export/models/DoublePointWithSessionData.kt 81.81% 2 Missing ⚠️
.../android/export/models/LongPointWithSessionData.kt 81.81% 2 Missing ⚠️
...models/ExponentialHistogramPointWithSessionData.kt 95.00% 1 Missing ⚠️
...oid/export/models/HistogramPointWithSessionData.kt 94.44% 1 Missing ⚠️
.../io/opentelemetry/android/ktx/SessionExtensions.kt 94.44% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1420      +/-   ##
==========================================
+ Coverage   63.65%   65.46%   +1.80%     
==========================================
  Files         159      180      +21     
  Lines        3134     3353     +219     
  Branches      326      341      +15     
==========================================
+ Hits         1995     2195     +200     
- Misses       1042     1058      +16     
- Partials       97      100       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@fractalwrench
Copy link
Member

Thanks for opening these PRs @gregory-at-cvs - it will be really great to improve the session management. My initial instinct is that this and the other PR have a very large diff that will make it hard to review the code thoroughly.

@breedx-splk do we have any sort of official/unofficial policy for acceptable PR size on this project? My usual benchmark is <500 LOC changes. Would be interested to hear your thoughts and whether it's worth making any implicit rules explicit in the contributing guide.

@gregory-at-cvs
Copy link
Author

gregory-at-cvs commented Nov 24, 2025

Thanks for opening these PRs @gregory-at-cvs - it will be really great to improve the session management. My initial instinct is that this and the other PR have a very large diff that will make it hard to review the code thoroughly.

@breedx-splk do we have any sort of official/unofficial policy for acceptable PR size on this project? My usual benchmark is <500 LOC changes. Would be interested to hear your thoughts and whether it's worth making any implicit rules explicit in the contributing guide.

You're welcome and thanks @fractalwrench.

@fractalwrench and @breedx-splk For this project, do you have guidelines on the number of LoC for different types of changes: architectural, wide breadth or far reaching, new features, or bug fixes?

I'll take a look at ways to break down this PR and the integration PR more into smaller sets of changes. I'll get back to you on this.

@breedx-splk
Copy link
Contributor

@fractalwrench and @breedx-splk For this project, do you have guidelines on the number of LoC for different types of changes: architectural, wide breadth or far reaching, new features, or bug fixes?

I'll take a look at ways to break down this PR and the integration PR more into smaller sets of changes. I'll get back to you on this.

Thanks thanks @gregory-at-cvs and @fractalwrench . I agree that this PR is likely too large to meaningfully review. I added some new text to the CONTRIBUTING.md in #1425 to help address the challenges around massive PRs, but I don't think that having a fixed number of lines makes sense. If we want to come up with a number for guidance, that's fine, but we should discuss that elsewhere.

Copy link
Contributor

@breedx-splk breedx-splk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting a preemptive change request here, based on size and approach.

The structure and content of the PR description lead me to believe that this PR was likely created with AI tooling, which I think would be helpful to disclose with full transparency. There is an OpenTelemetry policy about using these tools here: https://github.com/open-telemetry/community/blob/main/policies/genai.md TL;DR is that using AI tools is allowable, but I think it can be informative for reviewers to know ahead of time.

In addition to being too big, I am quite concerned about the approach presented here. We certainly don't want to create new data classes to hold existing telemetry data along with session information. Please review the current semantic convention for session here: https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/session.md and note that the session.id and session.previous_id are both otel attributes -- attributes which can/will be present on spans and events. You wouldn't want to put session id on metric data points, because that would make them very high cardinality indeed.

@gregory-at-cvs
Copy link
Author

gregory-at-cvs commented Nov 24, 2025

Putting a preemptive change request here, based on size and approach.

The structure and content of the PR description lead me to believe that this PR was likely created with AI tooling, which I think would be helpful to disclose with full transparency. There is an OpenTelemetry policy about using these tools here: https://github.com/open-telemetry/community/blob/main/policies/genai.md TL;DR is that using AI tools is allowable, but I think it can be informative for reviewers to know ahead of time.

In addition to being too big, I am quite concerned about the approach presented here. We certainly don't want to create new data classes to hold existing telemetry data along with session information. Please review the current semantic convention for session here: https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/session.md and note that the session.id and session.previous_id are both otel attributes -- attributes which can/will be present on spans and events. You wouldn't want to put session id on metric data points, because that would make them very high cardinality indeed.

Thank you for your feedback, @breedx-splk and @fractalwrench .

Regarding the AI usage, thanks for the policy link.
I use AI tooling as a coding assistant. I designed the architecture, made all technical decisions, and chose the approach (including the metrics integration strategy). AI helped me implement faster and document more quickly, but I own the design. I review everything thoroughly and refine.

Regarding the PR sizes, I broke down my original changes into the three PRs and will continue to do so now that I know your preferences. That makes sense.

I can break it up by adding the changes to separate PRs for:

  1. Core Session Integration to bring in the core session changes into SDK
  2. ANR and Crash instrumentation
  3. Activity and fragment lifecycle instrumentation
  4. View click and compose click instrumentation
  5. Network, WebSocket, Slow Rendering instrumentation
  6. Startup instrumentation

If there is a different breakdown that you all would prefer or if you would like to discuss more, let’s chat.

Regarding the metrics approach, this is very helpful feedback. The opt-in approach addresses cardinality concerns by keeping it disabled by default. If you would prefer to completely remove the metrics implementation, I will do so. Which direction best works for OpenTelemetry?
I'll also simplify the approach based on your feedback.

Thanks again.

P.S. If all you are ever curious, here is my personal OSS project, which shows my work from architectural design and down to the code: Shuttle.

@gregory-at-cvs
Copy link
Author

I will break down this PR into smaller ones.

@fractalwrench
Copy link
Member

That approach for separate PRs sounds great. I don't have a preference for metrics as tbh I'm not that familiar with that part of OTel - @breedx-splk probably has more opinions than me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants