POC for go auditlog sdk by MJarmo · Pull Request #2 · apeirora/opentelemetry-go

MJarmo · 2025-10-05T17:38:21Z

POC: Go auditlog SDK (file-backed audit logging + processing)

Summary

This PR introduces a proof-of-concept audit logging SDK for Go that provides file-based storage and processing of audit log records for use with OpenTelemetry. It implements a file-backed log sink, record serialization, and basic processing/consumption primitives to enable reliable, append-only audit logging with options for batch processing and rotation.

Why

Audit logging requires append-only, durable storage and controlled processing pipelines.
This POC demonstrates a minimal, production-oriented approach for a Go auditlog SDK that can be adopted or adapted by instrumentation consumers and server-side components.

What changed (high-level)

Added file-backed audit log sink implementation for appending records to disk.
Added record serialization/deserialization and schema types for audit events.
Added processing components for reading batches, checkpointing, and safe consumption.
Added examples and a small CLI / test utilities for local experimentation.
Tests and integration checks for core behaviors.

Design notes

The storage layer is append-only file(s) with structured record framing and checksums to ensure integrity.
A consumer/processor pattern supports reading batches and checkpointing to allow resilience and at-least-once processing semantics.
Configuration options cover file locations, rotation size/policy, consumer batch size, and durable checkpoint storage.
Focus on simplicity and correctness in the POC; later work should align this with project APIs and conventions.

Compatibility & Migration

This is a new component; no breaking changes to existing APIs are expected.
Consumers must adopt SDK types and the file sink explicitly.
Configuration and migration guidance will be provided in follow-up docs once the API stabilizes.

Testing & validation

Unit tests for serialization, basic append/read operations, and rotation.
Please add CI integration and more end-to-end tests as the API stabilizes.

Signed-off-by: MJarmo <michal.jarmolkiewicz@sap.com>

Signed-off-by: MJarmo <michal.jarmolkiewicz@sap.com> Signed-off-by: MJarmo <michal.jarmolkiewicz@sap.com>

…mantics Relocate audit log implementation to sdk/auditlog and add integrity, status mapping, idempotency conflict checks, and explicit policy hooks for 401/403/413/429 outcomes. Signed-off-by: MJarmo <michal.jarmolkiewicz@sap.com>

Split storage, store, identity, and status concerns into dedicated sdk/auditlog subpackages and keep root-level API wrappers/aliases so existing call sites continue to compile while improving readability. Signed-off-by: MJarmo <michal.jarmolkiewicz@sap.com>

Signed-off-by: MJarmo <michal.jarmolkiewicz@sap.com>

hilmarf · 2026-05-28T08:08:52Z

+type AuditLogger interface {
+	Emit(ctx context.Context, record AuditRecord) error
+	EmitWithResult(ctx context.Context, record AuditRecord) AuditEmitResult
+	Enabled(ctx context.Context, eventName string) bool


AuditLogger can't be disabled, when used it's always enabled!

Suggested change

Enabled(ctx context.Context, eventName string) bool

hilmarf · 2026-05-28T08:10:53Z

+}
+
+type AuditRecordProcessor interface {
+	Enabled(ctx context.Context, param AuditEnabledParameters) bool


AuditRecords should ALWAYS be processed... so the AuditRecordProcessor is always enabled.

Suggested change

Enabled(ctx context.Context, param AuditEnabledParameters) bool

hilmarf · 2026-05-28T08:12:38Z

+	auditAttrActor         = "audit.actor"
+	auditAttrActorType     = "audit.actor_type"


Suggested change

auditAttrActor = "audit.actor"

auditAttrActorType = "audit.actor_type"

auditAttrActor = "audit.actor.id"

auditAttrActorType = "audit.actor.type"

hilmarf · 2026-05-28T08:14:10Z

+	auditAttrActor         = "audit.actor"
+	auditAttrActorType     = "audit.actor_type"
+	auditAttrAction        = "audit.action"
+	auditAttrResource      = "audit.resource"


maybe we should split this into two:

Suggested change

auditAttrResource = "audit.resource"

auditAttrTargetID = "audit.target.id"

auditAttrTargetType = "audit.target.type"

hilmarf · 2026-05-28T08:14:38Z

+	auditAttrSourceIP      = "audit.source_ip"
+	auditAttrRecordID      = "audit.record_id"


Suggested change

auditAttrSourceIP = "audit.source_ip"

auditAttrRecordID = "audit.record_id"

auditAttrSourceID = "audit.source.id"

auditAttrRecordID = "audit.record.id"

hilmarf · 2026-05-28T08:14:57Z

+	auditAttrHash          = "audit.hash"
+	auditAttrSchemaVersion = "audit.schema_version"
+	auditAttrKeyID         = "audit.key_id"
+	auditAttrSequenceNo    = "audit.sequence_no"


Suggested change

auditAttrSequenceNo = "audit.sequence_no"

auditAttrSequenceNo = "audit.sequence.number"

hilmarf · 2026-05-28T08:15:38Z

+	auditAttrSignature     = "audit.signature"
+	auditAttrHMAC          = "audit.hmac"
+	auditAttrHash          = "audit.hash"
+	auditAttrSchemaVersion = "audit.schema_version"


Suggested change

auditAttrSchemaVersion = "audit.schema_version"

auditAttrSchemaVersion = "audit.schema.version"

hilmarf · 2026-05-28T08:21:53Z

+	if record.Body().Kind() == log.KindEmpty {
+		return newAuditStatusError(AuditErrorInvalidRequest, "audit body is required", false, nil)
+	}


I'm not 100% sure if we ALWAYS require a body... for some scenarios the auditAttr* may be sufficient. Let's allow empty bodys for now.

Suggested change

if record.Body().Kind() == log.KindEmpty {

return newAuditStatusError(AuditErrorInvalidRequest, "audit body is required", false, nil)

}

// if record.Body().Kind() == log.KindEmpty {

// return newAuditStatusError(AuditErrorInvalidRequest, "audit body is required", false, nil)

// }

hilmarf · 2026-05-28T08:23:28Z

+	if record.AttributesLen() == 0 {
+		return newAuditStatusError(AuditErrorInvalidRequest, "audit attributes are required", false, nil)
+	}


When is this check executed? Before or after otelRecord.AddAttributes(auditAttrs...)?

hilmarf · 2026-05-28T08:24:07Z

+	if record.SchemaVersion == "" {
+		return newAuditStatusError(AuditErrorInvalidRequest, "audit schema_version is required", false, nil)
+	}


should not be mandatory

Suggested change

if record.SchemaVersion == "" {

return newAuditStatusError(AuditErrorInvalidRequest, "audit schema_version is required", false, nil)

}

hilmarf · 2026-05-28T08:29:12Z

Why are those two files: sdk/log/go.mod and sdk/log/go.sum removed?

hilmarf · 2026-05-28T13:50:01Z

+	Priority int
+}
+
+type PriorityQueue []PriorityRecord


the longer I think about this, the more I believe it's not a good idea to use any PriorityQueue. Receivery might get very confused, when logs don't arrive in time sequence. Can we please stick for now with FIFO?

if we wanna be really flexible, then we can also keep the PriorityQueue, but allow users to inject the comparing function (something which compares two log-records)

These files were deleted by mistake in the auditlog relocation commit. sdk/log remains a separate module; only sdk/auditlog was intended to be added as a new module. Signed-off-by: MJarmo <michal.jarmolkiewicz@sap.com>

…alidation Remove Enabled from AuditLogger and AuditRecordProcessor, rename exported audit attribute keys to dotted form, auto-generate record IDs, relax body and schema validation, export queued records in FIFO order, and update tests and docs. Signed-off-by: MJarmo <michal.jarmolkiewicz@sap.com>

Introduce sdk/auditlog/stresstest with a mock OTLP HTTP receiver, shared harness, and end-to-end tests for eventual delivery, file-store crash recovery, retry limits, FIFO ordering, wait-on-export semantics, storage write modes, and concurrent emit. Signed-off-by: MJarmo <michal.jarmolkiewicz@sap.com>

Strengthen rejected-record validation by asserting rejected record IDs never reach the sink, raise default stress volume to 200k, and clean AUDIT_LOG_README content so it matches the current auditlog implementation. Signed-off-by: MJarmo <michal.jarmolkiewicz@sap.com>

hilmarf · 2026-06-02T13:36:07Z

Code review

Found 6 issues (compliance with CLAUDE.md / otel-audit-logging spec):

1. emit() is not synchronous by default — violates at-least-once delivery contract

WaitOnExport defaults to false in the processor builder, so EmitWithResult returns immediately with status 202 "queued" without blocking for sink acknowledgement. CLAUDE.md §2.2 says "emit is synchronous by default: blocks until the sink acknowledges" and §4.1 step 6 says "Block until the exporter obtains a successful acknowledgement."

opentelemetry-go/sdk/auditlog/audit_logger_api.go

Lines 203 to 221 in 0762a93

    
           	} 
        
           } 
        
           if l.provider.shouldWaitOnExport() { 
        
           	for _, p := range l.provider.processors { 
        
           		if err := p.ForceFlush(ctx); err != nil { 
        
           			mappedErr := newAuditStatusError(AuditErrorUnavailable, "processor_flush_failed", true, err) 
        
           			result.StatusCode, result.Status, result.Reason = mapAuditError(mappedErr) 
        
           			result.RetryAfter = time.Second 
        
           			return result 
        
           		} 
        
           	} 
        
           	result.StatusCode = 200 
        
           	result.Status = "delivered" 
        
           	result.SinkTimestamp = time.Now().UTC() 
        
           } else { 
        
           	result.StatusCode = 202 
        
           	result.Status = "queued" 
        
           	result.QueuedAt = queuedAt 
        
           }

2. audit.target.id is enforced as mandatory — spec lists it as optional

validateRequiredAuditRecord returns a hard error when audit.target.id is empty. CLAUDE.md §3.2 (Mandatory Attributes) does not include audit.target.id; it appears only in §3.3 (Optional / Recommended). This breaks valid callers emitting system-level events with no target.

opentelemetry-go/sdk/auditlog/audit_logger_api.go

Lines 240 to 244 in 0762a93

    
           } 
        
           targetID, _ := auditTargetFields(record) 
        
           if targetID == "" { 
        
           	return newAuditStatusError(AuditErrorInvalidRequest, "audit target id is required", false, nil) 
        
           }

3. Non-spec integrity attribute names — breaks interoperability with the Collector

The SDK emits audit.signature, audit.hmac, audit.hash, audit.key_id, and audit.prev_hash (underscore). CLAUDE.md §3.3 defines a single attribute audit.integrity.value for the cryptographic proof, audit.integrity.certificate for the key reference (Resource-level), and audit.prev.hash (dot-separated) for hash chaining. A Collector or consumer following the spec will not find these attributes.

opentelemetry-go/sdk/auditlog/audit_logger_api.go

Lines 52 to 59 in 0762a93

    
           auditAttrRecordID      = "audit.record.id" 
        
           auditAttrSignature     = "audit.signature" 
        
           auditAttrHMAC          = "audit.hmac" 
        
           auditAttrHash          = "audit.hash" 
        
           auditAttrSchemaVersion = "audit.schema.version" 
        
           auditAttrKeyID         = "audit.key_id" 
        
           auditAttrSequenceNo    = "audit.sequence.number" 
        
           auditAttrPrevHash      = "audit.prev_hash"

4. Records silently dropped when MaxAttempts is exhausted — no hard error, no dropped counter

In handleExportFailure, when the retry budget is exhausted (nextAttempt > maxAttempts), only ExceptionHandler.Handle is called and the function returns — the records are neither re-enqueued nor surfaced as a hard error to the emit caller. CLAUDE.md §4.5 says "Hard error — surface to caller when retry budget or buffer exhausted; MUST NOT silently drop" and §4.7 requires incrementing audit.records.dropped.

opentelemetry-go/sdk/auditlog/audit_processor.go

Lines 537 to 547 in 0762a93

    
           maxAttempts := p.config.RetryPolicy.MaxAttempts 
        
           if maxAttempts > 0 && int(nextAttempt) > maxAttempts { 
        
           	p.config.ExceptionHandler.Handle(&AuditException{ 
        
           		Message:    fmt.Sprintf("Failed to export audit log records after %d retry attempts", maxAttempts), 
        
           		Cause:      cause, 
        
           		Context:    context.Background(), 
        
           		LogRecords: records, 
        
           	}) 
        
           	return 
        
           }

5. No SDK observability metrics implemented

The entire sdk/auditlog package has zero metric instrumentation. CLAUDE.md §4.7 requires five specific metrics: audit.records.emitted, audit.records.exported, audit.records.dropped, audit.queue.depth, audit.export.duration. The spec states a non-zero audit.records.dropped MUST trigger a critical operational alert — which is impossible without the counter.

opentelemetry-go/sdk/auditlog/audit_processor.go

Lines 519 to 548 in 0762a93

    
           func (p *AuditLogProcessor) handleExportFailure(records []Record, cause error) { 
        
           	if p.config.StorageWriteMode == AuditStorageWriteOnError { 
        
           		storeCtx := context.Background() 
        
           		for _, record := range records { 
        
           			recordCopy := record 
        
           			if err := p.config.AuditLogStore.Save(storeCtx, &recordCopy); err != nil { 
        
           				p.config.ExceptionHandler.Handle(&AuditException{ 
        
           					Message:    "Failed to save failed export record to audit store", 
        
           					Cause:      err, 
        
           					Context:    storeCtx, 
        
           					LogRecords: []Record{recordCopy}, 
        
           				}) 
        
           			} 
        
           		} 
        
           	} 
        
           	nextAttempt := p.currentRetryAttempt.Add(1) 
        
           	p.lastRetryTimestamp.Store(time.Now().UnixMilli()) 
        
           	maxAttempts := p.config.RetryPolicy.MaxAttempts 
        
           	if maxAttempts > 0 && int(nextAttempt) > maxAttempts { 
        
           		p.config.ExceptionHandler.Handle(&AuditException{ 
        
           			Message:    fmt.Sprintf("Failed to export audit log records after %d retry attempts", maxAttempts), 
        
           			Cause:      cause, 
        
           			Context:    context.Background(), 
        
           			LogRecords: records, 
        
           		}) 
        
           		return 
        
           	}

6. Default integrity config is deny-all — zero-config provider rejects every record

defaultRequiredIntegrity() returns AuditIntegrityHMAC | AuditIntegritySignature, so a provider created with no HMAC key and no signing certificate will fail satisfiesRequiredIntegrity on every record. CLAUDE.md §3.3 and §1 describe integrity proofs as optional. The correct default should be 0 (no integrity required), not deny-all.

opentelemetry-go/sdk/auditlog/audit_integrity_config.go

Lines 43 to 45 in 0762a93

    
           func defaultRequiredIntegrity() AuditIntegrityFields { 
        
           	return AuditIntegrityHMAC | AuditIntegritySignature 
        
           }

🤖 Generated with Claude Code

_{If this code review was useful, please react with 👍. Otherwise, react with 👎.}

hilmarf · 2026-06-02T14:27:52Z

+	auditAttrSchemaVersion = "audit.schema.version"
+	auditAttrKeyID         = "audit.key_id"
+	auditAttrSequenceNo    = "audit.sequence.number"
+	auditAttrPrevHash      = "audit.prev_hash"


Suggested change

auditAttrPrevHash = "audit.prev_hash"

auditAttrPrevHash = "audit.prev.hash"

hilmarf · 2026-06-02T14:45:24Z

+	targetID, _ := auditTargetFields(record)
+	if targetID == "" {
+		return newAuditStatusError(AuditErrorInvalidRequest, "audit target id is required", false, nil)
+	}


not sure if we'll have a target in all cases... e.g. user logs in - what's the target?
let's start with target being optional

Suggested change

targetID, _ := auditTargetFields(record)

if targetID == "" {

return newAuditStatusError(AuditErrorInvalidRequest, "audit target id is required", false, nil)

}

Introduce go.opentelemetry.io/otel/audit with AuditReceipt and SdkAuditProvider. Add JCS HMAC integrity attributes, synchronous export by default, OTLP /v1/audit exporter, export receipts, and spec alignment tests. Signed-off-by: MJarmo <michal.jarmolkiewicz@sap.com>

Emit spec audit.records.* metrics, warn on NTP/timestamp skew, clone records in the processor queue, and treat OTLP partial success as export failure. Signed-off-by: MJarmo <michal.jarmolkiewicz@sap.com>

…op errors. Make integrity proofs optional by default, export only spec-aligned integrity attributes, and surface hard errors to synchronous emit callers when retry budget is exhausted. Signed-off-by: MJarmo <michal.jarmolkiewicz@sap.com>

Co-authored-by: Hilmar Falkenberg <hilmar.falkenberg@sap.com> Signed-off-by: MJarmo <38920471+MJarmo@users.noreply.github.com>

Move integrity algorithm and certificate to resource attributes, validate UUID v4 record IDs, normalize field casing, omit severity on export, add SinkTimestampNanos to AuditReceipt, and use audit.integrity.value with IntegrityAlgorithm for all proofs (HMAC, hash, signature). Signed-off-by: MJarmo <michal.jarmolkiewicz@sap.com>

MJarmo force-pushed the AuditLog branch from be2e862 to 91b4b5e Compare October 5, 2025 18:15

hilmarf added this to OTel-Audit-Logging Oct 14, 2025

hilmarf moved this to In progress in OTel-Audit-Logging Oct 14, 2025

hilmarf added the DO-NOT-MERGE label Nov 25, 2025

hilmarf marked this pull request as ready for review November 25, 2025 12:31

MJarmo force-pushed the AuditLog branch 2 times, most recently from 060194f to dd4e6cc Compare May 13, 2026 09:27

hilmarf self-requested a review May 13, 2026 13:12

hilmarf self-assigned this May 13, 2026