You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
V2 serialization format for wide columns with blob references
Introduce a new V2 serialization format for wide column entities that
supports storing individual column values in blob files. The V2 format
adds a column type section that marks each column as either inline or
blob-index, enabling per-column blob storage for large values.
Key additions:
- SerializeWithBlobIndices(): serialize entities with blob column references
- DeserializeColumns(): deserialize V2 entities extracting blob column metadata
- HasBlobColumns(): lightweight check for V2 blob references
- GetVersion(): return the format version of a serialized entity
- ResolveEntityBlobColumns(): resolve all blob references in an entity
- Updated Deserialize() and GetValueOfDefaultColumn() for V2 compatibility
Copy file name to clipboardExpand all lines: CLAUDE.md
+36-1Lines changed: 36 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,12 +24,31 @@ This document provides guidance for generating and reviewing code in the RocksDB
24
24
25
25
### Performance Considerations
26
26
27
+
**⚠️ PERFORMANCE IS CRITICAL:** RocksDB is a high-performance storage engine where every CPU cycle and memory access matters. When writing code, always evaluate from a performance perspective. This is not optional—performance-aware coding is a fundamental requirement for all contributions.
28
+
27
29
**Benchmarking and Profiling:** Performance claims should be backed by empirical evidence. Use RocksDB's benchmarking tools (e.g., `db_bench`) to validate improvements. Reviewers will request benchmark results for changes that could impact performance.
28
30
29
-
**Avoid Premature Optimization:** Focus on correctness first, then optimize based on profiling data. Reviewers are skeptical of optimizations that add complexity without measurable benefit.
31
+
**Memory Allocation:** Minimize dynamic memory allocations, especially in hot paths. Prefer stack allocation over heap allocation. Reuse buffers when possible. Consider using arena allocators or memory pools for frequent small allocations. Every `new`, `malloc`, or container resize has a cost.
32
+
33
+
**Memory Copy:** Avoid unnecessary memory copies. Use move semantics, `std::string_view`, `Slice`, and pass-by-reference where appropriate. Be aware of implicit copies in STL containers and function returns. Prefer in-place operations over copy-and-modify patterns.
34
+
35
+
**CPU Cache Efficiency:** Design data structures and access patterns to be cache-friendly. Keep frequently accessed data together (data locality). Prefer sequential memory access over random access. Be mindful of cache line sizes (typically 64 bytes) and avoid false sharing in concurrent code. Consider struct packing and field ordering to improve cache utilization.
36
+
37
+
**Loop Optimization:** Look for opportunities to collapse nested loops, reduce loop overhead, and minimize branch mispredictions. Hoist invariant computations out of loops. Consider loop unrolling for tight inner loops. Batch operations when possible to amortize per-operation overhead.
38
+
39
+
**SIMD and Vectorization:** Leverage SIMD instructions (SSE, AVX) for data-parallel operations when appropriate. Structure data to enable auto-vectorization by the compiler. Consider explicit SIMD intrinsics for critical hot paths like checksum computation, encoding/decoding, and bulk data processing.
40
+
41
+
**Branch Prediction:** Minimize unpredictable branches in hot paths. Use `LIKELY`/`UNLIKELY` macros to hint branch prediction. Consider branchless alternatives for simple conditionals. Order switch cases and if-else chains by frequency.
30
42
31
43
**Memory and Resource Management:** Be mindful of memory allocations, especially in hot paths. Use RAII patterns, smart pointers, and RocksDB's memory management utilities appropriately.
32
44
45
+
**Hot Path Analysis:** When deciding how aggressively to optimize code, consider whether it's on a hot path:
46
+
-**Hot path** (executed thousands+ times, e.g., data access, iteration, compaction loops): Performance is paramount. Apply all optimization techniques—loop collapsing, SIMD, cache optimization, pre-allocation, etc. The cost of each operation is multiplied by execution frequency.
47
+
-**Cold path** (executed rarely, e.g., DB open, configuration parsing, error handling): Maintainability and clarity are more important. Prefer readable code over micro-optimizations. Complex optimizations here add maintenance burden with negligible performance benefit.
48
+
-**Warm path** (moderate frequency): Balance both concerns. Use profiling data to guide optimization decisions.
49
+
50
+
**Avoid Premature Optimization:** While performance is critical, focus on correctness first, then optimize based on profiling data. However, be performance-aware from the start—choosing the right algorithm and data structure upfront is not premature optimization. Use the hot path analysis above to decide how much optimization effort is warranted.
51
+
33
52
### API Design and Compatibility
34
53
35
54
**Backwards Compatibility:** RocksDB maintains strong backwards compatibility guarantees. Breaking changes are rare and require extensive justification. When deprecating features, follow the project's deprecation policy (typically spanning multiple releases).
@@ -209,6 +228,22 @@ The following patterns emerged as frequent sources of review feedback:
0 commit comments