Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: replace std partition_point #112

Merged
merged 27 commits into from
Mar 29, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
c9fe47e
perf: replace partition_point in block index and disjoint level
marvin-j97 Mar 6, 2025
69c50e8
Merge branch 'main' into perf/replace-partition-point
marvin-j97 Mar 6, 2025
70e81d9
Merge branch 'main' into perf/replace-partition-point
marvin-j97 Mar 6, 2025
8db7d04
Merge branch 'main' into perf/replace-partition-point
marvin-j97 Mar 6, 2025
941c3ad
Merge remote-tracking branch 'origin/perf/replace-partition-point' in…
marvin-j97 Mar 6, 2025
f1e460e
comment
marvin-j97 Mar 6, 2025
2f61faf
Merge branch 'main' into perf/replace-partition-point
marvin-j97 Mar 6, 2025
0466df8
benchmark custom partition_point
marvin-j97 Mar 7, 2025
ffeeea5
perf: replace partition_point everywhere
marvin-j97 Mar 7, 2025
26eb2f3
refactor
marvin-j97 Mar 8, 2025
d390753
Merge remote-tracking branch 'origin/main' into perf/replace-partitio…
marvin-j97 Mar 8, 2025
2d8686e
perf: get_unchecked in partition_point
marvin-j97 Mar 11, 2025
4982254
test: added partition_point fuzz test
marvin-j97 Mar 11, 2025
2ab305d
doc
marvin-j97 Mar 11, 2025
65264ef
Merge branch 'main' into perf/replace-partition-point
marvin-j97 Mar 12, 2025
0dca39e
Merge branch 'main' into perf/replace-partition-point
marvin-j97 Mar 15, 2025
455c94f
Merge branch 'main' into perf/replace-partition-point
marvin-j97 Mar 21, 2025
258d485
Update binary_search.rs
marvin-j97 Mar 23, 2025
bd2ff04
Merge branch 'main' into perf/replace-partition-point
marvin-j97 Mar 24, 2025
b7dfce3
Merge branch 'main' into perf/replace-partition-point
marvin-j97 Mar 27, 2025
8cf4cda
Merge branch 'main' into perf/replace-partition-point
marvin-j97 Mar 28, 2025
84b5011
Merge branch 'main' into perf/replace-partition-point
marvin-j97 Mar 29, 2025
16e0e57
Update partition_point.rs
marvin-j97 Mar 29, 2025
8163921
Merge remote-tracking branch 'origin/2.8.0' into perf/replace-partiti…
marvin-j97 Mar 29, 2025
acdd2e7
Merge branch 'feat/new-cache-api' into perf/replace-partition-point
marvin-j97 Mar 29, 2025
d23a0f1
fix
marvin-j97 Mar 29, 2025
1751e48
Merge branch '2.8.0' into perf/replace-partition-point
marvin-j97 Mar 29, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 0 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,5 @@ Cargo.lock
*.pdb

.lsm.data
.data
/old_*
.test*
.block_index_test
.bench
6 changes: 6 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -101,3 +101,9 @@ name = "fd_table"
harness = false
path = "benches/fd_table.rs"
required-features = []

[[bench]]
name = "partition_point"
harness = false
path = "benches/partition_point.rs"
required-features = []
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ A K.I.S.S. implementation of log-structured merge trees (LSM-trees/LSMTs) in Rus
This is the most feature-rich LSM-tree implementation in Rust! It features:

- Thread-safe BTreeMap-like API
- 100% safe & stable Rust
- [99.9% safe](./UNSAFE.md) & stable Rust
- Block-based tables with compression support
- Range & prefix searching with forward and reverse iteration
- Size-tiered, (concurrent) Leveled and FIFO compaction
Expand Down
5 changes: 5 additions & 0 deletions UNSAFE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Unsafe usage

Currently, the project itself only uses one **1** unsafe block (ignoring dependencies which are tested themselves separately):

- https://github.com/fjall-rs/lsm-tree/blob/2d8686e873369bd9c4ff2b562ed988c1cea38331/src/binary_search.rs#L23-L25
23 changes: 23 additions & 0 deletions benches/partition_point.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
use criterion::{criterion_group, criterion_main, Criterion};
use lsm_tree::binary_search::partition_point;

fn bench_partition_point(c: &mut Criterion) {
let mut group = c.benchmark_group("partition_point");

for item_count in [10, 100, 1_000, 10_000, 100_000, 1_000_000] {
let items = (0..item_count).collect::<Vec<_>>();

// TODO: replace search key with random integer

group.bench_function(format!("native {item_count}"), |b| {
b.iter(|| items.partition_point(|&x| x <= 5_000))
});

group.bench_function(format!("rewrite {item_count}"), |b| {
b.iter(|| partition_point(&items, |&x| x <= 5_000))
});
}
}

criterion_group!(benches, bench_partition_point);
criterion_main!(benches);
1 change: 1 addition & 0 deletions fuzz/.gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
corpus
artifacts
19 changes: 19 additions & 0 deletions fuzz/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
[package]
name = "lsm-tree-fuzz"
version = "0.0.0"
publish = false
edition = "2021"

[package.metadata]
cargo-fuzz = true

[dependencies]
libfuzzer-sys = "0.4"
lsm-tree = { path = ".." }

[[bin]]
name = "partition_point"
path = "fuzz_targets/partition_point.rs"
test = false
doc = false
bench = false
19 changes: 19 additions & 0 deletions fuzz/fuzz_targets/partition_point.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#![no_main]
use libfuzzer_sys::{
arbitrary::{Arbitrary, Unstructured},
fuzz_target,
};
use lsm_tree::binary_search::partition_point;

fuzz_target!(|data: &[u8]| {
let mut unstructured = Unstructured::new(data);

if let Ok(mut items) = <Vec<u8> as Arbitrary>::arbitrary(&mut unstructured) {
items.sort();
items.dedup();

let idx = partition_point(&items, |&x| x < 128);
let std_pp_idx = items.partition_point(|&x| x < 128);
assert_eq!(std_pp_idx, idx);
}
});
91 changes: 91 additions & 0 deletions src/binary_search.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
// Copyright (c) 2024-present, fjall-rs
// This source code is licensed under both the Apache 2.0 and MIT License
// (found in the LICENSE-* files in the repository)

/// Returns the index of the partition point according to the given predicate
/// (the index of the first element of the second partition).
///
/// This seems to be faster than std's partition_point: https://github.com/rust-lang/rust/issues/138796
pub fn partition_point<T, F>(slice: &[T], pred: F) -> usize
where
F: Fn(&T) -> bool,
{
let mut left = 0;
let mut right = slice.len();

if right == 0 {
return 0;
}

while left < right {
let mid = (left + right) / 2;

// SAFETY: See https://github.com/rust-lang/rust/blob/ebf0cf75d368c035f4c7e7246d203bd469ee4a51/library/core/src/slice/mod.rs#L2834-L2836
#[warn(unsafe_code)]
let item = unsafe { slice.get_unchecked(mid) };

if pred(item) {
left = mid + 1;
} else {
right = mid;
}
}

left
}

#[cfg(test)]
mod tests {
use super::partition_point;
use test_log::test;

#[test]
fn binary_search_first() {
let items = [1, 2, 3, 4, 5];
let idx = partition_point(&items, |&x| x < 1);
assert_eq!(0, idx);

let std_pp_idx = items.partition_point(|&x| x < 1);
assert_eq!(std_pp_idx, idx);
}

#[test]
fn binary_search_last() {
let items = [1, 2, 3, 4, 5];
let idx = partition_point(&items, |&x| x < 5);
assert_eq!(4, idx);

let std_pp_idx = items.partition_point(|&x| x < 5);
assert_eq!(std_pp_idx, idx);
}

#[test]
fn binary_search_middle() {
let items = [1, 2, 3, 4, 5];
let idx = partition_point(&items, |&x| x < 3);
assert_eq!(2, idx);

let std_pp_idx = items.partition_point(|&x| x < 3);
assert_eq!(std_pp_idx, idx);
}

#[test]
fn binary_search_none() {
let items = [1, 2, 3, 4, 5];
let idx = partition_point(&items, |&x| x < 10);
assert_eq!(5, idx);

let std_pp_idx = items.partition_point(|&x| x < 10);
assert_eq!(std_pp_idx, idx);
}

#[test]
fn binary_search_empty() {
let items: [i32; 0] = [];
let idx = partition_point(&items, |&x| x < 10);
assert_eq!(0, idx);

let std_pp_idx = items.partition_point(|&x| x < 10);
assert_eq!(std_pp_idx, idx);
}
}
30 changes: 15 additions & 15 deletions src/level_manifest/level.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@
// This source code is licensed under both the Apache 2.0 and MIT License
// (found in the LICENSE-* files in the repository)

use crate::{segment::meta::SegmentId, HashSet, KeyRange, Segment, UserKey};
use crate::{
binary_search::partition_point, segment::meta::SegmentId, HashSet, KeyRange, Segment, UserKey,
};
use std::ops::Bound;

/// Level of an LSM-tree
Expand Down Expand Up @@ -175,13 +177,11 @@ pub struct DisjointLevel<'a>(&'a Level);
impl<'a> DisjointLevel<'a> {
/// Returns the segment that possibly contains the key.
pub fn get_segment_containing_key(&self, key: &[u8]) -> Option<Segment> {
let level = &self.0;

let idx = level
.segments
.partition_point(|x| x.metadata.key_range.max() < &key);
let idx = partition_point(&self.0.segments, |segment| {
segment.metadata.key_range.max() < &key
});

level
self.0
.segments
.get(idx)
.filter(|x| x.metadata.key_range.min() <= &key)
Expand All @@ -197,12 +197,12 @@ impl<'a> DisjointLevel<'a> {

let lo = match &key_range.0 {
Bound::Unbounded => 0,
Bound::Included(start_key) => {
level.partition_point(|segment| segment.metadata.key_range.max() < start_key)
}
Bound::Excluded(start_key) => {
level.partition_point(|segment| segment.metadata.key_range.max() <= start_key)
}
Bound::Included(start_key) => partition_point(level, |segment| {
segment.metadata.key_range.max() < start_key
}),
Bound::Excluded(start_key) => partition_point(level, |segment| {
segment.metadata.key_range.max() <= start_key
}),
};

if lo >= level.len() {
Expand All @@ -213,7 +213,7 @@ impl<'a> DisjointLevel<'a> {
Bound::Unbounded => level.len() - 1,
Bound::Included(end_key) => {
let idx =
level.partition_point(|segment| segment.metadata.key_range.min() <= end_key);
partition_point(level, |segment| segment.metadata.key_range.min() <= end_key);

if idx == 0 {
return None;
Expand All @@ -223,7 +223,7 @@ impl<'a> DisjointLevel<'a> {
}
Bound::Excluded(end_key) => {
let idx =
level.partition_point(|segment| segment.metadata.key_range.min() < end_key);
partition_point(level, |segment| segment.metadata.key_range.min() < end_key);

if idx == 0 {
return None;
Expand Down
5 changes: 4 additions & 1 deletion src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@

#![doc(html_logo_url = "https://raw.githubusercontent.com/fjall-rs/lsm-tree/main/logo.png")]
#![doc(html_favicon_url = "https://raw.githubusercontent.com/fjall-rs/lsm-tree/main/logo.png")]
#![forbid(unsafe_code)]
#![deny(unsafe_code)]
#![deny(clippy::all, missing_docs, clippy::cargo)]
#![deny(clippy::unwrap_used)]
#![deny(clippy::indexing_slicing)]
Expand Down Expand Up @@ -124,6 +124,9 @@ mod any_tree;

mod r#abstract;

#[doc(hidden)]
pub mod binary_search;

#[doc(hidden)]
pub mod blob_tree;

Expand Down
9 changes: 5 additions & 4 deletions src/segment/block_index/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ use super::{
block::{offset::BlockOffset, Block},
value_block::CachePolicy,
};
use crate::binary_search::partition_point;
use block_handle::KeyedBlockHandle;
use full_index::FullBlockIndex;
use two_level_index::TwoLevelBlockIndex;
Expand Down Expand Up @@ -44,7 +45,7 @@ impl KeyedBlockIndex for [KeyedBlockHandle] {
key: &[u8],
_: CachePolicy,
) -> crate::Result<Option<&KeyedBlockHandle>> {
let idx = self.partition_point(|x| &*x.end_key < key);
let idx = partition_point(self, |item| item.end_key < key);
Ok(self.get(idx))
}

Expand All @@ -53,7 +54,7 @@ impl KeyedBlockIndex for [KeyedBlockHandle] {
key: &[u8],
_: CachePolicy,
) -> crate::Result<Option<&KeyedBlockHandle>> {
let idx = self.partition_point(|x| &*x.end_key <= key);
let idx = partition_point(self, |x| &*x.end_key <= key);

if idx == 0 {
return Ok(self.first());
Expand Down Expand Up @@ -129,10 +130,10 @@ pub enum BlockIndexImpl {
#[allow(clippy::expect_used)]
mod tests {
use super::*;
use crate::Slice;
use crate::{segment::block::offset::BlockOffset, UserKey};
use test_log::test;

fn bh<K: Into<Slice>>(end_key: K, offset: BlockOffset) -> KeyedBlockHandle {
fn bh<K: Into<UserKey>>(end_key: K, offset: BlockOffset) -> KeyedBlockHandle {
KeyedBlockHandle {
end_key: end_key.into(),
offset,
Expand Down
10 changes: 5 additions & 5 deletions src/segment/value_block.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@
// This source code is licensed under both the Apache 2.0 and MIT License
// (found in the LICENSE-* files in the repository)

use super::{
block::{offset::BlockOffset, Block},
id::GlobalSegmentId,
use super::{block::Block, id::GlobalSegmentId};
use crate::{
binary_search::partition_point, descriptor_table::FileDescriptorTable,
segment::block::offset::BlockOffset, value::InternalValue, Cache,
};
use crate::{cache::Cache, descriptor_table::FileDescriptorTable, value::InternalValue};
use std::sync::Arc;

#[derive(Copy, Clone, Debug, PartialEq, Eq)]
Expand All @@ -28,7 +28,7 @@ pub type ValueBlock = Block<InternalValue>;
impl ValueBlock {
#[must_use]
pub fn get_latest(&self, key: &[u8]) -> Option<&InternalValue> {
let idx = self.items.partition_point(|item| &*item.key.user_key < key);
let idx = partition_point(&self.items, |item| &*item.key.user_key < key);

self.items
.get(idx)
Expand Down
6 changes: 3 additions & 3 deletions src/segment/value_block_consumer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
// (found in the LICENSE-* files in the repository)

use super::value_block::ValueBlock;
use crate::value::InternalValue;
use crate::{binary_search::partition_point, value::InternalValue};
use std::sync::Arc;

pub struct ValueBlockConsumer {
Expand All @@ -25,13 +25,13 @@ impl ValueBlockConsumer {
end_key: Option<&[u8]>,
) -> Self {
let mut lo = start_key.as_ref().map_or(0, |key| {
inner.items.partition_point(|x| &*x.key.user_key < *key)
partition_point(&inner.items, |x| &*x.key.user_key < *key)
});

let hi = end_key.as_ref().map_or_else(
|| inner.items.len() - 1,
|key| {
let idx = inner.items.partition_point(|x| &*x.key.user_key <= *key);
let idx = partition_point(&inner.items, |x| &*x.key.user_key <= *key);

if idx == 0 {
let first = inner
Expand Down
Loading