Skip to content

Flatbuffers impl #446

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Flatbuffers impl #446

wants to merge 7 commits into from

Conversation

boocmp
Copy link
Collaborator

@boocmp boocmp commented Mar 28, 2025

No description provided.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust Benchmark

Benchmark suite Current: 0238db7 Previous: d56be21 Ratio
rule-match-browserlike/brave-list 2008705996 ns/iter (± 16871464) 1727033355 ns/iter (± 11920489) 1.16
rule-match-first-request/brave-list 1006441 ns/iter (± 8517) 1005963 ns/iter (± 12764) 1.00
blocker_new/brave-list 159390982 ns/iter (± 2576711) 220140757 ns/iter (± 5159081) 0.72
memory-usage/brave-list-initial 21457739 ns/iter (± 3) 41408849 ns/iter (± 3) 0.52
memory-usage/brave-list-after-1000-requests 24064706 ns/iter (± 3) 44004875 ns/iter (± 3) 0.55

This comment was automatically generated by workflow using github-action-benchmark.

@boocmp boocmp force-pushed the flatbuffers_impl branch from 91c5ec7 to cacfcc6 Compare April 9, 2025 00:55
bytes.as_ptr() as *const u16,
bytes.len() / std::mem::size_of::<u16>(),
)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that from_raw_parts results in issues if bytes.as_ptr() isn't aligned to 2 bytes We need to assert this.

bytes.as_ptr() as *const u16,
bytes.len() / std::mem::size_of::<u16>(),
)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

}

let filters_list =
unsafe { fb::root_as_network_filter_list_unchecked(&self.flatbuffer_memory) };
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@boocmp use root_as_network_filter_list() + .expect() to remove unsafe

unsafe {
self._tab
.get::<flatbuffers::ForwardsUOffset<&str>>(NetworkFilter::VT_RAW_LINE, None)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

@boocmp boocmp force-pushed the flatbuffers_impl branch from cacfcc6 to a3df20d Compare April 9, 2025 01:05
@boocmp boocmp marked this pull request as ready for review April 9, 2025 01:38
@boocmp boocmp requested review from atuchin-m and antonok-edm April 9, 2025 01:38
Copy link
Collaborator

@atuchin-m atuchin-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a few nits

}
}

fn get_or_insert(&mut self, h: &Hash) -> u16 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: get_or_insert_unique_domain

@@ -37,6 +37,18 @@ jobs:
- name: Bench memory usage
run: cargo bench --bench bench_memory -- --output-format bencher | tee -a output.txt

- name: Bench network filter matching (flat)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove this in favor of enabling flatbuffers-storage by default (or in the tests above).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the next pr

None
};

let raw_line = network_filter
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we create this only in debug mode?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes only debug

&mut self.builder,
&&fb::NetworkFilterListArgs {
network_filters: Some(filters),
unique_domains_hashes: Some(unique_domains),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't convert string to hashes here, so let's replace unique_domains_hashes to unique_domains or vice versa

owner: &'a FlatNetworkFilterList,
fb_filter: &'a fb::NetworkFilter<'a>,

pub mask: NetworkFilterMask,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not clear why mask is pub here, but the other fields are not.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because it is used outside, in the FlatNetworkFilterList::check

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be pub(crate)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could. done

}

#[inline(always)]
pub fn iter(&self) -> FlatPatternsIterator {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not absolutely clear why we implement iterator traits instead using the structure as-is.
If we do it to use a one check_pattern() for both "old" and flat impl, let's make a comment to inline this after "old" impl is gone.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because I removed direct dependency from any storages in the check functions. I'm not planning to inline it

bytes.as_ptr() as *const u16,
bytes.len() / std::mem::size_of::<u16>(),
)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

bytes.as_ptr() as *const u16,
bytes.len() / std::mem::size_of::<u16>(),
)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

src/blocker.rs Outdated
pub(crate) generic_hide: NetworkFilterList,
pub struct GenericBlocker<NetworkFilterListType>
where
NetworkFilterList: NetworkFilterListTrait,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this trait bound isn't doing anything useful since you're missing Type:

NetworkFilterListType: NetworkFilterListTrait
                 ^^^^

Comment on lines +392 to +393
let mut disabled_directives: HashSet<String> = HashSet::new();
let mut enabled_directives: HashSet<String> = HashSet::new();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not &str anymore?

@@ -5,13 +5,16 @@
//! serialization/deserialization implementations and can automatically dispatch to the appropriate
//! one.

#![allow(dead_code)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're not using these anymore, let's just remove them

}

pub trait EngineSerializer {
fn serialize_raw(&self) -> Result<Vec<u8>, SerializeError>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's call it just serialize.

Some context We previously had serialize, which serialized a compressed format. Then we needed to have an uncompressed format; I added serialize_compressed and serialize_raw to maintain backward compatibility. serialize remained as a direct wrapper around serialize_compressed, but with a #[deprecated] marker explaining that it would later be removed and the default behavior will be changed to serialize_raw.

serialize_compressed has been removed for enough time that I think it's safe to shift back to serialize.

Comment on lines +180 to +181
// Implement ExactSizeIterator for FilterPartIterator
impl<'a> ExactSizeIterator for FlatPatternsIterator<'a> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment isn't adding any useful information and it is already outdated 😅

.iter()
.map(|x| self.get_or_insert_unique_domain_hash(x))
.collect();
o.sort_unstable();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to get rid of the sorting behavior. We used that previously to make differential updates of serialized buffers feasible, but that's no longer useful now that we distribute plaintext lists.

Copy link

[puLL-Merge] - brave/adblock-rust@446

Description

This PR introduces a significant refactoring and optimization of the adblock-rust crate's serialization and deserialization functionality. The primary change is the implementation of FlatBuffers for storing network filters, which should provide better performance and memory efficiency. Additionally, the PR reorganizes the serialization code by moving it into a dedicated trait and module.

Changes

Changes

  1. Cargo.toml:

    • Added "flatbuffers-storage" to the default features
  2. Benchmarks:

    • Moved serialization/deserialization benchmarks from bench_matching.rs to a new dedicated file bench_serialization.rs
    • Updated benchmark calls to use the new serialize() method instead of serialize_raw()
  3. Engine Serializer:

    • Introduced a new EngineSerializer trait in src/engine_serializer.rs
    • Moved serialization/deserialization logic from engine.rs to this new module
    • Renamed serialize_raw() to serialize() for cleaner API
  4. FlatBuffers Implementation:

    • Added a new module src/filters/fb_network.rs that implements FlatBuffers storage for network filters
    • Updated the FlatBuffers schema to include raw_line field
    • Updated the network filter list implementation to use FlatBuffers
  5. Network Filter Matching:

    • Refactored domain matching code to be more modular and efficient
    • Split domain checking functions into separate parts (checking included domains, excluded domains)
    • Added mapped versions that work with the FlatBuffers representation
  6. Blocker Implementation:

    • Removed some unused methods from Blocker like add_filter and filter_exists
    • Updated the CSP directive processing to use owned strings instead of references
  7. Examples and Tests:

    • Updated all examples and tests to use the new EngineSerializer trait
    • Fixed imports and method calls throughout the codebase
sequenceDiagram
    participant Client
    participant Engine
    participant EngineSerializer
    participant Blocker
    participant FlatNetworkFilter
    participant NetworkFilterList

    Client->>Engine: from_rules(rules)
    Engine->>Blocker: new(network_filters)
    Blocker->>NetworkFilterList: new(filters)
    NetworkFilterList->>FlatNetworkFiltersListBuilder: new()
    FlatNetworkFiltersListBuilder->>FlatNetworkFiltersListBuilder: add(filter)
    FlatNetworkFiltersListBuilder->>NetworkFilterList: finish()
    
    Client->>Engine: serialize()
    Engine->>EngineSerializer: serialize()
    EngineSerializer->>Client: serialized_data
    
    Client->>Engine: deserialize(serialized_data)
    Engine->>EngineSerializer: deserialize(serialized_data)
    EngineSerializer->>Engine: updated Engine
    
    Client->>Engine: check_network_request(request)
    Engine->>Blocker: check(request)
    Blocker->>NetworkFilterList: first_match(request)
    NetworkFilterList->>FlatNetworkFilter: matches(request)
    FlatNetworkFilter->>NetworkFilterList: CheckResult
    NetworkFilterList->>Blocker: result
    Blocker->>Engine: result
    Engine->>Client: BlockerResult
Loading

Possible Issues

  • Some public API methods have been removed from the Blocker class, which could cause compatibility issues if consumers were using these methods directly.
  • The change from references to owned strings in CSP directive processing might introduce additional allocations.
  • Removing serialize_raw() in favor of serialize() is a breaking change that will require consumers to update their code.

Security Hotspots

  • The use of unsafe code in the FlatBuffers implementation for casting between slices of different types (in src/filters/fb_network.rs) should be carefully reviewed, although this appears to be standard practice when working with FlatBuffers.

if self.filter_map.is_empty() {
return None;
}

let filters_list =
unsafe { fb::root_as_network_filter_list_unchecked(&self.flatbuffer_memory) };

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog


if self.filter_map.is_empty() {
return filters;
}

let filters_list =
unsafe { fb::root_as_network_filter_list_unchecked(&self.flatbuffer_memory) };

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

@boocmp boocmp force-pushed the flatbuffers_impl branch 2 times, most recently from 6b08928 to 5087b75 Compare April 28, 2025 07:25
@boocmp boocmp force-pushed the flatbuffers_impl branch from 5087b75 to 0238db7 Compare April 28, 2025 07:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants