-
Notifications
You must be signed in to change notification settings - Fork 140
Flatbuffers impl #446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Flatbuffers impl #446
Conversation
Added raw_line field to flatbuffers for testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rust Benchmark
Benchmark suite | Current: 0238db7 | Previous: d56be21 | Ratio |
---|---|---|---|
rule-match-browserlike/brave-list |
2008705996 ns/iter (± 16871464 ) |
1727033355 ns/iter (± 11920489 ) |
1.16 |
rule-match-first-request/brave-list |
1006441 ns/iter (± 8517 ) |
1005963 ns/iter (± 12764 ) |
1.00 |
blocker_new/brave-list |
159390982 ns/iter (± 2576711 ) |
220140757 ns/iter (± 5159081 ) |
0.72 |
memory-usage/brave-list-initial |
21457739 ns/iter (± 3 ) |
41408849 ns/iter (± 3 ) |
0.52 |
memory-usage/brave-list-after-1000-requests |
24064706 ns/iter (± 3 ) |
44004875 ns/iter (± 3 ) |
0.55 |
This comment was automatically generated by workflow using github-action-benchmark.
bytes.as_ptr() as *const u16, | ||
bytes.len() / std::mem::size_of::<u16>(), | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage
Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage
Cc @thypon @kdenhartog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out that from_raw_parts
results in issues if bytes.as_ptr()
isn't aligned to 2 bytes We need to assert this.
bytes.as_ptr() as *const u16, | ||
bytes.len() / std::mem::size_of::<u16>(), | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage
Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage
Cc @thypon @kdenhartog
src/flat_network_filter_list.rs
Outdated
} | ||
|
||
let filters_list = | ||
unsafe { fb::root_as_network_filter_list_unchecked(&self.flatbuffer_memory) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage
Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage
Cc @thypon @kdenhartog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@boocmp use root_as_network_filter_list()
+ .expect()
to remove unsafe
unsafe { | ||
self._tab | ||
.get::<flatbuffers::ForwardsUOffset<&str>>(NetworkFilter::VT_RAW_LINE, None) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage
Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage
Cc @thypon @kdenhartog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a few nits
src/filters/fb_network.rs
Outdated
} | ||
} | ||
|
||
fn get_or_insert(&mut self, h: &Hash) -> u16 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: get_or_insert_unique_domain
.github/workflows/perf-ci.yml
Outdated
@@ -37,6 +37,18 @@ jobs: | |||
- name: Bench memory usage | |||
run: cargo bench --bench bench_memory -- --output-format bencher | tee -a output.txt | |||
|
|||
- name: Bench network filter matching (flat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's remove this in favor of enabling flatbuffers-storage
by default (or in the tests above).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the next pr
None | ||
}; | ||
|
||
let raw_line = network_filter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we create this only in debug mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes only debug
src/filters/fb_network.rs
Outdated
&mut self.builder, | ||
&&fb::NetworkFilterListArgs { | ||
network_filters: Some(filters), | ||
unique_domains_hashes: Some(unique_domains), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't convert string to hashes here, so let's replace unique_domains_hashes
to unique_domains
or vice versa
src/filters/fb_network.rs
Outdated
owner: &'a FlatNetworkFilterList, | ||
fb_filter: &'a fb::NetworkFilter<'a>, | ||
|
||
pub mask: NetworkFilterMask, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not clear why mask
is pub here, but the other fields are not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because it is used outside, in the FlatNetworkFilterList::check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it be pub(crate)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it could. done
} | ||
|
||
#[inline(always)] | ||
pub fn iter(&self) -> FlatPatternsIterator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not absolutely clear why we implement iterator traits instead using the structure as-is.
If we do it to use a one check_pattern()
for both "old" and flat impl, let's make a comment to inline this after "old" impl is gone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because I removed direct dependency from any storages in the check functions. I'm not planning to inline it
bytes.as_ptr() as *const u16, | ||
bytes.len() / std::mem::size_of::<u16>(), | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage
Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage
Cc @thypon @kdenhartog
bytes.as_ptr() as *const u16, | ||
bytes.len() / std::mem::size_of::<u16>(), | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage
Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage
Cc @thypon @kdenhartog
src/blocker.rs
Outdated
pub(crate) generic_hide: NetworkFilterList, | ||
pub struct GenericBlocker<NetworkFilterListType> | ||
where | ||
NetworkFilterList: NetworkFilterListTrait, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this trait bound isn't doing anything useful since you're missing Type
:
NetworkFilterListType: NetworkFilterListTrait
^^^^
let mut disabled_directives: HashSet<String> = HashSet::new(); | ||
let mut enabled_directives: HashSet<String> = HashSet::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not &str
anymore?
@@ -5,13 +5,16 @@ | |||
//! serialization/deserialization implementations and can automatically dispatch to the appropriate | |||
//! one. | |||
|
|||
#![allow(dead_code)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we're not using these anymore, let's just remove them
src/engine_serializer.rs
Outdated
} | ||
|
||
pub trait EngineSerializer { | ||
fn serialize_raw(&self) -> Result<Vec<u8>, SerializeError>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's call it just serialize
.
Some context
We previously hadserialize
, which serialized a compressed format. Then we needed to have an uncompressed format; I added serialize_compressed
and serialize_raw
to maintain backward compatibility. serialize
remained as a direct wrapper around serialize_compressed
, but with a #[deprecated]
marker explaining that it would later be removed and the default behavior will be changed to serialize_raw
.
serialize_compressed
has been removed for enough time that I think it's safe to shift back to serialize
.
// Implement ExactSizeIterator for FilterPartIterator | ||
impl<'a> ExactSizeIterator for FlatPatternsIterator<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment isn't adding any useful information and it is already outdated 😅
.iter() | ||
.map(|x| self.get_or_insert_unique_domain_hash(x)) | ||
.collect(); | ||
o.sort_unstable(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to get rid of the sorting behavior. We used that previously to make differential updates of serialized buffers feasible, but that's no longer useful now that we distribute plaintext lists.
[puLL-Merge] - brave/adblock-rust@446 DescriptionThis PR introduces a significant refactoring and optimization of the adblock-rust crate's serialization and deserialization functionality. The primary change is the implementation of FlatBuffers for storing network filters, which should provide better performance and memory efficiency. Additionally, the PR reorganizes the serialization code by moving it into a dedicated trait and module. ChangesChanges
sequenceDiagram
participant Client
participant Engine
participant EngineSerializer
participant Blocker
participant FlatNetworkFilter
participant NetworkFilterList
Client->>Engine: from_rules(rules)
Engine->>Blocker: new(network_filters)
Blocker->>NetworkFilterList: new(filters)
NetworkFilterList->>FlatNetworkFiltersListBuilder: new()
FlatNetworkFiltersListBuilder->>FlatNetworkFiltersListBuilder: add(filter)
FlatNetworkFiltersListBuilder->>NetworkFilterList: finish()
Client->>Engine: serialize()
Engine->>EngineSerializer: serialize()
EngineSerializer->>Client: serialized_data
Client->>Engine: deserialize(serialized_data)
Engine->>EngineSerializer: deserialize(serialized_data)
EngineSerializer->>Engine: updated Engine
Client->>Engine: check_network_request(request)
Engine->>Blocker: check(request)
Blocker->>NetworkFilterList: first_match(request)
NetworkFilterList->>FlatNetworkFilter: matches(request)
FlatNetworkFilter->>NetworkFilterList: CheckResult
NetworkFilterList->>Blocker: result
Blocker->>Engine: result
Engine->>Client: BlockerResult
Possible Issues
Security Hotspots
|
if self.filter_map.is_empty() { | ||
return None; | ||
} | ||
|
||
let filters_list = | ||
unsafe { fb::root_as_network_filter_list_unchecked(&self.flatbuffer_memory) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage
Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage
Cc @thypon @kdenhartog
|
||
if self.filter_map.is_empty() { | ||
return filters; | ||
} | ||
|
||
let filters_list = | ||
unsafe { fb::root_as_network_filter_list_unchecked(&self.flatbuffer_memory) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage
Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage
Cc @thypon @kdenhartog
6b08928
to
5087b75
Compare
5087b75
to
0238db7
Compare
No description provided.