feat: add rolling bloom filter, reliability utils and protobuf #4

shash256 · 2025-01-13T13:42:29Z

This PR adds the rolling bloom filter to the SDS API, that is built upon bloom.nim. This PR also adds protobuf for message, bloom filter serialization and deserialization, definitions & utility functions for the ReliabilityManager. The core functions and unit tests for ReliabilityManager would be in a subsequent PR.

Would greatly appreciate nim related review and comments from the nwaku team !!

This PR is a part of the work towards nim-sds API deliverable

Ivansete-status

Thanks for it! Some comments so far :) Ping me if any comment is unclear or you need me to review again

src/protobuf.nim

Ivansete-status · 2025-01-15T10:57:25Z

src/protobuf.nim

+    var intArray = newSeq[int](bytes.len div sizeof(int))
+    for i in 0 ..< intArray.len:
+      let start = i * sizeof(int)
+      copyMem(addr intArray[i], unsafeAddr bytes[start], sizeof(int))


That looks beautiful but a couple of doubts:

Do we have a Nim module that already does that?

Should we care about the desired endianness?

wdyt @arnetheduck ?

Thanks for pointing this out.

For this use-case and from what I saw, this seems to be the efficient built-in option, but curious as well to know if there's anything else

Updated to handle endianness. Does it look good now ?

src/reliability_utils.nim

src/rolling_bloom_filter.nim

jm-clius

Generally looks good! I've requested a change wrt the rolling bloom filter design being based on duration - I think we'll get more predictable behaviour if we simply design it based on a maximum capacity with some variability.

jm-clius · 2025-01-15T12:57:16Z

src/rolling_bloom_filter.nim

+    for msg in rbf.messages:
+      if msg.timestamp > cutoff:
+        newMessages.add(msg)
+        newFilter.insert(msg.id)


What happens if the number of messages within the cutoff window is much higher than our capacity? From glancing at https://github.com/waku-org/nim-sds/blob/master/src/bloom.nim I don't think there will be any errors, but our false positive rate design will be way off.

To me it seems unnecessary to bring a time window consideration into the design here at all - we want SDS to work the same for a group communication at any rate. Perhaps therefore the rolling filter can operate on a "min" and "max" number of entries around the configured capacity? E.g. we can allow up to capacity + 20% of messages before triggering a clean, after which we add back the last capacity - 20% of messages? Something like that, but in any case avoiding the highly variable time element.

I agree with you said. The time-based approach might introduce unnecessary variability. I've updated the implementation to use capacity-based rolling filter with configurable thresholds for min and max (currently set to 20%)

src/reliability_utils.nim

jm-clius · 2025-01-20T13:58:45Z

Side note: I think it's a good idea to mention the deliverable that this forms part of in the PR description. :) waku-org/pm#194

jm-clius

Thanks! Minor comments below, but no need to rerequest approval once addressed. :)

src/message.nim

src/rolling_bloom_filter.nim

Ivansete-status

Thanks for it! Some more comments :)

src/private/probabilities.nim

src/protobuf.nim

src/rolling_bloom_filter.nim

Ivansete-status · 2025-02-03T14:40:38Z

src/rolling_bloom_filter.nim

+    var filterResult: Result[BloomFilter, string]
+    {.gcsafe.}:
+      filterResult = initializeBloomFilter(capacity, errorRate)
+
+    if filterResult.isOk:
+      logInfo("Successfully initialized bloom filter")
+      let targetCapacity = capacity
+      let minCapacity = (capacity.float * 0.8).int
+      let maxCapacity = (capacity.float * 1.2).int
+      return RollingBloomFilter(
+        filter: filterResult.get(),
+        capacity: targetCapacity,
+        minCapacity: minCapacity,
+        maxCapacity: maxCapacity,
+        messages: @[]
+      )
+    else:
+      logError("Failed to initialize bloom filter: " & filterResult.error)
+
+  except Exception:
+    logError("Failed to initialize bloom filter: " & getCurrentExceptionMsg())


Suggested change

var filterResult: Result[BloomFilter, string]

{.gcsafe.}:

filterResult = initializeBloomFilter(capacity, errorRate)

if filterResult.isOk:

logInfo("Successfully initialized bloom filter")

let targetCapacity = capacity

let minCapacity = (capacity.float * 0.8).int

let maxCapacity = (capacity.float * 1.2).int

return RollingBloomFilter(

filter: filterResult.get(),

capacity: targetCapacity,

minCapacity: minCapacity,

maxCapacity: maxCapacity,

messages: @[]

)

else:

logError("Failed to initialize bloom filter: " & filterResult.error)

except Exception:

logError("Failed to initialize bloom filter: " & getCurrentExceptionMsg())

let filter = BloomFilter.init(capacity, errorRate).valueOr:

return err("could not create bloom filter: " & $error)

let targetCapacity = capacity

let minCapacity = (capacity.float * 0.8).int

let maxCapacity = (capacity.float * 1.2).int

info "Successfully initialized bloom filter", targetCapacity, minCapacity, maxCapacity

return RollingBloomFilter(

filter: filter,

capacity: targetCapacity,

minCapacity: minCapacity,

maxCapacity: maxCapacity,

messages: @[]

)

Added a minor variation to this to initialize with the default parameters if initialization with given parameters fails

Ivansete-status

Thanks for it! Super insightful changes 🥳
Just adding some more nitpick comments
Something important is to start using nph . Ping me anytime and I can explain how we do in nwaku.
Besides, I think is interesting to start separating modules and avoid having too generic modules such as reliability_utils.nim, and also I encourage to use private attributes as much as possible.

Thanks again for the outstanding and very enriching work!

Ivansete-status · 2025-02-06T19:46:29Z

src/reliability_utils.nim

+          rm.incomingBuffer.setLen(0)
+          rm.messageHistory.setLen(0)
+      except Exception:
+        error "Error during cleanup", msg = getCurrentExceptionMsg()


nitpick comment

Suggested change

error "Error during cleanup", msg = getCurrentExceptionMsg()

error "Error during cleanup", error = getCurrentExceptionMsg()

Ivansete-status · 2025-02-06T19:46:54Z

src/reliability_utils.nim

+    try:
+      rm.bloomFilter.clean()
+    except Exception:
+      error "Failed to clean bloom filter", msg = getCurrentExceptionMsg()


Suggested change

error "Failed to clean bloom filter", msg = getCurrentExceptionMsg()

error "Failed to clean bloom filter", error = getCurrentExceptionMsg()

Ivansete-status · 2025-02-06T19:49:47Z

src/rolling_bloom_filter.nim

+    let newFilterResult = initializeBloomFilter(rbf.maxCapacity, rbf.filter.errorRate)
+    if newFilterResult.isErr:
+      error "Failed to create new bloom filter", error = newFilterResult.error
+      return
+
+    var newFilter = newFilterResult.get()


Suggested change

let newFilterResult = initializeBloomFilter(rbf.maxCapacity, rbf.filter.errorRate)

if newFilterResult.isErr:

error "Failed to create new bloom filter", error = newFilterResult.error

return

var newFilter = newFilterResult.get()

var newFilter = initializeBloomFilter(rbf.maxCapacity, rbf.filter.errorRate).valueOr:

error "Failed to create new bloom filter", error = $error

return

Ivansete-status · 2025-02-06T19:58:09Z

src/message.nim

+  UnacknowledgedMessage* = object
+    message*: Message


Other possible option

Suggested change

UnacknowledgedMessage* = object

message*: Message

UnacknowledgedMessage* = object of Message

Ivansete-status · 2025-02-06T20:04:25Z

src/message.nim

+    lamportTimestamp*: int64
+    causalHistory*: seq[MessageID]
+    channelId*: ChannelID
+    content*: seq[byte]


What is the expected content and how are we going to link the WakuMessage with that Message type?

Ivansete-status · 2025-02-06T20:07:05Z

src/message.nim

+  MessageID* = seq[byte]
+  ChannelID* = seq[byte]
+
+  Message* = object


I think we need to use a more explicit types' names to rapidly identify the object's purposes

Suggested change

MessageID* = seq[byte]

ChannelID* = seq[byte]

Message* = object

SdsMessageID* = seq[byte]

SdsChannelID* = seq[byte]

SdsMessage* = object

Ivansete-status · 2025-02-06T20:08:36Z

src/reliability_utils.nim

+
+proc cleanup*(rm: ReliabilityManager) {.raises: [].} =
+  if not rm.isNil():
+    {.gcsafe.}:


Out of curiosity, why this is needed?

feat - add rolling bloom filter, reliability utils and protobuf

6b0b9c3

shash256 self-assigned this Jan 13, 2025

shash256 requested review from vpavlin, chaitanyaprem, jm-clius, gabrielmer and Ivansete-status January 13, 2025 13:43

Ivansete-status reviewed Jan 15, 2025

View reviewed changes

jm-clius requested changes Jan 15, 2025

View reviewed changes

shash256 linked an issue Jan 27, 2025 that may be closed by this pull request

Create a nim library for SDS implementation from the API specification #5

Open

9 tasks

chore: address review comments

bb9e89b

shash256 requested review from Ivansete-status and jm-clius January 30, 2025 17:20

jm-clius approved these changes Jan 31, 2025

View reviewed changes

src/message.nim Outdated Show resolved Hide resolved

src/rolling_bloom_filter.nim Outdated Show resolved Hide resolved

Ivansete-status reviewed Feb 3, 2025

View reviewed changes

chore: address new comments

ac2e9c3

shash256 requested a review from Ivansete-status February 6, 2025 09:56

Ivansete-status approved these changes Feb 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add rolling bloom filter, reliability utils and protobuf #4

feat: add rolling bloom filter, reliability utils and protobuf #4

shash256 commented Jan 13, 2025 •

edited

Loading

Ivansete-status left a comment

Ivansete-status Jan 15, 2025

shash256 Jan 30, 2025

jm-clius left a comment

jm-clius Jan 15, 2025

shash256 Jan 30, 2025

jm-clius commented Jan 20, 2025

jm-clius left a comment

Ivansete-status left a comment

Ivansete-status Feb 3, 2025

shash256 Feb 6, 2025

Ivansete-status left a comment

Ivansete-status Feb 6, 2025

Ivansete-status Feb 6, 2025

Ivansete-status Feb 6, 2025

Ivansete-status Feb 6, 2025

Ivansete-status Feb 6, 2025

Ivansete-status Feb 6, 2025

Ivansete-status Feb 6, 2025

	error "Error during cleanup", msg = getCurrentExceptionMsg()
	error "Error during cleanup", error = getCurrentExceptionMsg()

	error "Failed to clean bloom filter", msg = getCurrentExceptionMsg()
	error "Failed to clean bloom filter", error = getCurrentExceptionMsg()

	UnacknowledgedMessage* = object
	message*: Message
	UnacknowledgedMessage* = object of Message

feat: add rolling bloom filter, reliability utils and protobuf #4

Are you sure you want to change the base?

feat: add rolling bloom filter, reliability utils and protobuf #4

Conversation

shash256 commented Jan 13, 2025 • edited Loading

Ivansete-status left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jm-clius left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jm-clius commented Jan 20, 2025

jm-clius left a comment

Choose a reason for hiding this comment

Ivansete-status left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ivansete-status left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shash256 commented Jan 13, 2025 •

edited

Loading