Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Copy on Write (CoW) and use in event queue #436

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

JonoPrest
Copy link
Collaborator

@JonoPrest JonoPrest commented Jan 24, 2025

@JonoPrest JonoPrest requested a review from DZakh January 24, 2025 10:28
Comment on lines +1 to +45
type t<'a> = {
mutable data: 'a,
mutable isImmutable: bool,
copy: 'a => 'a,
}

let make = (data: 'a, ~copy): t<'a> => {data, isImmutable: true, copy}
let getDataRef = (self: t<'a>): 'a => self.data
let getData = (self: t<'a>): 'a =>
if self.isImmutable {
self.data
} else {
self.data->self.copy
}

let copy = ({isImmutable, data, copy} as self: t<'a>): t<'a> =>
if isImmutable {
data->make(~copy)
} else {
self.isImmutable = true
data->make(~copy)
}

let mutate = (self: t<'a>, fn) =>
if self.isImmutable {
self.isImmutable = false
self.data = self.data->self.copy
fn(self.data)
} else {
fn(self.data)
}

module Array = {
type t<'a> = t<array<'a>>
module InternalFunctions = {
let copy = arr => arr->Array.copy
let push = (arr, item) => arr->Js.Array2.push(item)->ignore
let pop = arr => arr->Js.Array2.pop
}
let make = (data: array<'a>): t<'a> => make(data, ~copy=InternalFunctions.copy)
let push = (self: t<'a>, item) => self->mutate(arr => arr->InternalFunctions.push(item))
let pop = (self: t<'a>) => self->mutate(InternalFunctions.pop)
let length = (self: t<'a>) => self->getDataRef->Array.length
let last = (self: t<'a>) => self->getDataRef->Utils.Array.last
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about CoW when reading a bit about the swift compiler and thought it would be useful here

@@ -62,7 +62,7 @@ type t = {

let shallowCopyPartition = (p: partition) => {
...p,
fetchedEventQueue: p.fetchedEventQueue->Array.copy,
fetchedEventQueue: p.fetchedEventQueue->Cow.copy,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the improvement happens. It won't copy the array until a mutation occurs

@DZakh
Copy link
Member

DZakh commented Jan 24, 2025

Could you measure the performance of the change? I was experimenting with profiling recently, and the only bottleneck of the JS execution was decoding.

@DZakh
Copy link
Member

DZakh commented Jan 24, 2025

Ideally I think it should be solved without a shallow copy of the FetchState in the first place. I have an idea about how to do it, but maybe in the next quoter. Might be a good improvement for now, just interested if it's actually an improvement.

@JonoPrest
Copy link
Collaborator Author

Could you measure the performance of the change? I was experimenting with profiling recently, and the only bottleneck of the JS execution was decoding.

Yeah it's a bit difficult to measure memory allocation in such detail with node js. I was hoping to just deploy an indexer and see if there's an average memory usage difference.

The create batch function already has a benchmark that gets tracked so it will be easy to see if there's speed improvement on average. But this doesn't even feature in terms of time in the system as a whole and won't unblock anything. Not sure what type of metric you're interested in?

@DZakh
Copy link
Member

DZakh commented Jan 24, 2025

If this is for memory, I'm not sure that preventing the recreation of an array of pointers will change something. When I was profiling our memory usage, the biggest consumers were address strings and event string fields. As for the number of allocations, this is Array.keepMap. My only problem with the PR is that it increases the cognitive complexity without a clear gain. But we can definitely merge it and try to measure the results, although it's better to do separately from the insert unnest PR, since there are also some improvements for converting entities to SQL params.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants