Skip to content

Conversation

@ysbaddaden
Copy link
Collaborator

No description provided.

@ysbaddaden ysbaddaden added the rfc label Nov 22, 2024
@ysbaddaden ysbaddaden self-assigned this Nov 22, 2024
@ysbaddaden ysbaddaden changed the title RFC 0000: timers RFC 0012: timers Nov 22, 2024

- Go stores all timers into a min-heap (4-ary) but allocates timers in the GC
HEAP and merely marks cancelled timers on delete. I didn't investigate how
it deals with the tombstones.
Copy link
Member

@RX14 RX14 Nov 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds worth investigating further to me, I actually wrote a message above which described doing exactly this, but I deleted it when I read this part. Tombstones can probably be kept around until dequeue, though there may be other opportune times to delete them if scanning/moving entries anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be interesting to understand, but I wonder about the benefit.

Keeping the tombstones means they might stay for seconds, minutes or hours despite having been cancelled. The occupancy would no longer be how many active timers there are now, but the total number of timers created in the last N seconds/minutes/hours.

They also increase the cost of delete-min: it must be repeated multiple times until we reach a not cancelled timer (not cool).

We'd have to allocate the event in the GC HEAP (we currently allocate events on the stack) and they'd stay allocated until they finally leave the 4-heap.

We can probably clear the tombstones we meet as we swap items (cool), but that means dereferencing each pointer, which reduces the CPU cache benefit of the flat array...

Copy link
Member

@RX14 RX14 Nov 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the practicalities of the solution might outweigh any performance benefit, but Go going that way provides signal to me it's worth doing the benchmarking.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be a cleanup during a GC collection 🤔

Co-authored-by: Johannes Müller <[email protected]>
@straight-shoota straight-shoota marked this pull request as ready for review November 24, 2024 10:58
straight-shoota added a commit to crystal-lang/crystal that referenced this pull request Nov 26, 2024
Related to [RFC #12](crystal-lang/rfcs#12).

Replaces the `Deque` used in #14996 for a min [Pairing Heap] which is a kind of [Mergeable Heap] and is one of the best performing heap in practical tests when arbitrary deletions are required (think cancelling a timeout), otherwise a D-ary Heap (e.g. 4-heap) will usually perform better. See the [A Nearly-Tight Analysis of Multipass Pairing Heaps](https://epubs.siam.org/doi/epdf/10.1137/1.9781611973068.52) paper or the Wikipedia page for more details.

The implementation itself is based on the [Pairing Heaps: Experiments and Analysis](https://dl.acm.org/doi/pdf/10.1145/214748.214759) paper, and merely implements a recursive twopass algorithm (the auxiliary twopass might perform even better). The `Crystal::PointerPairingList(T)` type is generic and relies on intrusive nodes (the links are into `T`) to avoid extra allocations for the nodes (same as `Crystal::PointerLinkedList(T)`). It also requires a `T#heap_compare` method, so we can use the same type for a min or max heap, or to build a more complex comparison.

Note: I also tried a 4-heap, and while it performs very well and only needs a flat array, the arbitrary deletion (e.g. cancelling timeout) needs a linear scan and its performance quickly plummets, even at low occupancy, and becomes painfully slow at higher occupancy (tens of microseconds on _each_ delete, while the pairing heap does it in tens of nanoseconds).

Follow up to #14996 

[Mergeable Heap]: https://en.wikipedia.org/wiki/Mergeable_heap
[Pairing Heap]: https://en.wikipedia.org/wiki/Pairing_heap
[D-ary Heap]: https://en.wikipedia.org/wiki/D-ary_heap

Co-authored-by: Linus Sellberg <[email protected]>
Co-authored-by: Johannes Müller <[email protected]>
Co-authored-by: Vlad Zarakovsky <[email protected]>
Comment on lines +23 to +24
- determine the next timer to expire, so we can decide for how long a process or
thread can be suspended (usually when there is nothing to do).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The need for this depends on the event loop. Is there any need at all for all this complexity if the event loop is driven by io_uring? Just emit a timeout event for each fiber that is waiting, and that's it. I'm all for shared code between underlying event loops that are limited in what they can do, but are there any reason to lock event loops into more structure?

It also supports timeouting io operations.

Copy link
Collaborator Author

@ysbaddaden ysbaddaden Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

io_uring supporting timeouts on IO operations => ❤️

Each eventloop can use whatever it pleases, yet... there are still "select action timeouts" that can be cancelled, so even io_uring will need to support an arbitrary dequeue of timeouts.

Even with events to notify the blocking waits (which we use for epoll, kqueue and IOCP), we still need to rearm the timer after it triggered (for example) and need to know when the next timer is expiring. I don't think io_uring will be treated differently.

So far, my naive vision is for io_uring to notify an eventfd registered to an epoll instance, along with a timerfd (for precise timers), waiting on arbitrary fd (#wait_readable and #wait_writable) and eventually more niceties (i.e. signalfd and pidfd).

Copy link
Contributor

@yxhuvud yxhuvud Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each eventloop can use whatever it pleases, yet... there are still "select action timeouts" that can be cancelled,

Yes. For example using the uring op TIMEOUT_REMOVE.

That said, it is when considering what actually goes on in a select action loop that made me really dislike it in general. So much pointless teardown and rearming..

so even io_uring will need to support an arbitrary dequeue of timeouts.

No, that does not follow. It may be an issue if we are not ok waiting for the response to the timeout removal I guess, and it also need to handle the race condition where the timer is already triggering and execute before the actual timeout removal. But it is definitely doable without.

timerfd (for precise timers),

FWIW, uring timeout op also take timespec structs as arguments with the same precision as timerfd. What uring doesn't seem to support is the periodic part of the argument, but instead there is an MULTISHOT argument if you want repeating triggers.

So far, my naive vision is for io_uring to notify an eventfd registered to an epoll instance

I'd suggest not using epoll at all and instead use the uring POLL op, which does more or less the same but a lot simpler.

But in any case I guess it doesn't matter too much as it doesn't really impact the public interfaces so in the end it can be changed when the need arises..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the information! I'll have to dig much deeper into io_uring capabilities.

AFAIK all timeouts in the Linux kernel go into the timer wheel (tick based, low precision, now loses precision over time) while timers go into hrtimer (high precision, no ticks, nanoseconds clock).

I'd expect io_uring timeouts to end up into the timer wheel, which is fine for timeouts, but I'd like to keep timerfd for sleep(seconds).

https://www.kernel.org/doc/html/latest/timers/highres.html

Co-authored-by: Vlad Zarakovsky <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants