-
-
Notifications
You must be signed in to change notification settings - Fork 7
RFC 0012: timers #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
RFC 0012: timers #12
Conversation
|
|
||
| - Go stores all timers into a min-heap (4-ary) but allocates timers in the GC | ||
| HEAP and merely marks cancelled timers on delete. I didn't investigate how | ||
| it deals with the tombstones. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds worth investigating further to me, I actually wrote a message above which described doing exactly this, but I deleted it when I read this part. Tombstones can probably be kept around until dequeue, though there may be other opportune times to delete them if scanning/moving entries anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be interesting to understand, but I wonder about the benefit.
Keeping the tombstones means they might stay for seconds, minutes or hours despite having been cancelled. The occupancy would no longer be how many active timers there are now, but the total number of timers created in the last N seconds/minutes/hours.
They also increase the cost of delete-min: it must be repeated multiple times until we reach a not cancelled timer (not cool).
We'd have to allocate the event in the GC HEAP (we currently allocate events on the stack) and they'd stay allocated until they finally leave the 4-heap.
We can probably clear the tombstones we meet as we swap items (cool), but that means dereferencing each pointer, which reduces the CPU cache benefit of the flat array...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the practicalities of the solution might outweigh any performance benefit, but Go going that way provides signal to me it's worth doing the benchmarking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be a cleanup during a GC collection 🤔
Co-authored-by: Johannes Müller <[email protected]>
Related to [RFC #12](crystal-lang/rfcs#12). Replaces the `Deque` used in #14996 for a min [Pairing Heap] which is a kind of [Mergeable Heap] and is one of the best performing heap in practical tests when arbitrary deletions are required (think cancelling a timeout), otherwise a D-ary Heap (e.g. 4-heap) will usually perform better. See the [A Nearly-Tight Analysis of Multipass Pairing Heaps](https://epubs.siam.org/doi/epdf/10.1137/1.9781611973068.52) paper or the Wikipedia page for more details. The implementation itself is based on the [Pairing Heaps: Experiments and Analysis](https://dl.acm.org/doi/pdf/10.1145/214748.214759) paper, and merely implements a recursive twopass algorithm (the auxiliary twopass might perform even better). The `Crystal::PointerPairingList(T)` type is generic and relies on intrusive nodes (the links are into `T`) to avoid extra allocations for the nodes (same as `Crystal::PointerLinkedList(T)`). It also requires a `T#heap_compare` method, so we can use the same type for a min or max heap, or to build a more complex comparison. Note: I also tried a 4-heap, and while it performs very well and only needs a flat array, the arbitrary deletion (e.g. cancelling timeout) needs a linear scan and its performance quickly plummets, even at low occupancy, and becomes painfully slow at higher occupancy (tens of microseconds on _each_ delete, while the pairing heap does it in tens of nanoseconds). Follow up to #14996 [Mergeable Heap]: https://en.wikipedia.org/wiki/Mergeable_heap [Pairing Heap]: https://en.wikipedia.org/wiki/Pairing_heap [D-ary Heap]: https://en.wikipedia.org/wiki/D-ary_heap Co-authored-by: Linus Sellberg <[email protected]> Co-authored-by: Johannes Müller <[email protected]>
Co-authored-by: Vlad Zarakovsky <[email protected]>
| - determine the next timer to expire, so we can decide for how long a process or | ||
| thread can be suspended (usually when there is nothing to do). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The need for this depends on the event loop. Is there any need at all for all this complexity if the event loop is driven by io_uring? Just emit a timeout event for each fiber that is waiting, and that's it. I'm all for shared code between underlying event loops that are limited in what they can do, but are there any reason to lock event loops into more structure?
It also supports timeouting io operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
io_uring supporting timeouts on IO operations => ❤️
Each eventloop can use whatever it pleases, yet... there are still "select action timeouts" that can be cancelled, so even io_uring will need to support an arbitrary dequeue of timeouts.
Even with events to notify the blocking waits (which we use for epoll, kqueue and IOCP), we still need to rearm the timer after it triggered (for example) and need to know when the next timer is expiring. I don't think io_uring will be treated differently.
So far, my naive vision is for io_uring to notify an eventfd registered to an epoll instance, along with a timerfd (for precise timers), waiting on arbitrary fd (#wait_readable and #wait_writable) and eventually more niceties (i.e. signalfd and pidfd).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each eventloop can use whatever it pleases, yet... there are still "select action timeouts" that can be cancelled,
Yes. For example using the uring op TIMEOUT_REMOVE.
That said, it is when considering what actually goes on in a select action loop that made me really dislike it in general. So much pointless teardown and rearming..
so even io_uring will need to support an arbitrary dequeue of timeouts.
No, that does not follow. It may be an issue if we are not ok waiting for the response to the timeout removal I guess, and it also need to handle the race condition where the timer is already triggering and execute before the actual timeout removal. But it is definitely doable without.
timerfd(for precise timers),
FWIW, uring timeout op also take timespec structs as arguments with the same precision as timerfd. What uring doesn't seem to support is the periodic part of the argument, but instead there is an MULTISHOT argument if you want repeating triggers.
So far, my naive vision is for io_uring to notify an eventfd registered to an epoll instance
I'd suggest not using epoll at all and instead use the uring POLL op, which does more or less the same but a lot simpler.
But in any case I guess it doesn't matter too much as it doesn't really impact the public interfaces so in the end it can be changed when the need arises..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the information! I'll have to dig much deeper into io_uring capabilities.
AFAIK all timeouts in the Linux kernel go into the timer wheel (tick based, low precision, now loses precision over time) while timers go into hrtimer (high precision, no ticks, nanoseconds clock).
I'd expect io_uring timeouts to end up into the timer wheel, which is fine for timeouts, but I'd like to keep timerfd for sleep(seconds).
Co-authored-by: Vlad Zarakovsky <[email protected]>
No description provided.