Skip to content

Outlier detection and outlier exclusion#5

Closed
NickNeck wants to merge 6 commits intobencheeorg:mainfrom
hrzndhrn:outliers
Closed

Outlier detection and outlier exclusion#5
NickNeck wants to merge 6 commits intobencheeorg:mainfrom
hrzndhrn:outliers

Conversation

@NickNeck
Copy link
Contributor

@NickNeck NickNeck commented Jul 4, 2023

Hello @PragTob

This PR adds outlier handling to statistex. The goal of this changes is to use the new functionality in benchee to solve issue bencheeorg/benchee#382

A little test with benchee, this PR and the benchmark from the readme in benchee show the following effect:

outlier exclusion: none
Name                  ips        average  deviation         median         99th %
flat_map           4.35 K      229.88 μs     ±9.86%      226.75 μs      406.74 μs
map.flatten        2.43 K      410.68 μs    ±15.94%      410.71 μs      567.22 μs

outlier exclusion: once
Name                  ips        average  deviation         median         99th %
flat_map           4.41 K      226.95 μs     ±0.15%      226.92 μs      228.17 μs
map.flatten        2.42 K      413.41 μs     ±1.50%      412.21 μs      436.42 μs

outlier exclusion: repeatedly
Name                  ips        average  deviation         median         99th %
flat_map           4.40 K      227.21 μs     ±0.10%      227.21 μs      227.79 μs
map.flatten        2.43 K      411.59 μs     ±0.82%      411.42 μs      419.54 μs

I have also made a change in Statistex.Percentiles. The function percentiles expects now a sorted list and the Statistex module is responsible for doing the sort. This change is to prevent the list from being sorted more than once.

WDYT? and thanks for all the work on benchee.

@PragTob
Copy link
Member

PragTob commented Nov 10, 2023

👋 Thank you a ton, I have seen this and have some time these days and I'll get to this in the coming days I hope :) And sorry for the looong wait :(

Thanks for tackling this major issue and feature! 💚

IMG_20220602_080206

@NickNeck
Copy link
Contributor Author

NickNeck commented Jan 2, 2025

Happy New Year and a very quiet ping.

@PragTob
Copy link
Member

PragTob commented Jan 2, 2025

happy new year and very needed ping 🙈

I wanna ramp up back again on benchee so... let's say I wanna get this done within the next 2 weeks and if not please ping me :)

Copy link
Member

@PragTob PragTob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @NickNeck - first of all, thanks for the great work, pull requests and also all the really neat benchee plugins I know you built 💚 Seeing those really fills my heart with joy.

Second of all, as I said before - terribly sorry about all the delay here. I always want to make contributing to my OSS projects a great experience and here I failed miserably and I'm sorry. Life and such, and as you soon will notice I also knew it'd probably be some significant effort to get my head into that all again and understand it 😅
In that context, really thanks a lot for the ping 💚

Thirdly, I believe I have reviewed you a PR of yours before but just to be sure: A lot of my comments are optional and/or questions. I wanna understand what's going and what decisions were made and why - doubly so when it's statistics that I vaguely know and I "promise" users of this library that we know what the hell we're doing 😇 Plus, API things and naming and such.
I also tend to leave a shit ton of comments, that doesn't mean I think this isn't a good PR. In fact, I think it's a great PR and some really awesome work. I can also see you remedied some issues "past Tobi" left in there. I also really wanna thank you for tackling this, as it's something I really wanted to do for a long time but never dedicated time to get my head into it.

in this particular PR I think the main things to figure out are:

  • outlier_bounds - name, what does it mean and what is it used for? (a lot of comments touch on this)
  • the handling of sorted lists/who sorts/maybe get a sorted?: true/false flag going
  • the samples/rest coming back reversed from do_outliers if I'm not mistaken

Everything else (I think) are even more minor touch ups to not do at all or to do later.

Lastly, don't feel like you need to do any adjustments. Esp. the time break is entirely on me and so I'm fine to fix up this PR if you can't/don't want to 💚

Oh, and when a comment says "Future Tobi:" that's me coming back after having read the entire PR twice or thrice to add context. I sometimes like to (mostly) preserve my initial comment as I always think it's valuable to see what a first time reader of the code might/think or struggle with as pointers to what could maybe still improved.

IMG_20180201_140436-ANIMATION

@NickNeck
Copy link
Contributor Author

Wow, so much text for such a short PR 😉

Thanks for the review. I will have a look at all the points mentioned in the next few days. For me, the PR feels a bit like I'm reviewing someone else's PR (past Marcus).

@PragTob
Copy link
Member

PragTob commented Jan 13, 2025

No worries, that's entirely on me.

@PragTob
Copy link
Member

PragTob commented Jan 27, 2025

heyo, sorry for the delay - I've been sick the last couple of weeks just in time to start travel. I might get to that during travel, if not next week (assuming I stay healthy). Sorry for the inconvenience y'all :(

@NickNeck
Copy link
Contributor Author

@PragTob don't worry about delaying, it's more important that you stay healthy.

@PragTob
Copy link
Member

PragTob commented May 2, 2025

It is, and I shouldn't make any more promises/allusions to time lines 😬 Life has been... something 👀

I'll try to get this through this weekend though. Won't ask for adjustments, will just do them as it's clearly on me for leaving this lying around so often and introducing delay.

Feel free to comment/PR changes etc. :)

PragTob added a commit that referenced this pull request May 2, 2025
From: #5 (comment)

Need to think through it again/and or check some more samples
and test it against that. Getting different bounds/outliers
right now although I think they're right.
PragTob added a commit that referenced this pull request May 2, 2025
From: #5 (comment)

Need to think through it again/and or check some more samples
and test it against that. Getting different bounds/outliers
right now although I think they're right.
@PragTob
Copy link
Member

PragTob commented May 2, 2025

Trying to finish this out in #10 - I'll still use this PR for convos and trying to resolve comments I consider to be addressed :)

PragTob added a commit that referenced this pull request May 5, 2025
From: #5 (comment)

Need to think through it again/and or check some more samples
and test it against that. Getting different bounds/outliers
right now although I think they're right.
@PragTob
Copy link
Member

PragTob commented May 5, 2025

Merged as #10 - thank you!

IMG_20170507_172136

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants