Outlier detection and outlier exclusion#5
Conversation
|
Happy New Year and a very quiet |
|
happy new year and very needed ping 🙈 I wanna ramp up back again on benchee so... let's say I wanna get this done within the next 2 weeks and if not please ping me :) |
PragTob
left a comment
There was a problem hiding this comment.
Hello @NickNeck - first of all, thanks for the great work, pull requests and also all the really neat benchee plugins I know you built 💚 Seeing those really fills my heart with joy.
Second of all, as I said before - terribly sorry about all the delay here. I always want to make contributing to my OSS projects a great experience and here I failed miserably and I'm sorry. Life and such, and as you soon will notice I also knew it'd probably be some significant effort to get my head into that all again and understand it 😅
In that context, really thanks a lot for the ping 💚
Thirdly, I believe I have reviewed you a PR of yours before but just to be sure: A lot of my comments are optional and/or questions. I wanna understand what's going and what decisions were made and why - doubly so when it's statistics that I vaguely know and I "promise" users of this library that we know what the hell we're doing 😇 Plus, API things and naming and such.
I also tend to leave a shit ton of comments, that doesn't mean I think this isn't a good PR. In fact, I think it's a great PR and some really awesome work. I can also see you remedied some issues "past Tobi" left in there. I also really wanna thank you for tackling this, as it's something I really wanted to do for a long time but never dedicated time to get my head into it.
in this particular PR I think the main things to figure out are:
outlier_bounds- name, what does it mean and what is it used for? (a lot of comments touch on this)- the handling of sorted lists/who sorts/maybe get a
sorted?: true/falseflag going - the samples/rest coming back reversed from
do_outliersif I'm not mistaken
Everything else (I think) are even more minor touch ups to not do at all or to do later.
Lastly, don't feel like you need to do any adjustments. Esp. the time break is entirely on me and so I'm fine to fix up this PR if you can't/don't want to 💚
Oh, and when a comment says "Future Tobi:" that's me coming back after having read the entire PR twice or thrice to add context. I sometimes like to (mostly) preserve my initial comment as I always think it's valuable to see what a first time reader of the code might/think or struggle with as pointers to what could maybe still improved.
|
Wow, so much text for such a short PR 😉 Thanks for the review. I will have a look at all the points mentioned in the next few days. For me, the PR feels a bit like I'm reviewing someone else's PR (past Marcus). |
|
No worries, that's entirely on me. |
|
heyo, sorry for the delay - I've been sick the last couple of weeks just in time to start travel. I might get to that during travel, if not next week (assuming I stay healthy). Sorry for the inconvenience y'all :( |
|
@PragTob don't worry about delaying, it's more important that you stay healthy. |
|
It is, and I shouldn't make any more promises/allusions to time lines 😬 Life has been... something 👀 I'll try to get this through this weekend though. Won't ask for adjustments, will just do them as it's clearly on me for leaving this lying around so often and introducing delay. Feel free to comment/PR changes etc. :) |
From: #5 (comment) Need to think through it again/and or check some more samples and test it against that. Getting different bounds/outliers right now although I think they're right.
From: #5 (comment) Need to think through it again/and or check some more samples and test it against that. Getting different bounds/outliers right now although I think they're right.
|
Trying to finish this out in #10 - I'll still use this PR for convos and trying to resolve comments I consider to be addressed :) |
From: #5 (comment) Need to think through it again/and or check some more samples and test it against that. Getting different bounds/outliers right now although I think they're right.
|
Merged as #10 - thank you! |



Hello @PragTob
This PR adds outlier handling to
statistex. The goal of this changes is to use the new functionality inbencheeto solve issue bencheeorg/benchee#382A little test with
benchee, this PR and the benchmark from the readme inbencheeshow the following effect:I have also made a change in
Statistex.Percentiles. The functionpercentilesexpects now a sorted list and theStatistexmodule is responsible for doing the sort. This change is to prevent the list from being sorted more than once.WDYT? and thanks for all the work on
benchee.