-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add some hypothesis test functions #315
Conversation
Sorry that I've taken a bit to reply here. So far, these are great. We have mentioned the idea of a nan policy in regards to analytical functions (as opposed to empirical functions) and I think following the scipy approach is good because of developer expectations. We don't really have enough tests to have a sense of uniform API for tests and having the policy as an argument is useful for establishing that. I think the direction you're going in is good and I would approve this once I look into why the However, regarding license, to what degree would you say you referred to the scipy source? I don't wish to complicate the license we distribute with, nor do I want to use a license that's not typical for crates on crates.io |
Hey thanks for the response and feedback. In terms of the nightly piece. I found the same error locally. A day or so later I updated nightly and everything worked just fine, so it seems like it was an issue specific to nightly. In terms of how much I "referred to the scipy source", it's been a pretty loose reference for the most part but I'll provide some relevant links if you want to form your own opinion. one-way ANOVA F-test
My conclusion: commonality with scipy is mainly just the function signature as I leveraged a statsdirect page for logic One Sample t-test
My conclusion: again mainly just the function signature as I used the logic from this jpm page Mann Whitney UHere we'd probably want to look at the two main pieces of logic, being the different methods for calculating the test's statistic, separately Exact
My conclusion: These are very different from each other. The scipy version is doing a lot of 2-d array stuff and matrix operations that I didn't get into in my implementation. Asymptotic
My conclusion: This is probably the only case worth your review/thoughts. The scipy version is ~10 LOC and the version in this PR is basically a 1:1 copy of those lines. There isn't too much room for alternative implementation here, but happy to re-write it to avoid any potential issues. Also for reference, the scipy function references this section in the Mann Whitney U wiki article. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #315 +/- ##
==========================================
+ Coverage 93.81% 94.26% +0.45%
==========================================
Files 53 58 +5
Lines 11996 12943 +947
==========================================
+ Hits 11254 12201 +947
Misses 742 742 ☔ View full report in Codecov by Sentry. |
re: nightly Thanks for the well-structured reply. I took a look at each and I agree on your points regarding the implementations. To be fair, had you not done this I would still trust your assessment completely since the author has the best idea of what level they referred to sources. As a non-expert in licensing, I believe we're in the clear here, especially since it's laid out so clearly in the Wikipedia article. Regarding test cases, I'm not so sure here. I'll reach out to softwarefreedom.org and update you. |
src/stats_tests/mannwhitneyu.rs
Outdated
/// ranks data and accounts for ties to calculate the U statistic | ||
fn rankdata_mwu<T: PartialOrd>(xy: Vec<T>) -> Result<(Vec<f64>, Vec<usize>), MannWhitneyUError> { | ||
let mut j = (0..xy.len()).collect::<Vec<usize>>(); | ||
let mut y = xy; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hoping this comes off less of a nitpick and more of an "in case you didn't know"
You can bind arguments to be mut
in the function header without affecting the signature and it can remove this kind of line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wasn't aware of this. Still pretty new to Rust so appreciate any comments on the more idiomatic things I might be missing. Should be addressed in 50dcaab
In regards to this, I just added a couple of more tests. I'm happy to keep working on this PR adding more tests for a bit longer. Or I can just up a new PR for more additions. I don't have too much a preference, and it doesn't seem like you do either. |
I think I now prefer that the next things be a new PR since these tests could be used for sampling verification which I was working on a bit ago. I forgot that we have some Thanks for adding these hypothesis tests! |
Oh and I didn't get back to you on due diligence for attributing data. As much as possible, you should provide the source for the dataset where scipy is specifying what that source is; I believe it demonstrates good faith toward the primary source. But after that I'll definitely accept the PR. |
Sure thing. Are you looking for a full reference section? |
I think adding a line comment with an inline reference on the relevant tests would suffice. After all, scipy had their references within their docs, so consuming that information is more common. This is for appropriate attribution, so I think it's fine if it's a little hairy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the attribution changes!
Thank you for your contribution! Having tests will be great! |
Adds functions for
Functions are generally based on the scipy version. I tried to align with existing formatting/setup from the fisher test, but wanted to make sure I was on the right track (in terms of structure, level of documentation, desire for this capability, etc.) before thinking about adding more tests.