WIP, MAINT: remove dependence on graforvfl#9
Conversation
* `graforvfl` is still listed with a copyleft license, and its dependency stack is fairly problematic to install as noted in the code changes at gh-2. While it is only a test time dependency rather than a hard library runtime dependency, I suspect we won't be able to carry it (and its transitive deps) over to sklearn. Even independent of that, it is a bit of a pain for us to deal with long-term, so this patch attempts to remove our testing dependence on it (and its transitive deps) completely.
| scikit-learn>=1.2.1 | ||
| pandas | ||
| mealpy>=3.0.2 | ||
| permetrics>=2.0.0 |
There was a problem hiding this comment.
these are transitive deps (via grafo), so being removed here (assuming we agree to remove grafo usage..)
| ([10,], 2, "relu", "uniform", None, (20, 2), 0.5, 0.0444571694), | ||
| ([100,], 2, "tanh", "normal", None, (20, 2), 0.5, 0.02538905725), | ||
| ([10,], 5, "softmax", "lecun_uniform", None, (20, 5), | ||
| 0.186506112, 0.08469873), |
There was a problem hiding this comment.
I need help reconstituting a "reasonable" number of permutations "manually," to sample the parameter space that was previously being sampled automatically and compared against graforvfl.
It certainly isn't sustainable to restore all the of the cases from the pervious Cartesian product of cases, and that was probably overparametrizing anyway, but certainly what I've done here is not enough.
| # datasets | ||
| X, y = make_classification(n_classes=n_classes, | ||
| n_informative=8) | ||
| n_informative=8, random_state=0) |
There was a problem hiding this comment.
this was a problem with the old code--we should pretty much always pin random seeds in test case generation--it wasn't being caught because both our library and grafo were getting the same varied data...
src/gfdl/tests/test_regression.py
Outdated
| (100, 10, (100,), "relu", "glorot_normal", 10, (25, 10), | ||
| -29.31478018, -490.5751822), | ||
| (100, 10, (100,), "tanh", "uniform", 100, (25, 10), | ||
| -34.613165002, -327.82362807), |
There was a problem hiding this comment.
here also, I need help building up a "reasonable" number of "manual" regression test cases, obviously without fully substituting for the prior Cartesian product of cases that were being probed
|
It looks like the CI testing is still showing 100 % test line coverage. That's good, but obviously that metric isn't sensitive to the reduced case sampling I need help with. |
|
@tylerjereddy : Added a few more tests for regressor. No grafo deps are remaining. Ready for a final look. |
tylerjereddy
left a comment
There was a problem hiding this comment.
git grep -E -i "grafo" looks reasonable to me locally on this feature branch (remaining uses of the reference to that lib are fine) and the test line coverage still looks good. Of course, we're dropping our absolute number of test cases, but I think removing grafo.. is indeed worth it and we can build up the test cases manually over time as needed without the residual licensing concerns.
Thanks for pushing in a few more cases here.
I did have a few review comments, but they're non-blocking and can be handled in a follow up.
We've both contributed here, so it isn't clear who should do the final merge. We're not actually changing any source code outside tests, so probably not a big deal. I think I'll probably self-merge in this case since you've pushed more lines than I in the end...
| @@ -1,4 +1,5 @@ | |||
| import graforvfl | |||
| # tests/test_model.py | |||
There was a problem hiding this comment.
per gh-10, I think we more or less agreed to stop adding these comments for reasons related to maintainability; that's easy enough to remove
| actual_preds_median = np.median(actual_preds) | ||
| actual_preds_min = actual_preds.min() | ||
| actual_preds_r2 = r2_score(y_test, actual_preds) | ||
| np.testing.assert_allclose(actual_preds_shape, exp_preds_shape) |
There was a problem hiding this comment.
We could be a bit stricter here and for classifier test above with assert_array_equal since the shapes are by definition exact integers, but that can be done later, and in practice any reasonable bug would still cause a test failure as is.
graforvflis still listed with a copyleft license, and its dependency stack is fairly problematic to install as noted in the code changes at CI, BLD: support wheel builds, test in CI #2. While it is only a test time dependency rather than a hard library runtime dependency, I suspect we won't be able to carry it (and its transitive deps) over to sklearn. Even independent of that, it is a bit of a pain for us to deal with long-term, so this patch attempts to remove our testing dependence on it (and its transitive deps) completely.I could use help with:
git grepchecking to make sure all the remnants of grafo.. related things are gone?