Skip to content

Annotate metrics #630

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Apr 2, 2020
Merged

Annotate metrics #630

merged 11 commits into from
Apr 2, 2020

Conversation

prasunanand
Copy link
Contributor

Changes made:

  • Add mypy to setup.cfg
  • Add ArrayLike type
  • Modify ci scripts
  • Add annotations to dask_ml.metrics

Partially fixes #607

Copy link
Member

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Gave a quick pass through. Will look more later.

Comment on lines +108 to +109
X: ArrayLike, Y: ArrayLike, precomputed: bool = False
) -> Tuple[ArrayLike, ArrayLike]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know typing well enough, but I think there's a way to say that the Type[X] will exactly match the type of the first element of the result tuple? So you know that if you put in a dask.Array you'll get back a dask.Array rather than an numpy.ndarray?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this :/ .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK to punt on it for now.

y_true: ArrayLike,
y_pred: ArrayLike,
sample_weight: Optional[ArrayLike] = None,
multioutput: Optional[str] = "uniform_average",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to check with scikit-learn what the actual supported types are. Can this be an array, which just isn't implemented yet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger
Copy link
Member

If you merge master then the unrelated CI issues should be fixed.

@prasunanand
Copy link
Contributor Author

@TomAugspurger , Request your review.

Copy link
Member

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking close, thanks!

@TomAugspurger
Copy link
Member

Can you add mypy to the linting check in azure-pipelines/posix.yaml?

And there are some linting issues there which aren't failing the build. https://dev.azure.com/dask-dev/dask/_build/results?buildId=818&view=logs&j=34469798-9dec-5482-c96c-ad1eda9ec58d&t=3856b5af-eaa9-56ed-b4cc-e7b6fad51231&l=12 Can you fix those? I'll figure out why CI isn't failing.

@prasunanand
Copy link
Contributor Author

Thanks @TomAugspurger . I am working on the changes.

@TomAugspurger
Copy link
Member

Can you merge master? There will be a merge conflict with posix.yaml. You'll need to move your mypy check to ci/code_checks.sh.

@prasunanand
Copy link
Contributor Author

I think it should have worked.

dask_ml/metrics/pairwise.py:218: error: Invalid index type "Union[str, Callable[[Any, Any], float]]" for "Dict[str, Any]"; expected type "str"

https://dev.azure.com/dask-dev/dask/_build/results?buildId=834&view=logs&j=9dcbdf5b-840d-5ddd-a5ac-58624932ec86&t=96578b6e-0c8a-5a4c-e6c4-a33b762a5e3a&l=25

@TomAugspurger
Copy link
Member

TomAugspurger commented Apr 2, 2020

I think it should have worked.

I think PAIRWISE_KERNEL_FUNCTIONS is inferred as Dict[str, Any]. You could perhaps add an annotation to it.

Or... perhaps add an assert isinstance(metric, str) to ~L216. That will convince mypy that the only way we reach that code is when metric is a str, which I think is correct.


Just FYI @rth @thomasjpfan. IIRC you're investigating type annotations for scikit-learn. I'm not sure if there's a way for us to share types, since we have different definitions of "ArrayLike", but we can coordinate in places where we overlap.

@prasunanand
Copy link
Contributor Author

Thanks :)
assert isinstance(metric, str) worked

@prasunanand
Copy link
Contributor Author

@TomAugspurger , It's all good now :)

@TomAugspurger
Copy link
Member

Thanks, this looks good!

@TomAugspurger TomAugspurger merged commit e5a2f27 into dask:master Apr 2, 2020
@rth
Copy link
Contributor

rth commented Apr 2, 2020

I'm not sure if there's a way for us to share types, since we have different definitions of "ArrayLike", but we can coordinate in places where we overlap.

@TomAugspurger As far as I know, in scikit-learn we won't type array like objects for now precisely because it's unclear what the type for "ArrayLike" input should be. So at least there shouldn't be an issue of types being over-constrained on the scikit-learn side.

@mrocklin
Copy link
Member

mrocklin commented Apr 2, 2020

So at least there shouldn't be an issue of types being over-constrained on the scikit-learn side.

This is true of most things I think :)

chauhankaranraj pushed a commit to chauhankaranraj/dask-ml that referenced this pull request Apr 3, 2020
* Add annotations for metrics module
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add mild type annotations
4 participants