-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC and WIP: Add MypyTypeInferenceProvider #831
base: main
Are you sure you want to change the base?
Conversation
3e16e20
to
da2d4b7
Compare
This change is RFC and WIP (please read whole change message). Add `MypyTypeInferenceProvider` as an alternative for `TypeInferenceProvider`. The provider infers types using mypy as library. The only requirement for the usage is to have the latest mypy installed. Types inferred are mypy types, since mypy type system is well designed, to avoid the conversion, and also to keep it simple. For compatibility and extensibility reasons, these types are stored in separate field `MypyType.mypy_type`. Let's assume we have the following code in the file `x.py` which we want to inspect: ```python x = [42] s = set() from enum import Enum class E(Enum): f = "f" e = E.f ``` Then to get play with mypy types one should use the code like: ```python import libcst as cst from libcst.metadata import MypyTypeInferenceProvider filename = "x.py" module = cst.parse_module(open(filename).read()) cache = MypyTypeInferenceProvider.gen_cache(".", [filename])[filename] wrapper = cst.MetadataWrapper( module, cache={MypyTypeInferenceProvider: cache}, ) mypy_type = wrapper.resolve(MypyTypeInferenceProvider) x_name_node = wrapper.module.body[0].body[0].targets[0].target set_call_node = wrapper.module.body[1].body[0].value e_name_node = wrapper.module.body[-1].body[0].targets[0].target print(mypy_type[x_name_node]) # prints: builtins.list[builtins.int] print(mypy_type[x_name_node].fullname) # prints: builtins.list[builtins.int] print(mypy_type[x_name_node].mypy_type.type.fullname) # prints: builtins.list print(mypy_type[x_name_node].mypy_type.args) # prints: (builtins.int,) print(mypy_type[x_name_node].mypy_type.type.bases[0].type.fullname) # prints: typing.MutableSequence print(mypy_type[set_call_node]) # prints: builtins.set print("issuperset" in mypy_type[set_call_node].mypy_type.names) # prints: True print(mypy_type[set_call_node.func]) # prints: typing.Type[builtins.set] print(mypy_type[e_name_node].mypy_type.type.is_enum) # prints: True ``` Why? 1. `TypeInferenceProvider` requires pyre (`pyre-check` on PyPI) to be installed. mypy is more popular than pyre. If the organization uses mypy already (which is almost always the case), it may be difficult to assure colleagues (including security team) that "we need yet another type checker". 2. Even though it is possible to run pyre without watchman installation, this is not advertised. watchman installation is not always possible because of system requirements, or because of the security requirements like "we install only our favorite GNU/Linux distribution packages". 3. `TypeInferenceProvider` usage requires `pyre start` command to be run before the execution, and `pyre stop` - after the execution. This may be inconvenient, especially for the cases when pyre was not used before. 4. Types produced by pyre in `TypeInferenceProvider` are just strings. For example, it's not easily possible to infer that some variable is enum instance. `MypyTypeInferenceProvider` makes it easy, see the code above. Drawbacks: 1. Speed. mypy is slower than pyre, so is `MypyTypeInferenceProvider` comparing to `TypeInferenceProvider`. How to partially solve this: 1. Implement AST tree caching in mypy. It may be difficult, however this will lead to speed improvements for all the projects that use this functionality. 2. Implement inferred types caching inside LibCST. As far as I know, no caching at all is implemented inside LibCST, which is the prerequisite for inferred types caching, so the task is big. 3. Implement LibCST CST to mypy AST. I am not sure if this possible at all. Even if it is possible, the task is huge. 2. Two providers are doing similar things in LibCST will be present, this can potentially lead to the situation when there is a need install two type checkers to get all codemods from the library running. Alternatives considered: 1. Put `MypyTypeInferenceProvider` inside separate library (say, LibCST-mypy or `libcst-mypy` on PyPI). This will explicitly separate `MypyTypeInferenceProvider` from the rest of LibCST. Drawbacks: 1. The need to maintain separate library. 2. Limited fame (people need to know that the library exists). 3. Since some codemods cannot be implemented easily without the library, for example, `if-elif-else` to `match` converter (it needs powerful type inference), they are doomed to not be shipped with LibCST. 2. Implement base class for inferred type, which inherits from `str` (to keep the compatibility with the existing codebase) and the mechanism for dynamically selecting `TypeInferenceProvider` type checker (mypy or pyre; user can do this via environmental variable). If the code inside LibCST requires just shallow type information (so, just `str` is enough), then the code can run with any type checker. The remaining code (such as `if-elif-else` to `match` converter) will still require mypy. Misc: Code does not lint in my env, by some reason `pyre check` cannot find `mypy` library.
da2d4b7
to
78f4b1d
Compare
What do you think? Will you consider to merge this addition if I improve it and address the feedback? |
Hi @rominf, thanks so much for this contribution. I'm in favor of supporting a I think the right way forward here is to come up with a representation of Python types that both mypy and pyre results can be mapped onto, and have a |
I need this. I've released libcst-mypy, and gave credits to @rominf. If you are against, or you want the PyPI namespace, let me know. |
Just an update... The package I've created, it's just too slow to use. |
Summary
This change is RFC and WIP (please read whole change message).
Commit message is basically the same ATM, no need to read it for now.
Add
MypyTypeInferenceProvider
as an alternative forTypeInferenceProvider
. The provider infers types using mypy aslibrary. The only requirement for the usage is to have the latest mypy
installed. Types inferred are mypy types, since mypy type system is well
designed, to avoid the conversion, and also to keep it simple. For
compatibility and extensibility reasons, these types are stored in
separate field
MypyType.mypy_type
.Let's assume we have the following code in the file
x.py
which we wantto inspect:
Then to play with mypy types one should use the code like:
Why?
TypeInferenceProvider
requires pyre (pyre-check
on PyPI) to beinstalled. mypy is more popular than pyre. If the organization uses
mypy already (which is almost always the case), it may be difficult
to assure colleagues (including security team) that "we need yet
another type checker".
this is not advertised. watchman installation is not always possible
because of system requirements, or because of the security
requirements like "we install only our favorite GNU/Linux
distribution packages".
TypeInferenceProvider
usage requirespyre start
command to be runbefore the execution, and
pyre stop
- after the execution. This maybe inconvenient, especially for the cases when pyre was not used
before.
TypeInferenceProvider
are just strings.For example, it's not easily possible to infer that some variable is
enum instance.
MypyTypeInferenceProvider
makes it easy, see thecode above.
Drawbacks:
MypyTypeInferenceProvider
comparing to
TypeInferenceProvider
.How to partially solve this:
this will lead to speed improvements for all the projects that use
this functionality.
no caching at all is implemented inside LibCST, which is the
prerequisite for inferred types caching, so the task is big.
at all. Even if it is possible, the task is huge.
this can potentially lead to the situation when there is a need
install two type checkers to get all codemods from the library
running.
Alternatives considered:
MypyTypeInferenceProvider
inside separate library (say,LibCST-mypy or
libcst-mypy
on PyPI). This will explicitlyseparate
MypyTypeInferenceProvider
from the rest of LibCST.Drawbacks:
library, for example,
if-elif-else
tomatch
converter(it needs powerful type inference), they are doomed to not be
shipped with LibCST.
str
(to keep the compatibility with the existing codebase) and
the mechanism for dynamically selecting
TypeInferenceProvider
type checker (mypy or pyre; user can do this via environmental
variable). If the code inside LibCST requires just shallow type
information (so, just
str
is enough), then the code can run withany type checker. The remaining code (such as
if-elif-else
tomatch
converter) will still require mypy.Misc:
Code does not lint in my env, by some reason
pyre check
cannot findmypy
library.Related to:
Test Plan
Check the message above and unittests.