-
Notifications
You must be signed in to change notification settings - Fork 4
Finalize Type Inference #81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @AlSchlo , I came across optd. I am interested in this issue. can i pick it up? |
Hi @Dharshan-K , sure any help is appreciated. I will detail the current type inference strategy in #79 (PR will be merged soon). The DSL type inference is pretty much separate to how the optimizer works (with the exception of Logical & Physical expressions which need to take into account whether they are stored in the memo or not -- which can be ignored for now). I will write design documentation & outstanding roadmap in a month about |
## Overview This PR brings in the long awaited type inference to the DSL. This not only makes rules more ergonomic to write, **but it is also needed for correctness in the optimizer**. The engine needs to know what types are `Logical` or `Physical`, as they behave differently in the `engine`. Thus, all expressions **must** be resolved to a type. Furthermore, type inference is also used to verify if field accesses and function calls are valid (as these are type-dependent checks). Some examples:     ## Strategy The type inference works in three phases. 1. In `from_ast`, during the initial `AST` -> `HIR<TypedSpan>` transformation, we create and add all implicit and explicit type information from the program (e.g. Literals like `1` or `"hello"`, function annotations, etc.). For types that are `Unknown`, we generate a new ID and assign the type to either `Descending` (for unknown closure parameters and map keys), or `Ascending` (for all the rest) — more details about these modes in (3). 2. `Constraints` are generated in the `generate.rs` file, which also performs scope-checking now (for convenience, as both code paths were extremely similar). Constraints indicate `subtype` relationship, `field_accesses`, and function `calls`. For example: `let a: Logical = expr` generates the constraint `Logical :> typeof(expr)` 3. The last step to resolve the unknown types is to use a solver (credit to @AarryaSaraf for the base algorithm idea). It works as follows: ``` // Pseudocode for the constraint solver function resolve(): anyChanged = true lastError = null // Keep iterating until we reach a fixed point (no more changes) while anyChanged: anyChanged = false lastError = null // Check each constraint and try to refine unknown types for each constraint in constraints: result = checkConstraint(constraint) if result is Ok(changed): anyChanged |= changed // Track if any types were refined else if result is Err(error, changed): anyChanged |= changed // Still track changes lastError = error // Remember the last error // After reaching fixed point, return error if any constraint failed return lastError ? Err(lastError) : Ok() ``` During type inference, the method refines unknown types to satisfy subtyping constraints according to their variance: - When an `UnknownAsc` type is encountered as a parent, it is updated to the least upper bound (LUB) of itself and the child type. These types start at `Nothing` and ascend up the type hierarchy as needed. - When an `UnknownDesc` type is encountered as a child, it is updated to the greatest lower bound (GLB) of itself and the parent type. These types start at `Universe` and descend down the type hierarchy as needed. - When an `UnknownAsc` type is encountered as a child, its resolved type is checked against the parent type. - When an `UnknownDesc` type is encountered as a parent, its resolved type is checked against the child type. This refinement process happens iteratively until the system reaches a stable state where no more unknown types can be refined. At that point, if any constraints remain unsatisfied, the solver reports the most relevant type error. The key insight of this algorithm is that it makes monotonic progress - each refinement step either: - Successfully resolves a constraint - Refines an unknown type to be more specific - Identifies a type error By tracking whether any types changed during each iteration and continuing until we reach a fixed point, we ensure all types are resolved as completely as possible before reporting any errors. ### Limitations While the algorithm is theoretically correct, it has the following limitations: 1. Its run time appears to be quadratic. This is not a problem for small programs but might become a compilation bottleneck in the future. Excellent heuristics exist to optimize the order in which constraints get applied. 2. As commented in the code, when resolving the constraint: `UnknownDesc <: UnknownAsc` We could either dump the left type to `Nothing`, or pump the right type to `Universe`. However, since pumping/dumping types cannot be undone later on (to avoid exponential run-time), it is possible that we over-dump/pump a specific type. A solution would be to ignore these constraints until all other constraints that can be safely applied have run out. This would result in much better empirical type inference. 3. We postpone as future work (#81): - Map `Concat` as the keys are contra-variant `let map: {Animal : I64} = {Dog : 3} ++ {Cat : 2}` would fail under the current type checker. - List pattern matching is still broken. - Generic functions are not yet supported. All these above points may be solved by adding specific constraints for each of these, like we did for `field_access` and `call`. 4. Some error messages are a bit confusing to understand, although a lot of effort has been done to make them already better (e.g. see examples above). There is no silver bullet here: probably improving the `span` handling from the `parser` is the correct way forward. ## Testing Given how hard (and bloated!) it is to test each point in isolation, we simply test whether fully written-out programs pass the type checker or not in `solver.rs`. Error reporting has been tested manually for a variety of programs, and is expected to improve as we start writing rules. ## Future work **Focus will be put on the final `HIR` compilation process, which needs to correctly encode identified `Logical` / `Physical` types** and reject ambiguous (albeit correctly) inferred types (i.e. `Nothing` or `Universe`). It is indeed better practice to enforce type annotations in these scenarios. ## Note to Reviewers Don't read the diff, just read the entire `analyzer/types` directory.
The text was updated successfully, but these errors were encountered: