Skip to content

Finalize Type Inference #81

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
AlSchlo opened this issue Apr 23, 2025 · 2 comments
Open

Finalize Type Inference #81

AlSchlo opened this issue Apr 23, 2025 · 2 comments

Comments

@AlSchlo
Copy link
Collaborator

AlSchlo commented Apr 23, 2025

  • List pattern matching
  • Generics
  • Concatenation for maps
@Dharshan-K
Copy link

Hi @AlSchlo , I came across optd. I am interested in this issue. can i pick it up?
How to setup optd locally.

@AlSchlo
Copy link
Collaborator Author

AlSchlo commented Apr 24, 2025

Hi @Dharshan-K , sure any help is appreciated.

I will detail the current type inference strategy in #79 (PR will be merged soon).
Details on how to compile programs is written in optd/dsl/cli/main.rs (along with example programs).
Note: you cannot run the programs yet as the optimizer component still needs to be merged and connected with the DSL. But this doesn't matter for the analyzer & type checker.

The DSL type inference is pretty much separate to how the optimizer works (with the exception of Logical & Physical expressions which need to take into account whether they are stored in the memo or not -- which can be ignored for now).

I will write design documentation & outstanding roadmap in a month about optd in general.

AlSchlo added a commit that referenced this issue Apr 24, 2025
## Overview

This PR brings in the long awaited type inference to the DSL. This not
only makes rules more ergonomic to write, **but it is also needed for
correctness in the optimizer**.

The engine needs to know what types are `Logical` or `Physical`, as they
behave differently in the `engine`. Thus, all expressions **must** be
resolved to a type.

Furthermore, type inference is also used to verify if field accesses and
function calls are valid (as these are type-dependent checks).

Some examples:


![image](https://github.com/user-attachments/assets/4abf4596-3dd1-4113-9e7a-a5ac16a82bb8)


![image](https://github.com/user-attachments/assets/8649c29d-90ce-432e-bfe4-8ef3c53b3d9f)


![image](https://github.com/user-attachments/assets/f9a3b6b5-758d-4699-9955-9ee88c4bee60)


![image](https://github.com/user-attachments/assets/2e480b62-82a9-4931-9525-546d74c49049)

## Strategy 

The type inference works in three phases.

1. In `from_ast`, during the initial `AST` -> `HIR<TypedSpan>`
transformation, we create and add all implicit and explicit type
information from the program (e.g. Literals like `1` or `"hello"`,
function annotations, etc.). For types that are `Unknown`, we generate a
new ID and assign the type to either `Descending` (for unknown closure
parameters and map keys), or `Ascending` (for all the rest) — more
details about these modes in (3).

2. `Constraints` are generated in the `generate.rs` file, which also
performs scope-checking now (for convenience, as both code paths were
extremely similar). Constraints indicate `subtype` relationship,
`field_accesses`, and function `calls`.

For example:

`let a: Logical = expr`
generates the constraint
`Logical :> typeof(expr)`

3. The last step to resolve the unknown types is to use a solver (credit
to @AarryaSaraf for the base algorithm idea). It works as follows:

```
// Pseudocode for the constraint solver
function resolve():
    anyChanged = true
    lastError = null

    // Keep iterating until we reach a fixed point (no more changes)
    while anyChanged:
        anyChanged = false
        lastError = null
        
        // Check each constraint and try to refine unknown types
        for each constraint in constraints:
            result = checkConstraint(constraint)
            
            if result is Ok(changed):
                anyChanged |= changed  // Track if any types were refined
            else if result is Err(error, changed):
                anyChanged |= changed  // Still track changes
                lastError = error      // Remember the last error
        
    // After reaching fixed point, return error if any constraint failed
    return lastError ? Err(lastError) : Ok()
```

During type inference, the method refines unknown types to satisfy
subtyping constraints according to their variance:

- When an `UnknownAsc` type is encountered as a parent, it is updated to
the least upper bound (LUB) of itself and the child type. These types
start at `Nothing` and ascend up the type hierarchy as needed.
- When an `UnknownDesc` type is encountered as a child, it is updated to
the greatest lower bound (GLB) of itself and the parent type. These
types start at `Universe` and descend down the type hierarchy as needed.
- When an `UnknownAsc` type is encountered as a child, its resolved type
is checked against the parent type.
- When an `UnknownDesc` type is encountered as a parent, its resolved
type is checked against the child type.

This refinement process happens iteratively until the system reaches a
stable state where no more unknown types can be refined. At that point,
if any constraints remain unsatisfied, the solver reports the most
relevant type error.

The key insight of this algorithm is that it makes monotonic progress -
each refinement step either:

- Successfully resolves a constraint
- Refines an unknown type to be more specific
- Identifies a type error

By tracking whether any types changed during each iteration and
continuing until we reach a fixed point, we ensure all types are
resolved as completely as possible before reporting any errors.

### Limitations

While the algorithm is theoretically correct, it has the following
limitations:

1. Its run time appears to be quadratic. This is not a problem for small
programs but might become a compilation bottleneck in the future.
Excellent heuristics exist to optimize the order in which constraints
get applied.

2. As commented in the code, when resolving the constraint:

`UnknownDesc <: UnknownAsc`

We could either dump the left type to `Nothing`, or pump the right type
to `Universe`. However, since pumping/dumping types cannot be undone
later on (to avoid exponential run-time), it is possible that we
over-dump/pump a specific type. A solution would be to ignore these
constraints until all other constraints that can be safely applied have
run out. This would result in much better empirical type inference.

3. We postpone as future work (#81):
- Map `Concat` as the keys are contra-variant `let map: {Animal : I64} =
{Dog : 3} ++ {Cat : 2}` would fail under the current type checker.
- List pattern matching is still broken.
- Generic functions are not yet supported.

All these above points may be solved by adding specific constraints for
each of these, like we did for `field_access` and `call`.

4. Some error messages are a bit confusing to understand, although a lot
of effort has been done to make them already better (e.g. see examples
above). There is no silver bullet here: probably improving the `span`
handling from the `parser` is the correct way forward.

## Testing

Given how hard (and bloated!) it is to test each point in isolation, we
simply test whether fully written-out programs pass the type checker or
not in `solver.rs`.

Error reporting has been tested manually for a variety of programs, and
is expected to improve as we start writing rules.

## Future work

**Focus will be put on the final `HIR` compilation process, which needs
to correctly encode identified `Logical` / `Physical` types** and reject
ambiguous (albeit correctly) inferred types (i.e. `Nothing` or
`Universe`). It is indeed better practice to enforce type annotations in
these scenarios.

## Note to Reviewers

Don't read the diff, just read the entire `analyzer/types` directory.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants