Skip to content

Smarter integer pattern matching #45

Description

@eduardoleon

One of ML's greatest strengths is pattern matching, which allows us to perform sophisticated case analyses without ever branching on a boolean expression. When it comes to case-analyzing inductively defined types, no other general-purpose language is better than ML.

Unfortunately, ML is far less good at case-analyzing integers. Just last week I wrote the following datatype:

datatype 'a tree
  = Pure of 'a
  | Two of int * 'a tree * 'a tree
  | ThreeA of int * 'a tree * 'a tree * 'a tree
  | ThreeB of int * 'a tree * 'a tree * 'a tree
  | Four of int * 'a tree * 'a tree * 'a tree * 'a tree

and the following helper functions:

fun part (m, Two (n, a, b)) = if m mod 2 = 0 andalso n = m + 1 then SOME (a, b) else NONE
  | part _ = NONE
  
fun many (n, ab, cd) =
  case (part (n, ab), part (n, cd)) of
      (NONE, NONE) => Two (n, ab, cd)
    | (NONE, SOME (c, d)) => ThreeA (n, ab, c, d)
    | (SOME (a, b), NONE) => ThreeB (n, a, b, cd)
    | (SOME (a, b), SOME (c, d)) => Four (n, a, b, c, d)

What I wish I could have written instead is

fun many (2*n, Two (2*n+1, a, b), Two (2*n+1, c, d)) = Four (2*n, a, b, c, d)
  | many (2*n, Two (2*n+1, a, b), c) = ThreeB (2*n, a, b, c)
  | many (2*n, a, Two (2*n+1, b, c)) = ThreeA (2*n, a, b, c)
  | many args = Two args

The patterns 2*n and 2*n+1 already take care of matching two consecutive integers, the greater of which is the odd one. More generally, what I would want to see is

  • Pattern matching affine combinations of integer variables with constant coefficients. For example,

    fun sameParity (x+y, x-y) = true
      | sameParity _ = false
  • Patterns that only match nonnegative integers. For example,

    fun lessOrEq (n, n + nonneg k) = true
      | lessOrEq _ = false
  • Of course, the type checker should reject patterns that can be matched in two or more ways for the same input. For example, the snippet

    fun wrong (2*x + 3*y, 4*x + 6*y) = ...
      | wrong _ = ...

    is wrong because the matrix [2 3; 4 6] (sorry, MATLAB notation) is not invertible, whereas the snippet

    fun okay (2*x + 3*y, 3*x + 2*y) = ...
      | okay _ = ...

    is not wrong because the matrix [2 3; 3 2] is invertible.

Does this seem like a reasonable feature to want?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions