Skip to content

TM014 Module Loading

Joe Politz edited this page May 28, 2015 · 7 revisions

This is an incomplete draft

Module Compilation and Loading

This is a specification of the internal interfaces used for module loading. We should be able to use this to:

  • Plan and implement different kinds of module loading (e.g. shared-gdrive, my-gdrive, files, builtins, future things like import-from-url), and keep semantics consistent
  • Write user-facing docs to help folks understand in what order modules will load and run
  • Figure out when and how to report compilation failures and type errors (with e.g. information about the import path that got us to the module that failed to compile)
  • Have sensible semantics for when it is OK to cache compiled results and avoid recompilation

Note on Promises

All of the return types in this specification are explicitly labelled as Promise<T>. This is to emphasize that all of these interfaces need to work asynchronously, because they are close to the JavaScript core of Pyret, and may be run in:

  1. A context where information about a module requires going out over the Web
  2. A context where Pyret evaluation will be used in the browser thread being shared with the UI, which needs to be interruptible.

JavaScript implementations should use safeCall and pauseStack to work asynchronously. Types are given in terms of the Pyret interface.

Locators

A main abstraction in the compilation is a Locator, which is a stateful interface to an abstract location that holds a Pyret module and can store a (serialized) compiled Pyret module. It can be used to get metadata about the module, like its name, dependencies, and exports.

The identity of a module is based on the URI that a locator specifies. In the case of multiple aliases for the same module (e.g. symlinks on disk), the URI of the locator decides whether the underlying modules should both be instatiated or treated identically.

  LocatedModule :: {

    // Could either have needs-provide be implicitly stateful, and cache
    // the most recent map, or use explicit interface below
    needs-compile :: Map<Dependency, Provides> -> Promise<Bool>

    get-module :: -> Promise<PyretString>
    get-dependencies :: -> List<Dependency>
    get-provides :: -> Provides

    // Need something like one of the following:

    // e.g. create a new CompileContext that is at the base of the directory
    // this Locator is in.  The CC holds the current working directory
    update-compile-context :: CompileContext -> CompileContext

    uri :: -> URI
    name :: -> String

    // Note that CompileResults can contain both errors and successful
    // compilations
    set-compiled :: CompileResult -> Undef
    get-compiled :: -> Option<CompileResult>

    // tentative, more explicit interface than caching on needs-compile
    set-provides :: Map<Dependency, Provides> -> Promise<Undef>
    get-expected-provides :: -> Promise<Option<Map<Dependency, Provides>>>
  }

A Provide is as specified in the [REF] specification for module exports. It represents the structure of types and values exported by a module, and it needs to be updated if that structure changes on recompilation of the module. Provides can be determined simply from the source text of a module -- they don't rely on compilation to be generated.

A Dependency is an import specification; it corresponds to the information in between import and as in an import statement. For example, a Dependency might be:

{
  type: "google-drive-shared",
  name: "interp-basic-definitions.arr",
  id: "jf09jfawocvh90q3"
}

from

import shared-gdrive("interp-basic-definitions.arr",  "jf09jfawocvh90q3") as I

or

{
  type: "file",
  path: "./path/to/module.arr"
}

from

import file("./path/to/module.arr") as M

Dependencies don't contain the module's URI, and can always be generated just by looking at the source text of the program (no CompileContext required). Generating the URI may require more context (like the current working directory of the module loader), and requires input from find (see below).

Note on the argument to needs-compile: We need to handle this case with the Locator interface (B imports A, C imports A):

  A
 / \
B   C

Consider this sequence:

  1. Compile B (and its dependencies, including A)
  2. A changes its provide structure
  3. Compile C (and its dependencies, including A)
  4. Compile B

B needs to be recompiled. We need to decide how responsibility is split between A and B for remembering which provides B was compiled relative to "last time", so it knows whether or not it needs to be recompiled/re-typechecked.

The needs-compile interface can do this by taking a representation of the new dependency map and return whether or not these dependencies demand a recompile. This would be sufficient for caching compiled code as well -- the Locator could store the most recent Map<Dependency, Provides> passed in to needs-compile, and then check new ones against the old when asked -- basically the question is "do you need to recompile if you're being compiled against these dependencies?". This also makes it so a module can always report "true" for needs-compile and never cause anything to get inconsistent.

We may want to include some kind of magic number, hash, mtime, etc. in the Provides structure to make this check cheaper. From the point of view of this interface, that's "just" a representation detail for performance, though.

The alternative is to have a more explicit interface where we get and set the whole provide map at compilation time. Either works, just a question of how often we'll actually use the ability to get and set the provide map, and how much this interface is just for compile consistency.

In order to map from programs and their metadata to actual code (whether compiled or not), we also need a function that knows what locator to use for each import statement within a compilation job. The CompileContext can hold things like the root directory of the project, or the credentials of a user logged in to Google Drive, and can be used along with the Dependency (which is derived from the import statement), to get an unambiguous LocatedModule. The function find can do this, and is the main entry point for externally configuring the compilation process.

find :: CompileContext x Dependency -> LocatedModule

Depending on the ultimate interface for updating the compile context, more than a LocatedModule may be returned -- it may be a LocatedModule and an updated CompileContext.

Compiling

To compile a single module, we pass in a LocatedModule and a mapping from each Dependency to a Provides. The Provides in the map are trusted by the compiler, so no further compilation or checking is done for modules outside the one designated by the Locator.

compile-module :: LocatedModule, Map<Dependency, Provides> -> CompileResult

Further, compile-module assumes that the module designated by the Locator actually needs to be compiled, so it doesn't use needs-compile directly.

As an implementation note, the dependency map should probably be converted into (part of) the CompileEnv data structure that already exists in the compiler, enriched with type information.

To compile a program, which we distinguish from a module in that a program is a module along with all of its dependencies, we need to check the full dependency graph of modules to see if any need recompilation, and recompile dependents as necessary. First, we compute the worklist of things that need to be compiled:

ToCompile = { located: LocatedModule, provide-map :: Map<Dependency, Provides>, from :: List<LocatedModule> }
compile-worklist :: Locator, CompileContext -> List<ToCompile>

This doesn't actually do any compilation. It simply generates a list of modules to (potentially) compile, in order, with their necessary Provides map and the path to this module from the root module passed to compile-worklist. It also doesn't determine whether each module should be compiled or not; it is simply a topological sort of the dependency graph of the user-defined modules that make up this program.

The resulting from field for each entry contains the path of Locators from the root to this module, which is designed to be useful in error reporting.

(Note: Need to handle failure if a module is not syntactically valid, in which case it won't be able to get the new provides.)

A second function actually does the compilation:

Compiled = { located: LocatedModule, from :: List<LocatedModule>, result :: CompileResult }
compile-program :: List<ToCompile> -> List<Compiled>

compile-program maps over all the modules to (potentially) compile, and can use needs-compile to check if the module needs to be recompiled according to the Locator (e.g. because the file changed on disk since the last time it was compiled), and has access to the correct dependency map to pass to needs-compile. If needs-compile returns false, it can use get-compiled and plug the result in to the result field directly; otherwise it can use compile-module.

NOTE -- not sure if we need to propagate information forward about the success or failure of type-checking statically, and how to configure whether type-checking failure aborts compilation, gets reported at the toplevel, etc. I guess we have the list of CompileResults, so the user-facing process can decide if it wants to report type errors and provide a "run anyway" button, or what.

Loading and Running

Once modules are compiled, they need to have the compiled code registered with the running system. Compiled modules refer to their dependents by URI, and modules are loaded into the runtime by mapping their URI to their compiled code. Very concretely, in JavaScript, this will mean defining modules (probably in RequireJS) with strings mapping to JavaScript closures created by calling "eval" on compiled JavaScript source.

Registering a module does not cause its body to actually be evaluated (or that of its dependents). It simply makes it available for import, which instantiates the module, running its body.

To actually run a main module, a client of the runtime can call instantiate-module directly with a URI to run, and get back the result of evaluating that module's body (including test case results, exports for the REPL, etc).

RunResult =
  | Failure(err)
  | Success(PyretModuleVal)

PyretModuleVal = {
  values: ... values ...
  types: ... types ...
  answer: ... answer ...
  check-results: ... check-results ...
}

register-module :: Runtime x URI x CompileResult -> Promise<Undef>
# Causes Runtime to link the CompileResult and store it at URI

instantiate-module :: Runtime x URI -> Promise<RunResult>

Questions:

  • What is responsible for updating the compile context, if anything? Where does the act of "going into a subdirectory" happen?

  • Can we just implement all of the compile stuff in Pyret, and only have the registration and instantiation be handled by JavaScript?

Clone this wiki locally