-
Notifications
You must be signed in to change notification settings - Fork 11
Vizier Architecture
The Vizier system consists of the components shown in the architecture diagram shown below.
- API: Vizier uses an API layer to manages notebook state and mediates between the components. The API may be accessed directly (e.g., by scripts), or via Vizier's UI.
- UI: Vizier relies on a HTML/JS-based frontend for most user interactions.
- Scheduler: A scheduler is responsible for evaluating dependencies between notebook cells and re-executing cells that are out-of-date (whether because the cell was updated or one of its inputs changed in a new notebook version).
- Datastore: Structured data (dataframes) and simple unstructured data (blobs) are stored in the Datastore layer. In addition to keeping track of this state, the datastore layer is responsible for managing fine-grained provenance relationships between data elements, and profiling dataset state.
- Filestore: Vizier uses a file storage layer to manage large unstructured data.
There are two versions of Vizier under development: Scala and Python. Their architectures are related but different:
Vizier-Scala is a pure scala implementation of the Vizier API.
- UI: Vizier-Scala uses Web-UI, a React application, as its UI layer. It relies on an underlying REST/HATEOAS API.
-
API Layer: The API layer is implemented directly in Vizier-Scala itself. Notable elements include:
- The Vizier API object manages the API layer, including spinning up a Jetty server to host it
- The api package contains handlers for every API call
- The routes file specifies all API routes
- The api.servlet package contains the servlets implementing the API.
- The catalog package implements the API's state model.
-
Scheduler: The Scheduler is implemented directly in Vizier-Scala itself, and in particular in the viztrails package. Notable elements include:
- The Scheduler object manages workflow state, and the workflow execution lifecycle.
- The Provenance object contains methods for determining inter-cell dependencies and managing cell states.
-
Datastore: The Datastore is implemented by Mimir as a layer over Apache Spark supplemented with Mimir's Caveats package, Notable elements of the datastore include:
- The Catalog maintains a record of all existing dataframes and persists a record of how to reconstruct them.
- The api.request package provides a fixed API to access Datastore functionality.
-
Filestore: The Filestore is implemented directly in Vizier-Scala itself in the Filestore object.
- The Mimir API maintains its own filestore
At present, Vizier-Scala's filestore is limited to the local filesystem. Our goal is to eventually merge the Mimir API into Vizier itself, which will allow the filestore to work over HDFS and S3 as well.
Mimir is a system for tracking caveats and provenance of SQL queries. Any request for declarative access to a dataaset from the workflow layer goes through Mimir which uses Spark for storage and execution of data flows. Mimir also implements lenses, the data curation operations build into Vizier.
Web API-Async is a pure python implementation of the Vizier API.
- UI: Web API-Async uses Web-UI, a React application, as its UI layer. It relies on an underlying REST/HATEOAS API.
-
API Layer: The API layer is implemented directly in Web API-Async itself. Notable elements include:
- The api.webservice.server module implements the API itself.
- The api.webservice module contains packages that implement accessors for the various state.
- The viztrail module implements the API's state model
-
Scheduler: The Scheduler is implemented directly in Web API-Async itself. Notable elements include:
- The engine module contains the scheduler's execution logic.
- The viztrail.module.provenance module contains logic for determining inter-cell dependencies and figuring out which cells to run when.
- Datastore: Web API-Async has a modular data store implementation defined in the datastore package. Three implementations exist:
-
Filestore: Web API-Async has a modular file store implementation defined in the filestore package. One implementation exists:
- fs: A simple filesystem-based file store.