VISTA: Knowledge-Driven Vessel Trajectory Imputation with Repair Provenance

This is the official implementation of the paper "VISTA: Knowledge-Driven Vessel Trajectory Imputation with Repair Provenance".

Abstract

Repairing incomplete trajectory data is essential for downstream spatio-temporal applications. Yet, existing repair methods focus solely on reconstruction without documenting the reasoning behind repair decisions, undermining trust in safety-critical applications where repaired trajectories affect operational decisions, such as in maritime anomaly detection and route planning. We introduce repair provenance—structured, queryable metadata that documents the full reasoning chain behind each repair—which transforms imputation from pure data recovery into a task that supports downstream decision-making. We propose VISTA (knowledge-driven interpretable vessel trajectory imputation), a framework that reliably equips repaired trajectories with repair provenance by grounding LLM reasoning in data-verified knowledge. Specifically, we formalize Structured Data-derived Knowledge (SDK), a knowledge model whose data-verifiable components can be validated against real data and used to anchor and constrain LLM-generated explanations. We organize SDK in a Structured Data-derived Knowledge Graph (SD-KG) and establish a data-knowledge-data loop for extraction, validation, and incremental maintenance over large-scale AIS data. A workflow management layer with parallel scheduling, fault tolerance, and redundancy control ensures consistent and efficient end-to-end processing. Experiments on two large-scale AIS datasets show that VISTA achieves state-of-the-art accuracy, improving over baselines by 5–91% and reducing inference time by 51–93%, while producing repair provenance, whose interpretability is showcased via a case study and an interactive demo system (https://github.com/hyLiu1994/CLEAR).

Code Structure

VISTA/
├── config/
│   └── config.yaml             # Configuration file
├── data/                       # Data directory
│   ├── RawData/                # Original AIS data (unprocessed)
│   ├── CleanedFilteredData/    # Data after cleaning and filtering
│   └── ProcessedData/          # Data after preprocessing and feature extraction
├── results/                    # Experimental results, logs, and evaluation outputs
├── src/                        # Source code
│   ├── data/                   # Data loading, preprocessing, and handling
│   ├── modules/                # Core algorithmic components (e.g., StaticSpatialEncoder, BehaviorAbstraction)
│   ├── pipeline/               # End-to-end Pipelines
│   ├── utils/                  # Utility functions
│   └── main.py                 # Main entry point of the project
│
├── .gitignore
└── Readme.md

Framework Structure

VISTA forms a data–knowledge–data loop that transforms raw AIS data into structured maritime knowledge and reuses it to reconstruct missing trajectories with interpretable reasoning. As shown in the figure, the framework is built around four tightly connected components: AIS Data, SD-KG, SD-KG Construction, Trajectory Imputation, and a coordinating Workflow Manager Layer.

SD-KG and AIS Data

At the center of the framework are AIS Data and the Structured Data-derived Knowledge Graph (SD-KG) — the two endpoints of the data–knowledge–data loop.

AIS Data (src/data/AISDataProcessor.py) provides raw vessel messages and stores reconstructed trajectories under results/[exp_name]/ImputationResults/. It serves as both the input for knowledge construction and the output repository for imputation results.
SD-KG (src/modules/M0_SDKG.py) acts as the central maritime knowledge repository, storing vessel attributes, behavior patterns, and validated imputation methods under results/[exp_name]/SDKG/. It connects both sides of the loop — continually updated during construction and reused during trajectory imputation.

SD-KG Construction

On the left side of the framework, the SD-KG Construction Workflow Manager (SDKG_Construction_Multithreading() in src/pipeline/pipeline.py) orchestrates parallel knowledge extraction from AIS data. It integrates three key modules corresponding to the blue blocks in the figure:

Static & Spatial Encoder (generate_vs() in M1_StaticSpatialEncoder.py): extracts vessel attributes and spatial motion cues.
Behavior Abstraction (generate_vb() in M2_BehaviorAbstraction.py): identifies canonical vessel behavior patterns from time-series trajectories.
Method Builder (generate_vf() in M3_MethodBuilder.py): generates and validates imputation functions, then inserts them into SD-KG as executable knowledge units.

Trajectory Imputation

On the right side of the framework, the Trajectory Imputation Workflow Manager (Trajectory_Imputation_Multithreading() in src/pipeline/pipeline.py) leverages SD-KG to reconstruct missing trajectory segments with interpretable reasoning. It includes three LLM-driven modules corresponding to the green blocks in the figure:

Behavior Estimator (behavior_estimator() in M4_BehaviorEstimator.py): infers missing motion patterns using SD-KG priors and vessel context.
Method Selector (method_selector() in M5_MethodSelector.py): chooses the most suitable imputation function based on graph-supported evidence.
Explanation Composer (explanation_composer() in M6_ExplanationComposer.py): generates concise, human-readable explanations linking reconstructed trajectories to maritime knowledge and operational logic.

Workflow Manager Layer

The Workflow Manager Layer (src/pipeline/pipeline.py) bridges construction and imputation, coordinating parallel execution, anomaly handling, and redundancy control through SDKG_Construction_Multithreading() and Trajectory_Imputation_Multithreading().

Setup and Execution

Environment Setup

bash environment_install.**sh**

Dataset Preparation

The datasets AIS-DK and AIS-US can be automatically downloaded from the following official sources based on the dataset hyperparameter:

Detailed instructions for downloading, cleaning, and filtering the datasets are provided in ./src/data/Readme.md.

After preparing the dataset, update the data path in the configuration file ./config/config.yaml. For example:

raw_data_file: ./data/CleanedFilteredData/AIS_2024_04_01@15_filtered360_1000000000.csv

LLM Setup

VISTA supports the flexibility to choose from various platforms such as OpenAI, Alibaba Cloud's DashScope, or others, and configure the corresponding API key for seamless interaction with the selected model.

Step 1. Select the Platform

Open ./config/config.yaml and update the base_url: with your service provider's Base URL (such as 'https://api.openai.com/v1' for OpenAI, or 'https://dashscope.aliyuncs.com/compatible-mode/v1' for Alibaba Cloud's DashScope).

Example:

base_url: 'https://dashscope.aliyuncs.com/compatible-mode/v1'

Step 2. Obtain and Set the API Key

Open ./config/config.yaml and paste the API key after llm_api_key:, which you can obtain from Platforms (e.g., Alibaba Cloud's DashScope, OpenAI).

Example:

llm_api_key: sk-xxxxxxxx

Step 3. Choose the Model Types

Open ./config/config.yaml and specify the models you wish to use after mining_llm:, coding_llm:, and analysis_llm:, which you can find on Platfroms (e.g., Alibaba Cloud's DashScope, OpenAI).

Example:

mining_llm: gpt-4.1-nano
coding_llm: gpt-3.5-turbo
analysis_llm: qwen-plus

Execution

Before running the pipeline, configure key hyperparameters such as retry_times, e_f, and top_k in the ./config/config.yaml file.

Then, execute the following command to start the process:

python src/main.py --config config.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VISTA: Knowledge-Driven Vessel Trajectory Imputation with Repair Provenance

Abstract

Code Structure

Framework Structure

SD-KG and AIS Data

SD-KG Construction

Trajectory Imputation

Workflow Manager Layer

Setup and Execution

Environment Setup

Dataset Preparation

LLM Setup

Step 1. Select the Platform

Step 2. Obtain and Set the API Key

Step 3. Choose the Model Types

Execution

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
config		config
data		data
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Readme.md		Readme.md
environment_install.sh		environment_install.sh

Folders and files

Latest commit

History

Repository files navigation

VISTA: Knowledge-Driven Vessel Trajectory Imputation with Repair Provenance

Abstract

Code Structure

Framework Structure

SD-KG and AIS Data

SD-KG Construction

Trajectory Imputation

Workflow Manager Layer

Setup and Execution

Environment Setup

Dataset Preparation

LLM Setup

Step 1. Select the Platform

Step 2. Obtain and Set the API Key

Step 3. Choose the Model Types

Execution

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages