Skip to content

FFY0/LegoSketch_ICML

Repository files navigation

Lego Sketch: A Scalable Memory-augmented Neural Network for Sketching Data Streams

Lego Sketch utilizes scalable memory-augmented neural networks (MANN) to store data streams, enabling future approximate item-frequency queries—a process known as sketching streams. This research falls into the emerging category of neural sketch methods, aiming to replace traditional rule-based handcrafted sketches with a self-supervised end-to-end training paradigm. By implementing notable architectural improvements, it overcomes the scalability challenges of previous neural sketches, eliminates the need for frequent retraining, and achieves substantially lower prediction errors.

Scalable Memory of Lego Sketch

Scalable Memory of Lego Sketch

Comparison of Estimation Errors

Estimation Error

How to Train a Lego Brick

To train the Lego Brick model, navigate to the following directory and execute the command:

cd Lego/ExpCode/Lego_brick

python Lego_seed0.py

Important Note: For efficiency, we use a uniform random number generator as a substitute for the hash addressing during training. However, for evaluation, we provide the CUDA kernel for the actual hash function. If you are using the Lego Sketch architecture for your own tasks, you need to confirm that this substitution, made for efficiency purposes, does not introduce errors.

Eval Lego Sketch on Synthetic Data

To facilitate accuracy testing, we provide synthetic datasets with different skewness levels and two Lego Brick instances (trained under random seeds 0 and 1). You can directly test them using the following command, and the results will be saved in the Figure folder.

cd Lego/EvalDataset

python Test_Lego.py

About

The Official Implementation of LegoSketch [ICML 2025]

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages