History

v0.14.2 - 2026-04-01

Internal

Define the get_instance_status and get_job_status() methods of the BenchmarkLauncher - Issue #570 by @R-Palazzo
Define the terminate() method of the BenchmarkLauncher - Issue #568 by @R-Palazzo
Define workflows to be able to run from a config file or some given parameters - Issue #547 by @R-Palazzo
Add a script that launches a benchmark from a yaml file or a set of parameters - Issue #546 by @R-Palazzo
Move the current benchmark configs to yaml files - Issue #545 by @R-Palazzo

v0.14.1 - 2026-03-23

Internal

Update the "wins" computation to include the Pareto front - Issue #572 by @R-Palazzo

v0.14.0 - 2026-03-09

New Features

The ResultExplorer should look for latest run results by default - Issue #552 by @R-Palazzo
Update load_results to be able to filter on dataset or synthesizer - Issue #551 by @R-Palazzo

Bugs Fixed

OUTPUT_DESTINATION_AWS points to the wrong location - Issue #564 by @R-Palazzo

v0.13.1 - 2026-02-28

New Features

Include SDV-Enterprise in single-table benchmarks - Issue #549 by @R-Palazzo

Internal

Add extremity data points of the Pareto curve for the Quality–Speed Tradeoff plot - Issue #556 by @R-Palazzo
Internal benchmark results upload crashes if there's no error column in the result table - Issue #544 by @R-Palazzo

Maintenance

Update RELEASE guide to include conda-forge step - Issue #560 by @sarahmish
Support Python 3.14 - Issue #528 by @pvk-developer
Update license information in pyproject.toml to use new format - Issue #527 by @pvk-developer

Miscellaneous

Set the SDGym Slack alert to be posted on the sdgym channel. - Issue #555 by @R-Palazzo

v0.13.0 - 2026-01-30

New Features

Add a Dataset Details and a Model Details excel sheets when uploading benchmark results - Issue #532 by @R-Palazzo
Add workflow to run SDGym multi-table benchmark monthly and publish results - Issue #516 by @R-Palazzo
Define internal single and multi table methods to run on GCP - Issue #515 by @R-Palazzo
Add multi table support to ResultsExplorer - Issue #488 by @fealho
Add benchmark_multi_table_aws - Issue #487 by @R-Palazzo
Add benchmark_multi_table function - Issue #486 by @pvk-developer
Add multi-table UniformSynthesizer - Issue #485 by @R-Palazzo

Bugs Fixed

Private S3 bucket access fails in benchmark_multi_table_aws despite valid credentials - Issue #525 by @R-Palazzo
RealTabFormer 0.2.4 causes integration to fail - Issue #523 by @R-Palazzo

Internal

Remove deprecated parameters - Issue #519 by @fealho

Miscellaneous

Update multi-table dataset list - Issue #535 by @R-Palazzo

v0.12.1 - 2025-12-05

New Features

If there are no datasets in the bucket, the DatasetExplorer should show a warning and return an empty table - Issue #475 by @fealho
Add input validation for the DatasetExplorer class and functions - Issue #474 by @fealho

Bugs Fixed

Record the train and sample times whenever an error occurs during a benchmark. - Issue #503 by @R-Palazzo

Maintenance

Workflow fails due to lack of space - Issue #511 by @rwedge

v0.12.0 - 2025-11-20

New Features

Rename create_sdv_synthesizer_variant to create_synthesizer_variant - Issue #491 by @R-Palazzo
SDGym should be able to automatically discover SDV Enterprise synthesizers - Issue #481 by @R-Palazzo
Incorporate the get_available_datasets functionality into the DatasetExplorer - Issue #473 by @fealho

Bugs Fixed

Update result aggregation logic in the ResultExplorer to match new naming schema - Issue #494 by @R-Palazzo
When running a benchmark locally, the additional_datasets_folder path should be the root path - Issue #484 by @fealho

v0.11.1 - 2025-11-03

Bugs Fixed

Missing dependency openpyxl - Issue #479 by @rwedge

v0.11.0 - 2025-10-31

New Features

Add a DatasetExplorer class that provides a summary of all datasets in a bucket (for a given modality) - Issue #469 by @pvk-developer
Update SDGym to use the new S3 bucket and bucket structure - Issue #468 by @pvk-developer
Update Pareto plot data generation to use the Adjusted Time and Quality score - Issue #462 by @R-Palazzo
The ResultsExplorer should allow programmatic access to all the saved artifacts from benchmarking - Issue #450 by @R-Palazzo
When performing multiple SDGym runs on the same day, save the artifacts with consistent naming - Issue #448 by @R-Palazzo
To simulate graceful degradation, fallback to using the results from the UniformSynthesizer - Issue #439 by @rwedge
Pip install sdgym released version on ec2 machines - Issue #437 by @pvk-developer
Add a Fallback to UniformSynthesizer when an error occur and improve the time tracker of the synthetic data generation - Issue #436 by @R-Palazzo
Make the synthesizer names consistent throughout SDGym - Issue #430 by @R-Palazzo
Simplify the import API for SDGym's results explorer - Issue #429 by @R-Palazzo
Add workflow to run SDGym monthly and publish results - Issue #425 by @R-Palazzo
Add benchmark_single_table_aws function - Issue #414 by @R-Palazzo
Add summarize function to SDGymResultsExplorer class - Issue #412 by @R-Palazzo
Add SDGymResultsExplorer class - Issue #411 by @R-Palazzo
Add ability to save synthesizers and data when running benchmark_single_table - Issue #410 by @R-Palazzo
Update REalTabFormer default parameters so that it runs on benchmarking - Issue #400 by @fealho
Add DCRBaseline Metric to single table report - Issue #397 by @gsheni

Bugs Fixed

Update link to s3 results in the Slack Alert message - Issue #464 by @R-Palazzo
EC2 instance not terminating after timeout - Issue #463 by @R-Palazzo
Adjusted time and quality score not aggregating correctly on EC2 - Issue #461 by @R-Palazzo
Update warning message for deprecated parameters - Issue #455 by @R-Palazzo
The UniformSynthesizer produces multiple UserWarning messages when run on a demo dataset - Issue #449 by @R-Palazzo
Always include UniformSynthesizer doesn't work on AWS - Issue #446 by @R-Palazzo
Fix minimum test version due to RealTabFormer and Torch releases - Issue #434 by @R-Palazzo
Add modality parameter to get_available_datasets function - Issue #403 by @gsheni
Update the EC2 instance used when run_on_ec2 is enabled - Issue #396 by @R-Palazzo
All bump-version commands are failing - Issue #391 by @amontanez24

Internal

To simulate graceful degradation, always run the UniformSynthesizer on all the requested datasets - Issue #438 by @rwedge

Maintenance

Remove support for Python 3.8 - Issue #457 by @fealho
Check pyproject for release candidate dependencies - Issue #406 by @rwedge
Update the library installation script for EC2 machines to install optional dependencies like RealTabFormer - Issue #388 by @R-Palazzo
Speed up test_benchmark_single_table_realtabformer_no_metrics integration test - Issue #379 by @fealho
Update python set up step in workflows to use latest python version - Issue #361 by @frances-h
Support Python 3.13 - Issue #355 by @rwedge

Miscellaneous

Add workflow to release SDGym on PyPI - Issue #418 by @gsheni

v0.10.0 - 2025-02-06

New Features

Add integration with 3rd party synthesizer (REalTabFormer) - Issue #347 by @cristid9
Add support for numpy 2.0.0 - Issue #315 by @R-Palazzo

Bugs Fixed

Minimum tests failing because of broken action - Issue #351 by @amontanez24
The ColumnSynthesizer should follow the sdtypes in the metadata (not the data's dtypes) - Issue #249 by @fealho

Maintenance

Minimum tests fail due to dependency version mismatch - Issue #376 by @amontanez24
Create Prepare Release workflow - Issue #364 by @R-Palazzo
MigrateSDV synthesizer to Use Unified Metadata Instead of Legacy SingleTableMetadata - Issue #359 by @fealho
Update codecov and add flag for integration tests - Issue #354 by @amontanez24

v0.9.1 - 2024-08-29

Bugs Fixed

AttributeError when running custom synthesizer with timeout - Issue #335 by @fealho

v0.9.0 - 2024-08-07

This release enables the diagnostic score to be computed in a benchmarking run. It also renames the IndependentSynthesizer to ColumnSynthesizer. Finally, it fixes a bug so that the time for all metrics will now be used to compute the Evaluate_Time column in the results.

Bugs Fixed

Cap numpy to less than 2.0.0 until SDGym supports - Issue #313 by @gsheni
The returned Evaluate_Time does not include results from all metrics - Issue #310 by @lajohn4747

New Features

Rename IndependentSynthesizer to ColumnSynthesizer - Issue #319 by @lajohn4747
Allow the ability to compute diagnostic score in a benchmarking run - Issue #311 by @lajohn4747

v0.8.0 - 2024-06-07

This release adds support for both Python 3.11 and 3.12! It also drops support for Python 3.7.

This release adds a new parameter to benchmark_single_table called run_on_ec2. When enabled, it will launch a t2.medium ec2 instance on the user's AWS account using the credentials they specify in environment variables. The benchmarking will then run on this instance. The output_filepath must be provided and must be in the format {s3_bucket_name}/{path_to_file} when run_on_ec2 is enabled.

Documentation

Docs for AWS integration are incorrect - Issue #304 by @srinify

Maintenance

Add support for Python 3.11 - Issue #250 by @fealho
Remove anyio usage - Issue #252 by @lajohn4747
Drop support for Python 3.7 - Issue #254 by @R-Palazzo
Switch default branch from master to main - Issue #257 by @R-Palazzo
Transition from using setup.py to pyproject.toml to specify project metadata - Issue #266 by @R-Palazzo
Remove bumpversion and use bump-my-version - Issue #267 by @R-Palazzo
Switch to using ruff for Python linting and code formatting - Issue #268 by @gsheni
Add dependency checker - Issue #277 by @lajohn4747
Add bandit workflow - Issue #282 by @R-Palazzo
Cleanup automated PR workflows - Issue #286 by @R-Palazzo
Add support for Python 3.12 - Issue #288 by @fealho
Only run unit and integration tests on oldest and latest python versions for macos - Issue #294 by @R-Palazzo
Bump verions SDV, SDMetrics and RDT - Issue #298

Bugs Fixed

The UniformSynthesizer should follow the sdtypes in metadata (not the data's dtypes) - Issue #248 by @lajohn4747
Fix minimum version workflow when pointing to github branch - Issue #280 by @R-Palazzo
Passing synthesizer as string fails if run_on_ec2 is enabled - Issue #306 by @lajohn4747

New Features

Add run_on_ec2 flag to benchmark_single_table - Issue #265 by @lajohn4747
Remove FastML Synthesizer - Issue #292 by @lajohn4747

v0.7.0 - 2023-06-13

This release adds support for SDV 1.0 and PyTorch 2.0!

New Features

Add functions to top level import - Issue #229 by @fealho
Cleanup SDGym to the new SDV 1.0 metadata and synthesizers - Issue #212 by @fealho

Bugs Fixed

limit_dataset_size causes sdgym to crash - Issue #231 by @fealho
benchmark_single_table crashes with metadata dict - Issue #232 by @fealho
Passing None as synthesizers runs all of them - Issue #233 by @fealho
timeout parameter causes sdgym to crash - Issue #234 by @pvk-developer
SDGym is not working with latest torch - Issue #210 by @amontanez24
Fix sdgym --help - Issue #206 by @katxiao

Internal

Increase code style lint - Issue #123 by @fealho
Remove code support for synthesizers that are not strings/classes - PR #236 by @fealho
Code Refactoring - Issue #215 by @fealho

Maintenance

Remove pomegranate - Issue #230 by @amontanez24

v0.6.0 - 2023-02-01

This release introduces methods for benchmarking single table data and creating custom synthesizers, which can be based on existing SDGym-defined synthesizers or on user-defined functions. This release also adds support for Python 3.10 and drops support for Python 3.6.

New Features

Benchmarking progress bar should update on one line - Issue #204 by @katxiao
Support local additional datasets folder with zip files - Issue #186 by @katxiao
Enforce that each synthesizer is unique in benchmark_single_table - Issue #190 by @katxiao
Simplify the file names inside the detailed_results_folder - Issue #191 by @katxiao
Use SDMetrics silent report generation - Issue #179 by @katxiao
Remove arguments in get_available_datasets - Issue #197 by @katxiao
Accept metadata.json as valid metadata file - Issue #194 by @katxiao
Check if file or folder exists before writing benchmarking results - Issue #196 by @katxiao
Rename benchmarking argument "evaluate_quality" to "compute_quality_score" - Issue #195 by @katxiao
Add option to disable sdmetrics in benchmarking - Issue #182 by @katxiao
Prefix remote bucket with 's3' - Issue #183 by @katxiao
Benchmarking error handling - Issue #177 by @katxiao
Allow users to specify custom synthesizers' display names - Issue #174 by @katxiao
Update benchmarking results columns - Issue #172 by @katxiao
Allow custom datasets - Issue #166 by @katxiao
Use new datasets s3 bucket - Issue #161 by @katxiao
Create benchmark_single_table method - Issue #151 by @katxiao
Update summary metrics - Issue #134 by @katxiao
Benchmark individual methods - Issue #159 by @katxiao
Add method to create a sdv variant synthesizer - Issue #152 by @katxiao
Add method to generate a multi table synthesizer - Issue #149 by @katxiao
Add method to create single table synthesizers - Issue #148 by @katxiao
Updating existing synthesizers to new API - Issue #154 by @katxiao

Bug Fixes

Pip encounters dependency issues with ipython - Issue #187 by @katxiao
IndependentSynthesizer is printing out ConvergeWarning too many times - Issue #192 by @katxiao
Size values in benchmarking results seems inaccurate - Issue #184 by @katxiao
Import error in the example for benchmarking the synthesizers - Issue #139 by @katxiao
Updates and bugfixes - Issue #132 by @csala

Maintenance

Update README - Issue #203 by @katxiao
Support Python Versions >=3.7 and <3.11 - Issue #170 by @katxiao
SDGym Package Maintenance Updates documentation - Issue #163 by @katxiao
Remove YData - Issue #168 by @katxiao
Update to newest SDV - Issue #157 by @katxiao
Update slack invite link. - Issue #144 by @pvk-developer
updating workflows to work with windows - Issue #136 by @amontanez24
Update conda dependencies - Issue #130 by @katxiao

v0.5.0 - 2021-12-13

This release adds support for Python 3.9, and updates dependencies to accept the latest versions when possible.

Issues closed

Add support for Python 3.9 - Issue #127 by @katxiao
Add pip check worflow - Issue #124 by @pvk-developer
Fix meta.yaml dependencies - PR #119 by @fealho
Upgrade dependency ranges - Issue #118 by @katxiao

v0.4.1 - 2021-08-20

This release fixed a bug where passing a json file as configuration for a multi-table synthesizer crashed the model. It also adds a number of fixes and enhancements, including: (1) a function and CLI command to list the available synthesizer names, (2) a curate set of dependencies and making Gretel into an optional dependency, (3) updating Gretel to use temp directories, (4) using nvidia-smi to get the number of gpus and (5) multiple dockerfile updates to improve functionality.

Issues closed

Bug when using JSON configuration for multiple multi-table evaluation - Issue #115 by @pvk-developer
Use nvidia-smi to get number of gpus - PR #113 by @katxiao
List synthesizer names - Issue #82 by @fealho
Use nvidia base for dockerfile - PR #108 by @katxiao
Add Makefile target to install gretel and ydata - PR #107 by @katxiao
Curate dependencies and make Gretel optional - PR #106 by @csala
Update gretel checkpoints to use temp directory - PR #105 by @katxiao
Initialize variable before reference - PR #104 by @katxiao

v0.4.0 - 2021-06-17

This release adds new synthesizers for Gretel and ydata, and creates a Docker image for SDGym. It also includes enhancements to the accepted SDGym arguments, adds a summary command to aggregate metrics, and adds the normalized score to the benchmark results.

New Features

Add normalized score to benchmark results - Issue #102 by @katxiao
Add max rows and max columns args - Issue #96 by @katxiao
Automatically detect number of workers - Issue #97 by @katxiao
Add summary function and command - Issue #92 by @amontanez24
Allow jobs list/JSON to be passed - Issue #93 by @fealho
Add ydata to sdgym - Issue #90 by @fealho
Add dockerfile for sdgym - Issue #88 by @katxiao
Add Gretel to SDGym synthesizer - Issue #87 by @amontanez24

v0.3.1 - 2021-05-20

This release adds new features to store results and cache contents into an S3 bucket as well as a script to collect results from a cache dir and compile a single results CSV file.

Issues closed

Collect cached results from s3 bucket - Issue #85 by @katxiao
Store cache contents into an S3 bucket - Issue #81 by @katxiao
Store SDGym results into an S3 bucket - Issue #80 by @katxiao
Add a way to collect cached results - Issue #79 by @katxiao
Allow reading datasets from private s3 bucket - Issue #74 by @katxiao
Typos in the sdgym.run function docstring documentation - Issue #69 by @sbrugman

v0.3.0 - 2021-01-27

Major rework of the SDGym functionality to support a collection of new features:

Add relational and timeseries model benchmarking.
Use SDMetrics for model scoring.
Update datasets format to match SDV metadata based storage format.
Centralize default datasets collection in the sdv-datasets S3 bucket.
Add options to download and use datasets from different S3 buckets.
Rename synthesizers to baselines and adapt to the new metadata format.
Add model execution and metric computation time logging.
Add optional synthetic data and error traceback caching.

v0.2.2 - 2020-10-17

This version adds a rework of the benchmark function and a few new synthesizers.

New Features

New CLI with run, make-leaderboard and make-summary commands
Parallel execution via Dask or Multiprocessing
Download datasets without executing the benchmark
Support for python from 3.6 to 3.8

New Synthesizers

sdv.tabular.CTGAN
sdv.tabular.CopulaGAN
sdv.tabular.GaussianCopulaOneHot
sdv.tabular.GaussianCopulaCategorical
sdv.tabular.GaussianCopulaCategoricalFuzzy

v0.2.1 - 2020-05-12

New updated leaderboard and minor improvements.

New Features

Add parameters for PrivBNSynthesizer - Issue #37 by @csala

v0.2.0 - 2020-04-10

New Becnhmark API and lots of improved documentation.

New Features

The benchmark function now returns a complete leaderboard instead of only one score
Class Synthesizers can be directly passed to the benchmark function

Bug Fixes

One hot encoding errors in the Independent, VEEGAN and Medgan Synthesizers.
Proper usage of the eval mode during sampling.
Fix improperly configured datasets.

v0.1.0 - 2019-08-07

First release to PyPi

FilesExpand file tree

HISTORY.md

Latest commit

History

HISTORY.md

File metadata and controls

History

v0.14.2 - 2026-04-01

Internal

v0.14.1 - 2026-03-23

Internal

v0.14.0 - 2026-03-09

New Features

Bugs Fixed

v0.13.1 - 2026-02-28

New Features

Internal

Maintenance

Miscellaneous

v0.13.0 - 2026-01-30

New Features

Bugs Fixed

Internal

Miscellaneous

v0.12.1 - 2025-12-05

New Features

Bugs Fixed

Maintenance

v0.12.0 - 2025-11-20

New Features

Bugs Fixed

v0.11.1 - 2025-11-03

Bugs Fixed

v0.11.0 - 2025-10-31

New Features

Bugs Fixed

Internal

Maintenance

Miscellaneous

v0.10.0 - 2025-02-06

New Features

Bugs Fixed

Maintenance

v0.9.1 - 2024-08-29

Bugs Fixed

v0.9.0 - 2024-08-07

Bugs Fixed

New Features

v0.8.0 - 2024-06-07

Documentation

Maintenance

Bugs Fixed

New Features

v0.7.0 - 2023-06-13

New Features

Bugs Fixed

Internal

Maintenance

v0.6.0 - 2023-02-01

New Features

Bug Fixes

Maintenance

v0.5.0 - 2021-12-13

Issues closed

v0.4.1 - 2021-08-20

Issues closed

v0.4.0 - 2021-06-17

New Features

v0.3.1 - 2021-05-20

Issues closed

v0.3.0 - 2021-01-27

v0.2.2 - 2020-10-17

New Features

New Synthesizers

v0.2.1 - 2020-05-12

New Features

v0.2.0 - 2020-04-10

New Features

Bug Fixes

v0.1.0 - 2019-08-07