-
Notifications
You must be signed in to change notification settings - Fork 12
Expand file tree
/
Copy pathCITATION.cff
More file actions
86 lines (85 loc) · 2.87 KB
/
CITATION.cff
File metadata and controls
86 lines (85 loc) · 2.87 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: metasyn
message: >-
If you use this software, please also cite the associated
software paper under preferred-citation in this file.
type: software
authors:
- given-names: Raoul
family-names: Schram
email: [email protected]
affiliation: Utrecht University
orcid: 'https://orcid.org/0000-0001-6616-230X'
- given-names: Samuel
family-names: Spithorst
email: [email protected]
orcid: 'https://orcid.org/0009-0000-4140-0658'
affiliation: Utrecht University
- given-names: Erik-Jan
name-particle: van
family-names: Kesteren
email: [email protected]
affiliation: Utrecht University
orcid: 'https://orcid.org/0000-0003-1548-1663'
identifiers:
- type: doi
value: 10.21105/joss.07099
description: Journal of Open Source Software paper
- type: doi
value: 10.5281/zenodo.7696031
description: Latest archived version of metasyn
repository-code: 'https://github.com/sodascience/metasyn'
url: 'https://metasyn.readthedocs.io/'
repository-artifact: 'https://pypi.org/project/metasyn/'
abstract: >-
Synthetic data is a promising tool for improving the
accessibility of datasets which are too sensitive to be
shared publicly. To this end, we introduce metasyn, a
Python package for generating synthetic data from tabular
datasets. Unlike existing synthetic data generation
software, metasyn is built on a simple generative model
that omits multivariate information. This choice enables
transparency and auditability, keeps information leakage
to a minimum, and enables privacy guarantees through a
plug-in system. While the analytical validity of the
generated data is thus intentionally limited, its
potential uses are broad, including exploratory analyses,
code development and testing, and external communication
and teaching.
keywords:
- synthetic data
- metadata
- generative model
- data science
- data management
- machine learning
- statistics
license: MIT
preferred-citation:
authors:
- given-names: Raoul
family-names: Schram
email: [email protected]
affiliation: Utrecht University
orcid: 'https://orcid.org/0000-0001-6616-230X'
- given-names: Samuel
family-names: Spithorst
email: [email protected]
orcid: 'https://orcid.org/0009-0000-4140-0658'
affiliation: Utrecht University
- given-names: Erik-Jan
name-particle: van
family-names: Kesteren
email: [email protected]
affiliation: Utrecht University
orcid: 'https://orcid.org/0000-0003-1548-1663'
title: "Metasyn: Transparent Generation of Synthetic Tabular Data with Privacy Guarantees"
journal: "Journal of Open Source Software"
year: 2025
doi: 10.21105/joss.07099
volume: 10
issue: 105
pages: 7099
type: article