Skip to content

Generating

WiredSound edited this page Feb 3, 2022 · 2 revisions

diagram depicting the generation process

Diagram accurate as of commit d9cbb1a299affed4d543eb06704b65679ab34d96, Synth version 0.6.4, January 11th 2022

Likely one the first commands a new user of Synth would encounter is synth generate. As the name would imply, this command is used to generate synthetic data based on the JSON schema files found the namespace specified. The generated data may then be written to STDOUT or it may be pumped directly into a database or file. In order to do this, input data is passed through a few distinct stages:

  • The structopt crate is first used to parse command-line arguments - this extracts information such as the name of the namespace to generate from (a directory path essentially), how to output the generated data (to a database? or to STDOUT?), the PRNG seed, etc.
  • JSON schema files are read from the namespace directory and each parsed to synth_core::schema::Content instances (an abstract representation of a collection schema).
    • This parsing is done using the serde_json crate to parse the JSON, and the From<serde_json::Value> implementation for Content is used to obtain a final Content instance for each schema.
  • Next, an 'export strategy' is determined based on the --to parameter parsed to the synth generate command.
    • There exists a trait synth::cli::ExportStrategy which represents some way of outputting data generated from a Synth namespace. Types that implement this trait include JsonStdoutExportStrategy, PostgresExportStrategy, and various others (see the submodules of synth::cli).
  • Each export strategy uses the 'sampler' to generate a SamplerOutput instance.
    • SamplerOutput stores synth_core::graph::Value instances which define generated data in a way that is not specific to any one export strategy.
    • The sampler in short works by constructing a acyclic graph (synth_core::Graph) from the namespace by means of the compiler, and then traversing the graph to generate data.
  • Once data has been generated, it is up to each individual export strategy to determine how to take the nonspecific Values of SamplerOutput, convert them into the appropriate data type, and then output to a database or STDOUT.
Clone this wiki locally