Add csvw task. #120

lkitching · 2020-04-16T16:18:34Z

Issue #31 - Update each pipeline to output CSVW to the specified
directory instead of generating the output RDF directly. Update
the exec task to write the CSVW to a temporary directory and
write the RDF using csv2rdf. The cube pipeline now only creates
a single metadata file instead of one for each element (e.g. dataset
metadata, DSD etc.). Update each metadata function in the cube
namespace to return a table definition instead of a full metadata
definition by removing the @context key.

Add new csvw task which writes the CSVW output for each pipeline
to the specified directory.

Update the output of the describe task to include an example
invocation of the csvw command.

@context

Issue #31 - Update each pipeline to output CSVW to the specified directory instead of generating the output RDF directly. Update the exec task to write the CSVW to a temporary directory and write the RDF using csv2rdf. The cube pipeline now only creates a single metadata file instead of one for each element (e.g. dataset metadata, DSD etc.). Update each metadata function in the cube namespace to return a table definition instead of a full metadata definition by removing the @context key. Add new csvw task which writes the CSVW output for each pipeline to the specified directory. Update the output of the describe task to include an example invocation of the csvw command.

This resolves a conflict between #108 (refactoring integrant) and #120 (where the key was just :function).

For some reason the printf call wasn't being flushed, despite ending with "%n". Appending a println or flush causes the message to be printed.

This refactoring renames these functions for consistency. Sequences of record-maps follow the convention "<type>-records": observation-records (unchanged) read-component-specifications -> component-specification-records codes -> code-records components -> component-records The maps describing the metadata tableSchema follow the convention "<type>-schema": used-codes-codes-table -> used-codes-codes-schema used-codes-codelists-table -> used-codes-codelists-schema observations-table -> observations-schema data-structure-definition-table -> data-structure-definition-schema component-specification-metadata-table -> component-specification-schema dataset-metadata-table -> dataset-schema codelist-metadata -> codelist-schema components-metdata -> components-schema The term "table" was confusing. A csvw:table is a key in the metadata specification for the table schema but, since we also deal with the tables of records themselves this was confusing - is "table" the data or the metadata about it?! The terms "metadata" is also overloaded. It could mean the csv schema or the notion that e.g. a qb:DataSet has metadata.

Robsteranium · 2020-04-27T09:50:02Z

I can't change the base of this PR to master after having merged it (as GH complains there's nothing to merge!).

GH insists there are unmerged commits from issue_31 -> issue_37 but I think this is fine since everything is in master.

- URI configuration with an optional `--uri-templates` argument that expects a edn file. This introduces a small meta-templating language that allows you to include variables from outwith the csvw table (e.g. `base-uri`). See the `docs/usage.md` and the `customising-uris` example for a demonstration of how to use this feature. See #103 for implementation details. - A csv parser for declarative validation and coercion of the csv inputs. The validation checks for required or unknown columns, validates URI templates and checks XML datatypes. Violations cause exceptions to be thrown with messages that identify the offending cell by row and column index (`ex-data` is also provided). The parses also makes the specification of transformations and defaults clearer. See #102 for details. - A csvw task which will output the csvw serialisation (csv and metadata json) to a specified directory. This also collates the metadata for the cube-pipeline into a single document. See #120 for details. - The log4j dependency is removed, allowing consumers of the library to configure logging themselves. - Test coverage for CLI tasks

Robsteranium mentioned this pull request Apr 20, 2020

Avoid writing tabular files within pipelines. #29

Open

Robsteranium and others added 7 commits April 20, 2020 17:26

Merge branch 'issue_37' into issue_31

a14fd68

Merge branch 'issue_37'

7eabfb2

Update org.clojure/data.csv to 1.0.0

ee849eb

Merge branch 'master' into issue_31

95fbed8

Associate pipeline function with the table2qb/pipeline-fn key

3ae1a22

This resolves a conflict between #108 (refactoring integrant) and #120 (where the key was just :function).

Flush output stream in csvw task

ec8caa0

For some reason the printf call wasn't being flushed, despite ending with "%n". Appending a println or flush causes the message to be printed.

Robsteranium closed this Apr 27, 2020

Robsteranium reopened this Apr 27, 2020

Robsteranium closed this Apr 27, 2020

Robsteranium deleted the issue_31 branch April 27, 2020 09:50

This was referenced Apr 28, 2020

WIP: Automate the generation of the examples #123

Open

Option to serialise csvw to file #31

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add csvw task. #120

Add csvw task. #120

lkitching commented Apr 16, 2020

Robsteranium commented Apr 27, 2020

Add csvw task. #120

Add csvw task. #120

Conversation

lkitching commented Apr 16, 2020

Robsteranium commented Apr 27, 2020