Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add csvw task. #120

Closed
wants to merge 8 commits into from
Closed

Add csvw task. #120

wants to merge 8 commits into from

Conversation

lkitching
Copy link
Contributor

Issue #31 - Update each pipeline to output CSVW to the specified
directory instead of generating the output RDF directly. Update
the exec task to write the CSVW to a temporary directory and
write the RDF using csv2rdf. The cube pipeline now only creates
a single metadata file instead of one for each element (e.g. dataset
metadata, DSD etc.). Update each metadata function in the cube
namespace to return a table definition instead of a full metadata
definition by removing the @context key.

Add new csvw task which writes the CSVW output for each pipeline
to the specified directory.

Update the output of the describe task to include an example
invocation of the csvw command.

Issue #31 - Update each pipeline to output CSVW to the specified
directory instead of generating the output RDF directly. Update
the exec task to write the CSVW to a temporary directory and
write the RDF using csv2rdf. The cube pipeline now only creates
a single metadata file instead of one for each element (e.g. dataset
metadata, DSD etc.). Update each metadata function in the cube
namespace to return a table definition instead of a full metadata
definition by removing the @context key.

Add new csvw task which writes the CSVW output for each pipeline
to the specified directory.

Update the output of the describe task to include an example
invocation of the csvw command.
Robsteranium and others added 7 commits April 20, 2020 17:26
This resolves a conflict between #108 (refactoring integrant) and #120 (where the key was just :function).
For some reason the printf call wasn't being flushed, despite ending with "%n". Appending a println or flush causes the message to be printed.
This refactoring renames these functions for consistency.

Sequences of record-maps follow the convention "<type>-records":

observation-records (unchanged)
read-component-specifications -> component-specification-records
codes -> code-records
components -> component-records

The maps describing the metadata tableSchema follow the convention "<type>-schema":

used-codes-codes-table -> used-codes-codes-schema
used-codes-codelists-table -> used-codes-codelists-schema
observations-table -> observations-schema
data-structure-definition-table -> data-structure-definition-schema
component-specification-metadata-table -> component-specification-schema
dataset-metadata-table -> dataset-schema
codelist-metadata -> codelist-schema
components-metdata -> components-schema

The term "table" was confusing. A csvw:table is a key in the metadata specification for the table schema but, since we also deal with the tables of records themselves this was confusing - is "table" the data or the metadata about it?!

The terms "metadata" is also overloaded. It could mean the csv schema or the notion that e.g. a qb:DataSet has metadata.
@Robsteranium
Copy link
Contributor

I can't change the base of this PR to master after having merged it (as GH complains there's nothing to merge!).

GH insists there are unmerged commits from issue_31 -> issue_37 but I think this is fine since everything is in master.

@Robsteranium Robsteranium deleted the issue_31 branch April 27, 2020 09:50
Robsteranium added a commit that referenced this pull request May 11, 2020
- URI configuration with an optional `--uri-templates` argument that expects a edn file. This introduces a small meta-templating language that allows you to include variables from outwith the csvw table (e.g. `base-uri`). See the `docs/usage.md` and the `customising-uris` example for a demonstration of how to use this feature. See #103 for implementation details.
- A csv parser for declarative validation and coercion of the csv inputs. The validation checks for required or unknown columns, validates URI templates and checks XML datatypes. Violations cause exceptions to be thrown with messages that identify the offending cell by row and column index (`ex-data` is also provided). The parses also makes the specification of transformations and defaults clearer. See #102 for details.
- A csvw task which will output the csvw serialisation (csv and metadata json) to a specified directory. This also collates the metadata for the cube-pipeline into a single document. See #120 for details.
- The log4j dependency is removed, allowing consumers of the library to configure logging themselves.
- Test coverage for CLI tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants