-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add csvw task. #120
Closed
Add csvw task. #120
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Issue #31 - Update each pipeline to output CSVW to the specified directory instead of generating the output RDF directly. Update the exec task to write the CSVW to a temporary directory and write the RDF using csv2rdf. The cube pipeline now only creates a single metadata file instead of one for each element (e.g. dataset metadata, DSD etc.). Update each metadata function in the cube namespace to return a table definition instead of a full metadata definition by removing the @context key. Add new csvw task which writes the CSVW output for each pipeline to the specified directory. Update the output of the describe task to include an example invocation of the csvw command.
For some reason the printf call wasn't being flushed, despite ending with "%n". Appending a println or flush causes the message to be printed.
This refactoring renames these functions for consistency. Sequences of record-maps follow the convention "<type>-records": observation-records (unchanged) read-component-specifications -> component-specification-records codes -> code-records components -> component-records The maps describing the metadata tableSchema follow the convention "<type>-schema": used-codes-codes-table -> used-codes-codes-schema used-codes-codelists-table -> used-codes-codelists-schema observations-table -> observations-schema data-structure-definition-table -> data-structure-definition-schema component-specification-metadata-table -> component-specification-schema dataset-metadata-table -> dataset-schema codelist-metadata -> codelist-schema components-metdata -> components-schema The term "table" was confusing. A csvw:table is a key in the metadata specification for the table schema but, since we also deal with the tables of records themselves this was confusing - is "table" the data or the metadata about it?! The terms "metadata" is also overloaded. It could mean the csv schema or the notion that e.g. a qb:DataSet has metadata.
I can't change the base of this PR to master after having merged it (as GH complains there's nothing to merge!). GH insists there are unmerged commits from issue_31 -> issue_37 but I think this is fine since everything is in master. |
This was referenced Apr 28, 2020
Robsteranium
added a commit
that referenced
this pull request
May 11, 2020
- URI configuration with an optional `--uri-templates` argument that expects a edn file. This introduces a small meta-templating language that allows you to include variables from outwith the csvw table (e.g. `base-uri`). See the `docs/usage.md` and the `customising-uris` example for a demonstration of how to use this feature. See #103 for implementation details. - A csv parser for declarative validation and coercion of the csv inputs. The validation checks for required or unknown columns, validates URI templates and checks XML datatypes. Violations cause exceptions to be thrown with messages that identify the offending cell by row and column index (`ex-data` is also provided). The parses also makes the specification of transformations and defaults clearer. See #102 for details. - A csvw task which will output the csvw serialisation (csv and metadata json) to a specified directory. This also collates the metadata for the cube-pipeline into a single document. See #120 for details. - The log4j dependency is removed, allowing consumers of the library to configure logging themselves. - Test coverage for CLI tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue #31 - Update each pipeline to output CSVW to the specified
directory instead of generating the output RDF directly. Update
the exec task to write the CSVW to a temporary directory and
write the RDF using csv2rdf. The cube pipeline now only creates
a single metadata file instead of one for each element (e.g. dataset
metadata, DSD etc.). Update each metadata function in the cube
namespace to return a table definition instead of a full metadata
definition by removing the @context key.
Add new csvw task which writes the CSVW output for each pipeline
to the specified directory.
Update the output of the describe task to include an example
invocation of the csvw command.