Scicat ingestor creates a raw dataset along with metadata using
wrdn messages and scicat api whenever a new file is written by a file-writer.
git clone https://github.com/SciCatProject/scicat-ingestor.git
cd scicat-ingestor
pip install -e . # It will allow you to use entry-points of the scripts,
# defined in ``pyproject.toml``, under ``[project.scripts]`` section.All commands have prefix of scicat so that you can use auto-complete in a terminal.
Each command is connected to a free function in a module. It is defined in pyproject.toml, under [project.scripts] section.
All scripts parse the system arguments and configuration in the same way.
You can start the ingestor daemon with certain configurations.
It will continuously process wrdn messages and ingest the corresponding nexus files.
scicat_ingestor --logging.verbose -c PATH_TO_CONFIGURATION_FILE.yamlA topic can contain non-wrdn message so the ingestor filters messages and ignores irrelevant types of messages.
See configuration for how to use configuration files.
You can also run the ingestor file by file.
You need to know the path to the nexus file you want to ingest
and also the path to the done_writing_message_file as a json file.
scicat_background_ingestor \
--logging.verbose \
-c PATH_TO_CONFIGURATION_FILE.yaml \
--nexus-file PATH_TO_THE_NEXUS_FILE.nxs \
--done-writing-message-file PATH_TO_THE_MESSAGE_FILE.ymlYou can add --ingestion.dry-run flag for dry-run testings.
scicat_ingestor --logging.verbose -c PATH_TO_CONFIGURATION_FILE.yaml --ingestion.dry-runscicat_background_ingestor \
--logging.verbose \
-c PATH_TO_CONFIGURATION_FILE.yaml \
--nexus-file PATH_TO_THE_NEXUS_FILE.nxs \
--done-writing-message-file PATH_TO_THE_MESSAGE_FILE.yml \
--ingestion.dry-runYou can use a json file to configure options.
There is a template, resources/config.sample.yml you can copy/paste to make your own configuration file.
In order to update the configurations, you should update it the scicat_configuration module.
The template file can be synchronized automatically by scicat_synchronize_config command.
There is a unit test that checks if the online ingestor configuration dataclass is in sync with the resources/config.sample.yml.
You can validate a configuration file with scicat_validate_ingestor_config command.
scicat_validate_ingestor_configIt tries building nested configuration dataclasses from the configuration file.
It will throw errors if configuration is invalid.
i.e. In the operation, it'll ignore extra keywords that do not match the configuration dataclass arguments but validator throws an error if there are extra keywords that do not match the arguments.
This is part of CI tests.