Skip to content

v0.1

Pre-release
Pre-release
Compare
Choose a tag to compare
@khituras khituras released this 10 Jul 11:08
· 221 commits to master since this release

The JCoRe Pipeline Modules are a set of tools to facilitate the creation and running of NLP pipelines using UIMA components. They focus on the JCoRe component repositories but are also able to incorporate other components. For this purpose, each component needs to be described with a JSON file pointing out important information about a component such as its Maven coordinates, its name and the UIMA descriptors of the component. For examples of this JSON format, refer e.g. to jcore-base.

The starting point for creating a pipeline using the tools offered here is the jcore-pipeline-builder-cli. There is also a graphical UI for pipeline building but it is not fully functional currently. It is also not clear if it will ever be.

Using the pipeline builder, components can be interactively selected, configured and the pipeline can be saved. Saving a pipeline creates a specific directory structure and the specified directory containing the UIMA descriptors and all JARs required to run the pipeline. The pipeline builder can also load existing pipelines for further editing.

The pipeline runner can then read such a created directory structure. The runner requires an XML configuration file specifying the location of the pipeline and other parameters such as the number of threads to run. To create such a configuration file, just call the pipeline runner and deliver a path to which a template configuration should be written. You will note that the template offers two possibilities to run pipelines, namely the CPE runner and the DUCC runner. Currently, only the CPE runner can be used. The DUCC runner will successfully commit a job to your DUCC cluster but unfortunately it will not successfully run. As of now, we don't know the reasons for this. So you should stick to the CPE runner for the time being.