|
1 | | -# RDF Connect: JVM runner |
| 1 | +# JVM Runner Plugin for RDF Connect |
2 | 2 |
|
| 3 | +The **JvmRunner** executes processors implemented in the Java Virtual Machine (JVM). |
| 4 | +It allows you to integrate custom Java (or Kotlin, Scala, etc.) processors into an RDF Connect streaming pipeline by providing a JAR and a class name. |
3 | 5 |
|
4 | | -## Build |
5 | | -```sh |
6 | | -gradle build |
| 6 | +## Overview |
| 7 | + |
| 8 | +- **Runner type**: `rdfc:JvmRunner` (imported automatically). |
| 9 | +- **Processor definition**: Each processor must declare a JAR and a fully qualified Java class via `rdfc:javaImplementationOf`. |
| 10 | +- **Implementation requirement**: All processors extend the abstract class `io.github.rdfc.Processor<T>` and provide lifecycle methods. |
| 11 | +- **Packaging requirement**: Processors must include a descriptor RDF file (e.g., `index.ttl`) inside their JAR. |
| 12 | +- **Distribution option**: You can publish your processor JAR with [JitPack](https://jitpack.io) so it can be included as a dependency in pipelines. |
| 13 | + |
| 14 | + |
| 15 | +## Using Processors in a Pipeline |
| 16 | + |
| 17 | +To use your JVM processor in a pipeline: |
| 18 | + |
| 19 | +1. Import the JvmRunner: |
| 20 | + |
| 21 | +```turtle |
| 22 | +<> owl:imports <https://javadoc.jitpack.io/com/github/rdf-connect/jvm-runner/runner/master-SNAPSHOT/runner-master-SNAPSHOT-index.jar>. |
| 23 | +``` |
| 24 | + |
| 25 | +2. Link your processor to the runner: |
| 26 | + |
| 27 | +```turtle |
| 28 | +@prefix rdfc: <https://w3id.org/rdf-connect#>. |
| 29 | +<> a rdfc:Pipeline; |
| 30 | + rdfc:consistsOf [ |
| 31 | + rdfc:instantiates rdfc:JvmRunner; |
| 32 | + rdfc:processor <myProcessor>; |
| 33 | + ]. |
| 34 | +<myProcessor> a rdfc:TestProcessor. |
| 35 | +``` |
| 36 | + |
| 37 | + |
| 38 | +## Implementing a new processor |
| 39 | + |
| 40 | +Processors must: |
| 41 | + |
| 42 | +1. **Extend the abstract class** `io.github.rdfc.Processor<T>` where `T` is an `Args` class containing configuration fields. |
| 43 | +2. **Implement lifecycle methods**: |
| 44 | + - `init(Consumer<Void> callback)` — initialization. |
| 45 | + - `transform(Consumer<Void> callback)` — processing of inputs from readers, called for each processor before produce. |
| 46 | + - `produce(Consumer<Void> callback)` — producing data, useful for processor like a file reader. |
| 47 | + The callbacks should be called to indicate that the function is finished. |
| 48 | +3. **Define an `Args` class** with fields matching RDF properties defined in the SHACL shape. |
| 49 | + |
| 50 | +### Processor description file |
| 51 | + |
| 52 | +The processor should be accompanied with a description file, often called `index.ttl`. |
| 53 | + |
| 54 | +They require the following fields: |
| 55 | +* `rdfc:javaImplementationOf` with value `rdfc:Processor`, indicating that this processor is a JavaProcessor, |
| 56 | +* `rdfc:jar` pointing to the resulting jar, often `<>` pointing to the current jar |
| 57 | +* `rdfc:class` the fully qualified name of the processor |
| 58 | +* A SHACL shape defining the required arguments. |
| 59 | + |
| 60 | +For example, the following description file declares a processor with arguments `{ reader: Reader, writer: Writer, additionalText: string }`. |
| 61 | +A matching implementation can be found on [github](https://github.com/rdf-connect/template-processor-jvm/blob/main/src/main/java/org/example/Library.java). |
| 62 | +```turtle |
| 63 | +@prefix rdfc: <https://w3id.org/rdf-connect#>. |
| 64 | +@prefix sh: <http://www.w3.org/ns/shacl#>. |
| 65 | +@prefix xsd: <http://www.w3.org/2001/XMLSchema#>. |
| 66 | +
|
| 67 | +rdfc:TestProcessor rdfc:javaImplementationOf rdfc:Processor; |
| 68 | + rdfc:class "org.example.Library"; |
| 69 | + rdfc:jar <file:./build/libs/my-processor-all.jar>. |
| 70 | +
|
| 71 | +[] a sh:NodeShape; |
| 72 | + sh:targetClass rdfc:TestProcessor; |
| 73 | + sh:property [ |
| 74 | + sh:path rdfc:reader; |
| 75 | + sh:name "reader"; |
| 76 | + sh:minCount 1; |
| 77 | + sh:maxCount 1; |
| 78 | + sh:class rdfc:Reader; |
| 79 | + ], [ |
| 80 | + sh:path rdfc:writer; |
| 81 | + sh:name "writer"; |
| 82 | + sh:minCount 1; |
| 83 | + sh:maxCount 1; |
| 84 | + sh:class rdfc:Writer; |
| 85 | + ], [ |
| 86 | + sh:path rdfc:additionalText; |
| 87 | + sh:name "additionalText"; |
| 88 | + sh:minCount 1; |
| 89 | + sh:maxCount 1; |
| 90 | + sh:datatype xsd:string; |
| 91 | + ]. |
7 | 92 | ``` |
8 | 93 |
|
| 94 | +### Build Instructions |
9 | 95 |
|
10 | | -## Use in a pipeline |
| 96 | +To build a JVM processor for use with the JvmRunner: |
11 | 97 |
|
12 | | -The file `index.ttl` suggests how to specify the JvmRunner. |
13 | | -We currently face a problem of how to point correctly to the jar, here it is hard coded. |
| 98 | +#### Dependencies |
14 | 99 |
|
15 | | -For the test pipeline there is a test processor in `test-processor`. |
16 | | -In their `build.gradle`, it points to the types jar from the runner, all Processors require this jar to implement against the expected `Processor<?>` abstract class. |
| 100 | +Your `build.gradle` should include: |
17 | 101 |
|
18 | 102 | ```gradle |
19 | | -implementation files('../../types/lib/build/libs/types.jar') |
| 103 | +plugins { |
| 104 | + id 'java' |
| 105 | + id 'com.github.johnrengelman.shadow' version '8.1.1' |
| 106 | + id 'maven-publish' |
| 107 | +} |
| 108 | +
|
| 109 | +repositories { |
| 110 | + mavenCentral() |
| 111 | + maven { url = 'https://jitpack.io' } |
| 112 | +} |
| 113 | +
|
| 114 | +dependencies { |
| 115 | + implementation 'com.google.protobuf:protobuf-java:4.28.2' |
| 116 | + implementation 'com.github.rdf-connect.jvm-runner:types:master-SNAPSHOT' |
| 117 | +} |
| 118 | +``` |
| 119 | + |
| 120 | +#### Fat JAR Packaging |
| 121 | + |
| 122 | +Use the Shadow plugin to produce a fat JAR that includes your processor and its descriptor: |
| 123 | +``` gradle |
| 124 | +tasks.named("shadowJar", Jar) { |
| 125 | + // add your processor descriptor (e.g., index.ttl) to the root of the jar |
| 126 | + from("index.ttl") { |
| 127 | + into("") |
| 128 | + } |
| 129 | +} |
| 130 | +``` |
| 131 | + |
| 132 | +The fat jar is built with `gradle shadowJar` |
| 133 | + |
| 134 | + |
| 135 | +#### Publishing with JitPack |
| 136 | + |
| 137 | +To make your processor available as a dependency from GitHub via [JitPack](https://jitpack.io), add the following to your `build.gradle`. |
| 138 | + |
| 139 | +```gradle |
| 140 | +publishing { |
| 141 | + publications { |
| 142 | + maven(MavenPublication) { |
| 143 | + // publish the fat JAR |
| 144 | + artifact(tasks.shadowJar) |
| 145 | + } |
| 146 | + } |
| 147 | +} |
| 148 | +``` |
| 149 | + |
| 150 | +Then: |
| 151 | + |
| 152 | +1. Push your code to GitHub |
| 153 | +2. Users can then include your processor as a dependency like this: |
| 154 | + |
| 155 | +```gradle |
| 156 | +repositories { |
| 157 | + mavenCentral() |
| 158 | + maven { url = 'https://jitpack.io' } |
| 159 | +} |
| 160 | +
|
| 161 | +dependencies { |
| 162 | + implementation 'com.github.<your-github-user>:<your-repo>:master-SNAPSHOT' // or a git hash or release |
| 163 | +} |
20 | 164 | ``` |
21 | 165 |
|
22 | 166 |
|
| 167 | + |
| 168 | +## Notes |
| 169 | + |
| 170 | +* Args class fields must align with RDF properties defined in the SHACL shape. |
| 171 | +* Descriptor file (e.g., index.ttl) must be packaged in the JAR. |
| 172 | +* Fat JAR packaging ensures no dependency issues when running. |
| 173 | +* Publishing with JitPack allows others to use your processor directly via GitHub. |
| 174 | + |
0 commit comments