Skip to content

Commit 94f8411

Browse files
committed
update README
1 parent f024d2b commit 94f8411

File tree

1 file changed

+162
-10
lines changed

1 file changed

+162
-10
lines changed

README.md

Lines changed: 162 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,174 @@
1-
# RDF Connect: JVM runner
1+
# JVM Runner Plugin for RDF Connect
22

3+
The **JvmRunner** executes processors implemented in the Java Virtual Machine (JVM).
4+
It allows you to integrate custom Java (or Kotlin, Scala, etc.) processors into an RDF Connect streaming pipeline by providing a JAR and a class name.
35

4-
## Build
5-
```sh
6-
gradle build
6+
## Overview
7+
8+
- **Runner type**: `rdfc:JvmRunner` (imported automatically).
9+
- **Processor definition**: Each processor must declare a JAR and a fully qualified Java class via `rdfc:javaImplementationOf`.
10+
- **Implementation requirement**: All processors extend the abstract class `io.github.rdfc.Processor<T>` and provide lifecycle methods.
11+
- **Packaging requirement**: Processors must include a descriptor RDF file (e.g., `index.ttl`) inside their JAR.
12+
- **Distribution option**: You can publish your processor JAR with [JitPack](https://jitpack.io) so it can be included as a dependency in pipelines.
13+
14+
15+
## Using Processors in a Pipeline
16+
17+
To use your JVM processor in a pipeline:
18+
19+
1. Import the JvmRunner:
20+
21+
```turtle
22+
<> owl:imports <https://javadoc.jitpack.io/com/github/rdf-connect/jvm-runner/runner/master-SNAPSHOT/runner-master-SNAPSHOT-index.jar>.
23+
```
24+
25+
2. Link your processor to the runner:
26+
27+
```turtle
28+
@prefix rdfc: <https://w3id.org/rdf-connect#>.
29+
<> a rdfc:Pipeline;
30+
rdfc:consistsOf [
31+
rdfc:instantiates rdfc:JvmRunner;
32+
rdfc:processor <myProcessor>;
33+
].
34+
<myProcessor> a rdfc:TestProcessor.
35+
```
36+
37+
38+
## Implementing a new processor
39+
40+
Processors must:
41+
42+
1. **Extend the abstract class** `io.github.rdfc.Processor<T>` where `T` is an `Args` class containing configuration fields.
43+
2. **Implement lifecycle methods**:
44+
- `init(Consumer<Void> callback)` — initialization.
45+
- `transform(Consumer<Void> callback)` — processing of inputs from readers, called for each processor before produce.
46+
- `produce(Consumer<Void> callback)` — producing data, useful for processor like a file reader.
47+
The callbacks should be called to indicate that the function is finished.
48+
3. **Define an `Args` class** with fields matching RDF properties defined in the SHACL shape.
49+
50+
### Processor description file
51+
52+
The processor should be accompanied with a description file, often called `index.ttl`.
53+
54+
They require the following fields:
55+
* `rdfc:javaImplementationOf` with value `rdfc:Processor`, indicating that this processor is a JavaProcessor,
56+
* `rdfc:jar` pointing to the resulting jar, often `<>` pointing to the current jar
57+
* `rdfc:class` the fully qualified name of the processor
58+
* A SHACL shape defining the required arguments.
59+
60+
For example, the following description file declares a processor with arguments `{ reader: Reader, writer: Writer, additionalText: string }`.
61+
A matching implementation can be found on [github](https://github.com/rdf-connect/template-processor-jvm/blob/main/src/main/java/org/example/Library.java).
62+
```turtle
63+
@prefix rdfc: <https://w3id.org/rdf-connect#>.
64+
@prefix sh: <http://www.w3.org/ns/shacl#>.
65+
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
66+
67+
rdfc:TestProcessor rdfc:javaImplementationOf rdfc:Processor;
68+
rdfc:class "org.example.Library";
69+
rdfc:jar <file:./build/libs/my-processor-all.jar>.
70+
71+
[] a sh:NodeShape;
72+
sh:targetClass rdfc:TestProcessor;
73+
sh:property [
74+
sh:path rdfc:reader;
75+
sh:name "reader";
76+
sh:minCount 1;
77+
sh:maxCount 1;
78+
sh:class rdfc:Reader;
79+
], [
80+
sh:path rdfc:writer;
81+
sh:name "writer";
82+
sh:minCount 1;
83+
sh:maxCount 1;
84+
sh:class rdfc:Writer;
85+
], [
86+
sh:path rdfc:additionalText;
87+
sh:name "additionalText";
88+
sh:minCount 1;
89+
sh:maxCount 1;
90+
sh:datatype xsd:string;
91+
].
792
```
893

94+
### Build Instructions
995

10-
## Use in a pipeline
96+
To build a JVM processor for use with the JvmRunner:
1197

12-
The file `index.ttl` suggests how to specify the JvmRunner.
13-
We currently face a problem of how to point correctly to the jar, here it is hard coded.
98+
#### Dependencies
1499

15-
For the test pipeline there is a test processor in `test-processor`.
16-
In their `build.gradle`, it points to the types jar from the runner, all Processors require this jar to implement against the expected `Processor<?>` abstract class.
100+
Your `build.gradle` should include:
17101

18102
```gradle
19-
implementation files('../../types/lib/build/libs/types.jar')
103+
plugins {
104+
id 'java'
105+
id 'com.github.johnrengelman.shadow' version '8.1.1'
106+
id 'maven-publish'
107+
}
108+
109+
repositories {
110+
mavenCentral()
111+
maven { url = 'https://jitpack.io' }
112+
}
113+
114+
dependencies {
115+
implementation 'com.google.protobuf:protobuf-java:4.28.2'
116+
implementation 'com.github.rdf-connect.jvm-runner:types:master-SNAPSHOT'
117+
}
118+
```
119+
120+
#### Fat JAR Packaging
121+
122+
Use the Shadow plugin to produce a fat JAR that includes your processor and its descriptor:
123+
``` gradle
124+
tasks.named("shadowJar", Jar) {
125+
// add your processor descriptor (e.g., index.ttl) to the root of the jar
126+
from("index.ttl") {
127+
into("")
128+
}
129+
}
130+
```
131+
132+
The fat jar is built with `gradle shadowJar`
133+
134+
135+
#### Publishing with JitPack
136+
137+
To make your processor available as a dependency from GitHub via [JitPack](https://jitpack.io), add the following to your `build.gradle`.
138+
139+
```gradle
140+
publishing {
141+
publications {
142+
maven(MavenPublication) {
143+
// publish the fat JAR
144+
artifact(tasks.shadowJar)
145+
}
146+
}
147+
}
148+
```
149+
150+
Then:
151+
152+
1. Push your code to GitHub
153+
2. Users can then include your processor as a dependency like this:
154+
155+
```gradle
156+
repositories {
157+
mavenCentral()
158+
maven { url = 'https://jitpack.io' }
159+
}
160+
161+
dependencies {
162+
implementation 'com.github.<your-github-user>:<your-repo>:master-SNAPSHOT' // or a git hash or release
163+
}
20164
```
21165

22166

167+
168+
## Notes
169+
170+
* Args class fields must align with RDF properties defined in the SHACL shape.
171+
* Descriptor file (e.g., index.ttl) must be packaged in the JAR.
172+
* Fat JAR packaging ensures no dependency issues when running.
173+
* Publishing with JitPack allows others to use your processor directly via GitHub.
174+

0 commit comments

Comments
 (0)