Skip to content

Commit f5e1dcd

Browse files
committed
docs: Update documentation in README
1 parent 941fc5b commit f5e1dcd

File tree

1 file changed

+142
-16
lines changed

1 file changed

+142
-16
lines changed

README.md

Lines changed: 142 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,159 @@
11
# file-utils-processors-ts
22

3-
[![Bun CI](https://github.com/rdf-connect/file-utils-processors-ts/actions/workflows/build-test.yml/badge.svg)](https://github.com/rdf-connect/file-utils-processors-ts/actions/workflows/build-test.yml) [![npm](https://img.shields.io/npm/v/@rdfc/file-utils-processors-ts.svg?style=popout)](https://npmjs.com/package/@rdfc/file-utils-processors-ts)
3+
[![Build and tests with Node.js](https://github.com/rdf-connect/file-utils-processors-ts/actions/workflows/build-test.yml/badge.svg)](https://github.com/rdf-connect/file-utils-processors-ts/actions/workflows/build-test.yml)
44

5-
[RDF-Connect](https://rdf-connect.github.io/rdfc.github.io/) Typescript processors for handling file operations. It currently exposes 6 functions:
5+
This repository provides a set of processors for reading, transforming, and extracting files in RDF-Connect pipelines.
6+
It includes utilities for reading files from folders or glob patterns, substituting strings or environment variables, reading files on demand, and handling compressed files (zip/gzip).
67

7-
### [`js:GlobRead`](https://github.com/rdf-connect/file-utils-processors-ts/blob/main/processors.ttl#L10)
8+
These processors are designed to integrate seamlessly into RDF-Connect pipelines using the [rdfc:NodeRunner](https://github.com/rdf-connect/js-runner).
89

9-
This function relies on the [`glob`](https://www.npmjs.com/package/glob) library to select a set of files according to a shell expression and stream them out in a sequential fashion. A `wait` parameter can be defined to wait x milliseconds between file streaming operations.
10+
---
1011

11-
### [`js:FolderRead`](https://github.com/rdf-connect/file-utils-processors-ts/blob/main/processors.ttl#L70)
12+
## Usage
1213

13-
This function reads all the files present in a given folder and streams out their content in a sequential fashion. A `maxMemory` parameter can be given (in GB) to defined threshold of maximum used memory by the streaming process. When the threshold is exceeded, the streaming process will pause for as many milliseconds as defined by the `pause` parameter.
14+
To use these processors, import the package into your RDF-Connect pipeline configuration and reference the required processors.
1415

15-
### [`js:Substitute`](https://github.com/rdf-connect/file-utils-processors-ts/blob/main/processors.ttl#L121)
16+
### Installation
1617

17-
This function transform a stream by applying a given string substitution on each of the messages. The matching string can be a regex defined by the `source` property and setting the `regexp` property to `true`.
18+
```bash
19+
npm install
20+
npm run build
21+
```
1822

19-
### [`js:Envsub`](https://github.com/rdf-connect/file-utils-processors-ts/blob/main/processors.ttl#L185)
23+
Or install from NPM:
2024

21-
This function substitute all the defined environment variables on each of the elements of an input stream that have been labeled with a `${VAR_NAME}` pattern.
25+
```bash
26+
npm install @rdfc/file-utils-processors-ts
27+
```
2228

23-
### [`js:ReadFile`](https://github.com/rdf-connect/file-utils-processors-ts/blob/main/processors.ttl#L220)
29+
Next, you can add the processors to your pipeline configuration as follows:
2430

25-
This function can read on demand and push downstream the contents of a file located in a predefined folder. This processor is used mostly for testing and demonstrating pipeline implementations.
31+
```turtle
32+
@prefix rdfc: <https://w3id.org/rdf-connect#>.
33+
@prefix owl: <http://www.w3.org/2002/07/owl#>.
2634
27-
### [`js:UnzipFile`](https://github.com/rdf-connect/file-utils-processors-ts/blob/main/processors.ttl#L265)
35+
### Import the processor definitions
36+
<> owl:imports <./node_modules/@rdfc/file-utils-processors-ts/processors.ttl>.
2837
29-
This function can receive a zipped file in the form of a Buffer and stream out its decompressed contents.
38+
### Define the channels your processor needs
39+
<in> a rdfc:Reader, rdfc:Writer.
40+
<out> a rdfc:Reader, rdfc:Writer.
3041
31-
### [`js:GunzipFile`](https://github.com/rdf-connect/file-utils-processors-ts/blob/main/processors.ttl#L310)
42+
### Attach the processor to the pipeline under the NodeRunner
43+
# Add the `rdfc:processor <folderReader>` statement under the `rdfc:consistsOf` statement of the `rdfc:NodeRunner`
3244
33-
This function can receive a gzipped file in the form of a Buffer and stream out its decompressed contents.
45+
### Define and configure the processors
46+
<folderReader> a rdfc:FolderRead;
47+
rdfc:folder_location "./data";
48+
rdfc:file_stream <out>.
49+
```
50+
51+
---
52+
53+
## Processors and Configuration
54+
55+
### 📂 `rdfc:GlobRead` – Glob-based File Reader
56+
Reads all files matching a given glob pattern.
57+
58+
**Parameters:**
59+
- `rdfc:glob` (`string`, required): Glob pattern to select files.
60+
- `rdfc:output` (`rdfc:Writer`, required): Output channel to stream file contents.
61+
- `rdfc:wait` (`integer`, optional): Delay (ms) before reading files.
62+
- `rdfc:closeOnEnd` (`boolean`, optional): Whether to close the stream after finishing.
63+
- `rdfc:binary` (`boolean`, optional): If true, streams binary data instead of text.
64+
65+
---
66+
67+
### 📁 `rdfc:FolderRead` – Folder File Reader
68+
Reads all files inside a folder.
69+
70+
**Parameters:**
71+
- `rdfc:folder_location` (`string`, required): Path to the folder.
72+
- `rdfc:file_stream` (`rdfc:Writer`, required): Output channel to stream file contents.
73+
- `rdfc:max_memory` (`double`, optional): Max memory usage allowed (in MB).
74+
- `rdfc:pause` (`integer`, optional): Pause duration (ms) between file reads.
75+
76+
---
77+
78+
### 🔄 `rdfc:Substitute` – String Substitution Processor
79+
Performs string substitution (supports regex) on messages in the stream.
80+
81+
**Parameters:**
82+
- `rdfc:input` (`rdfc:Reader`, required): Input channel.
83+
- `rdfc:output` (`rdfc:Writer`, required): Output channel.
84+
- `rdfc:source` (`string`, required): Source string or regex to match.
85+
- `rdfc:replace` (`string`, required): Replacement string.
86+
- `rdfc:regexp` (`boolean`, optional): If true, treat `source` as a regex.
87+
88+
---
89+
90+
### 🌍 `rdfc:Envsub` – Environment Variable Substitution
91+
Substitutes environment variables in the stream with their values.
92+
93+
**Parameters:**
94+
- `rdfc:input` (`rdfc:Reader`, required): Input channel.
95+
- `rdfc:output` (`rdfc:Writer`, required): Output channel.
96+
97+
---
98+
99+
### 📄 `rdfc:ReadFile` – On-Demand File Reader
100+
Reads a requested file from a given folder.
101+
102+
**Parameters:**
103+
- `rdfc:input` (`rdfc:Reader`, required): Input channel (file requests).
104+
- `rdfc:folderPath` (`string`, required): Path to the folder containing files.
105+
- `rdfc:output` (`rdfc:Writer`, required): Output channel for file contents.
106+
107+
---
108+
109+
### 📦 `rdfc:UnzipFile` – Zip File Extractor
110+
Unzips a compressed file and streams its content.
111+
112+
**Parameters:**
113+
- `rdfc:input` (`rdfc:Reader`, required): Input channel (zip file).
114+
- `rdfc:output` (`rdfc:Writer`, required): Output channel (extracted contents).
115+
- `rdfc:outputAsBuffer` (`boolean`, optional): If true, outputs raw buffers instead of strings.
116+
117+
---
118+
119+
### 🗜️ `rdfc:GunzipFile` – Gzip File Extractor
120+
Gunzip a compressed file and stream out its content.
121+
122+
**Parameters:**
123+
- `rdfc:input` (`rdfc:Reader`, required): Input channel (gzip file).
124+
- `rdfc:output` (`rdfc:Writer`, required): Output channel (extracted contents).
125+
- `rdfc:outputAsBuffer` (`boolean`, optional): If true, outputs raw buffers instead of strings.
126+
127+
---
128+
129+
## Example Pipelines
130+
131+
### Example 1: Reading all `.txt` files in a folder and logging them
132+
```turtle
133+
<reader> a rdfc:GlobRead;
134+
rdfc:glob "./data/*.txt";
135+
rdfc:output <out>.
136+
137+
<logger> a rdfc:LogProcessorJs;
138+
rdfc:reader <out>;
139+
rdfc:level "info";
140+
rdfc:label "glob-reader".
141+
```
142+
143+
### Example 2: Substituting strings in a stream
144+
```turtle
145+
<substitute> a rdfc:Substitute;
146+
rdfc:reader <in>;
147+
rdfc:writer <out>;
148+
rdfc:source "World";
149+
rdfc:replace "RDF-Connect";
150+
rdfc:regexp false.
151+
```
152+
153+
### Example 3: Reading and unzipping a file
154+
```turtle
155+
<unzipper> a rdfc:UnzipFile;
156+
rdfc:reader <in>;
157+
rdfc:writer <out>;
158+
rdfc:outputAsBuffer true.
159+
```

0 commit comments

Comments
 (0)