-
Notifications
You must be signed in to change notification settings - Fork 0
Clarify the RDF Message Stream definition #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
6c01530
760012e
3a1efb3
88853d4
4d1fd54
edaeccf
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -66,33 +66,50 @@ Note: This specification does not provide any mechanism for referring to an RDF | |
|
|
||
| ## RDF Message Streams ## {#rdf-message-streams} | ||
|
|
||
| An <dfn>RDF Message Stream</dfn> is an ordered, potentially unbounded sequence of [=RDF Messages=]. An [=RDF Message Stream=] carries [=RDF Messages=] from one specific producer to one specific consumer. | ||
| An <dfn>RDF Message Stream</dfn> is an ordered, potentially unbounded sequence of [=RDF Messages=]. | ||
|
|
||
| Note: This concept is different from an RDF quad stream that carries individual quads. | ||
| Note: This concept is different from an RDF quad stream that is a stream of individual quads. | ||
|
|
||
| A <dfn>stream producer</dfn> makes available an [=RDF Message Stream=] using a stream protocol. | ||
| Note: This definition is intentionally abstract and simple. More details about implementing RDF Message Streams are provided in [[#producers-consumers]]. | ||
|
|
||
| A <dfn>stream consumer</dfn> consumes the [=RDF Messages=] in the [=RDF Message Stream=] using a stream protocol. | ||
| ## Scope of RDF Messages ## {#scope} | ||
|
|
||
| Issue: Add a diagram illustrating RDF Messages, an RDF Message Stream, stream producers, and stream consumers. | ||
| By default, we assume that [=RDF Messages=] in an [=RDF Message Stream=] are not in the same "world". In other words, what is asserted in one message, is not asserted in other messages. | ||
|
||
|
|
||
| Note: The underlying stream protocol is out of scope of this specification. It can be for example [[!WebSockets]], [[!LDN]], [[!EventSource]], [Linked Data Event Streams](https://w3id.org/ldes/specification), [Jelly gRPC](https://w3id.org/jelly/), [MQTT](https://mqtt.org/), or a programming language-specific stream interface that carries RDF Datasets, or a collection or stream of RDF Quads. | ||
| For example, if each message describes the state of a domestic cat at a certain point in time, one message may report that the cat is running, while another message that the cat is sleeping. This is not a contradiction, as the messages are by default separate "worlds" that should be interpreted independently. | ||
Ostrzyciel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| Stream protocols used for [=RDF Message Streams=] may support any streaming semantics. For example: | ||
| [=RDF Message Stream Profiles=] can be used to indicate that messages should be interpreted in a broader scope. For example, a profile may indicate that all messages in a stream should be interpreted together. In this case, it could be concluded that the cat is running and sleeping at the same time, which is a contradiction. | ||
|
|
||
| - Delivery guarantees: at most once, at least once, exactly once. | ||
| - Ordering guarantees: ordered, unordered, partially ordered. While we assume that an [=RDF Message Stream=] is ordered, the order does have to be the same for the producer and the consumer. | ||
| - Flow control: push-based, pull-based, or hybrid. | ||
| ## RDF Message Stream Producers and Consumers ## {#producers-consumers} | ||
|
|
||
| Issue: Find out and document the similarities/differences to the [RDF-JS Stream interface](https://rdf.js.org/stream-spec/) | ||
| An <dfn>RDF Message Stream Producer</dfn> can make an [=RDF Message Stream=] available to be consumed by an <dfn>RDF Message Stream Consumer</dfn> using a stream protocol. | ||
|
|
||
| ## Scope of RDF Messages ## {#scope} | ||
| The underlying stream protocol is out of scope of this specification. It can be for example [[!WebSockets]], [[!LDN]], [[!EventSource]], [Linked Data Event Streams](https://w3id.org/ldes/specification), [Jelly gRPC](https://w3id.org/jelly/), [MQTT](https://mqtt.org/), or a programming language-specific stream interface that carries RDF Datasets, or a collection or stream of RDF Quads. | ||
Ostrzyciel marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| By default, we assume that [=RDF Messages=] in an [=RDF Message Stream=] are not in the same "world". In other words, what is asserted in one message, is not asserted in other messages. | ||
| Stream protocols used for [=RDF Message Streams=] may support any streaming semantics, such as delivery guarantees, ordering, and flow control (pull-based, push-based, etc.). | ||
|
|
||
| For example, if each message describes the state of a domestic cat at a certain point in time, one message may report that the cat is running, while another message that the cat is sleeping. This is not a contradiction, as the messages are by default separate "worlds" that should be interpreted independently. | ||
| Note: An RDF Message Stream can be created ad-hoc, and describes only one specific "instance" of a stream. This allows streaming protocols to have freedom in how they manage ordering, stream lifecycle, delivery guarantees, flow control, and other streaming semantics. See the examples below for more details. | ||
|
|
||
| [=RDF Message Stream Profiles=] can be used to indicate that messages should be interpreted in a broader scope. For example, a profile may indicate that all messages in a stream should be interpreted together. In this case, it could be concluded that the cat is running and sleeping at the same time, which is a contradiction. | ||
| <div class="example"> | ||
| An HTTP server exposes a file at `https://example.org/stream`. This file contains an [=RDF Message Log=] serialization of an [=RDF Message Stream=]. A client can consume the stream by sending an HTTP GET request to that URL, and parsing the response as an [=RDF Message Stream=]. | ||
|
|
||
| In this example, the server is the **stream producer**, and the client is the **stream consumer**. The stream protocol is HTTP. The RDF Message Stream only exists over the course of the HTTP request. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
If I understood you correctly, storing the HTTP GET result to disk and reading it from the log results in a different stream instance even though they are equivalent right?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, that is the idea. Should I add this clarification to the spec? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's not immediately clear what
means in that context. Here is one attempt at explaining this differently:
I am not attached to any of these words. Maybe it can help clarify some things. What is actually the purpose of defining the producer/consumer in this context? Do we use that somewhere else? If we just say an RDF Message Stream is just a way of interpreting a sequence of "primitive messages" we could avoid the explanations about producers and consumers. |
||
| </div> | ||
|
|
||
| <div class="example"> | ||
| An MQTT broker ([[mqtt-5]]) hosts a topic `iot/temparature` to which RDF Messages are published by an IoT thermometer. Multiple clients, at different points in time, subscribe to that topic to consume the RDF Messages being published. Because of the used Quality-of-Service settings (QoS 0), some messages maybe be lost, resulting in different clients seeing different subsets of the messages published on the topic. | ||
|
|
||
| In this example: | ||
|
|
||
| - The IoT thermometer is a **stream producer** that produces an [=RDF Message Stream=] by publishing RDF Messages to the MQTT broker, which acts as the **stream consumer**. | ||
| - For each individual client, the MQTT broker is a **stream producer** that produces an [=RDF Message Stream=] by sending RDF Messages to the client, which acts as the **stream consumer**. | ||
|
|
||
| For example, for 5 clients subscribing to the topic, there would be 6 different [=RDF Message Streams=]: one from the IoT thermometer to the MQTT broker, and one from the MQTT broker to each of the 5 clients. Each of these streams may have different messages in them, due to the Quality-of-Service settings and the clients subscribing to the stream at different points in time. | ||
|
||
| </div> | ||
|
|
||
| Issue: Add a diagram illustrating RDF Messages, an RDF Message Stream, stream producers, and stream consumers. | ||
|
|
||
| Issue: Find out and document the similarities/differences to the [RDF-JS Stream interface](https://rdf.js.org/stream-spec/) | ||
|
|
||
| ## RDF Message Logs ## {#rdf-message-logs} | ||
|
|
||
|
|
@@ -131,7 +148,7 @@ Note: Blank node identifiers in RDF Message Streams and RDF Message Logs are sco | |
| # Serializing and parsing RDF Message Logs # {#rdf-message-logs-serialization} | ||
|
|
||
| In this specification we propose that all RDF serializations MUST implement a way to group quads into [=RDF Messages=]. | ||
| This way, a [=stream consumer=] can write the stream into an [=RDF Message Log=] that can be read again by a [=stream producer=] into an [=RDF Message Stream=]. | ||
| This way, a [=RDF Message Stream Consumer=] can write the stream into an [=RDF Message Log=] that can be read again by a [=RDF Message Stream Producer=] into an [=RDF Message Stream=]. | ||
|
|
||
| Note: While we do define content types for the RDF Message Log serialization formats, this does not imply that the serialization needs to be used over HTTP only. The use of alternative transport mechanisms is equally valid and encouraged. | ||
|
|
||
|
|
@@ -309,3 +326,24 @@ As an alternative, [Jelly-RDF](https://w3id.org/jelly) distributions are also av | |
| A [Nanopublication](https://nanopub.net/) is a small RDF dataset that contains an assertion, its provenance, and publication information. Nanopublications are stored and exchanged by a network of services (registries and query endpoints). Exchanging each Nanopublication individually leads to significant overhead, due to repeated HTTP requests necessitated by the lack of a format for grouping multiple Nanopublications together. This issue was resolved by using [Jelly](https://w3id.org/jelly/) to serialize multiple Nanopublications into a single byte stream, where each Nanopublication corresponds to a [Jelly frame](https://w3id.org/jelly/dev/user-guide/#stream-frames). | ||
|
|
||
| Using an [=RDF Message Log=] serialization to group multiple Nanopublications into a single file would also solve this problem, while still allowing each Nanopublication to be processed individually as an [=RDF Message=]. | ||
|
|
||
|
|
||
| <pre class=biblio> | ||
| { | ||
| "mqtt-5": { | ||
| "authors": [ | ||
| "Andrew Banks", | ||
| "Ed Briggs", | ||
| "Ken Borgendale", | ||
| "Rahul Gupta" | ||
| ], | ||
| "href": "https://docs.oasis-open.org/mqtt/mqtt/v5.0/mqtt-v5.0.html", | ||
| "title": "MQTT Version 5.0", | ||
| "status": "OASIS Standard", | ||
| "publisher": "OASIS", | ||
| "deliveredBy": [ | ||
| "https://www.oasis-open.org/committees/mqtt/" | ||
| ] | ||
| } | ||
| } | ||
| </pre> | ||
Uh oh!
There was an error while loading. Please reload this page.