Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ pipx run bikeshed spec spec/<spec-name>.bs
You can also start bikeshed in watch mode to automatically rebuild the specs on changes, and have it serve the specs on a local web server:

```bash
pipx run bikeshed serve
pipx run bikeshed serve spec/spec-name>.bs
```

To view, for example, `messages.bs`, open `http://localhost:8000/spec/messages.html` in your web browser.
70 changes: 54 additions & 16 deletions spec/messages.bs
Original file line number Diff line number Diff line change
Expand Up @@ -66,33 +66,50 @@ Note: This specification does not provide any mechanism for referring to an RDF

## RDF Message Streams ## {#rdf-message-streams}

An <dfn>RDF Message Stream</dfn> is an ordered, potentially unbounded sequence of [=RDF Messages=]. An [=RDF Message Stream=] carries [=RDF Messages=] from one specific producer to one specific consumer.
An <dfn>RDF Message Stream</dfn> is an ordered, potentially unbounded sequence of [=RDF Messages=].

Note: This concept is different from an RDF quad stream that carries individual quads.
Note: This concept is different from an RDF quad stream that is a stream of individual quads.

A <dfn>stream producer</dfn> makes available an [=RDF Message Stream=] using a stream protocol.
Note: This definition is intentionally abstract and simple. More details about implementing RDF Message Streams are provided in [[#producers-consumers]].

A <dfn>stream consumer</dfn> consumes the [=RDF Messages=] in the [=RDF Message Stream=] using a stream protocol.
## Scope of RDF Messages ## {#scope}

Issue: Add a diagram illustrating RDF Messages, an RDF Message Stream, stream producers, and stream consumers.
By default, we assume that [=RDF Messages=] in an [=RDF Message Stream=] are not in the same "world". In other words, what is asserted in one message, is not asserted in other messages.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Giving this a bit more thought: this means we will never be able to fallback to a parser not supporting RDF Messages, as the semantics are different. If we would fall back, the contexts would be merged automatically of course, which is not desired. The proposal however currently is to indeed not allow fallbacks in the serializations, as that was initially proposed by having a pragma in comments.

I think calling it a «world» is also not the most clear thing to do as it’s nowhere mentioned in the RDF semantics. I propose to refer to an implicit context:


Each RDF message has an implicit context.

Note: An RDF Messages Stream Consumer can make this context explicit by putting the triples from the message in a named graph (e.g., with a blank node), and annotate the graph with the implicit context explicitly, such as when it was retrieved, a link to the conceptual stream, etc.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not entirely up to date on this discussion but here are my 2 cents.

What about just calling it distinct datasets/messages?

By default, we assume that the [=RDF Messages=] in an [=RDF Message Stream=] are distinct and should therefore not be combined, unless an RDF Message Stream Profile overrules this default. This means that, by default, what is asserted in one message, is not asserted in other messages.

Cat Exaple

The implicit context is something to think about. How would we handle messages with named graphs in this scenario? Do we need an exception?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposal however currently is to indeed not allow fallbacks in the serializations, as that was initially proposed by having a pragma in comments.

Yes, that is the case. The new serializations are to use completely new (non-backward compatible) syntax.

Each RDF message has an implicit context.

I– I'm not sure if this is the right way. The whole named graphs thing pulls in a very large part of the RDF spec into the discussion here, that I'm not convinced is required to explain what we need to explain. It also invokes the demons of the ancient RDF 1.1 "dataset semantics" discussion. Let's apply Occam's razor here.

What about just calling it distinct datasets/messages?

Pieter has a point in that we are inventing some new terms here, so I also double-checked what do the existing specs / W3C notes say about this.

RDF 1.2 Semantics only discusses datasets and "interpretations". The phrase "interpretation scope" never occurs in this document, but in my opinion it would be at least understandable.

RDF 1.1: On Semantics of RDF Datasets uh... basically says nothing on the subject, at least I can't find anything.

So, I really don't have a better idea other than what you @tobixdev proposed. I will put that in the spec in a second.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done – what do you think of it now?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me this makes it clearer. Thanks!


Note: The underlying stream protocol is out of scope of this specification. It can be for example [[!WebSockets]], [[!LDN]], [[!EventSource]], [Linked Data Event Streams](https://w3id.org/ldes/specification), [Jelly gRPC](https://w3id.org/jelly/), [MQTT](https://mqtt.org/), or a programming language-specific stream interface that carries RDF Datasets, or a collection or stream of RDF Quads.
For example, if each message describes the state of a domestic cat at a certain point in time, one message may report that the cat is running, while another message that the cat is sleeping. This is not a contradiction, as the messages are by default separate "worlds" that should be interpreted independently.

Stream protocols used for [=RDF Message Streams=] may support any streaming semantics. For example:
[=RDF Message Stream Profiles=] can be used to indicate that messages should be interpreted in a broader scope. For example, a profile may indicate that all messages in a stream should be interpreted together. In this case, it could be concluded that the cat is running and sleeping at the same time, which is a contradiction.

- Delivery guarantees: at most once, at least once, exactly once.
- Ordering guarantees: ordered, unordered, partially ordered. While we assume that an [=RDF Message Stream=] is ordered, the order does have to be the same for the producer and the consumer.
- Flow control: push-based, pull-based, or hybrid.
## RDF Message Stream Producers and Consumers ## {#producers-consumers}

Issue: Find out and document the similarities/differences to the [RDF-JS Stream interface](https://rdf.js.org/stream-spec/)
An <dfn>RDF Message Stream Producer</dfn> can make an [=RDF Message Stream=] available to be consumed by an <dfn>RDF Message Stream Consumer</dfn> using a stream protocol.

## Scope of RDF Messages ## {#scope}
The underlying stream protocol is out of scope of this specification. It can be for example [[!WebSockets]], [[!LDN]], [[!EventSource]], [Linked Data Event Streams](https://w3id.org/ldes/specification), [Jelly gRPC](https://w3id.org/jelly/), [MQTT](https://mqtt.org/), or a programming language-specific stream interface that carries RDF Datasets, or a collection or stream of RDF Quads.

By default, we assume that [=RDF Messages=] in an [=RDF Message Stream=] are not in the same "world". In other words, what is asserted in one message, is not asserted in other messages.
Stream protocols used for [=RDF Message Streams=] may support any streaming semantics, such as delivery guarantees, ordering, and flow control (pull-based, push-based, etc.).

For example, if each message describes the state of a domestic cat at a certain point in time, one message may report that the cat is running, while another message that the cat is sleeping. This is not a contradiction, as the messages are by default separate "worlds" that should be interpreted independently.
Note: An RDF Message Stream can be created ad-hoc, and describes only one specific "instance" of a stream. This allows streaming protocols to have freedom in how they manage ordering, stream lifecycle, delivery guarantees, flow control, and other streaming semantics. See the examples below for more details.

[=RDF Message Stream Profiles=] can be used to indicate that messages should be interpreted in a broader scope. For example, a profile may indicate that all messages in a stream should be interpreted together. In this case, it could be concluded that the cat is running and sleeping at the same time, which is a contradiction.
<div class="example">
An HTTP server exposes a file at `https://example.org/stream`. This file contains an [=RDF Message Log=] serialization of an [=RDF Message Stream=]. A client can consume the stream by sending an HTTP GET request to that URL, and parsing the response as an [=RDF Message Stream=].

In this example, the server is the **stream producer**, and the client is the **stream consumer**. The stream protocol is HTTP. The RDF Message Stream only exists over the course of the HTTP request.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stream protocol is HTTP. The RDF Message Stream only exists over the course of the HTTP request.

If I understood you correctly, storing the HTTP GET result to disk and reading it from the log results in a different stream instance even though they are equivalent right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is the idea. Should I add this clarification to the spec?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not immediately clear what

An RDF Message Stream can be created ad-hoc [...]

means in that context.

Here is one attempt at explaining this differently:

An [=RDF Message Stream=] is a way of modeling a sequence of [=RDF Messages=].
Defining how a stream between two parties is implemented is beyond the scope of this text.
We assume that these details are defined in an underlying stream protocol.
It can be for example [[!WebSockets]], [[!LDN]], [[!EventSource]], Linked Data Event Streams, Jelly gRPC, MQTT, or a programming language-specific stream interface that carries RDF Datasets, or a collection or stream of RDF Quads.
These stream protocols used for [=RDF Message Streams=] may support any streaming semantics, such as delivery guarantees, ordering, and flow control (pull-based, push-based, etc.).

Assuming the existence of a stream that adheres to any given stream protocol, an [=RDF Message Stream=] can be layered on-top.
That is, the [=RDF Message Stream=] is just a way of interpreting the messages that are processed in the underlying stream protocol.

I am not attached to any of these words. Maybe it can help clarify some things.

What is actually the purpose of defining the producer/consumer in this context? Do we use that somewhere else? If we just say an RDF Message Stream is just a way of interpreting a sequence of "primitive messages" we could avoid the explanations about producers and consumers.

</div>

<div class="example">
An MQTT broker ([[mqtt-5]]) hosts a topic `iot/temparature` to which RDF Messages are published by an IoT thermometer. Multiple clients, at different points in time, subscribe to that topic to consume the RDF Messages being published. Because of the used Quality-of-Service settings (QoS 0), some messages maybe be lost, resulting in different clients seeing different subsets of the messages published on the topic.

In this example:

- The IoT thermometer is a **stream producer** that produces an [=RDF Message Stream=] by publishing RDF Messages to the MQTT broker, which acts as the **stream consumer**.
- For each individual client, the MQTT broker is a **stream producer** that produces an [=RDF Message Stream=] by sending RDF Messages to the client, which acts as the **stream consumer**.

For example, for 5 clients subscribing to the topic, there would be 6 different [=RDF Message Streams=]: one from the IoT thermometer to the MQTT broker, and one from the MQTT broker to each of the 5 clients. Each of these streams may have different messages in them, due to the Quality-of-Service settings and the clients subscribing to the stream at different points in time.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're getting rather detailed here but I think this does not allow us to fully model the QoS 0 semantics of MQTT.

If messages get dropped during the communication, the producer has a different "view" of the stream than the consumer. The dropped message is included in the producer's view of the stream but not in the consumer's view of the stream. Using our definition of RDF Message Streams, even though it's the same stream, the sender see's a different sequence of messages than the producer. To avoid this "two views" problem we would need to have two message streams for each connection, one modeling the view of the producer, and one modeling the view of the consumer but that get's quickly complicated.

I don't know if we should mention that but maybe it's something to keep in mind.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think MQTT QoS 0 does not guarantee in-order delivery. This would be another point that leads to the "two views" of the same stream problem.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was wondering about both of these issues, but tried not to overcomplicate the explanation. I added a note on both issues – could you have a look now?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that notes clarify this distinction.

</div>

Issue: Add a diagram illustrating RDF Messages, an RDF Message Stream, stream producers, and stream consumers.

Issue: Find out and document the similarities/differences to the [RDF-JS Stream interface](https://rdf.js.org/stream-spec/)

## RDF Message Logs ## {#rdf-message-logs}

Expand Down Expand Up @@ -131,7 +148,7 @@ Note: Blank node identifiers in RDF Message Streams and RDF Message Logs are sco
# Serializing and parsing RDF Message Logs # {#rdf-message-logs-serialization}

In this specification we propose that all RDF serializations MUST implement a way to group quads into [=RDF Messages=].
This way, a [=stream consumer=] can write the stream into an [=RDF Message Log=] that can be read again by a [=stream producer=] into an [=RDF Message Stream=].
This way, a [=RDF Message Stream Consumer=] can write the stream into an [=RDF Message Log=] that can be read again by a [=RDF Message Stream Producer=] into an [=RDF Message Stream=].

Note: While we do define content types for the RDF Message Log serialization formats, this does not imply that the serialization needs to be used over HTTP only. The use of alternative transport mechanisms is equally valid and encouraged.

Expand Down Expand Up @@ -309,3 +326,24 @@ As an alternative, [Jelly-RDF](https://w3id.org/jelly) distributions are also av
A [Nanopublication](https://nanopub.net/) is a small RDF dataset that contains an assertion, its provenance, and publication information. Nanopublications are stored and exchanged by a network of services (registries and query endpoints). Exchanging each Nanopublication individually leads to significant overhead, due to repeated HTTP requests necessitated by the lack of a format for grouping multiple Nanopublications together. This issue was resolved by using [Jelly](https://w3id.org/jelly/) to serialize multiple Nanopublications into a single byte stream, where each Nanopublication corresponds to a [Jelly frame](https://w3id.org/jelly/dev/user-guide/#stream-frames).

Using an [=RDF Message Log=] serialization to group multiple Nanopublications into a single file would also solve this problem, while still allowing each Nanopublication to be processed individually as an [=RDF Message=].


<pre class=biblio>
{
"mqtt-5": {
"authors": [
"Andrew Banks",
"Ed Briggs",
"Ken Borgendale",
"Rahul Gupta"
],
"href": "https://docs.oasis-open.org/mqtt/mqtt/v5.0/mqtt-v5.0.html",
"title": "MQTT Version 5.0",
"status": "OASIS Standard",
"publisher": "OASIS",
"deliveredBy": [
"https://www.oasis-open.org/committees/mqtt/"
]
}
}
</pre>