Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions spnl/semantics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Details of Execution

The lifecycle of a span query includes:

- Input from client to the generate
- Output to client of the generate
- By-products: what the model server caches as a result of that generate

## Input Concerns

Input from client to the generate.

### Messages

```
(system m): a message with role "system" and content m
(user m): a message with role "user" and content m
(assistant m): a message with role "assistant" and content m
```

### Terminology

The terminology below has capitalized letters representing strings and
lowercase letters representing token sequences. We assume that when
mapping `A` to `a` the chat template is applied then the tokenizer.

```
A, B, C: these represent messages
a, b, c: these represent corresponding token sequences, with chat template applied
_: ensure that the preceding sequence both starts and ends on a block boundary
+: special token for begin span
x: special token for restore cross attention
```

### Rules

```
(seq A B C) -> abc
(plus A B C) -> (+a)_(+b)_(+c)_ meaning add + to each and ensure each starts and ends on a block boundary
(cross A B C) -> ab(xc)_ meaning add x before the last element and ensure (xc) starts and ends on a block boundary
```

### Examples

```
(cross A (plus B C) D) -> a(+b)_(+c)_(xd)_
```

## By-product of generate

What the model server caches as a result of that generate.

TODO