Skip to content

Commit d4e477a

Browse files
committed
Daedalus notes and pdf prototype
1 parent 4c2ea9e commit d4e477a

File tree

6 files changed

+390
-1
lines changed

6 files changed

+390
-1
lines changed

Cargo.lock

Lines changed: 8 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[workspace]
2-
members = [".", "generated/", "doodle-formats/"]
2+
members = [".", "generated/", "doodle-formats/", "experiments/doodle-pdf"]
33

44
[package]
55
name = "doodle"

Daedalus.md

Lines changed: 269 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,269 @@
1+
# Daedalus Notes
2+
3+
## 3.1 (Primitive Parsers)
4+
5+
* `^<value>` is achievable through `compute`
6+
* `Fail <msg>` requires a refactor we have discussed but never implemented
7+
* This could be a modification of `Fail` itself, or a wrapper using `StyleHint`
8+
* We don't have a notion of character classes for the parser `$[c]`, but we could emulate this with `ByteSet` constants
9+
* The closest thing to `Match <string>` is `is_bytes`, which stores an N-Tuple ([issue #260](https://github.com/yeslogic/doodle/issues/260) discusses this)
10+
11+
## 3.2 (Sequential Composition)
12+
13+
The syntax `{ P1; ...; Pn }` can be emulated with `LetFormat`/`MonadSeq`.
14+
15+
16+
### Example Translations
17+
18+
```ddl
19+
def Add2 =
20+
block
21+
let x = BEUInt64
22+
let y = BEUInt64
23+
^ x + y
24+
25+
def StatementSemi =
26+
block
27+
$$ = Statement
28+
$[';']
29+
```
30+
31+
```rust
32+
let add2 =
33+
chain(
34+
base.u64be(), "x",
35+
chain(
36+
base.u64be(),
37+
"y",
38+
compute(add(var("x"), var("y")))
39+
)
40+
);
41+
42+
let statement_semi =
43+
chain(
44+
statement, // defined elsewhere,
45+
"ret",
46+
monad_seq(
47+
is_byte(b';'),
48+
compute(var("ret"))
49+
)
50+
);
51+
```
52+
53+
## 3.3 (Parallel Composition)
54+
55+
There is currently no way of performing *unbiased* composition in `doodle`; all parallel compositions
56+
are first-come-first-served and will bias towards the first non-error result.
57+
58+
`<|` is therefore supported, while `|` is not.
59+
60+
## 3.4 (Repetition)
61+
62+
* Kleene Star - `Many <P>` is just `Repeat`, while `Many (1..) <P>` is `Repeat`
63+
* Kleene Star with State - At least some cases of `many (x = s) <P>` can be emulated with `Map(Repeat, LeftFold)`, while `AccumUntil` might be usable in other cases; there may be cases where neither are applicable, in which case a more bespoke `Format` may be required.
64+
* `map (k,v in c) <P>` and `for (x = s; k,v in c) <P>` could be emulated with `ForEach`, at least in certain instances.
65+
66+
## 3.5 (Branching)
67+
68+
* `case-of` (and, by extension, `if-then-else`) appear one-to-one with `Match`
69+
70+
## 3.6 (Coercions)
71+
72+
* There are no format-level types per-se, but various `Expr`s like `AsU64` and family accomplish much-the-same, albeit in a closed class rather with a constructive syntax.
73+
* `as?` (dynamic safe coercion) is closest to what we have, since `AsU{8,16,32,64}`/`AsChar` are runtime-checked; we currently have nothing of the sort of `as` (static safe) or `as!` (static lossy) coercion
74+
75+
## 5.1 (Bitdata)
76+
77+
Through helpers like `bit_fields_u8` and so forth, which can be defined as-needed, we have a plausible
78+
analogue to the `bitdata` declarations in Daedalus.
79+
80+
However, the current implementation of `BitFieldKind` is somewhat restrictive, in the following ways, compared to `bitdata`:
81+
82+
* It does not support type-coercions (e.g. u8 packed in a u16)
83+
* It does not support fixed-bits checking other than all-zero
84+
85+
These are features that could be added with various caveats, if necessary.
86+
87+
### Examples
88+
89+
```ddl
90+
bitdata Packed where
91+
x: uint 8
92+
0xFFFF
93+
y: uint 8
94+
95+
bitdata Uni where
96+
value = { get: Packed }
97+
null = 0
98+
```
99+
100+
```rust
101+
use doodle::helper::BitFieldKind::*;
102+
103+
let packed = bit_fields_u32([ // <- this is not yet defined
104+
BitsField { field_name: "x", bit_width: 8 },
105+
Reserved { bit_width: 16, check_zero: false },
106+
BitsField { field_name: "y", bit_width: 8 },
107+
]);
108+
109+
let uni = union_nondet([
110+
("null", is_bytes(&[0; 4])),
111+
("value", packed),
112+
]);
113+
```
114+
115+
This is more of a parse-level directive than a data-type declaration, however,
116+
as while in Daedalus the two are implicitly specified with the same declaration,
117+
in `doodle` the data-type is a synthetic implication of the parse declaration,
118+
and cannot be used in coercions; for that, we would need a separate declaration
119+
of a dependent `u32 -?-> Packed` computation that could then be fed in arbitrary
120+
arguments to interpret as `Packed`.
121+
122+
```rust
123+
use doodle::helper::BitFieldKind::*;
124+
let as_packed = module.define_format_args(
125+
"Packed-Coerce",
126+
[(Label::Borrowed("raw"), ValueType::Base(BaseType::U32))],
127+
cast_u32_bit_fields( // <- also not defined, but furthermore has no archetype
128+
var("raw"),
129+
[
130+
BitsField { field_name: "x", bit_width: 8 },
131+
Reserved { bit_width: 16, check_zero: false },
132+
BitsField { field_name: "y", bit_width: 8 },
133+
]
134+
)
135+
);
136+
```
137+
138+
## 5.2 (Automatic ADT Synthesis)
139+
140+
We have no first-class types in the specification language of `doodle`, and all
141+
types are implied through synthesis over the declared formats and expressions.
142+
As a result, type-ascriptions are syntactically unavailable.
143+
144+
Even currently, we can still at least ensure that two parsers have mutually
145+
compatible types, by defining a declaration-check marker-format that we run
146+
through type-checking but discard afterwards:
147+
148+
```rust
149+
let point = module.define_format("types.point", record([("x", base.u8()), ("y", base.u8())]));
150+
let point_x = module.define_format("types.point_x", record([("x", base.u8()), ("y", is_byte(0))]));
151+
152+
let __type_proof = module.define_format(
153+
"__TYPE_PROOF",
154+
monad_seq(
155+
union([point.call(), point_x.call()]),
156+
/* we can sequence other type-compatibility assertion-formats here as well */
157+
Format::EMPTY,
158+
)
159+
);
160+
```
161+
162+
Because every format needs a reified type for the module to be usable, but these
163+
type-ascriptions need not be a bijection, there would be no implicitly-defined
164+
alias `type PointX = Point` as would be synthesized by the corresponding
165+
Daedalus declarations; instead, whichever type-name is preferred would win, and
166+
both formats would receive verbatim-identical type-ascriptions.
167+
168+
169+
170+
While tagged unions in general are supportable, the example given for tagged
171+
unions cannot be constructed in `doodle` because of a lack of support for
172+
auto-recursive and mutually-recursive format-constructs. Implementing these is
173+
not *a priori* impossible, but would require a noticeable investment of effort
174+
into a design to support this, which would most notably require a
175+
termination-rule for otherwise infinitely-recursive type-checking.
176+
177+
## 6 (Lookahead and Stream Manipulation)
178+
179+
The concept of a `Stream` is not first-class within the `Format` model of `doodle`,
180+
though there are various combinators that interact with it.
181+
182+
* `GetStream` does not obviously have a one-to-one equivalent in `doodle`, though constructs that use it may be replicable in other ways
183+
* `Drop n s` does not properly exist as a first-class construction but can be emulated with some degree of ingenuity
184+
* `Take n s` itself is not quite analogous to anything in `doodle`
185+
* `SetStream` does not properly exist as a first-class construction but can be emulated with some degree of ingenuity
186+
* `Chunk n P` is equivalent to `Slice`
187+
* `Lookahead P` is equivalent to `Peek`
188+
* `WithStream` can be emulated using `DecodeBytes` up to a certain degree, where if the parser itself is responsible for determining where the stream ends, there may be issues in capturing the stream into a suitable buffer.
189+
190+
### Example
191+
192+
```ddl
193+
block
194+
let base = GetStream
195+
Many block
196+
let offset = Word64
197+
let here = GetStream
198+
SetStream (Drop offset base)
199+
$$ = ParseObject
200+
SetStream here
201+
```
202+
203+
```rust
204+
chain(
205+
Format::Pos,
206+
"base",
207+
repeat(
208+
chain(base.u64be(), "offset",
209+
with_relative_offset(Some(var("base")), var("offset"), parse_object)
210+
)
211+
)
212+
)
213+
```
214+
215+
## 7 (Eager vs. Lazy)
216+
217+
There is currently no support, at any layer, for parse-level laziness in the
218+
`doodle` processing model; there is some value-level laziness involving
219+
constructed sequences, but that is more of an representation-level optimization
220+
than a feature of the processing model, and aside from performance concerns,
221+
nothing would change if it were eliminated.
222+
223+
Multiple paths cannot be explored in parallel, both for unbounded/indefinite-length
224+
repetitions as well as for more explicit alternations over N branches. There are currently only two places where nondeterministic unions are used:
225+
226+
* At the top-level, for alternating between distinct data-formats (e.g. png, OpenType, gzip); and secondly, to allow fallback to uninterpreted bytes during speculative
227+
parsing of zlib-compressed UTF-8 in an `iTXt` PNG-chunk.
228+
229+
The latter usage is more of a band-aid against unwanted parse-failure, and due
230+
to the limitations of the error-propagation model whereby high-confidence
231+
partial-parses of a given top-format are nevertheless rejected altogether when
232+
even a single, possibly trivial component encounters uncaught parse-failure.
233+
Aside from mitigation within the operational model that would allow
234+
determinations such as 'correct format, malformed data', there are ways to get
235+
around this locally without adjusting the model, by using a separate construct
236+
from `UnionNondet` to avoid coupling one form of speculative parsing to the only
237+
version of something like that in the current implementation of `doodle`:
238+
239+
```rust
240+
fn try_with_fallback(f0: Format, f1: Format) -> Format {
241+
TryWithFallback(Box::new(f0), Box::new(f1))
242+
}
243+
244+
/* .. */
245+
246+
let zlib_utf8txt_or_raw =
247+
try_with_fallback(
248+
fmt_variant("valid", zlib_utf8text),
249+
fmt_variant("invalid", repeat(base.u8())),
250+
);
251+
```
252+
253+
This can be used in a broader sense, as a more generic 'permit local failure gracefully'
254+
construct:
255+
256+
```rust
257+
fn permit_fail(f: Format, dummy: Expr) -> Format {
258+
TryWithFallback(Box::new(f), Box::new(compute(dummy)))
259+
}
260+
261+
// vvv Usage Patterns vvv
262+
fn opt_success(f: Format) -> Format {
263+
permit_fail(fmt_some(f), expr_none())
264+
}
265+
266+
fn try_or_invalid(f: Format) -> Format {
267+
permit_fail(fmt_variant("valid", f), Expr::Variant("invalid".into(), Expr::UNIT))
268+
}
269+
```

experiments/doodle-pdf/Cargo.toml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
[package]
2+
name = "doodle-pdf"
3+
version = "0.1.0"
4+
edition = "2024"
5+
6+
[dependencies]
7+
doodle = { path = "../../" }
8+
doodle-formats = { path = "../../doodle-formats/" }

experiments/doodle-pdf/src/main.rs

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
mod pdf;
2+
3+
fn main() {
4+
println!("Hello, world!");
5+
}

0 commit comments

Comments
 (0)