Skip to content

feat: add support for optional unixfs file link writer in create file #62

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 83 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,88 @@ const demo = async blob => {
}
```

### Collecting UnixFS FileLinks

You can optionally pass a unixFsFileLinkWriter stream to capture metadata for each link (useful for indexing or tracking layout information).

```js
import {
createWriter,
createFileWriter,
} from '@ipld/unixfs'

import { withMaxChunkSize } from '@ipld/unixfs/file/chunker/fixed'
import { withWidth } from '@ipld/unixfs/file/layout/balanced'

const defaultSettings = UnixFS.configure({
fileChunkEncoder: raw,
smallFileEncoder: raw,
chunker: withMaxChunkSize(1024 * 1024),
fileLayout: withWidth(1024),
})

/**
* @param {Blob} blob
* @returns {Promise<import('@vascosantos/unixfs').FileLink[]>}
*/
async function collectUnixFsFileLinks(blob) {
const fileLinks = []

// Create a stream to collect metadata (FileLinks)
const { readable, writable } = new TransformStream()

// Set up the main UnixFS writer (data goes nowhere here)
const unixfsWriter = createWriter({
writable: new WritableStream(), // Discard actual DAG output
settings: defaultSettings,
})

// Set up the file writer with link metadata writer
const unixFsFileLinkWriter = writable.getWriter()

const fileWriter = createFileWriter({
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would personally suggest different API design. Specifically instead of passing another stream to write links to I would simply defined something like

interface FileFragment extends Block {
  link: UnixFS.FileLink
}

That way current consumer can continue using API as is while consumers interested in links could use .link to get it. Also streams have .tee() method allowing to have two different consumers if desired.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of regret not just making stream EncodedFile in which case we would not have this problem in first place, but perhaps switching to that as breaking change could be another way this could be introduced. You could still have something like createFileBlockWriter for old code and new code could use createFileWriter

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually thinking more about it I think it would be best to:

  1. Add createEncodedFileWriter that takes Writer<EncodedFile>.
  2. Create createFileBlockWriter that basically uses createEncodedFileWriter and maps values to .block.
  3. Create alias for createFileWriter with a deprecation messages suggesting to use one of the above two

I think this would create good migration path without having to introduce additional code paths or complexity.

...unixfsWriter,
initOptions: {
unixFsFileLinkWriter,
},
})

// Start concurrent reading of the metadata stream
const fileLinkReader = readable.getReader()
const readLinks = (async () => {
while (true) {
const { done, value } = await fileLinkReader.read()
if (done) break
fileLinks.push(value)
}
})()

// Pipe the blob to the file writer
await blob.stream().pipeTo(
new WritableStream({
async write(chunk) {
await fileWriter.write(chunk)
},
})
)

// Finalize everything
await fileWriter.close()
await unixfsWriter.close()
await unixFsFileLinkWriter.close()

// Wait for all links to be read
await readLinks

return fileLinks
}

// Usage
const blob = new Blob(['Hello UnixFS links'])
const links = await collectUnixFsFileLinks(blob)
console.log(links)
```

## License

Licensed under either of
Expand All @@ -144,4 +226,4 @@ Unless you explicitly state otherwise, any contribution intentionally submitted
[readablestream]: https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream
[car]: https://ipld.io/specs/transport/car/carv1/
[`transformstream`]: https://developer.mozilla.org/en-US/docs/Web/API/TransformStream
[`writablestream`]: https://developer.mozilla.org/en-US/docs/Web/API/WritableStream
[`writablestream`]: https://developer.mozilla.org/en-US/docs/Web/API/WritableStream
3 changes: 2 additions & 1 deletion src/api.ts
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ import type {
Options as DirectoryWriterOptions,
State as DirectoryWriterState,
} from "./directory.js"
import { Metadata } from "./unixfs.js"
import { Metadata, FileLink } from "./unixfs.js"

export type {
WriterOptions,
Expand All @@ -47,6 +47,7 @@ export type {
MultihashHasher,
MultihashDigest,
Metadata,
FileLink,
}

/**
Expand Down
15 changes: 11 additions & 4 deletions src/file.js
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ export const defaults = () => ({
* @param {Partial<API.EncoderSettings<Layout>>} config
* @returns {API.EncoderSettings<Layout>}
*/
export const configure = config => ({
export const configure = (config) => ({
...defaults(),
...config,
})
Expand All @@ -50,8 +50,15 @@ export const UnixFSRawLeaf = {
* @param {API.Options<Layout>} options
* @returns {API.View<Layout>}
*/
export const create = ({ writer, metadata = {}, settings = defaults() }) =>
new FileWriterView(Writer.init(writer, metadata, configure(settings)))
export const create = ({
writer,
metadata = {},
settings = defaults(),
initOptions = {},
}) =>
new FileWriterView(
Writer.init(writer, metadata, configure(settings), initOptions)
)

/**
* @template T
Expand Down Expand Up @@ -98,7 +105,7 @@ export const close = async (
*/
const perform = (view, effect) =>
Task.fork(
Task.loop(effect, message => {
Task.loop(effect, (message) => {
const { state, effect } = Writer.update(message, view.state)
view.state = state
return effect
Expand Down
7 changes: 7 additions & 0 deletions src/file/api.ts
Original file line number Diff line number Diff line change
Expand Up @@ -72,10 +72,17 @@ export interface EncoderSettings<Layout extends unknown = unknown> {
linker: Linker
}

export interface InitOptions {
unixFsFileLinkWriter?: UnixFsFileLinkWriter
}

export interface UnixFsFileLinkWriter extends StreamWriter<UnixFS.FileLink> {}

export interface Options<Layout = unknown> {
writer: BlockWriter
metadata?: UnixFS.Metadata
settings?: EncoderSettings<Layout>
initOptions?: InitOptions
}

export interface CloseOptions {
Expand Down
51 changes: 45 additions & 6 deletions src/file/writer.js
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ import * as Queue from "./layout/queue.js"
* readonly metadata: UnixFS.Metadata
* readonly config: API.EncoderSettings<Layout>
* readonly writer: API.BlockWriter
* readonly unixFsFileLinkWriter?: API.UnixFsFileLinkWriter
* chunker: Chunker.Chunker
* layout: Layout
* nodeQueue: Queue.Queue
Expand All @@ -25,6 +26,7 @@ import * as Queue from "./layout/queue.js"
* readonly metadata: UnixFS.Metadata
* readonly config: API.EncoderSettings<Layout>
* readonly writer: API.BlockWriter
* readonly unixFsFileLinkWriter?: API.UnixFsFileLinkWriter
* readonly rootID: Layout.NodeID
* readonly end?: Task.Fork<void, never>
* chunker?: null
Expand All @@ -39,6 +41,7 @@ import * as Queue from "./layout/queue.js"
* readonly metadata: UnixFS.Metadata
* readonly config: API.EncoderSettings<Layout>
* readonly writer: API.BlockWriter
* readonly unixFsFileLinkWriter?: API.UnixFsFileLinkWriter
* readonly link: Layout.Link
* chunker?: null
* layout?: null
Expand All @@ -63,6 +66,7 @@ import * as Queue from "./layout/queue.js"
* |{type:"write", bytes:Uint8Array}
* |{type:"link", link:API.EncodedFile}
* |{type:"block"}
* |{type:"fileLink"}
* |{type: "close"}
* |{type: "end"}
* } Message
Expand All @@ -82,6 +86,9 @@ export const update = (message, state) => {
/* c8 ignore next 2 */
case "block":
return { state, effect: Task.none() }
/* c8 ignore next 2 */
case "fileLink":
return { state, effect: Task.none() }
case "close":
return close(state)
case "end":
Expand All @@ -96,9 +103,10 @@ export const update = (message, state) => {
* @param {API.BlockWriter} writer
* @param {UnixFS.Metadata} metadata
* @param {API.EncoderSettings} config
* @param {API.InitOptions} [options]
* @returns {State<Layout>}
*/
export const init = (writer, metadata, config) => {
export const init = (writer, metadata, config, options = {}) => {
return {
status: "open",
metadata,
Expand All @@ -116,6 +124,7 @@ export const init = (writer, metadata, config) => {
// overhead.
// @see https://github.com/Gozala/vectrie
nodeQueue: Queue.mutable(),
unixFsFileLinkWriter: options.unixFsFileLinkWriter,
}
}
/**
Expand Down Expand Up @@ -188,11 +197,23 @@ export const link = (state, { id, link, block }) => {
? state.end.resume()
: Task.none()

if (!state.unixFsFileLinkWriter) {
return {
state: newState,
effect: Task.listen({
link: Task.effects(tasks),
block: writeBlock(state.writer, block),
end,
}),
}
}

return {
state: newState,
effect: Task.listen({
link: Task.effects(tasks),
block: writeBlock(state.writer, block),
fileLink: writeFileLink(state.unixFsFileLinkWriter, link),
Comment on lines +200 to +216
Copy link
Collaborator

@Gozala Gozala Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!state.unixFsFileLinkWriter) {
return {
state: newState,
effect: Task.listen({
link: Task.effects(tasks),
block: writeBlock(state.writer, block),
end,
}),
}
}
return {
state: newState,
effect: Task.listen({
link: Task.effects(tasks),
block: writeBlock(state.writer, block),
fileLink: writeFileLink(state.unixFsFileLinkWriter, link),
return {
state: newState,
effect: Task.listen({
link: Task.effects(tasks),
block: writeBlock(state.writer, block),
fileLink: state.unixFsFileLinkWriter ? writeFileLink(state.unixFsFileLinkWriter, link) : undefined,

Nit: This feels more straightforward to me personally.

P.S.: See my other more comments that make this remark obsolete

end,
}),
}
Expand All @@ -203,7 +224,7 @@ export const link = (state, { id, link, block }) => {
* @param {State<Layout>} state
* @returns {Update<Layout>}
*/
export const close = state => {
export const close = (state) => {
if (state.status === "open") {
const { chunks } = Chunker.close(state.chunker)
const { layout, ...write } = state.config.fileLayout.write(
Expand Down Expand Up @@ -269,7 +290,7 @@ export const close = state => {
* @param {API.EncoderSettings} config
*/
const encodeLeaves = (leaves, config) =>
leaves.map(leaf => encodeLeaf(config, leaf, config.fileChunkEncoder))
leaves.map((leaf) => encodeLeaf(config, leaf, config.fileChunkEncoder))

/**
* @param {API.EncoderSettings} config
Expand All @@ -286,6 +307,7 @@ const encodeLeaf = function* ({ hasher, linker }, { id, content }, encoder) {
const link = /** @type {UnixFS.FileLink} */ ({
cid,
contentByteLength: content ? content.byteLength : 0,
contentByteOffset: content ? content.byteOffset : 0,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

propagates offset to be able to map original content byte offset

dagByteLength: bytes.byteLength,
})

Expand All @@ -297,7 +319,7 @@ const encodeLeaf = function* ({ hasher, linker }, { id, content }, encoder) {
* @param {API.EncoderSettings} config
*/
const encodeBranches = (nodes, config) =>
nodes.map(node => encodeBranch(config, node))
nodes.map((node) => encodeBranch(config, node))

/**
* @template Layout
Expand Down Expand Up @@ -338,13 +360,30 @@ export const writeBlock = function* (writer, block) {
writer.write(block)
}

/**
* @param {API.UnixFsFileLinkWriter} writer
* @param {Layout.Link} link
* @returns {Task.Task<void, never>}
*/

export const writeFileLink = function* (writer, link) {
/* c8 ignore next 3 */
if (!writer) {
return
}
if ((writer.desiredSize || 0) <= 0) {
yield* Task.wait(writer.ready)
}
writer.write(link)
}

/**
*
* @param {Uint8Array|Chunker.Chunk} buffer
* @returns
*/

const asUint8Array = buffer =>
const asUint8Array = (buffer) =>
buffer instanceof Uint8Array
? buffer
: buffer.copyTo(new Uint8Array(buffer.byteLength), 0)
Expand All @@ -353,4 +392,4 @@ const asUint8Array = buffer =>
* @param {Layout.Node} node
* @returns {node is Layout.Leaf}
*/
const isLeafNode = node => node.children == null
const isLeafNode = (node) => node.children == null
14 changes: 12 additions & 2 deletions src/unixfs.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,15 @@ import type {
Link as IPLDLink,
Version as LinkVersion,
Block as IPLDBlock,
BlockView as IPLDBlockView
BlockView as IPLDBlockView,
} from "multiformats"
import { Data, type IData } from "../gen/unixfs.js"
export type { MultihashHasher, MultibaseEncoder, MultihashDigest, BlockEncoder }
export type {
MultihashHasher,
MultibaseEncoder,
MultihashDigest,
BlockEncoder,
}
export * as Layout from "./file/layout/api"

import NodeType = Data.DataType
Expand Down Expand Up @@ -161,6 +166,11 @@ export interface ContentDAGLink<T> extends DAGLink<T> {
* Total number of bytes in the file
*/
readonly contentByteLength: number

/**
* Offset bytes in the file
*/
readonly contentByteOffset?: number
}

/**
Expand Down
Loading