Skip to content

feat: add support for optional unixfs file link writer in create file #62

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

vasco-santos
Copy link
Member

@vasco-santos vasco-santos commented May 13, 2025

Feature: Add Support for Optional UnixFsFileLinkWriter in createFile

This PR adds support for passing an optional UnixFsFileLinkWriter to the createFile function, enabling consumers to extract structured FileLink metadata while writing UnixFS files (and eventually do not require data transformation to serve content addressable data).

What’s New

  • New Option: initOptions.unixFsFileLinkWriter
    • Allows callers of createFile to pass a UnixFsFileLinkWriter that receives a stream of FileLink objects as the file is chunked and encoded.
    • This makes it possible to build content-addressed metadata indexes alongside file creation, without reprocessing/transforming already stored data. One can store just the bytes of the root block to have how to start traversing the DAG

Why This Matter

  • Enables advanced use cases like bringing your own bucket to be served as content addressable data, but without requiring data transformation
  • allows implements to use "--no-copy" style supported by kubo
  • Improves integration with trustless gateway and verifiable blob streaming tools like @hash-stream.
  • Keeps the core createFile behavior unchanged when the writer is not provided.

Usage snippet

import {
  createWriter,
  createFileWriter,
} from '@vascosantos/unixfs'

import { withMaxChunkSize } from '@vascosantos/unixfs/file/chunker/fixed'
import { withWidth } from '@vascosantos/unixfs/file/layout/balanced'

const defaultSettings = UnixFS.configure({
  fileChunkEncoder: raw,
  smallFileEncoder: raw,
  chunker: withMaxChunkSize(1024 * 1024),
  fileLayout: withWidth(1024),
})

/**
 * @param {Blob} blob
 * @returns {Promise<import('@vascosantos/unixfs').FileLink[]>}
 */
async function collectUnixFsFileLinks(blob) {
  const fileLinks = []

  // Create a stream to collect metadata (FileLinks)
  const { readable, writable } = new TransformStream()

  // Set up the main UnixFS writer (data goes nowhere here)
  const unixfsWriter = createWriter({
    // Discard actual DAG output for example case.
    // One should eventually get Last block to have DAG structure
    // to start traversing this
    writable: new WritableStream(),
    settings: defaultSettings,
  })

  // Set up the file writer with link metadata writer
  const unixFsFileLinkWriter = writable.getWriter()

  const fileWriter = createFileWriter({
    ...unixfsWriter,
    initOptions: {
      unixFsFileLinkWriter,
    },
  })

  // Start concurrent reading of the metadata stream
  const fileLinkReader = readable.getReader()
  const readLinks = (async () => {
    while (true) {
      const { done, value } = await fileLinkReader.read()
      if (done) break
      fileLinks.push(value)
    }
  })()

  // Pipe the blob to the file writer
  await blob.stream().pipeTo(
    new WritableStream({
      async write(chunk) {
        await fileWriter.write(chunk)
      },
    })
  )

  // Finalize everything
  await fileWriter.close()
  await unixfsWriter.close()
  await unixFsFileLinkWriter.close()

  // Wait for all links to be read
  await readLinks

  return fileLinks
}

// Usage
const blob = new Blob(['Hello UnixFS links'])
const links = await collectUnixFsFileLinks(blob)
console.log(links)

@vasco-santos vasco-santos marked this pull request as draft May 13, 2025 11:28
@vasco-santos vasco-santos force-pushed the feat/expose-link-metadata-writer-in-create-file branch from 8d68b24 to 2e2c1b7 Compare May 16, 2025 10:04
@vasco-santos vasco-santos force-pushed the feat/expose-link-metadata-writer-in-create-file branch from 2e2c1b7 to a33ec24 Compare May 16, 2025 10:07
@vasco-santos vasco-santos changed the title feat: expose link metadata writer in create file feat: expose unixfs file link metadata writer in create file May 21, 2025
@vasco-santos vasco-santos changed the title feat: expose unixfs file link metadata writer in create file feat: add support for optional unixfs file link writer in create file May 21, 2025
@vasco-santos vasco-santos marked this pull request as ready for review May 21, 2025 16:31
@vasco-santos vasco-santos force-pushed the feat/expose-link-metadata-writer-in-create-file branch from a6023f4 to cb99618 Compare May 21, 2025 16:33
@@ -286,6 +307,7 @@ const encodeLeaf = function* ({ hasher, linker }, { id, content }, encoder) {
const link = /** @type {UnixFS.FileLink} */ ({
cid,
contentByteLength: content ? content.byteLength : 0,
contentByteOffset: content ? content.byteOffset : 0,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

propagates offset to be able to map original content byte offset

@vasco-santos
Copy link
Member Author

cc @alanshaw @Gozala can any of you please have a look here? 🙏🏼

Copy link
Collaborator

@Gozala Gozala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provided my feedback, reflecting how I would prefer going about this feature. Specifically I think there is a simpler path that does not introduce extra complexity & if I was maintaining this code I would requested to make proposed changes so it's easier to maintain the project. However, I'm not maintaining this code and therefor don't feel like it's my call to make, so my comments should be treated as recommendations and nothing more.

Comment on lines +200 to +216
if (!state.unixFsFileLinkWriter) {
return {
state: newState,
effect: Task.listen({
link: Task.effects(tasks),
block: writeBlock(state.writer, block),
end,
}),
}
}

return {
state: newState,
effect: Task.listen({
link: Task.effects(tasks),
block: writeBlock(state.writer, block),
fileLink: writeFileLink(state.unixFsFileLinkWriter, link),
Copy link
Collaborator

@Gozala Gozala Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!state.unixFsFileLinkWriter) {
return {
state: newState,
effect: Task.listen({
link: Task.effects(tasks),
block: writeBlock(state.writer, block),
end,
}),
}
}
return {
state: newState,
effect: Task.listen({
link: Task.effects(tasks),
block: writeBlock(state.writer, block),
fileLink: writeFileLink(state.unixFsFileLinkWriter, link),
return {
state: newState,
effect: Task.listen({
link: Task.effects(tasks),
block: writeBlock(state.writer, block),
fileLink: state.unixFsFileLinkWriter ? writeFileLink(state.unixFsFileLinkWriter, link) : undefined,

Nit: This feels more straightforward to me personally.

P.S.: See my other more comments that make this remark obsolete

// Set up the file writer with link metadata writer
const unixFsFileLinkWriter = writable.getWriter()

const fileWriter = createFileWriter({
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would personally suggest different API design. Specifically instead of passing another stream to write links to I would simply defined something like

interface FileFragment extends Block {
  link: UnixFS.FileLink
}

That way current consumer can continue using API as is while consumers interested in links could use .link to get it. Also streams have .tee() method allowing to have two different consumers if desired.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of regret not just making stream EncodedFile in which case we would not have this problem in first place, but perhaps switching to that as breaking change could be another way this could be introduced. You could still have something like createFileBlockWriter for old code and new code could use createFileWriter

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually thinking more about it I think it would be best to:

  1. Add createEncodedFileWriter that takes Writer<EncodedFile>.
  2. Create createFileBlockWriter that basically uses createEncodedFileWriter and maps values to .block.
  3. Create alias for createFileWriter with a deprecation messages suggesting to use one of the above two

I think this would create good migration path without having to introduce additional code paths or complexity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants