Skip to content

recreate() opens remote sinks in r+b mode, breaking simplecache:: write caching #1581

@MoAly98

Description

@MoAly98

The problem

uproot.recreate() with a simplecache:: path writes data locally but never uploads it to the remote server. The cause seems to be (from some digging) that uproot opens the sink in r+b mode, while simplecache only triggers its upload-on-close mechanism for wb mode (e.g. https://filesystem-spec.readthedocs.io/en/latest/features.html#remote-write-caching).

Reproducer

import uproot
import fsspec

path = "simplecache::root://server//store/user/test/output.root"
raw = path.replace("simplecache::", "")

# Write with simplecache — appears to succeed
with uproot.recreate(path) as f:
    f["Events"] = {"x": [1, 2, 3]}
print("WRITE: no exception")

# But data never reaches the remote storage, although a file is created
try:
    with uproot.open(f"{raw}:Events") as t:
        print(f"READ: {t.num_entries} entries")
except Exception as e:
   # Throws a zero-bytes read error
    print(f"READ: failed — {e}")

How I tried to disentangle the problem

I tried to figure out why the files seem to get created on the remote storage but it is corrupted/empty. So I tried to use unittest.mock.patch to trace calls to methods that would create the file on the remote server without writing anything on it (namely: mkdir, touch). It is hacky, but was insightful:

import uproot
import fsspec
from unittest.mock import patch
from fsspec_xrootd.xrootd import XRootDFileSystem

path = "simplecache::root://server//store/user/test/trace_output.root"

# Trace XRootDFileSystem operations and fsspec.open
traced_methods = ["mkdirs", "touch", "mkdir", "makedirs"]
originals = {}

def make_traced(name, original):
    def traced(self, *args, **kwargs):
        path_arg = args[0] if args else ""
        print(f"  FS.{name}({path_arg!r})")
        return original(self, *args, **kwargs)
    return traced

for method_name in traced_methods:
    original = getattr(XRootDFileSystem, method_name, None)
    if original is not None:
        originals[method_name] = original
        setattr(XRootDFileSystem, method_name, make_traced(method_name, original))

original_fsspec_open = fsspec.open
def traced_fsspec_open(p, *args, **kwargs):
    mode = args[0] if args else kwargs.get("mode", "?")
    print(f"  fsspec.open({p!r}, mode={mode!r})")
    return original_fsspec_open(p, *args, **kwargs)

try:
    with patch("fsspec.open", traced_fsspec_open):
        with uproot.recreate(path) as f:
            f["Events"] = {"x": [1, 2, 3]}
finally:
    for method_name, original in originals.items():
        setattr(XRootDFileSystem, method_name, original)

# Output:
#   FS.mkdirs('/store/user/test') --> creates parent dir on remote
#   FS.touch('/store/user/test/trace_output.root') --> creates 0-byte file on remote
#   fsspec.open('simplecache::root://...', mode='r+b')  --> opens in r+b, not wb

uproot opens the sink in r+b mode. With simplecache, this opens a local cached copy for read+write. Writes go to the local file, but I think simplecache does not upload on close because r+b is not a write mode from its perspective.

Expected behaviour

For simplecache:: paths, maybe recreate() should open in wb mode so that simplecache uploads on close?

Some other concerns

Writing files to remote storage via simplecache:: silently produces empty files. No exception is raised, which made it a pain to debug. The empty file is visible on the remote server because uproot calls mkdirs() + touch() on the underlying remote filesystem before opening. These create a 0-byte file that is never overwritten with actual data since the upload never happens.

Note: for simplecache::root:// specifically, there might be a missing piece in fsspec-xrootd that needs to be implemented. Writes through S3 seem to work with simplecache, but not xrd. I opened a separate issue there.

cc: @oshadura @alexander-held @nsmith- @ariostas

Metadata

Metadata

Assignees

No one assigned

    Labels

    bug (unverified)The problem described would be a bug, but needs to be triaged

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions