-
Notifications
You must be signed in to change notification settings - Fork 2
Add initial remote table registration implementation #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This worked as expected to produce a .tiledb file from my input .csv. I was able to use internal tooling to register this as an OMERO.table. This is a great workflow 🎉 ! In trying
I tried to pass in a dataframe instead using
I can fix both of the above by including a I do not get any links (to ROI or Image, IDs passed in above) within OMERO. I believe this is consistent with what you described:
So currently why would I use |
Thanks @erindiel
Yup, since we don't pre-scan tables in this mode we needed a default chunk size. That was the cause of both errors.
Yes
Realistically you wouldn't 😃 I anticipate that a later version of this functionality will perform the registration, so I've put the functionality in place ready for it. Strictly speaking people can import the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest changes resolved the issue with not specifying chunk_size
for upload_table()
. Looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of nitpicks. Worked for me on a couple of simple tests.
LOGGER.debug(f"Remote path would be {str(remote_path)}") | ||
if write_path.exists(): | ||
raise ValueError(f"Table file {write_path} already exists") | ||
# path.as_uri() exists but mangles any spaces in the path! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this comment means. Is it an explanation of why you use str(write_path)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. tiledb.from_pandas
' first argument is formally called uri
, but doesn't seem to handle escaped special characters like spaces. I wanted to head off people asking "why don't you just use the Pathlib to_uri
method?"
# Use a default chunk size if not set | ||
chunk_size = chunk_size or 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably just use a default value in the function signature instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'd normally do this. However in this context we may receive chunk_size=None
from the higher level omero2pandas.upload_table
function. This is used to indicate that the chunk size should be calculated automatically. With local tiledb we don't need to worry about ice message size limits so we don't run the pre-scan that figures out chunk size like we do with normal table uploads, so this line simply grants a fallback default.
Nonetheless I'll add some defaults to this function's signature so that people using it in isolation have an easier time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One README suggestion to clarify the current scope of this extension. Otherwise, happy to get this in
This PR begins work towards a mechanism for registering tables without uploading the data to the OMERO Managed Repository, so as to allow users to store table data elsewhere.
Per internal discussions this initial version operates offline. omero2pandas will create a local tiledb file and return the path, which the user should register with OMERO using external tooling. Future work will add a system for doing this registration automatically.
I've tried to match the omero-plus schema as closely as possible. Some testing for compatibility needs to be done.
N.b. For any of this to be useful the OMERO server must have the TileDB OMERO.tables backend installed. PyTables is the default backend.
Installation
Usage
Due to constraints in the API this will require an OMERO login. To test offline do the following:
chunk_size
denotes how much data to load/save in a single operation,remote_path
will eventually be used to provide a server-visible version oflocal_path
when performing the upload from a different machine than the OMERO server.