-
Notifications
You must be signed in to change notification settings - Fork 34
suggestion for reading subcrates #244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
I had a quick look and I quite like this implementation at first glance. Some extra suggestions:
|
|
Thanks for the quick feedback ! I have pushed a few more commits to allow things like For the logging, it seems it's not used in the codebase yet, should I just use the logging package, initializing a default logger instance like Now I am wondering if a Happy to discuss this in the drop-in call tomorrow 😉 EDIT : I also commited a .pre-commit-config.yaml file to help enforcing flake8 syntax, could be removed before merging of course |
|
I went through the code and did some testing, which exposed problems: With >>> from rocrate.rocrate import ROCrate
>>> crate = ROCrate("test/test-data/crate_with_subcrate")
>>> d = crate.get("subcrate/")
>>> d
<subcrate/ Dataset>
>>> d.get("conformsTo")
'https://w3id.org/ro/crate/'
>>> crate.write("/tmp/crate")With >>> from rocrate.rocrate import ROCrate
>>> crate = ROCrate("test/test-data/crate_with_subcrate", parse_subcrate=True)
>>> d = crate.get("subcrate/")
>>> d
<subcrate/ Dataset>
>>> d.get("conformsTo") # this fails
>>> crate.write("/tmp/crate")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/simleo/git/ro-crate-py/rocrate/rocrate.py", line 567, in write
writable_entity.write(base_path)
File "/home/simleo/git/ro-crate-py/rocrate/model/metadata.py", line 97, in write
super()._write_from_stream(write_path)
File "/home/simleo/git/ro-crate-py/rocrate/model/file.py", line 63, in _write_from_stream
for _, chunk in self.stream():
File "/home/simleo/git/ro-crate-py/rocrate/model/metadata.py", line 90, in stream
yield self.id, str.encode(json.dumps(content, indent=4, sort_keys=True), encoding='utf-8')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/json/__init__.py", line 238, in dumps
**kw).encode(obj)
^^^^^^^^^^^
File "/usr/lib/python3.12/json/encoder.py", line 202, in encode
chunks = list(chunks)
^^^^^^^^^^^^
File "/usr/lib/python3.12/json/encoder.py", line 432, in _iterencode
yield from _iterencode_dict(o, _current_indent_level)
File "/usr/lib/python3.12/json/encoder.py", line 406, in _iterencode_dict
yield from chunks
File "/usr/lib/python3.12/json/encoder.py", line 326, in _iterencode_list
yield from chunks
File "/usr/lib/python3.12/json/encoder.py", line 406, in _iterencode_dict
yield from chunks
File "/usr/lib/python3.12/json/encoder.py", line 326, in _iterencode_list
yield from chunks
File "/usr/lib/python3.12/json/encoder.py", line 439, in _iterencode
o = _default(o)
^^^^^^^^^^^
File "/usr/lib/python3.12/json/encoder.py", line 180, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type File is not JSON serializableI think this has something to do with the hacking of >>> from rocrate.rocrate import ROCrate
>>> crate = ROCrate("test/test-data/crate_with_subcrate", parse_subcrate=True)
>>> subcrate = crate.get("subcrate/")
>>> subcrate.subcrate
>>> subcrate.get("conformsTo")
>>> subcrate.subcrate
<rocrate.rocrate.ROCrate object at 0x7b6edf975160>
>>> subcrate.subcrate.subcrate_entities
[<subsubcrate/ Dataset>]
>>> subcrate.get("conformsTo")
>>> The new structure looks very confusing. I did not have the time to work on it yet, but I have the feeling that subcrate support could / should be done without using a special The main thing I would like to point out about the current solution is that it hacks critical sections of the code, so getting it right is harder than it looks. |
also fix flake8 with precommit
also prevents directly accessing items listed in subcrate under hasPart e.g subcrate.get("subfile.txt")
|
Thanks for the feedback Simone, I added a couple changes. First I made sure any attribute on the original dataset entity (such as I also changed a bit the behaviour, such that the The |
Hi guys,
live from the deNBI hackathon here, I have been playing with reading an entity referencing a subcrate i.e traversing the graph from the top crate to a subcrate, as suggested in the 1.2 spec.
I am proposing a simple approach here, with a new
Subcrateclass extending theDatasetclass.I defined this class in the main
rocrate.pyfile, inmodels.pyit would cause circular dependencies.This would allow things like
(see added tests too)
at this point I am mostly interested to know if you think that could be a viable approach before going further.
The implementation is such that the subcrate is only loaded when accessing some of its attribute, to avoid potentially loading large amount of metadata, as one purpose of the subcrate is also to reduce the amount of information in the main crate.