-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add content reader to decouple NFS file reads [RHELDST-26339] #643
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -16,6 +16,8 @@ | |||||
optional, | ||||||
optional_str, | ||||||
) | ||||||
from ..reader import PushItemReader | ||||||
from ..utils.openers import open_src_local | ||||||
|
||||||
|
||||||
LOG = logging.getLogger("pushsource") | ||||||
|
@@ -178,6 +180,10 @@ def _default_build_info(self): | |||||
doesn't enforce this. | ||||||
""" | ||||||
|
||||||
opener = attr.ib(type=callable, default=open_src_local, repr=False) | ||||||
"""Callable that gets the content of this push item. The content could be | ||||||
retrived from `content()` method""" | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this also needs to be a bit more specific regarding the contract that an "opener" needs to satisfy. I would write something like: The opener, when given a push item, should return a file-like object suitable for reading this item's bytes. |
||||||
|
||||||
def with_checksums(self): | ||||||
"""Return a copy of this push item with checksums present. | ||||||
|
||||||
|
@@ -248,3 +254,12 @@ def with_checksums(self): | |||||
updated_sums[attribute] = hasher.hexdigest() | ||||||
|
||||||
return attr.evolve(self, **updated_sums) | ||||||
|
||||||
def content(self): | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure this should be a property, I think a plain method fits better. It will work either way, but I think developers usually have an expectation that properties are cheap, and not subject to failure. It doesn't fit this case well, as this could potentially have to do HTTP requests or other slow and error-prone operations. It also returns a different object on each call, which is not how a property would usually work. For instance someone might write code like this, intending to read the content in chunks: while chunk := item.content.read():
# oops, this is an infinite loop because every iteration
# opens a new file and reads from the beginning! This code is wrong but it doesn't look wrong. I think the above would more clearly stand out as wrong if it were a method rather than a property. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for pointing this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried caching it but static failed with "W0642: Invalid assignment to self in method (self-cls-assignment) "
So, I though of using a weakref.WeakKeyDictionary() as cache and keep the reader for each instance but then it might return a new object if garbage collected and the cache size might grow too large. Hence, I left it as-is assuming it's implied that it will return a new object now that it's not an attribute and is not bound/expected to remain same. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I think the way you've got it now is fine. |
||||||
"""Returns a read-only, non-seekable content of this push item. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should probably elaborate on this a bit with an example, and also clarify what's expected for types of push items not designed to be read in this way. Here's my attempt at fleshing out the docs a bit:
|
||||||
|
||||||
Returns: | ||||||
:class:`~io.BufferedReader` | ||||||
A non-seekable object of the push item content | ||||||
""" | ||||||
return PushItemReader(self.opener(self)) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As mentioned in my proposed doc addition, I think we need to make this also support a "return None" case, given there are many types of push items where this doesn't make sense. My suggestion is that One thing I'm not sure of: should At the moment, I'm leaning towards saying it should default to |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
from io import BufferedReader, SEEK_SET, UnsupportedOperation | ||
|
||
|
||
class PushItemReader(BufferedReader): | ||
# Internal class to ensure that the file-like content object returned by | ||
# the push items are read-only and non-seekable with a name attribute. | ||
def __init__(self, raw, name=None, **kwargs): | ||
super(PushItemReader, self).__init__(raw, **kwargs) | ||
|
||
# Attempt to assign name from the raw object if none is provided | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that it might be better to drop this, and to just always force a The way you have it now, when combined with the current I believe a lot of io streams do not have the |
||
self._name = name or getattr(super(), "name", None) | ||
if not self._name: | ||
raise ValueError("'name' not provided or availble from 'raw' object") | ||
|
||
@property | ||
def name(self): | ||
return self._name or super().name() | ||
|
||
def seekable(self): | ||
return False | ||
|
||
def seek(self, offset, whence=SEEK_SET): | ||
raise UnsupportedOperation(f"Seek unsupported while reading {self.name}") |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
def open_src_local(item): | ||
# default opener for the push items | ||
# assumes that the item's 'src' points to the | ||
# locally-accessible file | ||
return open(item.src, "rb") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to just filter these out rather than dumping them. (i.e. do not show
opener
field here, same as we don't show it inrepr
.)Firstly because there is no way
opener
can survive a round-trip anyway, since we only serialize the callable's name and not the code. I'm pretty sure prior to this change, you can take the output of "pushsource-ls" and paste it into a python file and it'll actually work to construct push items, but that will be broken by this.On top of that, this leaks some internal implementation details.
open_src_local
isn't public API, but its name is going to become visible here if we don't filter it out. If people actually use the serializedopener
field for something, that can potentially block us from renaming or refactoring it later.