Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add content reader to decouple NFS file reads [RHELDST-26339] #643

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions src/pushsource/_impl/list_cmd.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,10 @@ def represent_enum(cls, dumper: yaml.Dumper, value: enum.Enum):
# same as a plain string "foo".
return dumper.represent_data(value.value)

@classmethod
def represent_callable(cls, dumper: yaml.Dumper, value: callable):
return dumper.represent_data(getattr(value, "__name__", repr(value)))

@classmethod
def add_enum_representers(cls):
# Register our enum representer for any enum classes in the API.
Expand All @@ -59,6 +63,7 @@ def add_enum_representers(cls):

ItemDumper.add_enum_representers()
ItemDumper.add_representer(frozendict, ItemDumper.represent_dict)
ItemDumper.add_representer(type(lambda: None), ItemDumper.represent_callable)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to just filter these out rather than dumping them. (i.e. do not show opener field here, same as we don't show it in repr.)

Firstly because there is no way opener can survive a round-trip anyway, since we only serialize the callable's name and not the code. I'm pretty sure prior to this change, you can take the output of "pushsource-ls" and paste it into a python file and it'll actually work to construct push items, but that will be broken by this.

On top of that, this leaks some internal implementation details. open_src_local isn't public API, but its name is going to become visible here if we don't filter it out. If people actually use the serialized opener field for something, that can potentially block us from renaming or refactoring it later.



def format_python(item):
Expand Down
15 changes: 15 additions & 0 deletions src/pushsource/_impl/model/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@
optional,
optional_str,
)
from ..reader import PushItemReader
from ..utils.openers import open_src_local


LOG = logging.getLogger("pushsource")
Expand Down Expand Up @@ -178,6 +180,10 @@ def _default_build_info(self):
doesn't enforce this.
"""

opener = attr.ib(type=callable, default=open_src_local, repr=False)
"""Callable that gets the content of this push item. The content could be
retrived from `content()` method"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
retrived from `content()` method"""
retrieved from `content()` method"""

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this also needs to be a bit more specific regarding the contract that an "opener" needs to satisfy.

I would write something like:

The opener, when given a push item, should return a file-like object suitable for reading this item's bytes.


def with_checksums(self):
"""Return a copy of this push item with checksums present.

Expand Down Expand Up @@ -248,3 +254,12 @@ def with_checksums(self):
updated_sums[attribute] = hasher.hexdigest()

return attr.evolve(self, **updated_sums)

def content(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this should be a property, I think a plain method fits better.

It will work either way, but I think developers usually have an expectation that properties are cheap, and not subject to failure. It doesn't fit this case well, as this could potentially have to do HTTP requests or other slow and error-prone operations.

It also returns a different object on each call, which is not how a property would usually work. For instance someone might write code like this, intending to read the content in chunks:

while chunk := item.content.read():
  # oops, this is an infinite loop because every iteration
  # opens a new file and reads from the beginning!

This code is wrong but it doesn't look wrong. I think the above would more clearly stand out as wrong if it were a method rather than a property.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this. FileContentReader uses the same file object, however I missed that this created a new FileContentReader every time.

Copy link
Contributor Author

@rajulkumar rajulkumar Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried caching it but static failed with "W0642: Invalid assignment to self in method (self-cls-assignment) "

if not getattr(self, "_content", None):
     self = attr.evolve(self, _content=PushItemReader(self.opener(self))
return self._content

So, I though of using a weakref.WeakKeyDictionary() as cache and keep the reader for each instance but then it might return a new object if garbage collected and the cache size might grow too large.

Hence, I left it as-is assuming it's implied that it will return a new object now that it's not an attribute and is not bound/expected to remain same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think the way you've got it now is fine.

"""Returns a read-only, non-seekable content of this push item.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should probably elaborate on this a bit with an example, and also clarify what's expected for types of push items not designed to be read in this way.

Here's my attempt at fleshing out the docs a bit:

For push items representing a single file, content will obtain a stream for reading that file's bytes;
for example, RpmPushItem.content can be used to read the content of an RPM; VMIPushItem.content
can be used to read the content of a VMI and so on.

Not every type of push item can be read in this way. For example, a single ContainerPushItem
may represent any number of artifacts making up a container image, to be accessed from a container
image registry in the usual way - not using this method. Other types of push items such as
ErratumPushItem may be understood as metadata-only items and do not themselves have any content. For items such as these, this method will return None.


Returns:
:class:`~io.BufferedReader`
A non-seekable object of the push item content
"""
return PushItemReader(self.opener(self))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in my proposed doc addition, I think we need to make this also support a "return None" case, given there are many types of push items where this doesn't make sense.

My suggestion is that opener should be allowed to be None. If it is None, then content should also return None. Then set up non-openable types such as ContainerPushItem to initialize opener to None.

One thing I'm not sure of: should opener on the base class default to open_src_local and then be overridden to None in non-openable types? Or should it default to None in the base class and then be overridden to open_src_local in openable types?

At the moment, I'm leaning towards saying it should default to None in the base class.
The reason is that, when someone introduces a new PushItem subclass, I think it makes sense for the default behavior to be "don't support reading - unless the developer thinks about it and implements it". If it instead defaults to open_src_local on the base class then the default behavior will be "try to read, and crash if that doesn't make sense". This will probably lead to PushItem subclasses where content doesn't make sense but it has not been gracefully overriden to return None.

23 changes: 23 additions & 0 deletions src/pushsource/_impl/reader.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
from io import BufferedReader, SEEK_SET, UnsupportedOperation


class PushItemReader(BufferedReader):
# Internal class to ensure that the file-like content object returned by
# the push items are read-only and non-seekable with a name attribute.
def __init__(self, raw, name=None, **kwargs):
super(PushItemReader, self).__init__(raw, **kwargs)

# Attempt to assign name from the raw object if none is provided
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it might be better to drop this, and to just always force a name to come from the caller. Every push item has a name attribute, so maybe that can just be used.

The way you have it now, when combined with the current content implementation, means that the contract of opener is not only "returns a file-like object" but the more complicated "returns a file-like object with a non-empty name attribute".

I believe a lot of io streams do not have the name attribute, including those we're likely to see used in the future, such as the stream of an HTTP response body. So, insisting that opener must return something with a name seems to make the API more difficult to use.

self._name = name or getattr(super(), "name", None)
if not self._name:
raise ValueError("'name' not provided or availble from 'raw' object")

@property
def name(self):
return self._name or super().name()

def seekable(self):
return False

def seek(self, offset, whence=SEEK_SET):
raise UnsupportedOperation(f"Seek unsupported while reading {self.name}")
5 changes: 5 additions & 0 deletions src/pushsource/_impl/utils/openers.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
def open_src_local(item):
# default opener for the push items
# assumes that the item's 'src' points to the
# locally-accessible file
return open(item.src, "rb")
1 change: 1 addition & 0 deletions tests/baseline/cases/direct-cgw.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ items:
- dest
md5sum: null
name: cgw.yaml
opener: open_src_local
origin: direct
sha256sum: null
signing_key: null
Expand Down
1 change: 1 addition & 0 deletions tests/baseline/cases/direct-comps.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ items:
dest: []
md5sum: null
name: mycomps.xml
opener: open_src_local
origin: direct
sha256sum: null
signing_key: null
Expand Down
1 change: 1 addition & 0 deletions tests/baseline/cases/direct-dir.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ items:
- /destdir
md5sum: null
name: srcdir
opener: open_src_local
origin: direct
sha256sum: null
signing_key: null
Expand Down
1 change: 1 addition & 0 deletions tests/baseline/cases/direct-file.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ items:
display_order: null
md5sum: null
name: custom-filename
opener: open_src_local
origin: direct
sha256sum: null
signing_key: null
Expand Down
1 change: 1 addition & 0 deletions tests/baseline/cases/direct-modulemd-src.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ items:
dest: []
md5sum: null
name: modules.src.txt
opener: open_src_local
origin: direct
sha256sum: null
signing_key: null
Expand Down
1 change: 1 addition & 0 deletions tests/baseline/cases/direct-modulemd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ items:
dest: []
md5sum: null
name: my-best-module
opener: open_src_local
origin: direct
sha256sum: null
signing_key: null
Expand Down
1 change: 1 addition & 0 deletions tests/baseline/cases/direct-productid.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ items:
- repo3
md5sum: null
name: some-cert
opener: open_src_local
origin: direct
products:
- architecture:
Expand Down
1 change: 1 addition & 0 deletions tests/baseline/cases/direct-rpm.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ items:
md5sum: null
module_build: null
name: test.rpm
opener: open_src_local
origin: custom-origin
sha256sum: null
signing_key: A1B2C3
Expand Down
Loading
Loading