Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking Change: FileRequired #301

Closed
vpasdf opened this issue Nov 20, 2024 · 0 comments · Fixed by #298
Closed

Breaking Change: FileRequired #301

vpasdf opened this issue Nov 20, 2024 · 0 comments · Fixed by #298
Assignees

Comments

@vpasdf
Copy link
Collaborator

vpasdf commented Nov 20, 2024

TLDR: The FileRequired function will change it's interface. If you just use the core lib or cli, no change needed. If you use plugins directly or have your own plugins, you need to migrate.

before:

FileRequired(path string, fileinfo fs.FileInfo) bool

after:

FileRequired(api FileAPI) bool

type FileAPI interface {
	Stat() (fs.FileInfo, error)
	Path() string
}

Why?

Currently Stat is called on every file. FileRequired checks in most plugins 2 things: 1. if the path matches a certain pattern 2. if the filesize is within limits, as a safe guard against OOM and timeouts. Given that many files usually don't match any plugin, Stat is called, but the result is never used. Scalibr spends up to 50% in Stat calls. We can fix this by making Stat lazy. Thus FileRequired gets an API, which allows the plugin to call Stat() only when needed. The core library makes sure that Stat is actually only called once on the file.

History

FileRequired(path string) bool

At the beginning FileRequired just got the path.

FileRequired(path string, mode FileMode) bool

For binary extractors (e.g. gobinaries) Scalibr needs to know if the file has the executable bit set. Thus we added FileMode as a parameter. FileMode is in the DirEntry we get from the walk function.
This was a bug, since Walk does not set the executable bit in FileMode, only things like if the file is a dir or regular file are set. Thus the core lib started to call Stat on all files

FileRequired(path string, info FileInfo) bool

For production applications we need to make sure Scalibr does not consume too much resources, especially RAM and CPU. Thus we added a file size limit to the plugins which are used in one of those environments.
FileMode does not contain the file size, but FileInfo contains FileMode and the files size. Thus we changed the interface to accept FileInfo.

[alternative] FileRequired(path string, stat func() (fs.FileInfo, error)) bool

Scalibr now calls Stat on all files on the file system, but is only interested in certain files (determined by file name) and limits those by size. Those stat calls are responsible for up to 50% of the runtime of Scalibr. To fix this FileRequired gets access to the stat function, to be able to call Stat only when necessary. Additionally the Core lib caches the stat calls, such that if multiple plugins need to Stat the same file, there is only one Stat call to the os.

[this proposal] FileRequired(path string, api FileApi) bool

With growing adoption of Scalibr these API changes get expensive, thus we introduce an abstraction, such that future changes do not break the plugin.

@vpasdf vpasdf pinned this issue Nov 20, 2024
@vpasdf vpasdf self-assigned this Nov 25, 2024
@vpasdf vpasdf linked a pull request Nov 25, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant