-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple includes implementation #2483
base: main
Are you sure you want to change the base?
Conversation
I would like to see if the approach makes sense and any other considerations before moving along with a more complete solution |
Thanks, in general this approach looks good to me. I like that it doesn't need any change on the installation process as everything happens during build. Let me discuss some details.
I think we could place this file under
In principle I would prefer to make this invisible, during Though if we make this completely invisible we would have to check that everything uses the built packages, I am not sure now if fields validation and pipeline tests work with built packages or with the source files. Maybe a third option is to keep some list of checksums of imported files, something similar to Another problem we may have is that CI in the integrations repository is aware of packages now, and on PRs builds are only executed for modified packages. We need to make CI aware of these includes, so if some packages use a file from the
Not sure about The configuration above could be also expressed like the following, that doesn't need to make any assumption on the structure of the repository, and is not so different:
This would also allow to include files located in other parts of the repository. For example the APM package used to live in the repository of the APM server, they could have shared files with this approach. In any case we have to be very careful with not allowing to traverse paths out of the root of the repository. Maybe we can leverage the new |
Thanks for taking a look!
I like this one 👍
I favored making it explicit thinking of the dev experience. Generally we will use this with file definitions and pipelines (mostly). I can imagine some frustration to figure out when things do not work as expected. On top of that, it felt kind of cumbersome to copy files on the fly or clean them up after packaging and prone to polluted repo if things go wrong at some point. But it is mostly a personal preference so I am not against any other option if it is the preferred approach.
Correct, I was thinking about adding a new command to elastic-package like About removing the package key, sounds good to me 👍 Will do the mentioned changes while we discuss the rest of things. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Marc, I really want something like this.
In addition to sharing field definitions between data streams, I would have liked to have this functionality for
- input configuration shared across data streams
- field definitions shared between source data stream and transform
I don't have any strong recommendations here, just some thoughts...
Inline vs whole file
Some of the issue discussion talks about inline includes, but the implementation here is on the whole file level. I think that's enough. It keeps things simple while allowing use with files of different types.
Dev experience positives and negatives
In terms of dev experience, I see these positives currently:
- copying the file into its final destination means you can read it in context or find its content with grep.
- showing diffs means you save some time figuring out the mismatch
However, if each data stream has a certain file and i want to modify it, I may not realize before my edit that it is supposed to stay in sync with the others, and I need to consult the includes.yml
to know which of the files is the source and which are the destinations.
Currently there's no help with conflict resolution. I may overwrite my new changes to a destination file by rebuilding the package.
An alternative to includes.yml
An alternative would be to have the copied files only go into the the package, and instead of includes.yml
, there could be filename.ext.link
files wherever we want filename.ext
(with the content of .link
files saying where the source is). It would make it clearer when the source content is from elsewhere, but there are drawbacks too.
Secure references
I think there's a potential security issue currently for something like this:
- package: system
from: ../../../../../../../../etc/passwd
to: LICENSE.txt
It would be good to lock it down a bit more.
Use outside of the integrations
repo
I think the functionality we settle on should work and be useful for packages that are developed outside of the integrations
repo. Currently there is an assumption that sibling directories of the package root are other packages.
This should be enough for this also as far as we do not enforce any particular file type.
I think this is a good point as a middleground as it makes it clear what file to look at. Will give it a go and see how it looks like 👍
This is addressed by the use of OpenRoot, in my last commit we already scope the top most dir at
I think this might be something we do as a second iteration, not sure how to solve this generally (maybe by allowing referencing git repositories, for example) since most use cases for this currently fall in this initial case. |
I made changes to the spec and elastic-package so now you can add Considerations
|
Made a last change so during pipeline benchmarks/tests the included pipelines are expanded before instaling them so they can run properly |
We would need to modify at least the Maybe we should also add a check that requires a changelog entry for packages whose linked files have been modified. To avoid overlooking publication of fixes in dependant packages. Maybe we should store a checksum of the linked file apart from its path, so there is some track of modifications in the dependent packages. What do you think about this option?
👍
This couples the feature to the structure of the integrations repository, we have packages on their own repository, and packages in other paths in other repositories. I would only require the path to be under the root of the repository. |
Added the commented link commands:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the feature looks good to me as proposed. Added some questions and comments about the proposal.
One thing I see could be a bit confusing is the mixed scope, maybe this feature should be agnostic of packages, and work always at the global level, taking into account the current directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, we will have to wait for the change in package spec before merging.
Well, one thing we have to remember is the use of linked files in tests that don't use built packages, at least pipelines in pipeline tests. |
THis is already taken into account in https://github.com/elastic/elastic-package/pull/2483/files#diff-6d6115d2523865659d234b34e7eaf8cf1aa35ecf8378cfa32b1bb14d8781aaf9R75 where we copy them locally to install the pipelines and the remove them. Probably not the cleanest approach, but could not think about anything simpler without implementing something more complete such as #1743, but seemed out of scope for this one |
I wonder if given this can link files in the scope of a complete repository, does it really make sense to have it specified in the spec? Maybe would be enough/more versatile to ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the direction.
It would be good to explain it in docs/howto/
.
From what I can see, running elastic-package check
won't check the links, but I think it should.
There are some things I'd like to confirm:
Target hash in the link file
Is the purpose of this to make the developer confirm the places that an updated source will have effects, and to trigger checks for changes in certain paths in CI? I think those are good enough reasons.
File name scoping
File names are relative to the git repo, right? I think that's good.
But it's a new requirement that a package is inside a git repo, isn't it?
The link target can be outside _dev/shared/
- that's just one possibility. Is that right?
Source vs build package
The link files are added in package-spec
, but they won't be present in the built package. Is that right? I'm a bit confused, because I thought that package spec defined the layout and content of the built package, but I see it also defines _dev
, which is not copied during build. This current situation also means the package spec definitions about e.g. field defs or ingest pipelines don't apply to the linked versions (at least according to the definition). I think your suggestion that it could be only in elastic-package makes sense. I guess ideally there would be package spec-type definitions for both package source and built package definitions. I may be missing something here about how it currently works.
Validations
What's the current status of validations? Does a link file get get the same validations as a non-link file would in that location? This seems important.
It looks like currently, during package build the copy function skips links, but then they are added in a separate build step. In the pipeline test part, it copies link targets are the ingest pipeline directory and deletes them later. This logic is split over package-spec and elastic-package.
I know it's a bigger change, but it seems like the best solution might be to have all file access go through one function that can resolve the links, so that validations or other processing could be handled exactly the same for linked and plain files. What do you think?
Correct, these are the main reasons.
I think the paths could be relative to the link file itself, for clarity, and to avoid adding this requirement at this point.
Yep, this is an open discussion, I also think we should have at some point different validators for source and built packages. There is an open issue about this elastic/package-spec#549 In the meantime I think that we should make the spec aware of linked files, so validations apply also on source files. For example to check limits on number of fields, or format of some files. Last changes in the Marc's PR in package spec go in this direction.
+1, Marc's latest changes in the spec PR go in this direction. |
💔 Build Failed
Failed CI StepsHistory
|
This is an initial implementation for the elastic/package-spec#89 proposal.
package-spec branch with the changes in https://github.com/elastic/package-spec/compare/main...marc-gr:package-spec:feat/includes?expand=1
With the current changes it allows to describe a
_dev/shared/includes.yml
file describing files to copy from other packages/data_streams at build time.Instead of making this an invisible process I initially opted to commit the files explicitly, to ease debugging, and
elastic-package check
takes care of noticing if they are out of sync, whilebuild
copies them.The initial layout of the file is quite naive and just to prove the point.
Summary:
_dev/shared/files/*
path to put arbitrary files that can be shared from different data streams eg: field definitionscheck
command to notice any out of date files.build
command to copy the files.Considerations:
Example usage:
With an
includes.yml
inwindows/_dev/shared
like