Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add transitive extraction for Maven pom.xml #399

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

cuixq
Copy link
Contributor

@cuixq cuixq commented Jan 17, 2025

This PR adds a new pomxmlnet extractor to first parse pom.xml then perform dependency resolution to extract both direct and transitive dependencies. Most code are migrated from OSV-Scanner but switch to use the virtual file system.

To achieve this goal, this PR also adds some internal packages:

  • mavenutil: Maven utilities to help parse and resolve dependencies
  • datasource: clients to talk to registries for package and version information
  • resolution: clients (and tests) needed for dependency resolution

We still need to figure out how to registry this extractor to the list and only enable it with some flags.

Copy link
Collaborator

@another-rex another-rex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a quick pass.

internal/datasource/maven_registry.go Show resolved Hide resolved
@cuixq cuixq marked this pull request as ready for review January 21, 2025 00:04
@cuixq cuixq requested a review from erikvarga January 21, 2025 00:08
internal/datasource/cache.go Show resolved Hide resolved
extractor/filesystem/language/java/pomxmlnet/extractor.go Outdated Show resolved Hide resolved
extractor/filesystem/language/java/pomxmlnet/extractor.go Outdated Show resolved Hide resolved
internal/datasource/cache.go Show resolved Hide resolved
extractor/filesystem/language/java/pomxmlnet/extractor.go Outdated Show resolved Hide resolved
internal/datasource/fixtures/maven_settings/settings.xml Outdated Show resolved Hide resolved
AuthDigest
)

// HTTPAuthentication holds the information needed for general HTTP Authentication support.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have OSV-Scanner specific code in it? It sounds like something that'd be available in third-party libraries.
If so maybe we can add to the comment why we're using this instead of an existing lib.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually struggled to find third-party libraries that did this HTTP authentication for clients.

But regardless, this HTTPAuthentication struct was made to be configurable enough to be able to copy the how the native mvn/npm tooling approaches authentication. We could mention something to that effect in the comment.

internal/datasource/insights.go Show resolved Hide resolved
internal/datasource/insights.go Show resolved Hide resolved
extractor/filesystem/language/java/pomxmlnet/extractor.go Outdated Show resolved Hide resolved
@erikvarga
Copy link
Collaborator

Mostly had a couple of style-related comments (consistency with rest of OSV-SCALIBR and google-internal linters), I'll defer to Rex for the functional parts of the migration from the OSV-Scanner.

Copy link
Collaborator

@another-rex another-rex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a bunch of comments, a lot of them are probably from a lack of domain knowledge about Maven, feel free to ignore the ones that don't make sense.

internal/mavenutil/maven.go Outdated Show resolved Hide resolved
internal/mavenutil/maven.go Outdated Show resolved Hide resolved
internal/mavenutil/maven.go Outdated Show resolved Hide resolved
internal/mavenutil/maven.go Outdated Show resolved Hide resolved
internal/mavenutil/maven.go Outdated Show resolved Hide resolved
internal/mavenutil/maven.go Outdated Show resolved Hide resolved
internal/mavenutil/maven.go Show resolved Hide resolved
return nil, fmt.Errorf("failed resolving %v: %w", root, err)
}
for i, e := range g.Edges {
e.Type = dep.Type{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment here about why you are wiping e.Type here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was for comparing the resolved graph - we don't need this here.


// ToPURL converts an inventory created by this extractor into a PURL.
func (e Extractor) ToPURL(i *extractor.Inventory) *purl.PackageURL {
return &purl.PackageURL{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be missing some of the purl requirements.

https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#maven

  • Check name the format that purl expects?
  • Add repository_url parameter if neccessary
  • Maybe add type as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently classifier and type are not available in the node. I made #426 as a TODO for the future.

func (e Extractor) Requirements() *plugin.Capabilities {
return &plugin.Capabilities{
Network: true,
DirectFS: true,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs DirectFS, as all fs operations are done through the VirtualFS interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused about when to use the VirtualFS and when to use DirectFS. There was another comment suggesting to mark DirectFS as true. Do you mind elaborating this more?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand you need DirectFS if you access the disk directly (e.g. via os.Open, or via a C library ...etc), if you only access the disk through our virtual filesystem, you don't need DirectFS, since it doesn't matter to the extractor whether we are actually reading from disk, streaming the file over a network connection, or reading a container for example.

@erikvarga can you double check that my understanding is correct?

Copy link
Collaborator

@another-rex another-rex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants