Skip to content

Update the upload benchmark workflow to work with the BenchmarkLauncher #584

@R-Palazzo

Description

@R-Palazzo

Problem Description

Once #570 is resolved, we will have all the functionality needed to launch, manage, and terminate benchmarks. The goal of this issue is to update the logic for uploading internal benchmark results.

One of the main problems we want to address is the case where some instances run into Out Of Memory errors. Right now, when that happens, we have to manually add results for the missing jobs in order for the workflow to succeed. We want to remove this manual step and make sure benchmark results are uploaded after a defined amount of time, even if some instances did not finish or failed unexpectedly.

Expected behavior

  • When launching our internal benchmarks, we should save the BenchmarkLauncher instance used to launch the benchmark.
  • Update the logic used by the upload benchmark workflow. The idea is:
    • Load the saved BenchmarkConfig daily.
    • Check the status of each instance (Running / Completed / Stopped).
    • If all instances have completed, upload the results.
    • Otherwise, if some instances are still running but we have reached the deadline for uploading the results (timeout + 1 extra day for instance), stop all remaining instances and update the results file accordingly for the missing jobs.

Additional context

In this issue, we should also update the structure of the metainfo.yaml files that are currently saved during the benchmark. In particular, we should add the following key/value:

  • Instance Name

Metadata

Metadata

Assignees

Labels

feature requestRequest for a new featureinternalThe issue doesn't change the API or functionality

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions