-
Notifications
You must be signed in to change notification settings - Fork 5
Test strategy #40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Test strategy #40
Conversation
Signed-off-by: Pavithra Vijayakrishnan <[email protected]>
Signed-off-by: Pavithra Vijayakrishnan <[email protected]>
|
||
We use **pytest markers** to categorize tests by their purpose, requirements, and execution characteristics. This helps selectively run relevant tests during development, CI/CD, and nightly/weekly runs. | ||
|
||
#### Test Types and Markers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does labeling a test with .unit functionally do? I still want to see what marks are needed to get something run in CI
unit, premerge -> gets run in dynamo container on cpu node
unit, premerge, vllm -> gets run in vllm container on cpu node? Does this exist? Or do we just run on the gpu node before the gpu tests?
e2e, gpus_needed_2, sglang, nightly -> 2 gpus, run in sglang container on nightly frequency.
Can we also remove postmerge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unit, premerge, vllm -> gets run in vllm container on cpu node? Does this exist? Or do we just run on the gpu node before the gpu tests?
This currently does not exist. I do not see the unit marker made use of in CI. Please note that the segmentation still helps because you can run vllm and not unit
to run all the integration tests.
Yes lets remove postmerge. The name is not very helpful.
- `@pytest.mark.dynamo` | ||
|
||
- **Execution Specific Marks:** | ||
- `@pytest.mark.fast` – Quick tests, often small models. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like fast and slow. I would prefer it to be split as premerge and nightly or something like that. If we do stick with fast and slow then how will we build our logic around when something is run around it.
Like we always run fast tests but slow ones only on certain code changes
For example if its
vllm, premerge, slow, gpu_1 then it would only be run on a vllm codechange?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the fast and slow markers are helpful. They can make it easier to run tests both in CI and during local development.
I expect us running slow tests on merges to main/release branch and in nightly going forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
helpful sure, but helpful enough to outweigh people not using them correctly and people interpreting them differently? Can we not just assume e2e means slow and unit / integration mean fast?
|
||
--- | ||
|
||
## Test Segmentation and Grouping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the plan for enforcing good mark and test practice? When people put their marks on, how can they figure out where and how its going to run in CI. I think that is probably the most important part of the marks.
- `@pytest.mark.router` – Tests for router behavior. | ||
|
||
- **Infrastructure Specific Marks:** | ||
- `@pytest.mark.h100` – wideep tests requires to be run on H100 and cannot be run on L40. Also, certain pytorch versions support compute capability 8.0 and require a higher CC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an example the current state of things, h100 doesn't do anything in CI for github. So you can put it in but it will have no effect. However this might be different for gitlab where we do have h100 runners. I think that is a bit complicated.
No description provided.