Description
Recently I experiment with torchvision test on #6992 and I found the result is quite promising. I create this issue to give summary of the experiment, propose to separate the model test (test_models.py
and test_backbone_utils.py
) from other test, and ask for feedback.
Highlights
By separating the model test (test_models.py
and test_backbone_utils.py
) we can have:
- Faster test waiting time for developer (Reduce waiting time from 56 minutes to 38 minutes, ~32% improvement)
- Sum of all test time (the machine time) is reduced from 617 minutes to 415 minutes (~33% reduction) and more over the machine type for non-model test can be reduced to use machine that are 2 times cheaper.
- Some test has too much input variations and this can be reduced to speedup (for example:
test_augmix
)
Action plan:
After talking with @osalpekar , I will separate the model test for the workflow that already using GHA. And this will break down the progress by the OS.
- Separate model test for linux-cpu in GHA [In progress]
- Separate model test for macos-cpu in GHA [Waiting for the workflow to move to GHA]
- Separate model test for windows-cpu in GHA [Waiting for the workflow to move to GHA]
- Discuss on test that we may be able to speedup by reducing number of input variation
- Create a PR to speedup augmix by reducing input variation
Experiments
Analyze the test running time
First thing that I did is to use --durations 5000
for the linux test with python 3.8 (both cpu and gpu) so it will print each test durations (here is the data on google sheet).
For the raw data, each row correspond to a test with specific arguments or inputs, for instance a test_classification_model
for resnet50
model. From this, I aggregate by the test function and the test file to see a more aggregated result.
Here are the interesting findings on this data:
- We could see that the two slowest test files on cpu are
test_models.py
andtest_backbone_utils.py
which take34.36%
and28.46%
of the total test duration respectively. On GPU, we could see thattest_models.py
took61.08%
of the total test duration. test_transforms.py::test_augmix
took 145.6 seconds, whereas alltest_transforms.py
took 172.7 seconds (~ 84.3% of totaltest_transforms.py
duration is fromtest_augmix
alone)
Experiment on separating model and backbone_utils test
Both test_models.py
and test_backbone_utils.py
are pretty similar in a sense that they are testing the models. For big models, they require high memory usage and can be pretty slow. Since they are testing models, the API that we are testing are pretty high level, hence we might not need to run the test on all python version (since the low level operator should be tested on all python level already).
From this reasoning, I think it would be beneficial to separate the model test (I will refer both test_models.py
and test_backbone_utils.py
as model test) and only run it on one python version only (Consideration on running on all python version for main and nightly branch only like how we do gpu test).
I did experiment on separating them, and I use pytest marker to do it. Here are some main things that I did to separate the test:
- Add global variable
pytestmark = pytest.mark.slow
on thetest_models.py
andtest_backbone_utils.py
to mark all the test inside aspytest.mark.slow
- Modify
pytest.ini
addopts to have-m "not slow"
so by default it will skip the test marked as slow - Modify
.circleci/regenerate.py
and.circleci/config.yml.in
so that it acceptpytest_additional_args
as parameters, and we will pass this torun_test.sh
scripts to add-m "slow"
to run only the model test or-m "slow or not slow"
to run all test
(for more detail see #6992)
After separating the model test, I notice that in linux gpu we have a relatively high overhead for installing environment. Because of this overhead, we decide that it is not worth it to separate the test on GPU, hence we modify .circleci/regenerate.py
so that we only separate the model test on CPU.
Experiment on reducing machine type for non model test
After separating the model test, we have idea to reduce the machine type for the non-model test. The intuition is that we previously need higher machine type because the model test require high memory, hence if we separate the model test we can run the non-model test with lower spec machine.
So initially we run the test on 2xlarge+
machine for linux and large
machine for macos. We found that for non-model test we could use xlarge
machine for linux and medium
machine for macos instead, and since the model test is the slowest part it does not really affect the overall waiting time for all test to finished! This would potentially a good cost saving.
For more detail on experiment, see this docs.
Experiment on optimizing test_augmix
Once we separate the model test, test_augmix
become very significant among the non-model test, to be precise it occupy around 26% of the non-model test durations. Hence it would be good if we can speedup this test.
Looking at our data, seems like test_augmix
is called 97 times (it is called with 96 different input, and 1 time for setup). The first thing that we want to try is to reduce the amount of input variation for this test.
Looking at the code, currently they have input parameters as follow:
@pytest.mark.parametrize("fill", [None, 85, (128, 128, 128)])
@pytest.mark.parametrize("severity", [1, 10])
@pytest.mark.parametrize("mixture_width", [1, 2])
@pytest.mark.parametrize("chain_depth", [-1, 2])
@pytest.mark.parametrize("all_ops", [True, False])
@pytest.mark.parametrize("grayscale", [True, False])
Although each parameters only contain 2 or 3 different values, but because the exponential nature, the amount of total different inputs quickly grow to: 3 * 2 * 2 * 2 * 2 * 2 = 96
. The idea to reduce the total input variation is to sample from these 96 combinations that make sure at least each value on each parameters is chosen at least once. This should be okay since I think we dont really need to test all the 96 combinations. As what we did on #6992 , here are the code replacements for the input parameters:
@pytest.mark.parametrize(
"fill,severity,mixture_width,chain_depth,all_ops,grayscale",
[
(None, 1, 1, -1, True, True),
(85, 10, 2, 2, False, False),
((128, 128, 128), 1, 2, -1, False, True),
(None, 10, 1, 2, True, False),
(85, 1, 1, -1, False, False),
((128, 128, 128), 10, 2, -1, True, False),
],
)
Now we only have 6 variations, which should speedup the test_augmix
16x.
Note that we can increase the number of sample that we want to use, this will depend on the test owner to decide. Also, I think this method is also applicable for all the test that use a large number of varying input parameters.