Skip to content

[CI] Separating model and backbone_utils test from the others #7000

Open
@YosuaMichael

Description

@YosuaMichael

Recently I experiment with torchvision test on #6992 and I found the result is quite promising. I create this issue to give summary of the experiment, propose to separate the model test (test_models.py and test_backbone_utils.py) from other test, and ask for feedback.

Highlights

By separating the model test (test_models.py and test_backbone_utils.py) we can have:

  • Faster test waiting time for developer (Reduce waiting time from 56 minutes to 38 minutes, ~32% improvement)
  • Sum of all test time (the machine time) is reduced from 617 minutes to 415 minutes (~33% reduction) and more over the machine type for non-model test can be reduced to use machine that are 2 times cheaper.
  • Some test has too much input variations and this can be reduced to speedup (for example: test_augmix)

Action plan:

After talking with @osalpekar , I will separate the model test for the workflow that already using GHA. And this will break down the progress by the OS.

  • Separate model test for linux-cpu in GHA [In progress]
  • Separate model test for macos-cpu in GHA [Waiting for the workflow to move to GHA]
  • Separate model test for windows-cpu in GHA [Waiting for the workflow to move to GHA]
  • Discuss on test that we may be able to speedup by reducing number of input variation
  • Create a PR to speedup augmix by reducing input variation

Experiments

Analyze the test running time

First thing that I did is to use --durations 5000 for the linux test with python 3.8 (both cpu and gpu) so it will print each test durations (here is the data on google sheet).

For the raw data, each row correspond to a test with specific arguments or inputs, for instance a test_classification_model for resnet50 model. From this, I aggregate by the test function and the test file to see a more aggregated result.

Here are the interesting findings on this data:

  • We could see that the two slowest test files on cpu are test_models.py and test_backbone_utils.py which take 34.36% and 28.46% of the total test duration respectively. On GPU, we could see that test_models.py took 61.08% of the total test duration.
  • test_transforms.py::test_augmix took 145.6 seconds, whereas all test_transforms.py took 172.7 seconds (~ 84.3% of total test_transforms.py duration is from test_augmix alone)

Experiment on separating model and backbone_utils test

Both test_models.py and test_backbone_utils.py are pretty similar in a sense that they are testing the models. For big models, they require high memory usage and can be pretty slow. Since they are testing models, the API that we are testing are pretty high level, hence we might not need to run the test on all python version (since the low level operator should be tested on all python level already).

From this reasoning, I think it would be beneficial to separate the model test (I will refer both test_models.py and test_backbone_utils.py as model test) and only run it on one python version only (Consideration on running on all python version for main and nightly branch only like how we do gpu test).

I did experiment on separating them, and I use pytest marker to do it. Here are some main things that I did to separate the test:

  • Add global variable pytestmark = pytest.mark.slow on the test_models.py and test_backbone_utils.py to mark all the test inside as pytest.mark.slow
  • Modify pytest.ini addopts to have -m "not slow" so by default it will skip the test marked as slow
  • Modify .circleci/regenerate.py and .circleci/config.yml.in so that it accept pytest_additional_args as parameters, and we will pass this to run_test.sh scripts to add -m "slow" to run only the model test or -m "slow or not slow" to run all test

(for more detail see #6992)

After separating the model test, I notice that in linux gpu we have a relatively high overhead for installing environment. Because of this overhead, we decide that it is not worth it to separate the test on GPU, hence we modify .circleci/regenerate.py so that we only separate the model test on CPU.

Experiment on reducing machine type for non model test

After separating the model test, we have idea to reduce the machine type for the non-model test. The intuition is that we previously need higher machine type because the model test require high memory, hence if we separate the model test we can run the non-model test with lower spec machine.

So initially we run the test on 2xlarge+ machine for linux and large machine for macos. We found that for non-model test we could use xlarge machine for linux and medium machine for macos instead, and since the model test is the slowest part it does not really affect the overall waiting time for all test to finished! This would potentially a good cost saving.

For more detail on experiment, see this docs.

Experiment on optimizing test_augmix

Once we separate the model test, test_augmix become very significant among the non-model test, to be precise it occupy around 26% of the non-model test durations. Hence it would be good if we can speedup this test.

Looking at our data, seems like test_augmix is called 97 times (it is called with 96 different input, and 1 time for setup). The first thing that we want to try is to reduce the amount of input variation for this test.

Looking at the code, currently they have input parameters as follow:

@pytest.mark.parametrize("fill", [None, 85, (128, 128, 128)])
@pytest.mark.parametrize("severity", [1, 10])
@pytest.mark.parametrize("mixture_width", [1, 2])
@pytest.mark.parametrize("chain_depth", [-1, 2])
@pytest.mark.parametrize("all_ops", [True, False])
@pytest.mark.parametrize("grayscale", [True, False])

Although each parameters only contain 2 or 3 different values, but because the exponential nature, the amount of total different inputs quickly grow to: 3 * 2 * 2 * 2 * 2 * 2 = 96. The idea to reduce the total input variation is to sample from these 96 combinations that make sure at least each value on each parameters is chosen at least once. This should be okay since I think we dont really need to test all the 96 combinations. As what we did on #6992 , here are the code replacements for the input parameters:

@pytest.mark.parametrize(
    "fill,severity,mixture_width,chain_depth,all_ops,grayscale",
    [
        (None, 1, 1, -1, True, True),
        (85, 10, 2, 2, False, False),
        ((128, 128, 128), 1, 2, -1, False, True),
        (None, 10, 1, 2, True, False),
        (85, 1, 1, -1, False, False),
        ((128, 128, 128), 10, 2, -1, True, False),
    ],
)

Now we only have 6 variations, which should speedup the test_augmix 16x.

Note that we can increase the number of sample that we want to use, this will depend on the test owner to decide. Also, I think this method is also applicable for all the test that use a large number of varying input parameters.

cc @pmeier @datumbox @osalpekar

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions