|
1 |
| -# LaF-comparison-testing |
2 |
| -LaF focuses on the comparion testing of multiple deep learning models without manual labeling. |
| 1 | +# LaF: labeling-free comparison testing of deep learning models |
| 2 | + |
| 3 | +## Problem definition |
| 4 | + |
| 5 | +Given N pre-trained deep learning models, the task is to estimate the rank of models regrading their performance on an unlabeled test set. |
| 6 | + |
| 7 | +## Dependency |
| 8 | + |
| 9 | +- python 3.6.10 |
| 10 | +- keras 2.6.0 |
| 11 | +- tensorflow 2.5.1 |
| 12 | +- scipy 1.5.4 |
| 13 | +- numpy 1.19.5 |
| 14 | + |
| 15 | +## Download the dataset |
| 16 | + |
| 17 | +###ID data### |
| 18 | +MNIST, CIFAR-10, and Fashion-MNIST are available in Keras. |
| 19 | + |
| 20 | +Amazon and iwildcam are taken from [WILDS](https://github.com/p-lambda/wilds). |
| 21 | + |
| 22 | +Java250 and C++1000 are taken from [Project CodeNet](https://github.com/IBM/Project_CodeNet). |
| 23 | + |
| 24 | +###OOD data### |
| 25 | + |
| 26 | +Download the OOD data of MNIST from [Google drive]() or generate it by <pre><code>python gene_mnist.py</code></pre> |
| 27 | + |
| 28 | +Download the OOD data of CIFAR-10 from [Google drive]() or generate it by <pre><code>python gene_cifar10.py</code></pre> |
| 29 | + |
| 30 | +Download the OOD data of Amazon and iwildCam from [WILDS](https://github.com/p-lambda/wilds). |
| 31 | + |
| 32 | +Download the OOD data of Java250 from [Google drive](). |
| 33 | + |
| 34 | +## Download Pre-trained deep learning models |
| 35 | + |
| 36 | +Download all the models from [Google drive](). |
| 37 | + |
| 38 | +You can also train the models for MNIST and CIFAR-10 by running the scripts in **trainModel/mnist** and **trainModel/cifar10**. |
| 39 | + |
| 40 | +## How to use |
| 41 | + |
| 42 | +To speed the execution and avoid calling the model repeatedly, we first get the model prediction. E.g.: |
| 43 | + |
| 44 | +``` |
| 45 | +python main_ground.py --dataName mnist |
| 46 | +``` |
| 47 | + |
| 48 | +To get the results by baseline methods (SDS, Random, CES), run the following code: |
| 49 | + |
| 50 | +``` |
| 51 | +python main_selection.py --dataName mnist --metric random |
| 52 | +``` |
| 53 | + |
| 54 | +Besides, to get the final results of CES, you need to run: |
| 55 | + |
| 56 | +``` |
| 57 | +python main_ces_best.py --dataName mnist |
| 58 | +``` |
| 59 | + |
| 60 | +To get the results by LaF, run the following code: |
| 61 | +``` |
| 62 | +python main_laf.py --dataName mnist --dataType id |
| 63 | +``` |
| 64 | + |
| 65 | +To get the evaluation on kendall's tau, spearman's coefficients, jaccard similarity, run the following code: |
| 66 | + |
| 67 | +``` |
| 68 | +python main_eva.py --dataName mnist |
| 69 | +``` |
| 70 | + |
| 71 | +**[Notice] Be careful with the saving directories.** |
| 72 | + |
| 73 | +## Reference |
| 74 | +<pre><code>@misc{guo2022labelingfree, |
| 75 | + title={Labeling-Free Comparison Testing of Deep Learning Models}, |
| 76 | + author={Yuejun Guo and Qiang Hu and Maxime Cordy and Xiaofei Xie and Mike Papadakis and Yves Le Traon}, |
| 77 | + year={2022}, |
| 78 | + eprint={2204.03994}, |
| 79 | + archivePrefix={arXiv}, |
| 80 | + primaryClass={cs.LG} |
| 81 | +}</code></pre> |
0 commit comments