This repository contains data related to the paper "Metamorphic Testing of Deep Code Models: A Systematic Literature Review" authored by Ali Asgari, Milan de Koning, Pouria Derakhshanfar, and Annibale Panichella.
This study presents a systematic literature review (SLR) on metamorphic testing for deep code models, analyzing 45 papers published between 2019 and 2024. We provide a comprehensive list of metamorphic transformations and summarize the strategies used for robustness testing in deep code models. Additionally, we investigate and rank the most frequently evaluated code-related tasks, models, downstream languages, and datasets.
The following key aspects were extracted and analyzed from the reviewed papers:
- TransformationType: Categorization of metamorphic transformations applied in robustness testing.
- Strategy: The approach used to generate and evaluate adversarial or metamorphic test cases.
- Models: Deep learning and large language models evaluated in the studies.
- Dataset: The benchmark datasets used to train and test the models.
- Task: The specific software engineering or program analysis tasks targeted in each study.
- Languages: The programming languages involved in the evaluated tasks.
All_Papers_Final.json
: This file contains the final list of papers included in the systematic literature review, capturing all relevant information extracted from each study.supplementary-V1.pdf
: This file contains the initial dataset, encompassing all records, including some that were marked as doubtful. Some records have undergone modifications following discussions and double-checking by the authors for the final paper. Therefore, minor differences may exist between this dataset and the final tables presented in the published paper.PreFinal
folder: This folder contains two.bib
files (Scholar.bib
andprimarystudies.bib
) with details about the pre-final papers considered in our literature review. Some of the papers listed in these files were later removed from the final version after further refinement and selection. However, these files provide insights into the intermediate stages of our data collection process, capturing the middle journey of our work.
- Ali Asgari (Delft University of Technology) - Email
- Milan de Koning (Delft University of Technology, JetBrains Research) - Email
- Pouria Derakhshanfar (JetBrains Research) - Email
- Annibale Panichella (Delft University of Technology) - Email
This work was conducted as part of the AI for Software Engineering (AI4SE) collaboration between JetBrains and Delft University of Technology. The authors gratefully acknowledge the financial support provided by JetBrains, which made this research possible.