AixBench-L is a java function-level code generation dataset, which is initially proposed in the paper "SkCoder: A Sketch-based Approach for Automatic Code Generation". The dataset can be downloaded in this link.
We treat the original AixBench as test data and collect lots of natural langage-code pairs from Github as the train and dev data.
Specifically, we mined Java open-source projects with at least 30 stars from GitHub, and avoid projects containing test data. From mined projects, we remove autogenerated functions and extract functions (i) having an English docstring; (ii) having <1024 tokens and >1 lines. Finally, we obtain 200,000 samples and randomly split them into train data and dev data.
Dataset | Size |
---|---|
Train | 190,000 |
Dev | 10,000 |
Test | 175 |
Avg. tokens in description | 27.55 |
Max. tokens in description | 3752 |
Avg. tokens in code | 170.74 |
Max. tokens in code | 25237 |
The train data, dev data, and test data are stored in the train.jsonl, dev.jsonl, and test.jsonl files, respectively. Each line in the file is a json object, which contains the following fields:
- input: a natural language requirement and a Java signature
- input_token: a list of tokens of the input
- output: a Java function
- output_token: a list of tokens of the output
We use the Pass@1 and AvgPassRatio as evaluation metrics. The setup of evaluation can be found in AixBench. We also show some results of some baselines in our paper.
If you use this dataset, please cite our paper:
@article{li2023skcoder,
title={SkCoder: A Sketch-based Approach for Automatic Code Generation},
author={Li, Jia and Li, Yongmin and Li, Ge and Jin, Zhi and Hao, Yiyang and Hu, Xing},
journal={arXiv preprint arXiv:2302.06144},
year={2023}
}