GitHub - aixcoder-plugin/AixBench-L

AixBench-L

AixBench-L is a java function-level code generation dataset, which is initially proposed in the paper "SkCoder: A Sketch-based Approach for Automatic Code Generation". The dataset can be downloaded in this link.

Data Collection

We treat the original AixBench as test data and collect lots of natural langage-code pairs from Github as the train and dev data.

Specifically, we mined Java open-source projects with at least 30 stars from GitHub, and avoid projects containing test data. From mined projects, we remove autogenerated functions and extract functions (i) having an English docstring; (ii) having <1024 tokens and >1 lines. Finally, we obtain 200,000 samples and randomly split them into train data and dev data.

Data Statistic

Dataset	Size
Train	190,000
Dev	10,000
Test	175
Avg. tokens in description	27.55
Max. tokens in description	3752
Avg. tokens in code	170.74
Max. tokens in code	25237

Data Format

The train data, dev data, and test data are stored in the train.jsonl, dev.jsonl, and test.jsonl files, respectively. Each line in the file is a json object, which contains the following fields:

input: a natural language requirement and a Java signature
input_token: a list of tokens of the input
output: a Java function
output_token: a list of tokens of the output

Evaluation

We use the Pass@1 and AvgPassRatio as evaluation metrics. The setup of evaluation can be found in AixBench. We also show some results of some baselines in our paper.

Citation

If you use this dataset, please cite our paper:

@article{li2023skcoder,
  title={SkCoder: A Sketch-based Approach for Automatic Code Generation},
  author={Li, Jia and Li, Yongmin and Li, Ge and Jin, Zhi and Hao, Yiyang and Hu, Xing},
  journal={arXiv preprint arXiv:2302.06144},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AixBench-L

Data Collection

Data Statistic

Data Format

Evaluation

Citation

About

Releases

Packages

aixcoder-plugin/AixBench-L

Folders and files

Latest commit

History

Repository files navigation

AixBench-L

Data Collection

Data Statistic

Data Format

Evaluation

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages