Skip to content

aixcoder-plugin/AixBench-L

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

AixBench-L

AixBench-L is a java function-level code generation dataset, which is initially proposed in the paper "SkCoder: A Sketch-based Approach for Automatic Code Generation". The dataset can be downloaded in this link.

Data Collection

We treat the original AixBench as test data and collect lots of natural langage-code pairs from Github as the train and dev data.

Specifically, we mined Java open-source projects with at least 30 stars from GitHub, and avoid projects containing test data. From mined projects, we remove autogenerated functions and extract functions (i) having an English docstring; (ii) having <1024 tokens and >1 lines. Finally, we obtain 200,000 samples and randomly split them into train data and dev data.

Data Statistic

Dataset Size
Train 190,000
Dev 10,000
Test 175
Avg. tokens in description 27.55
Max. tokens in description 3752
Avg. tokens in code 170.74
Max. tokens in code 25237

Data Format

The train data, dev data, and test data are stored in the train.jsonl, dev.jsonl, and test.jsonl files, respectively. Each line in the file is a json object, which contains the following fields:

  • input: a natural language requirement and a Java signature
  • input_token: a list of tokens of the input
  • output: a Java function
  • output_token: a list of tokens of the output

Evaluation

We use the Pass@1 and AvgPassRatio as evaluation metrics. The setup of evaluation can be found in AixBench. We also show some results of some baselines in our paper.

Citation

If you use this dataset, please cite our paper:

@article{li2023skcoder,
  title={SkCoder: A Sketch-based Approach for Automatic Code Generation},
  author={Li, Jia and Li, Yongmin and Li, Ge and Jin, Zhi and Hao, Yiyang and Hu, Xing},
  journal={arXiv preprint arXiv:2302.06144},
  year={2023}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published