实验编程社群问答基准测试集 | Code-2-Art Community Q&A Benchmark Dataset

中文介绍

问答基准测试集是实验编程（code-2-art）社群开发的独特数据集，旨在评估人工智能模型在实际应用场景中的问答能力。与传统基准测试不同，这些数据集强调实用性和创造性，而非标准化评估。

问题表

测试使用的问题

answer

答题表

答题基本信息，很文件夹中相应文件

核心特点

真实需求驱动: 由社群成员根据实际使用场景和需求整理
开放评估: 不设预定标准答案，关注AI回应的实用价值和创造性
技术无限制: 允许使用系统提示词等技巧，模拟真实应用环境
能力探索: 测试AI在不受限条件下的真实能力边界
社群协作: 集体贡献问题和评估反馈，形成进化型测试集

应用场景

提示词工程交流: 分享和比较不同提示策略的效果
真实能力评估: 测试模型在实际任务中的表现
应用开发参考: 为创新应用提供性能参考
社群学习: 集体学习和改进AI使用方法
跨界实验: 探索AI在艺术、设计、编程交叉领域的应用

数据组织方式

问题库: 按应用场景和难度分类的真实问题集
答题库: 记录不同模型、不同提示策略下的回答
评估记录: 社群成员对回答效果的多维度评价
提示词库: 有效提示词的收集和分类

评估维度

创造性: 解决方案的新颖性和独特视角
实用性: 解决实际问题的有效程度
灵活性: 适应不同表述和需求变化的能力
交互体验: 回答的连贯性、清晰度和参与感
社群反馈: 实际使用者的主观评价和改进建议

English Introduction

The Q&A Benchmark Dataset is a unique collection developed by the Code-2-Art experimental programming community to evaluate AI models' question-answering capabilities in practical application scenarios. Unlike traditional benchmarks, these datasets emphasize utility and creativity rather than standardized assessment.

Key Characteristics

Real-need Driven: Compiled by community members based on actual usage scenarios
Open Evaluation: No predetermined standard answers, focusing on practical value and creativity
Technique Unrestricted: Allows system prompts and other techniques, simulating real application environments
Capability Exploration: Tests AI's true capabilities under unrestricted conditions
Community Collaboration: Collective contribution of questions and assessment feedback, forming an evolving test set

Application Scenarios

Prompt Engineering Exchange: Sharing and comparing different prompting strategies
Real Capability Assessment: Testing model performance in actual tasks
Application Development Reference: Providing performance references for innovative applications
Community Learning: Collective learning and improvement of AI usage methods
Interdisciplinary Experiments: Exploring AI applications in art, design, and programming intersections

Data Organization

Question Repository: Real-world questions categorized by application scenarios and difficulty
Answer Collection: Responses recorded under different models and prompting strategies
Evaluation Records: Multi-dimensional assessments from community members
Prompt Library: Collection and classification of effective prompts

Evaluation Dimensions

Creativity: Novelty of solutions and unique perspectives
Practicality: Effectiveness in solving real problems
Flexibility: Ability to adapt to different phrasings and changing requirements
Interaction Experience: Coherence, clarity, and engagement of responses
Community Feedback: Subjective evaluations and improvement suggestions from actual users

这个社群驱动的测试集代表了一种新型的AI评估方法，它不仅关注模型的技术能力，更注重实际应用价值和创新可能性，为AI在实验编程与艺术创作领域的发展提供了独特视角。

This community-driven test set represents a new approach to AI evaluation that focuses not just on technical capabilities but on practical application value and innovative possibilities, providing a unique perspective for AI development in experimental programming and artistic creation.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
answer		answer
README.md		README.md
问题表.md		问题表.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

实验编程社群问答基准测试集 | Code-2-Art Community Q&A Benchmark Dataset

中文介绍

问题表

answer

答题表

核心特点

应用场景

数据组织方式

评估维度

English Introduction

Key Characteristics

Application Scenarios

Data Organization

Evaluation Dimensions

About

Uh oh!

Releases

Packages

code-2-art/ArtifactQA

Folders and files

Latest commit

History

Repository files navigation

实验编程社群问答基准测试集 | Code-2-Art Community Q&A Benchmark Dataset

中文介绍

问题表

answer

答题表

核心特点

应用场景

数据组织方式

评估维度

English Introduction

Key Characteristics

Application Scenarios

Data Organization

Evaluation Dimensions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages