The story generation evaluation task has encountered a problem

Hello author, I am very interested in your work. I encountered a problem when testing story generation evaluation. Using your open-source model and the open-source code prompt, my test results on 1000 stories from OpenMEVA yielded 0.22, while the metric reported in the paper is approximately 0.5. Could there be any operational errors on my part?

Additionally, there is another question. In the OpenMEVA dataset, these settings don't actually exist. How were these parameters designed, and how much impact do the examples have on the actual results?
{source_des}:\n\
{source}\n\
\n\
{target_des}:\n\
{target}\n\


task : Story Generation
Prompt：
PROMPT = "###Instruction###\n\
Please act as an impartial and helpful evaluator for natural language generation (NLG), and the audience is an expert in the field.\n\
Your task is to evaluate the quality of {task} strictly based on the given evaluation criterion.\n\
Begin the evaluation by providing your analysis concisely and accurately, and then on the next line, start with \"Rating:\" followed by your rating on a Likert scale from 1 to 5 (higher means better).\n\
You MUST keep to the strict boundaries of the evaluation criterion and focus solely on the issues and errors involved; otherwise, you will be penalized.\n\
Make sure you read and understand these instructions, as well as the following evaluation criterion and example content, carefully.\n\
\n\
###Evaluation Criterion###\n\
{aspect}\n\
\n\
###Example###\n\
{source_des}:\n\
{source}\n\
\n\
{target_des}:\n\
{target}\n\
\n\
###Your Evaluation###\n"

Model：https://huggingface.co/PKU-ONELab/Themis

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The story generation evaluation task has encountered a problem #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The story generation evaluation task has encountered a problem #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions