FACE: A Fine-grained Reference Free Evaluator for Conversational Recommender Systems

This is the repo for our paper: FACE: A Fine-grained Reference Free Evaluator for Conversational Recommender Systems.

Specifically, the repository contains:

The CRSArena-Eval dataset with human-annotated conversations and meta-evaluation scripts.
The CRSArena-Eval interface for interactive meta-evaluation of your evaluator vs. baselines.
The FACE results for reproducing the reported numbers in the paper.

What is CRSArena-Eval and FACE?

CRSArena-Eval is a meta-evaluation dataset of human-annotated conversations between users and 9 Conversational Recommender Systems (CRSs), designed for evaluating CRS evaluators.
FACE is a Fine-grained, Aspect-based Conversation Evaluation method that provides evaluation scores for diverse turn and dialogue level qualities of recommendation conversations.

CRSArena-Eval Dataset Release (`dataset/`)

The directory dataset/ contains the CRSArena-Eval dataset. This dataset is designed for meta-evaluation of CRS evaluators and is built on the CRSArena-Dial dataset.

crs_arena_eval.json: The main dataset file containing 467 conversations with 4,473 utterances, annotated with both turn-level and dialogue-level quality scores by human evaluators.

Evaluation Aspects

Turn-level aspects:

Relevance (0-3): Does the assistant's response make sense and meet the user's interests?
Interestingness (0-2): Does the response make the user want to continue the conversation?

Dialogue-level aspects:

Understanding (0-2): Does the assistant understand the user's request and try to fulfill it?
Task Completion (0-2): Does the assistant make recommendations that the user finally accepts?
Interest Arousal (0-2): Does the assistant try to spark the user's interest in something new?
Efficiency (0-1): Does the assistant suggest items matching the user's interests within the first three interactions?
Overall Impression (0-4): What is the overall impression of the assistant's performance?

Table: General statistics of the CRSArena-Eval dataset.

Statistic	Value
# Conversations	467
# Utterances	4,473
Avg. utterances per conversation	9.58
Avg. words per user utterance	7.53
Avg. words per system utterance	15.18
# Final labels (after aggregation)	6,805

👉 For detailed dataset schema and structure, see dataset/README.md.

Evaluation

The dataset/run/ directory contains scripts and data for reproducing the evaluation results reported in the paper.

eval.py: Evaluation script that computes Pearson and Spearman correlations between predictions and CRSArena-Eval human annotations.
face_run.json: FACE predictions for the CRSArena-Eval dataset in the standard run file format.

CRSArena-Eval Interactive Meta-Evaluation Interface (`interface/`)

We provide an easy-to-use meta-evaluation interface to evaluate your evaluator against the CRSArena-Eval dataset. Visit: https://informagus.nl/face/

We also provide an python script to evaluate your evaluator on the CRSArena-Eval dataset.

👉 For detailed run file format and evaluation instructions, see dataset/run/README.md.

FACE Method (`face/`)

The face/ directory contains the implementation of the FACE evaluation method.

Citation

@article{Joko:2025:FACE,
  title={FACE: A Fine-grained Reference Free Evaluator for Conversational Recommender Systems},
  author={Joko, Hideaki and Hasibi, Faegheh},
  journal={arXiv preprint arXiv:2506.00314},
  year={2025}
}

Contact

If you have any questions, please contact Hideaki Joko ([email protected])

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github/workflows		.github/workflows
dataset		dataset
face		face
images		images
interface		interface
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FACE: A Fine-grained Reference Free Evaluator for Conversational Recommender Systems

What is CRSArena-Eval and FACE?

CRSArena-Eval Dataset Release (`dataset/`)

Evaluation Aspects

Evaluation

CRSArena-Eval Interactive Meta-Evaluation Interface (`interface/`)

FACE Method (`face/`)

Citation

Contact

About

Uh oh!

Releases

Packages

Languages

informagi/face

Folders and files

Latest commit

History

Repository files navigation

FACE: A Fine-grained Reference Free Evaluator for Conversational Recommender Systems

What is CRSArena-Eval and FACE?

CRSArena-Eval Dataset Release (dataset/)

Evaluation Aspects

Evaluation

CRSArena-Eval Interactive Meta-Evaluation Interface (interface/)

FACE Method (face/)

Citation

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

CRSArena-Eval Dataset Release (`dataset/`)

CRSArena-Eval Interactive Meta-Evaluation Interface (`interface/`)

FACE Method (`face/`)

Packages