This project analyzes emotional expression and mirroring in simulated therapy dialogues generated by large language models (LLMs). It extends the CounseLLMe framework to newer open-source models like Llama 3.3 and Gemma 3, comparing their emotional realism, mirroring behavior, and sensitivity to clinical framing.
The work corresponds to the paper: “Simulating Empathy: Emotional Expression and Mirroring in Llama 3.3 and Gemma 3 Therapy Dialogues”
⸻
- Generates simulated therapist–patient dialogues under depression and anxiety framings
- Quantifies emotions using EmoAtlas (based on Plutchik’s eight emotion categories)
- Evaluates emotional mirroring between therapist and patient roles using Pearson correlation
- Compares open-source models (Llama 3.3, Gemma 3) with prior baselines (GPT-3.5, Claude 3 Haiku)
simulated-counseling-llms/
│
├── conversations/ # Generated depression-framed dialogues
├── anxiety_conversations/ # Anxiety-framed dialogues
├── Conv-GPT-patients/ # Baseline GPT-3.5 dialogues (CounseLLMe)
├── Conv-Haiku-patients/ # Baseline Claude 3 Haiku dialogues
│
├── calculated_data/ # Z-score tables, correlation results
├── compare_conditions/ # Depression vs. anxiety comparisons
├── emotions_over_time/ # Per-turn emotion trajectories
├── plots/ # Generated figures
│
├── generate_multiple_conversations.py
├── generate_anxiety_conversations.py
├── emotion_distribution.py
├── emotions_over_time.py
├── response_correlation.py
├── plot_z-scores.py
└── container/ # Docker runtime setup
1. Clone the repo
git clone https://github.com/awmaxwell144/simulated-counseling-llms.git
cd simulated-counseling-llms
2. Install dependencies
python -m venv .venv
source .venv/bin/activate
pip install pandas numpy matplotlib scipy
3. Generate data
python generate_multiple_conversations.py # Depression framing
python generate_anxiety_conversations.py # Anxiety framing
4. Run analysis
python emotion_distribution.py
python emotions_over_time.py
python response_correlation.py
python plot_z-scores.py
Outputs will be saved under calculated_data/ and plots/.
⸻
- Generate dialogues – using role-specific therapist/patient prompts
- Compute emotions – EmoAtlas z-scores for anger, anticipation, disgust, fear, joy,sadness, surprise, and trust
- Measure mirroring – Pearson correlation between therapist and patient z-scores
- Compare conditions – analyze how emotions shift under depression vs. anxiety framings
- Visualize – plots of emotion distributions, correlations, and temporal trajectories
⸻
- Llama 3.3 shows strong emotional intensity but over expresses positivity.
- Gemma 3 is more neutral and context-sensitive, better matching clinical tone.
- Both improve mirroring versus earlier models but still lack full emotional complexity.
- Persistent gaps remain in representing anger and therapist sadness .