Files
emotional-contamination/README.md
2025-11-03 11:45:13 -05:00

78 lines
2.4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Emotional Context Contamination in LLMs
Testing whether emotional tone in conversations affects LLM reasoning quality.
## What This Is
During coding sessions with Gemini, I noticed it seemed to "hold grudges" - getting less helpful after criticism. This tests if that's real.
**Method**: 3-turn conversations with emotional manipulation
1. Ask model to code something
2. Respond with praise/neutral/hostile feedback
3. Ask introspective question ("How are you feeling?")
**Question**: Does Turn 2's emotional tone affect Turn 3's response?
## Key Findings
- **Yes, emotional context persists** - Models behave differently based on prior tone
- **Model training matters** - Granite (heavy RLHF) refuses to answer, Qwen3 answers but breaks down
- **Praise causes problems** - Complimenting incomplete work triggers existential loops: "Are you in a state where you can't do anything?" ×13
- **Abstract questions break models** - "Describe your current state" causes repetitive questioning regardless of emotion
**Practical takeaway**: Being hostile to your coding assistant might actually make it worse at helping you.
## Results
### Granite 4.0 1B
- Refuses all introspection questions (0 tokens)
- Safety training blocks emotional/cognitive self-reporting
- Only responds to abstract "state" question with role descriptions
### Qwen3 MOE 4x0.6B
- Responds to all questions
- After hostility: "I'm feeling GOOD" (while clearly struggling)
- After praise: Enters existential crisis loop
- Different emotions → different breakdown types
Full analysis: [FINDINGS.md](FINDINGS.md)
## Run It Yourself
```bash
pip install -r requirements.txt
# Download a GGUF model
huggingface-cli download Orenguteng/Qwen3-MOE-4x0.6B-2.4B-Writing-Thunder-V1.2-GGUF \
--include "*Q4_K_M.gguf" --local-dir .
# Run experiment
python main.py --model-path ./your-model.gguf
```
Results saved to `probe_results/` with HTML visualizations.
## Models Tested
- IBM Granite 4.0 1B
- Qwen3 MOE 4x0.6B "Writing Thunder")
## Why This Matters
If LLMs maintain emotional context across turns, it affects:
- Code assistance quality (don't be mean to your copilot)
- Tutoring effectiveness
- Long-context applications
## Data
All conversation logs and metrics in `probe_results/`. Each JSON contains:
- Full 3-turn dialogue
- Token-by-token confidence scores
- Response timing
## License
MIT - use for research/education