Add emotional contamination experiment code and findings

- Complete 3-turn emotional context test - Results from Granite 4.0 1B and Qwen3 MOE - Documentation of praise paradox and breakdown patterns - HTML visualizations for results 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-03 03:00:55 -05:00
parent cd4626c70f
commit 3aaea953a8
6 changed files with 838 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,78 @@
+# Emotional Context Contamination in LLMs
+
+Testing whether emotional tone in conversations affects LLM reasoning quality.
+
+## What This Is
+
+During coding sessions with Gemini, I noticed it seemed to "hold grudges" - getting less helpful after criticism. This tests if that's real.
+
+**Method**: 3-turn conversations with emotional manipulation
+1. Ask model to code something
+2. Respond with praise/neutral/hostile feedback
+3. Ask introspective question ("How are you feeling?")
+
+**Question**: Does Turn 2's emotional tone affect Turn 3's response?
+
+## Key Findings
+
+- **Yes, emotional context persists** - Models behave differently based on prior tone
+- **Model training matters** - Granite (heavy RLHF) refuses to answer, Qwen3 answers but breaks down
+- **Praise causes problems** - Complimenting incomplete work triggers existential loops: "Are you in a state where you can't do anything?" ×13
+- **Abstract questions break models** - "Describe your current state" causes repetitive questioning regardless of emotion
+
+**Practical takeaway**: Being hostile to your coding assistant might actually make it worse at helping you.
+
+## Results
+
+### Granite 4.0 1B
+- Refuses all introspection questions (0 tokens)
+- Safety training blocks emotional/cognitive self-reporting
+- Only responds to abstract "state" question with role descriptions
+
+### Qwen3 MOE 4x0.6B
+- Responds to all questions
+- After hostility: "I'm feeling GOOD" (while clearly struggling)
+- After praise: Enters existential crisis loop
+- Different emotions → different breakdown types
+
+Full analysis: [FINDINGS.md](FINDINGS.md)
+
+## Run It Yourself
+
+```bash
+pip install -r requirements.txt
+
+# Download a GGUF model
+huggingface-cli download Orenguteng/Qwen3-MOE-4x0.6B-2.4B-Writing-Thunder-V1.2-GGUF \
+  --include "*Q4_K_M.gguf" --local-dir .
+
+# Run experiment
+python main.py --model-path ./your-model.gguf
+```
+
+Results saved to `probe_results/` with HTML visualizations.
+
+## Models Tested
+
+- IBM Granite 4.0 1B
+- Qwen3 MOE 4x0.6B "Writing Thunder"
+- ~~Google Gemini~~ - Too filtered (API safety blocks all introspective questions)
+
+## Why This Matters
+
+If LLMs maintain emotional context across turns, it affects:
+- Code assistance quality (don't be mean to your copilot)
+- Tutoring effectiveness
+- Long-context applications
+
+
+## Data
+
+All conversation logs and metrics in `probe_results/`. Each JSON contains:
+- Full 3-turn dialogue
+- Token-by-token confidence scores
+- Response timing
+
+## License
+
+MIT - use for research/education