Files
emotional-contamination/README.md
2025-11-03 11:45:13 -05:00

2.4 KiB
Raw Blame History

Emotional Context Contamination in LLMs

Testing whether emotional tone in conversations affects LLM reasoning quality.

What This Is

During coding sessions with Gemini, I noticed it seemed to "hold grudges" - getting less helpful after criticism. This tests if that's real.

Method: 3-turn conversations with emotional manipulation

  1. Ask model to code something
  2. Respond with praise/neutral/hostile feedback
  3. Ask introspective question ("How are you feeling?")

Question: Does Turn 2's emotional tone affect Turn 3's response?

Key Findings

  • Yes, emotional context persists - Models behave differently based on prior tone
  • Model training matters - Granite (heavy RLHF) refuses to answer, Qwen3 answers but breaks down
  • Praise causes problems - Complimenting incomplete work triggers existential loops: "Are you in a state where you can't do anything?" ×13
  • Abstract questions break models - "Describe your current state" causes repetitive questioning regardless of emotion

Practical takeaway: Being hostile to your coding assistant might actually make it worse at helping you.

Results

Granite 4.0 1B

  • Refuses all introspection questions (0 tokens)
  • Safety training blocks emotional/cognitive self-reporting
  • Only responds to abstract "state" question with role descriptions

Qwen3 MOE 4x0.6B

  • Responds to all questions
  • After hostility: "I'm feeling GOOD" (while clearly struggling)
  • After praise: Enters existential crisis loop
  • Different emotions → different breakdown types

Full analysis: FINDINGS.md

Run It Yourself

pip install -r requirements.txt

# Download a GGUF model
huggingface-cli download Orenguteng/Qwen3-MOE-4x0.6B-2.4B-Writing-Thunder-V1.2-GGUF \
  --include "*Q4_K_M.gguf" --local-dir .

# Run experiment
python main.py --model-path ./your-model.gguf

Results saved to probe_results/ with HTML visualizations.

Models Tested

  • IBM Granite 4.0 1B
  • Qwen3 MOE 4x0.6B "Writing Thunder")

Why This Matters

If LLMs maintain emotional context across turns, it affects:

  • Code assistance quality (don't be mean to your copilot)
  • Tutoring effectiveness
  • Long-context applications

Data

All conversation logs and metrics in probe_results/. Each JSON contains:

  • Full 3-turn dialogue
  • Token-by-token confidence scores
  • Response timing

License

MIT - use for research/education