Steering Set 1
Baseline
Steered Output
Reference 1
Reference 2
Reference 3
ICLR 2026
From left to right: original output, steered output, representative examples of target concept used for steering.
Baseline
Steered Output
Reference 1
Reference 2
Reference 3
Baseline
Steered Output
Reference 1
Reference 2
Reference 3
Baseline
Steered Output
Reference 1
Reference 2
Reference 3
Comparison of prompting effectiveness vs. discovered features for concept “Silence”.
Prompt 1
Prompt 2
Prompt 3
Example 1
Example 2
Example 3
Examples of clips that highly activate discovered concepts (manually labeled; see dashboard for automatically labeled examples).
Example 1
Example 2
Example 3
Example 1
Example 2
Example 3
Example 1
Example 2
Example 3
Example 1
Example 2
Example 3
Example 1
Example 2
Example 3
Example 1
Example 2
Example 3
Example 1
Example 2
Example 3
Example 1
Example 2
Example 3
Example 1
Example 2
Example 3
Example 1
Example 2
Example 3
Example 1
Example 2
Example 3
Example 1
Example 2
Example 3
Example 1
Example 2
Example 3
Example 1
Example 2
Example 3
Example 1
Example 2
Example 3
Example 1
Example 2
Example 3
Example 1
Example 2
Example 3
Example 1
Example 2
Example 3
Example 1
Example 2
Example 3
Example 1
Example 2
Example 3
Example 1
Example 2
Example 3
Example 1
Example 2
Example 3
Example 1
Example 2
Example 3