Confirm
6 Ways to Fight the Interpretability Illusion
Notes on using optimization and causal models for interpretability.
Michael Sklar
Nov 30, 2023
A catalog of several million tasks Pythia can do.
We’re sharing datasets that we hope will be useful for language model interpretability.
T. Ben Thompson, Michael Sklar
Jun 25, 2023
No matching items