Back to Glossary
Mechanistic Interpretability
What is mechanistic interpretability?
A growing field focused on understanding how AI models "think" by examining their internal mechanisms at the neuron/circuit level.