mechanistic interpretability

an archive of posts in this category

Feb 06, 2025	Do Sparse Autoencoders (SAEs) transfer across base and finetuned language models?