automated-interpretability
The repository offers tools and code for generating and assessing neuron behavior explanations in language models. Access datasets related to GPT-2 XL and GPT-2 Small, including neuron activations and explanations. Gain insights into neuron activity through statistical analysis and visualization tools. The project provides updates and methodologies critical for comparing neuron behaviors and shares public datasets for detailed exploration.