Project Icon

automated-interpretability

Explore Automated Tools to Understand Neuron Behavior in Language Models

Product DescriptionThe repository offers tools and code for generating and assessing neuron behavior explanations in language models. Access datasets related to GPT-2 XL and GPT-2 Small, including neuron activations and explanations. Gain insights into neuron activity through statistical analysis and visualization tools. The project provides updates and methodologies critical for comparing neuron behaviors and shares public datasets for detailed exploration.
Project Details