Watch and listen to this week's update on YouTube or podcast. Interpretability can be called “the neuroscience of AI”. We look into the brain of AI to understand why and how they give certain outputs. AI safety often focuses on the Circuits paradigm. However, a new survey of 300 interpretability papers show 20 other paradigms within the field with similarly promising results.
Interpretability state-of-the-art W37
Interpretability state-of-the-art W37
Interpretability state-of-the-art W37
Watch and listen to this week's update on YouTube or podcast. Interpretability can be called “the neuroscience of AI”. We look into the brain of AI to understand why and how they give certain outputs. AI safety often focuses on the Circuits paradigm. However, a new survey of 300 interpretability papers show 20 other paradigms within the field with similarly promising results.