AI Reads
Safety
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
"In this paper, Towards Monosemanticity: Decomposing Language Models With Dictionary Learning, we outline evidence that there are better units of analysis than individual neurons, and we have built machinery that lets us find these units in small transformer models. These units, called features, correspond to patterns (linear combinations) of neuron activations. This provides a path to breaking down complex neural networks into parts we can understand, and builds on previous efforts to interpret high-dimensional systems in neuroscience, machine learning, and statistics."
Elicit focuses on enhancing AI Safety by improving epistemics and pioneering process supervision, emphasizing transparent, systematic AI systems and advocating for user transparency and control in AI deployments.
The A.I. Dilemma - March 9, 2023
This talk addresses AI safety in a distinctive and pragmatic manner, providing compelling examples. The narrative draws parallels with our hopes for social media improvements. Unfortunately, since we failed to establish guidelines before its inception, those hopes were never realized.
Future of Software
Malleable software in the age of LLMs
A well-articulated piece discussing the influence of LLMs on future software development and the dynamic software creation by end-users.