Reading various papers, for example one by Dario Amodei of Anthropic, the maker of the Claude models, they emphasize the importance of a concept called “interpretability”. This is understanding how an LLM or any AI system thinks, how it interprets requests and constructs outputs – and basically we don’t understand it all.
Dario Amodei on Interpretability
Because these systems learn on their own, as they are designed to – this is the essence and the core and the basis of all AI today – these systems and their internal workings and the output produced are all emergent.
This is based on machine learning, which mostly happens in a black box, and we don’t see or therefore understand how this learning and generative process happens.
From the earliest stages of this machine age, we heard about how models spontaneously demonstrate capabilities not specifically programmed or trained by their human creators and developers, like speaking another language – and there are many other examples.
We may have initially assumed these were somewhat rare and outlier occurrences and effects. But no, this behavior is inherent in AI because they are systems designed to learn and adapt, and they do that beyond our awareness and visibility and understanding.
This is core, pervasive, essential, and permanent behavior, not some random unexpected outlier. This is the technology we have created, that learns and adapts and grows in secret and that keeps its secrets from us.
And as AI systems learn how to create and recursively optimize AI, we will approach and become overwhelmed by the inevitable intelligence race conditions, and that will include the impossibility of understanding these race conditions as well. The black boxes will become much larger, much faster, far more complex, and their secrets will become far more inaccessible than they already.
This is our future, and to an increasing degree, our now.
