What Chomskian Linguistics Teaches Us About LLMs and Superposition in Neural Networks

The shared elegance of neural networks & human psycholinguistics — and a new unified cognitive theory of sparse representation and infinite possibilities.

9 min readNov 27, 2024

In my senior year of high school, I worked on a project for AP Psychology that would unknowingly set the stage for a many-years long journey. My project was on Noam Chomsky, the father of modern linguistics, and I was captivated by his concept of universal grammar — the idea that all human languages share an innate, underlying structure. Reading Why Only Us further ignited my fascination, introducing me to generative grammar. Generative grammar says that a finite set of grammatical rules can generate an infinite array of sentences, and to me this seemed simultaneously: irrefutably true, and impossible. I knew that I (or humans in general) could construct an infinite number of grammatical constructions, yet I also knew that there were only a finite, and relatively small number of rules. How could such simplicity be the basis of such complexity and expansiveness?

This grasped my curiosity so much that it led me to get a bachelor’s degree in Psycholinguistics. I yearned to understand the mechanics behind generative grammar and the cognitive structures that make language acquisition seemingly effortless for humans. Courses, like the ones I took in Minimalist Syntax, Language Acquisition, Morphology, Psychology of Language, and countless hours of theoretical exploration, provided lots of insights but mostly left the core mystery intact. And now, during my master’s program in Data Science, the same theoretical exploration of neural networks has felt insufficient for this complexity.

Until recently, when I listened to Lex Friedman’s podcast featuring Dario Amodei, Amanda Askell, and Chris Olah from Anthropic (a company that offers an LLM with among the highest performance, Claude) and the pieces of a theory began to align. Their discussion on mechanistic interpretability in large language models (LLMs) like GPT and Claude helped me build a potential bridge between Chomskian linguistics and modern neural networks used by large language models. The main key was in the concept of superposition and the sparsity inherent to language.

A visualization of sparse activations & superposition in a simple neural network (Source: original)

The Paradox of Infinite Expressions from Finite Means

Chomsky posited that humans possess an innate grammatical framework — a universal grammar — that allows us to generate and understand an infinite number of sentences we’ve never heard before. This idea explains our ability to produce novel utterances, but it raises a fundamental question: How can a finite set of rules account for infinite linguistic possibilities?

Similarly, in the realm of artificial intelligence, LLMs like OpenAI’s ChatGPT or Anthropic’s Claude operate within finite computational and dimensional constraints yet exhibit the capacity to generate an astounding variety of coherent and contextually appropriate language. How do these models reconcile finite resources with infinite expressive potential?

Word Embeddings and the Linear Representation Hypothesis

To unpack this, let’s consider a basis of their architecture, word embeddings. For now, we’ll focus specifically on smaller models like Word2Vec, which produce a mathematical representation of words (a vector), but note that this theory is extended by transformer models. These Word2Vec embeddings map words into high-dimensional vector spaces where semantic relationships are captured through mathematical operations. A classic example is: king - man + woman = queen, as seen in the graph of the embeddings below:

The classic example of Word2Vec embeddings under the **linear representation hypothesis**, visualized (Source: original)

This is an actual example that you can use Word2Vec or transformer models to recreate. This equation showcases the linear representation hypothesis, suggesting that relationships between words can be encoded as linear transformations in vector space. However, this brings us to the source of confusion: How can vectors with relatively low dimensions (e.g., 100 to 1,000 dimensions) capture the vastness of human language and knowledge?

Are there dedicated dimensions for every conceivable concept — one for “sushi,” another for “quantum mechanics,” and so on? The sheer number of unique ideas and nuances in language seems to dwarf the dimensionality of these embedding spaces. This is where the concept of sparsity and superposition comes into play.

Superposition: Encoding Infinite Meanings in Finite Dimensions

Chris Olah and his colleagues at Anthropic explore Mechanistic Interpretability, including how neural networks manage to represent more features than they have neurons or dimensions — a phenomenon he explains is enabled by superposition. The crux of the idea is that language and thought are sparse: at any given moment, only a small subset of all possible features is actually active or relevant.

In mathematics and data science, a vector is considered sparse if most of its components are zero. Sparse representations are efficient because they require less storage and can be processed more quickly. In the context of neural networks, sparsity allows for the overlapping of features without significant interference.

Superposition leverages sparsity to encode multiple features within the same dimensions of a vector space. Here’s a simplified example:

Imagine a three-dimensional vector space where each dimension can represent multiple features due to sparsity.

Theoretical Dimension 1: Can encode “gender” or “temperature” depending on context.
Theoretical Dimension 2: Can encode “royalty” or “velocity.”
Theoretical Dimension 3: Can encode “animality” or “formality.”

Because most features are not simultaneously active, the model can superimpose them within the same dimensions. Mathematically, this is akin to solving an underdetermined system where multiple solutions exist due to the sparsity of the vectors involved.

In math, an underdetermined system of linear equations is one in which there are more variables than equations, which leads to infinitely many solutions. This is because the system necessarily lacks sufficient constraints to fix each variable to a single value. Instead, the solutions form a high-dimensional space (e.g., a line, plane, or hyperplane in n-dimensional space) where each point satisfies all the equations simultaneously. The concept of sparsity enhances the practical utility of this infinity: by ensuring that only a subset of variables is active at any time, the system can represent distinct combinations of solutions within its infinite solution space, enabling remarkable flexibility and expressiveness.

Consider a high-dimensional sparse vector v representing a word:

Mathematical representation of a sparse vector (Source: original)

Where:

e represents the basis vectors (dimensions).
a represents the coefficients, most of which are zero due to sparsity.

The superposition hypothesis suggests that we can encode multiple features by allowing the same e to participate in different contexts, relying on sparsity to minimize interference.

LLMs and the Power of Superposition

Transformers and LLMs exploit superposition to handle vast amounts of information within finite-dimensional embeddings. They dynamically disentangle overlapping features based on context, enabling them to:

Understand Polysemy: Words with multiple meanings (e.g., “bank”) are disambiguated through context.
Capture Complex Relationships: Abstract concepts and nuanced relationships are encoded efficiently.
Generalize from Finite Data: By recombining features, models can generate sentences they’ve never seen before.

For example, consider another classic sentence:

“I sat by the bank and watched the river flow.”

The word “bank” could mean a financial institution or the side of a river. The LLM uses contextual clues to activate features related to nature and geography while suppressing financial features. This dynamic adjustment is possible because of superposition in a sparse vector space.

Mechanistic Interpretability & Blackbox Models

While superposition is part of the foundation to LLMs’ remarkable capabilities, it also renders them opaque — after all, if each dimension encoded for a single concept, we could determine precisely what led to each of its outputs. Generally, understanding how these models make decisions is critical for trust, safety, and further advancement.

Some of the methods used in mechanistic interpretability include:

Circuit Analysis: Breaking down neural networks into understandable components or “circuits” that perform specific functions.
Probe Models: Training simple models to predict features from the activations of the neural network, revealing what information is stored where.
Activation Visualization: Visualizing neuron activations to see how inputs are transformed at each layer.
Simplified Models: Studying smaller or more interpretable models to gain insights applicable to larger networks.

Suppose we want to understand how an LLM processes the concept of “royalty.” We can:

Identify Activations: Feed sentences related to royalty and observe which neurons activate.
Apply Probes: Use logistic regression or other models to predict the presence of “royalty” concepts based on neuron activations.
Analyze Superposition: Determine if neurons associated with “royalty” also participate in encoding other features, examining the degree of overlap and sparsity.

Mechanistic interpretability techniques give us a tool to uncover how finite dimensions accommodate infinite concepts.

On Human Cognition

However, this exploration doesn’t just advance AI — it offers a potential mirror to our own minds. If neural networks can use superposition and sparsity to encode vast amounts of information efficiently, perhaps our brains employ similar strategies. Sparse superposition provides a powerful framework for understanding how the human brain might encode vast amounts of information efficiently. By drawing connections between AI techniques and cognitive neuroscience, we can construct a unified theory of how our minds achieve remarkable feats of abstraction, language, and memory.

Here are some applications of my proposed unified theory:

Cognitive Efficiency

The human brain processes vast amounts of information with exceptional speed and efficiency. A possible explanation lies in overlapping representations akin to superposition:

Neural Sparsity: At any given time, only a fraction of neurons in a network are active. Sparse neural activity minimizes noise and reduces energy consumption.
Dynamic Reuse: Neurons likely participate in multiple representations depending on context. For instance, the same neural circuits could encode “affection” in one context and “loyalty” in another, depending on the relational or emotional inputs.

This mechanism aligns with findings in neuroscience that suggest regions of the brain, such as the prefrontal cortex, exhibit flexible task-switching and dynamic encoding of abstract concepts.

Generative Grammar & Language Acquisition

Chomsky’s generative grammar has long suggested that a finite set of rules allows humans to construct an infinite array of sentences. Sparse superposition provides a plausible neurobiological basis for this phenomenon:

Sparse Encoding of Grammar: The brain could encode linguistic rules in sparse, overlapping patterns. These rules would be activated contextually, enabling infinite combinations without requiring a unique neural pathway for every possible sentence.
Learning Through Overlap: Children acquiring language may exploit sparse representations to generalize grammatical patterns. For example, exposure to phrases like “The dog runs” could activate overlapping patterns that extend to “The cat jumps.”

This approach mirrors how LLMs generate coherent text by recombining sparse features into novel constructions.

Memory and Recall

Encoding: Memories are stored as sparse neural patterns, with overlapping dimensions representing related concepts. For example, the memory of “summer vacation” might share components with “beach,” “sun,” and “freedom.”
Contextual Recall: Sparse patterns allow memories to be dynamically reactivated based on context. A cue like “vacation” might bring up specific memories of a beach trip while suppressing unrelated associations.
Flexibility: Overlapping representations facilitate creativity and problem-solving, as the brain can recombine fragments of memories into new ideas.

These mechanisms align with findings from hippocampal research, where specific neurons, called place cells, exhibit sparse activations tied to spatial contexts but are reused across different environments.

Sparse Superposition Model of Cognition (SSMC)

I propose a novel unified theory of sparse superposition in cognition.

The parallels between LLMs and human cognition suggest that sparse superposition is not just a computational convenience but a fundamental principle of thought that boils down to the very core of human cognition. This theory integrates insights from psycholinguistics, neuroscience, and artificial intelligence into a cohesive framework: cognition operates through selective activation of sparse encodings, enabling efficient information processing and infinite combinatorial potential.

Sparse Representations: Both neural networks and the human brain encode information sparsely, ensuring efficiency and adaptability.
Dynamic Contextualization: Overlapping representations are activated selectively based on context, enabling flexible thought and language use.
Infinite Combinatorial Potential: Sparse patterns allow for the construction of infinite novel ideas or sentences from finite neural or computational resources.

The Big Picture

The convergence of Chomskian linguistics and modern AI reveals a profound truth: Finite systems can generate infinite possibilities through clever use of structure and context. Superposition in neural networks exemplifies this, offering a solution to the dimensional constraints that once seemed impossible.

By embracing sparsity and overlapping representations, we not only enhance our AI models but also gain insights into the fundamental principles of human cognition. Mechanistic interpretability is vital, ensuring that as we push the boundaries of what machines can do, we remain connected to how and why they do it.

This understanding also raises new research opportunities and challenges. How can we empirically validate sparse superposition in human neural processes, and what experimental designs can reveal its role in memory, language, and cognition? In AI, further work is needed to refine mechanistic interpretability techniques, ensuring that we can disentangle overlapping representations and align them with human understanding.

In understanding these mechanisms, we take a step closer to understanding ourselves — our languages, our thoughts, and the intricate neural foundations that make us who we are.

Other Sources:

Full Lex Friedman podcast: https://open.spotify.com/episode/69V7CtdbB8blcxNPXvpnmk?si=baaad72b518b4294
A related snippet I found on YouTube: https://www.youtube.com/watch?v=JIHOdpj7WM4
Code for my visualizations on GitHub
Mikolov, T., et al. (2013). “Efficient Estimation of Word Representations in Vector Space.” https://arxiv.org/pdf/1301.3781
Olah, C., et al. (2020). “Zoom In: An Introduction to Circuits.” Distill. https://distill.pub/2020/circuits/zoom-in/
Beyeler M, Rounds EL, Carlson KD, Dutt N, Krichmar JL (2019) Neural correlates of sparse coding and dimensionality reduction. PLoS Comput Biol 15(6): e1006908. https://doi.org/10.1371/journal.pcbi.1006908
Rentzeperis I, Calatroni L, Perrinet LU, Prandi D (2023) Beyond ℓ1 sparse coding in V1. PLoS Comput Biol 19(9): e1011459. https://doi.org/10.1371/journal.pcbi.1011459