Home Machine Learning The Architecture Mismatch at the Heart of Modern AI

Machine Learning

The Architecture Mismatch at the Heart of Modern AI

February 3, 2026

Author(s): Marc Bara

Originally published on Towards AI.

Photo by Google DeepMind on Unsplash

We have exactly one example of general intelligence: the human brain. We are spending hundreds of billions trying to build another with AI. And we are not copying the one that works. Why? The answer has less to do with science than with which hardware happened to be available.

This is not obvious at first glance. Over the last few years, research comparing brains and large language models has revealed genuine convergence: both rely on prediction, hierarchical representations, error correction, compression. But they diverge sharply in how those principles are implemented. The human brain processes information sequentially, causally, under tight energy constraints, while modern AI systems rely on massive parallelism, frozen weights during inference, and hardware optimized for matrix multiplication at scale, the exact operation GPUs were built to accelerate.

This gap is not an accident, and it is not primarily scientific. It is the result of architectural choices shaped by hardware, tooling, and capital investment. Changing what models predict is comparatively easy. Changing how they compute, and what physical substrate supports that computation, is much harder. Understanding this mismatch is essential if we want to understand both the limits of today’s AI systems and the kinds of intelligence they are structurally capable of producing.

The science: where brains and LLMs converge and diverge

Research comparing LLMs to brains has produced findings that are both encouraging and sobering. A 2024 study in Nature Machine Intelligence found that as LLMs advance, they become more brain-like: models with better performance show greater correspondence with neural activity patterns, with alignment peaking in intermediate layers. This suggests convergent computational principles: prediction as a core operation, hierarchical representations, statistical learning, error-driven updates.

But the architectural differences matter. Unlike transformers, which process hundreds or thousands of words simultaneously, language areas analyze input serially, word by word, recurrently and temporally. Human attention is guided by goals, emotions and novelty; it fluctuates and is limited. LLM attention is purely algorithmic. The brain uses approximately 20 watts for 86 billion neurons; LLMs require megawatts. And a NeurIPS 2024 paper found that much of the neural encoding performance attributed to LLMs is driven by simple features like sentence length and position, urging caution before drawing strong conclusions about cognitive similarity. This is where most discussions stop. The next question is harder: if brains and LLMs use similar principles, why are they built so differently?

At this point, a common objection appears: intelligence may be achievable through different physical substrates. Biological brains evolved under constraints radically different from silicon, including energy budgets, material availability, and the need for continuous online learning. Perhaps transformers represent a valid alternative path to intelligence, one that trades biological elegance for brute parallel computation. This argument has merit. But it does not address efficiency, scalability to embodied agents, or the specific cognitive capacities that current architectures demonstrably lack. The question is not whether transformers can be intelligent in some sense, but whether they can be intelligent in the ways that matter for the problems we want to solve. That distinction matters once we move from abstract intelligence to systems that must act, learn, and adapt in the world.

Why transformers won: hardware, not biology

Transformers did not triumph because they resemble brains. They triumphed because they fit the hardware. The key advantage is parallelization: transformers have no recurrent units, so they can process entire sequences simultaneously during training. As the Mamba paper notes, RNNs and LSTMs were more brain-like in important ways: sequential processing, state maintenance, temporal integration. But the recurrent process does not exploit modern GPUs, which were designed for parallel matrix operations. Training recurrent models was slow; training transformers was fast.

The brain does not need transformers because it operates under different constraints. It runs in real-time, causally integrated with the world. It has a 20-watt power budget. It performs many integrated tasks simultaneously: perception, action, homeostasis, emotion. It learns continuously during operation, not in separate training and inference phases. Transformers work despite being architecturally different from brains, compensating with scale: more parameters, more data, more compute.

LeCun’s critique: correct about objectives, silent about architecture

This distinction (between what models predict and how they compute) becomes clearer when applied to the most prominent critique of LLMs.

Yann LeCun’s scientific claim is straightforward: models whose core task is predicting the next token cannot achieve true understanding, reasoning, or human-like intelligence, regardless of scale. He has called autoregressive LLMs insufficient for human-level intelligence, or even cat-level intelligence. This claim is defensible. Token prediction is a weak training signal for world modeling, planning, and causal reasoning.

His proposed solution is JEPA: Joint Embedding Predictive Architecture. Instead of predicting tokens, JEPA predicts continuous embeddings (numerical vectors representing meaning) in a shared semantic space. Instead of reconstructing raw inputs, it predicts abstract representations. This is a meaningful change at the objective layer. JEPA learns to predict states of the world rather than words about the world.

But here is the architectural continuity that goes largely unnoticed. LeCun himself clarified: “JEPA is not an alternative to transformers. In fact, many JEPA systems use transformer modules. It is an alternative to Auto-Regressive Generative Architectures, regardless of whether they use transformers.” The technical details confirm this. I-JEPA consists of three Vision Transformers: context encoder, predictor, and target encoder. V-JEPA uses the same backbone. At the architecture layer, JEPA is still transformers with parallel attention, backpropagation, and GPU-optimized matrix operations.

A Slashdot commenter captured this precisely: “The block diagram for his JEPA solution is the same thing just predicting next floating latent space token instead of discrete word-token. Which is very powerful and cool but I mean it’s not like he is getting rid of backprop or convolution or even attention really.”

LeCun changes what the model predicts. He does not change how the model computes. If the architectural mismatch between transformers and brains matters, JEPA does not solve it.

Separate from the scientific and architectural questions, there is the organizational story at Meta. LeCun admitted that Llama 4’s published benchmarks were misleading: “Results were fudged a little bit. The team used different models for different benchmarks to give better results”. Zuckerberg reportedly lost confidence in the GenAI organization and sidelined it. LeCun left Meta at the end of 2025 to launch AMI Labs in Paris, seeking a valuation of 3 to 5 billion euros. This organizational failure is distinct from the scientific and architectural questions, but it contextualizes why LeCun’s alternative has not gained traction inside Meta.

Who is actually attacking the architecture

If the architectural layer matters, we should look at researchers who are changing how models compute, not just what they predict. The approaches below differ substantially, but each breaks a core transformer assumption: that computation should be parallel rather than sequential, that weights should be frozen at inference, or that processing should happen on von Neumann hardware (conventional computers where memory and processing are separate). None of these approaches is a silver bullet. Each targets a different constraint imposed by current architectures.

State Space Models: Mamba and RWKV. Albert Gu at Carnegie Mellon and Tri Dao at Princeton developed Mamba, based on selective state space models. These models have linear complexity rather than quadratic (processing time grows steadily with input length instead of exploding), can handle sequences of millions of tokens, and reintroduce a form of recurrence that transformers abandoned. State space models originate from neuromorphic spiking models, representing a departure from discrete token processing toward continuous information flow. RWKV, led by Bo Peng at Recursal AI, achieves a similar result differently: it trains in parallel like a transformer but runs like an RNN during inference, maintaining constant memory regardless of sequence length. RWKV has been integrated into Windows updates reaching over a billion devices for power-sensitive applications. Both architectures address sequential, temporal processing with state memory, properties that transformers lack.

Test-Time Training. Researchers at Stanford, UC San Diego, UC Berkeley and Meta proposed TTT layers, where the hidden state is itself a machine learning model that updates via self-supervised learning during inference. Unlike transformers, which apply frozen weights to new inputs, TTT models continue learning as they process. This is closer to how brains work: no rigid separation between learning and inference. TTT-Linear matches Mamba in wall-clock time while continuing to improve with longer context where Mamba plateaus.

Neuromorphic hardware. Intel’s Hala Point system uses Loihi 2 processors to implement brain-inspired computing: asynchronous spiking neural networks, integrated memory and computation, sparse and continuously changing connections. Hala Point supports 1.15 billion neurons and 128 billion synapses while consuming a maximum of 2,600 watts. Jason Eshraghian at UC Santa Cruz, with Intel Labs and the University of Groningen, deployed a MatMul-free LLM on Loihi 2, achieving 3x higher throughput with 2x less energy compared to transformer-based LLMs on edge GPUs. This is the first modern LLM architecture on neuromorphic hardware, attacking the problem at the hardware layer rather than just the objective or architecture.

Intel Hala Point neuromorphic system. Photo: Intel Corporation

Predictive coding. Karl Friston’s free energy framework describes how brains might implement Bayesian inference through hierarchical prediction error minimization. Networks pass prediction errors upward and predictions downward, with local learning based on discrepancy between expectation and reality. A 2023 Nature Communications study demonstrated that free energy minimization quantitatively predicts the self-organization of neuronal networks in vitro. This theoretical framework suggests a fundamentally different computational organization, but it has not yet translated into competitive architectures for practical AI tasks. The gap is partly engineering: predictive coding requires bidirectional message passing and local learning rules that do not map cleanly onto current hardware. It is also partly empirical: no one has yet demonstrated predictive coding networks that match transformer performance on standard benchmarks. The theory is compelling; the implementation path remains unclear. For now, predictive coding remains a theory of how brains work more than a recipe for building competitive AI systems.

Why these alternatives remain marginal: infrastructure lock-in

If these alternatives offer genuine advantages, why do transformers still dominate? The answer is path dependency and infrastructure economics, not scientific weakness.

NVIDIA controls 80 to 90 percent of the AI accelerator market. CUDA has created a flywheel: more developers use CUDA, so more libraries and frameworks are built on it, so more developers use CUDA. The Transformer Engine in NVIDIA’s Hopper architecture delivers up to 9x faster training and 30x faster inference on LLMs compared to previous generations. As transformers became dominant, hardware was optimized for transformers, which made transformers faster, which made them more dominant. Once an ecosystem reaches this scale, technical superiority alone is rarely enough to dislodge it.

Switching to recurrent, neuromorphic, or spiking architectures would require not just new algorithms but new hardware infrastructure at massive scale. Companies that could benefit from more efficient architectures are trapped in an ecosystem where switching costs are prohibitive. The technical merits of alternatives matter less than the hundreds of billions already invested in transformer-optimized infrastructure.

What kind of intelligence are we building?

The current industry bet is clear: scale transformers on parallel hardware, optimize the training objective, and assume that intelligence will emerge. LeCun’s contribution is recognizing that the training objective matters. His limitation is not attacking the architecture or hardware.

A genuinely brain-like system would require:

Recurrent processing with loops rather than pure feedforward computation
Sparse activation where only some neurons fire
Continuous learning during inference rather than frozen weights
Causal temporal integration in real-time rather than processing entire sequences at once
Multiple temporal scales for working memory, short-term retention, and long-term consolidation

JEPA offers none of these. Neither do current LLMs.

The researchers working on Mamba, RWKV, TTT, neuromorphic hardware, and predictive coding are attacking different pieces of this problem. They work at the margins not because their science is weak but because the infrastructure is locked in. Breaking path dependency requires not just better ideas but the willingness and resources to build new infrastructure from scratch.

Transformers are structurally biased toward a very specific kind of intelligence: massively parallel, context-window bounded, and frozen at inference time. They excel at pattern completion over static corpora. They struggle with continuous adaptation, causal reasoning over time, and integration with embodied action. This is not a limitation that scaling will fix. It is a consequence of architectural choices made to fit available hardware.

The question is not whether LLMs are like the brain. The question is whether the intelligence we are building can do what we actually need: adapt in real-time, reason causally, act in the world, and learn continuously from experience. Current architectures are optimized for a different problem. The industry has made its bet. The researchers at the margins are making a different one. Time, and constraints we have not yet hit, will decide which matters more.

Marc Bara is a project management consultant and educator with a PhD in Electrical Engineering. He writes about AI, work, and project management. Find him on LinkedIn.

Published via Towards AI

The Architecture Mismatch at the Heart of Modern AI

Author(s): Marc Bara

The science: where brains and LLMs converge and diverge

Why transformers won: hardware, not biology

LeCun’s critique: correct about objectives, silent about architecture

Who is actually attacking the architecture

Why these alternatives remain marginal: infrastructure lock-in

What kind of intelligence are we building?

LEAVE A REPLY Cancel reply

APLICATIONS

I tested Candy AI for 30 days: Here's what really happened

And for diplomacy of a board game

Studio updates, YouTube Live, new Gen AI tools and everything else...

Casio’s AI-Powered Robot Pet: Your Next Best Friend

HOT NEWS

Pinterest is experimenting with new AI-powered personalized boards

Discovering the Ideal AI Application for Guidewell DCMG

Get to Know the Humans and the AI

3 Questions: Modeling of the opposite intelligence to use vulnerabilities in...

POPULAR POSTS

Advantages and Disadvantages of the Top 14 AI Applications in 2024

National Recognition for GPHA Takoradi Hospital’s A.I. Application Focus Lab Week...

KRISP uses artificial intelligence to help Indians sound like Americans on...

POPULAR CATEGORY

Olmo 2 vs Claude 3.5 SONET: AI-to-Head Ai Showdown