Evolution, brain architecture, and compute efficiency triptych

For years, the dominant narrative in artificial intelligence has been one of scale. The logic seems sound: bigger models, more data, and more GPUs will inevitably lead to greater intelligence. Eric Schmidt recently crystallized this view, stating AGI could arrive by 2030 through continued scaling.

I believe this fundamentally misunderstands the problem.

The real bottleneck isn’t computational scale; it’s an efficiency chasm that evolution crossed billions of years ago. In my last article, I quantified the staggering computational scale of the human evolutionary process. But raw FLOPS are only half the story. When we examine the energy required, the results are so stark they demand a new roadmap for AGI.

1. Revisiting the Evolutionary Processing Unit (EPU)

Let’s briefly recap the thought experiment. The Evolutionary Processing Unit (EPU) represents the cumulative computational effort of all human brains that have ever existed.

Using conservative, mid-range estimates:

  • Number of humans who have ever lived: 117 billion
  • Processing capacity per brain: ~500 petaFLOPS (5 × 10¹⁷ FLOPS)
  • Average lifespan: ~70 years (2.2 × 10⁹ seconds)

The calculation is straightforward:

Cumulative EPU Computation = 117e9 humans × 2.2e9 seconds × 5e17 FLOPS ≈ 1.3 × 10³⁸ FLOPS

This number, ~1.3 × 10³⁸ brain-FLOPS, is so vast that it would take a modern exascale supercomputer thousands of times the current age of the universe to match it. But this is just the prelude. The real story begins when we ask about the energy bill.

2. The Brain’s Secret Weapon: Unmatched Efficiency

Digital computers and biological brains are fundamentally different architectures. But energy consumption is a universal, physical metric. It doesn’t care about architecture; it measures the fundamental cost of computation.

The human brain’s efficiency is its crowning achievement. It operates on a mere ~20 watts of power—less than a standard light bulb. If we accept its processing capacity of ~500 petaFLOPS, its efficiency is staggering:

Efficiency_brain = 5e17 FLOPS / 20 W = 2.5e16 FLOPS/W

That is 25 PetaFLOPS per Watt. Now, let’s calculate the total energy consumed by the EPU:

Energy_EPU = Total FLOPS / Efficiency = (1.3e38 FLOPS) / (2.5e16 FLOPS/W) = 5.2e21 Joules

To make this tangible, let’s convert Joules to Terawatt-hours (TWh), the unit used for national energy consumption:

Energy_EPU ≈ 5.2e21 J / 3.6e15 J/TWh ≈ **1.44 Million TWh**

Let’s put that number in context. The entire United States consumes about 4,000 TWh of electricity annually. The cumulative energy of the EPU is equivalent to powering the entire modern US for over 360 years.

3. The AI Counterpoint: A Drop in the Energy Ocean

Now, let’s contrast this with the training of a state-of-the-art AI model, like OpenAI’s GPT-4.

Expert estimates place GPT-4’s training energy at approximately 50-62 GWh. Using the conservative estimate of 50 GWh:

Energy_GPT-4 ≈ 0.05 TWh (50 GWh or 50,000 MWh)

This aligns with estimates that training GPT-4 consumed enough energy to power several thousand average US homes for a year. It’s a significant amount, rightly scrutinized for its environmental impact.

But when placed next to the EPU, the comparison becomes almost surreal.

System Total Computation (FLOPS) Total Energy Consumption
Evolutionary Processing Unit (EPU) ~1.3 × 10³⁸ ~1,440,000 TWh
OpenAI GPT-4 (One Training Run) ~2.2 × 10²⁵ ~0.05 TWh (50 GWh)

The energy used by the entire EPU is approximately 30 million times greater than the energy used to train GPT-4.

You could train a model like GPT-4 30 million times for the same energy cost evolution “spent” to architect the human brain.

4. The “Pre-Loaded” Architecture: What We’re Really Missing

This 30-million-fold gap is not a challenge for engineers to overcome with more power plants. It is a definitive sign that we are on the wrong architectural path.

The energy of the EPU wasn’t spent on “training” one giant model. It was spent on evolution’s R&D to find the perfect starting architecture. A human infant’s brain is not a blank slate. It is a pre-evolved system that comes pre-loaded with the foundational software for intelligence:

  • Embodiment: The brain is part of a closed-loop system with a body, sensors, and actuators, learning through direct interaction with a physical world.
  • Innate Structure: Infants possess pre-configured “inductive biases” for causality, object permanence, and social cognition. They don’t learn these from scratch; they learn with them.
  • Subconscious Processing: The brain seamlessly manages a trillion-cell biological organism, a task of unimaginable complexity that occurs entirely below the level of consciousness.

Evolution invested aeons of energy to discover a learning algorithm that is data-efficient, energy-frugal, and inherently robust. We are trying to skip this R&D phase and jump straight to the final product with an architecture that is, by its very nature, spectacularly inefficient for the task.

4.5 Why This Actually Strengthens the EPU/BPU Argument

Some might argue: “If we can get impressive results with GPT-4 using just 50 GWh, doesn’t that prove scaling works?”

This misses the point entirely. GPT-4 is impressive at pattern matching and language generation, but it fundamentally lacks:

  • Causal reasoning - It can’t distinguish correlation from causation
  • Grounded understanding - It has no model of physical reality
  • Continuous learning - It’s frozen after training
  • Energy efficiency at inference - Running GPT-4 costs millions per day; your brain runs on 20 watts continuously

The 30-million-fold gap isn’t about training one model—it’s about the R&D cost of discovering the right architecture. Evolution didn’t just train one brain; it spent 1.44 million TWh discovering an architecture that:

  • Learns continuously from minimal data
  • Operates at 25 PetaFLOPS per watt
  • Handles causal reasoning, not just correlation
  • Adapts to novel situations without retraining

We’re celebrating getting partial intelligence from 50 GWh, while ignoring that evolution discovered complete intelligence through an architecture that required 30 million times more energy to develop—but now runs on the power of a light bulb.

The efficiency gap proves we haven’t found the right architecture yet. We’re brute-forcing our way to narrow capabilities while evolution found a generalizable solution.

5. A New Roadmap for AGI: Learning from the EPU

The conclusion is inescapable. Attempting to achieve AGI by scaling current transformer-based architectures is not just impractical; it is a fundamental misreading of how intelligence arises.

The path forward is not to out-compute evolution, but to learn from its 4-billion-year-old blueprint. As I’ve argued before, we must shift our focus from scale to architecture. This means:

  1. Developmental AI: Building systems that learn in stages like a child, not in single, massive training runs.
  2. Embodied Cognition: Prioritizing AI that interacts with the real world through robotics and simulations, grounding its learning in physics and cause-and-effect.
  3. Modular & Orchestrated Systems: Moving beyond monolithic models to federations of specialized systems (for reasoning, memory, perception) coordinated by a central executive, much like the brain’s own structure.
  4. Innate Inductive Biases: Designing network architectures with built-in priors for physics, causality, and social reasoning, rather than expecting it all to be learned from data alone.

Conclusion: The Blueprint, Not the Brute Force

The 30-million-fold efficiency gap is more than a staggering statistic. It’s a roadmap.

Evolution spent 1.44 million TWh—the equivalent of 360 years of total US energy consumption—not on training a single brain, but on R&D. On discovering the architectural principles that make intelligence possible: embodiment, modularity, causal grounding, continuous plasticity, and ruthless energy efficiency.

We’re taking shortcuts. We’re trying to skip the R&D phase and jump straight to AGI with an architecture (transformers) that was never designed for general intelligence. Yes, we’ve achieved remarkable things—GPT-4 is a technological marvel. But marveling at GPT-4’s language abilities while pursuing AGI through scaling is like celebrating that we can build faster cars and assuming that will eventually get us to Mars.

The path to AGI isn’t hidden in the next training run or the next data center. It’s encoded in the architecture that evolution spent four billion years optimizing. The future belongs not to those who can afford the most GPUs, but to those humble enough to learn from biology’s billion-year head start.

AGI will not be built by brute force. It will be reverse-engineered from the only blueprint that works: the one evolution already gave us.

The efficiency gap isn’t a problem to overcome—it’s a lesson to learn from.


What biological principle do you think is most critical for AI to learn next? Join the conversation on [Link to your LinkedIn Post].