A conceptual triptych contrasting showing the progression of focus, from a Radio Telescope to a Microscope.

Why Intelligence, Artificial and Human, Depends on What We Choose to Ignore

“My experience is what I agree to attend to.” — William James

In 2017, a group of Google researchers published a paper with an audacious title: Attention Is All You Need.

It introduced the transformer architecture that powers every major AI system today: ChatGPT, Claude, GPT-N, all of them.

The breakthrough wasn’t more data or bigger models. It was selective focus: teaching systems to dynamically weight which parts of input matter most for the task at hand.

Yet here we are, eight years later, drowning in information we can’t process, distracted by notifications we can’t ignore, building AI systems that demand ever more of our fragmenting attention.

What if the 2017 paper got it right, not just for AI, but for us?

What if attention isn’t merely a technical mechanism but the fundamental constraint that shapes all intelligence, biological and artificial?


The Technical Insight

Before transformers, neural networks processed sequences linearly, like reading a sentence word by word with no ability to jump back or look ahead. They had fixed, limited “memory” of what came before.

Transformers changed everything with a deceptively simple idea: let the model decide what to pay attention to.

When processing the word “bank,” does it mean riverbank or financial institution? The attention mechanism looks at surrounding context, “deposit,” “loan,” “account,” and weights those words more heavily. It doesn’t process all input equally. It prioritizes.

This wasn’t just an incremental improvement. It was a paradigm shift, a sudden change in behavior that rewrote what was possible. Models could now handle longer contexts, understand deeper relationships, and scale to problems that were previously intractable.

But here’s what matters: the breakthrough came from architectural constraint, not raw power. By forcing the system to allocate limited attention strategically, the researchers discovered something profound.

Selective focus isn’t a limitation to overcome. It’s the mechanism that makes intelligence possible.


Nature’s Version

The human brain processes roughly 11 million bits of sensory information per second. Your eyes, ears, skin, and other sensory systems deliver a constant flood of data.

Your conscious awareness? About 50 bits per second.

That’s not a bug. That’s exquisite engineering.

The brain doesn’t try to process everything. It can’t. Instead, evolution built attention mechanisms, sophisticated filters that decide what’s signal and what’s noise, what gets through to conscious processing and what stays in the background.

Daniel Kahneman called this our attention budget. Donald Hoffman went further: evolution shaped perception not for truth, but for fitness. We notice what helps us survive, not necessarily what’s real.

When you cross a busy street, your attention system amplifies movement in your peripheral vision while suppressing the conversation you were just having. When you search for your keys, it highlights small metallic objects while dimming everything else.

This filtering isn’t passive selection. It’s active suppression. Your brain constantly inhibits irrelevant information, not just spotlighting what matters.

And here’s the critical part: attention is metabolically expensive. The prefrontal circuits that control attention consume enormous energy relative to their size. Evolution doesn’t invest in expensive mechanisms unless they are essential.

Why? Because unlimited processing would be catastrophic. An organism that tried to consciously attend to everything, every sound, every shadow, every internal sensation, would be paralyzed. Or eaten.

Attention evolved not despite limited capacity, but because of it. The constraint is the feature.


The Cost We’re Paying

Our attention mechanisms evolved in an environment of scarcity: limited threats, limited opportunities, limited information.

We now live in the opposite world.

Social-media platforms employ thousands of engineers whose sole job is to capture and hold your attention. Notification systems exploit the same dopamine circuits that once kept us alert to predators. News feeds leverage novelty bias, the same mechanism that made our ancestors notice rustling grass.

These aren’t accidents. They’re attention-extraction systems, engineered to hijack circuits that can’t tell a status update from a threat.

The result? We train our attention systems on trivia while complex, important problems go unexamined. We build neural pathways for distraction and atrophy the circuits for sustained focus.

And it’s getting worse. The average person now switches tasks every three minutes. Deep work, the sustained attention required for learning, creativity, and wisdom, is becoming rare.

This matters because attention isn’t just about focus in the moment. What we attend to literally shapes our neural architecture. The brain is plastic. It rewires based on use. Neurons that fire together wire together. If you spend hours scrolling, you’re training your brain for scrolling. If you spend hours in deep focus, you’re building capacity for deep focus.

Attention isn’t neutral. It’s formative. What we choose to notice reveals what we value. To attend to another person’s pain is compassion. To ignore it is indifference. Our repeated choices literally wire who we become.


What This Means for AI

Here’s where it gets interesting.

Both biological and artificial intelligence face the same fundamental constraint: limited attention in an unbounded information space.

The current AI race assumes more compute solves everything: bigger models, more parameters, more training data. But we’re already seeing diminishing returns. GPT-N is better than GPT-3, but not proportionally better relative to the compute increase.

Scaling up without selective focus is like trying to live wisely by doing everything at once. We can add data, energy, and speed, but without direction we simply amplify noise. Busyness is not mastery. Efficiency is not intelligence.

Why? Because the problem isn’t raw processing power. It’s attention allocation.

A system that processes everything equally is wasteful. A system that can’t prioritize can’t reason about complex, multi-step problems. A system without attention architecture can’t adapt to novelty.

The transformer breakthrough worked precisely because it imposed attention constraints. The next breakthrough won’t come from removing those constraints. It will come from making them smarter.

This is why I focus on AHI (Augmented Human Intelligence) rather than AGI (Artificial General Intelligence). The current approach, building monolithic models that try to do everything, is architecturally flawed. It’s the AI equivalent of trying to build a brain with more neurons but no specialized regions.

Better AI means modular systems where specialized components handle specific tasks, coordinated by attention mechanisms that route problems to the right processors and integrate results, just like the brain.

And critically, these systems should augment human attention, not compete for it.

Most current AI systems are attention thieves. They demand that we read their outputs, verify their results, prompt them effectively, manage their limitations. They fragment our focus across tools, screens, and contexts.

We should be building attention amplifiers: systems that handle routine processing so humans can focus on judgment, strategy, and meaning. Systems that compress noise so we can attend to signal.

This is Augmented Human Intelligence: a hybrid architecture of awareness where humans bring values and judgment, and AI brings endurance and pattern recognition. When AI filters noise, humans focus on meaning. When humans define meaning, AI aligns its attention accordingly.


Intelligence Is Attention Plus Architecture

The 2017 paper got it right, just incompletely.1

Attention is all we have, all we need, but only if we understand what attention actually is.

It’s not merely a technical mechanism for weighting tokens in a sequence. It’s the fundamental principle that lets bounded systems operate in unbounded environments. It’s how intelligence navigates the gap between infinite possibility and finite capacity.

For AI, this means moving beyond brute-force scaling toward architectures that allocate attention strategically, dynamically, efficiently.

For humans, it means recognizing that our attention is both our most valuable resource and our most vulnerable point of influence. We can’t opt out of the attention economy, but we can build better defenses and invest in better tools.

This architectural thinking applies beyond AI systems. About twelve months ago, after a reduction in force, I caught myself pursuing a like-for-like role replacement. I was allocating attention reactively instead of deliberately. That realization led to the Mastery of Life framework, treating fulfillment as an optimization problem where competing domains (health, relationships, growth, contribution) vie for finite attention.

The question isn’t whether we have enough attention to solve our problems. We never will.

The question is whether we can build systems, technological and social, that help us attend to what matters.

That’s the architecture challenge of our generation.


For a deeper exploration of how attention mechanisms constrain and enable intelligence across biological and artificial systems, see my whitepaper: Attention Is All We Have

  1. Vaswani, A., et al. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems, 30. arxiv.org/abs/1706.03762