Here's a number that should make you stop and think:

80-90% of AI's lifetime costs come from inference, not training.

Let me say that again, because most people—including most investors, most tech executives, and most policymakers—don't understand this yet.

Training a frontier AI model is expensive. GPT-4 reportedly cost over $100 million to train. DeepSeek's models cost a fraction of that, but still millions. These numbers make headlines. They're big, scary, impressive.

But they're also one-time costs.

Inference—the process of actually using the model, of serving billions of queries to millions of users—is a continuous cost. And it scales with adoption.

The more successful your AI product, the more expensive it becomes to run.

This is the hidden economics of AI that nobody talks about. The inference economy. And it's about to become the single largest driver of data center buildout, energy consumption, and infrastructure investment over the next decade.

Here's why: inference now accounts for 55-65% of all AI compute, up from 33% in 2023. By 2028, it will hit 70-80%. The inference-optimized chip market alone is projected to exceed $50 billion in 2026.

And unlike training—which you do once per model version—inference happens every single time someone uses your product.

Every ChatGPT query. Every Midjourney image. Every Claude code completion. Every AI agent action.

That's inference.

And the bill never stops coming.

The $200-to-$10,000 Problem

Let me tell you a story that perfectly captures the inference trap.

A construction company built an AI predictive analytics tool. During development, it cost them under $200/month. Reasonable. Manageable.

Then they launched it to actual users.

First month's bill? $10,000.

They weren't hacked. They didn't misconfigure anything. Their users just… used it. And usage at scale is 40-60% more expensive than most teams estimate when moving from development to production.

Here's the math that kills startups:

  • Development testing: Hundreds of queries per day

  • Production usage: Millions of queries per day

The pricing that looked reasonable in testing—$3 per million input tokens, $10 per million output tokens for GPT-4 (now succeeded by GPT-5.2)—suddenly becomes catastrophic at scale.

And here's the kicker: output tokens cost 2-5x more than input tokens because generation is computationally expensive.

So that innocent-looking chatbot that helps customers with support tickets? Every response it generates is burning money. The longer and more helpful the response, the more it costs.

Monthly production costs scale from:

  • $10-50 for personal projects

  • $1,000+ for small business applications

  • $10,000-100,000+ for enterprise applications

  • Millions for consumer-scale products

And this is just the API fees. If you're running your own infrastructure, you're paying for GPUs, electricity, cooling, and data center space 24/7, whether anyone is using your model or not.

The 1,000x Cost Collapse (That Still Isn't Cheap Enough)

Now, here's where it gets interesting.

Inference costs have dropped 1,000x in just three years.

In late 2022, running a GPT-4-class model cost approximately $20 per million tokens. In early 2026, equivalent performance costs $0.40 per million tokens—or less with providers like DeepSeek.

That's one of the fastest cost declines in computing history.

What drove this collapse?

1. Hardware efficiency gains
Each GPU generation delivers 2-3x more inference throughput per dollar. The H100 processes roughly 3x more tokens per second than the A100 at a similar price point.

2. Software optimization
Inference frameworks like vLLM, TensorRT-LLM, and SGLang improved GPU utilization from 30-40% to 70-80% through continuous batching, PagedAttention, and speculative decoding.

3. Model architecture efficiency
Mixture-of-Experts (MoE) models like Mixtral and DeepSeek V3 activate only a fraction of total parameters per token, delivering frontier-quality output at 3-5x lower compute cost.

4. Quantization and distillation
Running models at INT8 or INT4 precision reduces memory and compute requirements by 2-4x with minimal quality loss.

But here's the paradox: even with a 1,000x cost reduction, inference is still too expensive for most use cases to be profitable.

Why? Because demand is growing faster than costs are falling.

The Jevons Paradox of AI

There's an economic principle called the Jevons Paradox: when you make something more efficient, people use more of it, not less.

Coal-fired steam engines got more efficient in the 1800s. Did coal consumption fall? No. It exploded—because suddenly steam power was economical for a thousand new uses.

The same thing is happening with AI inference.

At $20 per million tokens, only the highest-value enterprise applications justified LLM deployment.

At $0.40 per million tokens, every SaaS product, internal tool, and consumer app can embed AI.

The addressable market expands by orders of magnitude.

So even though per-token costs are falling, total inference spending is growing. And it's growing fast.

Total GPU-hours consumed for inference are increasing, supporting rental rates on GPU marketplaces. The inference market is projected to exceed $50 billion in 2026, growing faster than training for the first time.

This is the inference economy.

And it's going to consume more electricity, more GPUs, more data center space, and more capital than training ever did.

The Energy Math That Doesn't Work

Let's talk about what this means for energy.

A single ChatGPT query uses roughly 10 times more electricity than a Google search according to IEA estimates.

Doesn't sound like much, right?

Now multiply that by billions of queries per day.

ChatGPT alone handles an estimated 1+ billion queries daily, leading to about 300 megawatt-hours (MWh) of electricity consumption per day and over 260,000 kilograms of CO₂ emissions per month from ChatGPT use alone.

And that's just one AI product.

Add in:

  • Google Gemini (hundreds of millions of users)

  • Microsoft Copilot (integrated into Office for 400+ million users)

  • Claude, Midjourney, Stable Diffusion, and dozens of other consumer AI products

  • Enterprise AI applications (customer service, coding assistants, analytics)

  • AI agents (which generate 5-50x more tokens per task than simple Q&A)

And you start to see the scale of the problem.

By 2028, over half of data center electricity will be used for AI, with inference accounting for 80-90% of that.

At that point, AI alone could consume as much electricity annually as 22% of all US households.

And here's the kicker: the energy used for inference is growing faster than renewable energy capacity.

Data centers now use electricity that is 48% more carbon-intensive than the US average, because they're increasingly powered by natural gas to meet immediate demand.

The math doesn't work.

We're building an economy on top of a technology whose energy requirements are growing exponentially, while our ability to provide clean energy for it is growing linearly.

Why Inference Is Different From Training

Let me break down why inference economics are fundamentally different from training economics:

Factor

Training

Inference

Workload Duration

Days/weeks per run

Continuous 24/7

Lifetime Cost Share

10-20%

80-90%

Scaling Pattern

Predictable

Variable demand

Hardware Utilization

High (batch)

Variable (request-driven)

Optimization Focus

Time-to-train

Cost-per-token

Competitive Landscape

NVIDIA dominant

More alternatives viable

Training is CapEx. You spend a lot upfront, but it's a known, finite cost.

Inference is OpEx. It's a continuous burn that scales with success.

And here's the brutal reality: if your inference costs scale faster than your revenue, you go bankrupt.

This is why so many AI startups are struggling. They built products that users love, usage is growing, and they're… losing money on every query.

The unit economics don't work.

The Cost-Per-Token War

This brings us to the most important metric in AI today: cost-per-token.

If your inference cost per million tokens is $1.00 and you charge $2.00, your gross margin is 50%.

If inference costs drop to $0.40 per million tokens, the same pricing yields 80% margin—or you can cut prices to grow users.

This is why a price war is brewing.

DeepSeek's models cost 20-50x less than OpenAI's equivalent, with inference at roughly $0.14 per million input tokens and $0.28 per million output tokens for DeepSeek-V3.

Compare that to:

  • GPT-4 (deprecated): $3.00 / $10.00 per million tokens

  • Claude 3.5 Sonnet (deprecated): $3.00 / $15.00 per million tokens

  • Gemini 1.5 Pro (deprecated): $1.25 / $5.00 per million tokens

DeepSeek is 90% cheaper.

How? Through a combination of:

  • Mixture-of-Experts architecture (only 37B of 671B parameters active per query)

  • Aggressive quantization (4-bit and 8-bit precision)

  • Context caching (reusing repeated inputs, cutting costs by 75-90%)

  • Open-source distribution (allowing self-hosting with no API fees)

  • Subsidized pricing (possibly loss-leading to gain market share)

And here's the thing: OpenAI, Google, and Anthropic can't ignore this.

If a competitor offers 90% lower costs with comparable quality, enterprise customers will switch. Developers will switch. Consumers will switch.

So the incumbents have two choices:

  1. Cut prices (destroying margins)

  2. Increase efficiency (requiring massive R&D and infrastructure investment)

Most likely, they'll do both.

This is the inference cost war. And it's just beginning.

The Hardware Shift: Why H100s Are the Wrong Tool

Here's something most people don't understand: the H100 is optimized for training, not inference.

The H100 dominates AI training because training requires:

  • Maximum memory bandwidth

  • Inter-GPU communication (NVLink)

  • FP16/BF16 compute precision

Inference has different requirements:

  • Low latency (respond in milliseconds)

  • High throughput (serve many requests simultaneously)

  • Cost efficiency (maximize queries per dollar)

And for those requirements, the H100 is often overkill.

An L40S delivers comparable cost-per-token at one-quarter the price:

GPU

MSRP

Inference Throughput (Llama 70B)

Cost per 1M Tokens

H100 80GB

$30,000

~2,800 tokens/sec

$0.30

L40S 48GB

$7,500

~1,200 tokens/sec

$0.18

L4 24GB

$2,500

~400 tokens/sec

$0.17

A100 80GB

$10,000 (used)

~1,600 tokens/sec

$0.17

For pure inference workloads, the L40S is the better choice. You get similar cost-per-token at a fraction of the capital cost.

This is why the inference hardware market is fragmenting:

  • Google TPUs deliver 4.7x better price-performance for inference and 67% lower power consumption

  • AWS Trainium/Inferentia chips are purpose-built for inference

  • AMD Instinct accelerators are gaining traction as NVIDIA alternatives

The H100's dominance in training doesn't guarantee dominance in inference.

The hardware wars are just beginning.

The Inference Supply Chain: Cloud, Edge, and Everything In Between

Inference doesn't just happen in centralized data centers. It happens everywhere users need it.

Tier 1: Hyperscale Cloud (AWS, Azure, GCP)

  • Best for: High-volume, latency-tolerant workloads

  • Cost: $1.50-$6.98/hr per H100

  • Latency: 50-200ms

Tier 2: GPU Marketplaces (Vast.ai, RunPod, CoreWeave)

  • Best for: Cost-sensitive workloads

  • Cost: 40-60% lower than hyperscalers

  • Latency: 80-300ms

Tier 3: On-Premise / Co-located

  • Best for: Sustained workloads above 50,000 GPU-hours/month

  • Cost: Lowest per-token at scale (after upfront CapEx)

  • Latency: 10-50ms

Tier 4: Edge Inference (devices, mobile, local)

  • Best for: Latency-critical, privacy-sensitive applications

  • Cost: Highest per-token, but zero marginal cost after deployment

  • Latency: 5-20ms

The "right" deployment depends on latency requirements, not just cost.

A medical imaging AI that needs 10ms response time cannot use a remote cloud API, regardless of price.

An autonomous vehicle cannot wait for a round-trip to a data center to decide whether to brake.

The inference supply chain is stratifying by latency tier, creating distinct GPU demand for each.

What This Means for the Next Decade

The shift from training-centric to inference-centric AI infrastructure is not just a technical trend. It's an economic and geopolitical restructuring.

Here's what I'm watching:

1. Data center design shifts
Engineers are redesigning data centers to use less copper per megawatt, with:

  • Higher-voltage distribution (reduces wiring copper)

  • Liquid cooling systems (more efficient)

  • Modular designs (minimize transmission distances)

2. Geographic distribution
Training concentrates in locations with the most compute. Inference benefits from distribution to reduce latency. Expect more regional data centers closer to population centers, not fewer megaclusters.

3. The profitability crisis
Most AI companies are not profitable on inference. OpenAI reportedly loses money on each ChatGPT query. This is unsustainable. Either:

  • Prices rise (killing adoption)

  • Costs fall faster (requiring breakthrough efficiency gains)

  • Business models shift (from per-query to subscription/enterprise licensing)

4. The agent explosion
AI agents use 10-100x more tokens than simple Q&A chatbots because they:

  • Chain multiple reasoning steps

  • Query external tools and databases

  • Self-correct and iterate

  • Generate structured outputs

As AI shifts from chatbots to agents, inference costs will explode even as per-token prices fall.

5. Energy becomes the bottleneck
We're already seeing this play out:

  • Data centers delayed due to grid capacity constraints

  • Tech companies buying nuclear reactors (see Article 3)

  • Inference workloads shifted to off-peak hours to reduce electricity costs

  • Energy cost per token becoming as important as GPU cost per token

By 2028, electricity may be the dominant variable cost for inference, not hardware.

The Abundance vs. Scarcity Framework: Applied to Inference

Let's apply my core analytical framework here.

What's becoming abundant:

  • Model capability - Frontier models getting cheaper and more accessible

  • Inference optimization techniques - Quantization, caching, batching all improving

  • Hardware alternatives - More GPU options, custom silicon, edge devices

  • Developer tools - Easier to build and deploy AI applications

What's becoming scarce:

  • Profitable unit economics - Harder to make money as costs fall but competition intensifies

  • Electrical capacity - Can't build inference infrastructure faster than grid capacity

  • Low-latency compute - Edge and near-user inference capacity constrained

  • Sustainable energy - Clean power growing slower than AI power demand

  • Attention/differentiation - As every product adds AI, competitive moats erode

The collision:

We're building an abundant AI capability on top of scarce physical infrastructure.

Every efficiency gain makes AI more accessible, which increases adoption, which increases total resource consumption, which hits physical limits (energy, cooling, space, materials).

This is the inference paradox.

The better AI gets, the more we use it. The more we use it, the more it costs—in aggregate—even as per-unit costs fall.

And eventually, we hit a wall. Not a software wall. A physics wall.

You can optimize code infinitely. You can't optimize the laws of thermodynamics.

Every computation generates heat. Every watt of compute requires a watt of cooling. Every data center requires copper, concrete, steel, and rare earth elements.

The atoms matter.

And the inference economy is about to consume more atoms than any computing paradigm in history.

Second-Order Effects: What Happens When Inference Dominates

Let me walk through some non-obvious consequences of the shift to an inference-dominated AI economy:

1. The death of "AI for everything"
Right now, every SaaS product is adding AI features. But if inference costs don't fall fast enough, only high-value use cases will survive. We'll see a shakeout where:

  • Low-margin AI features get removed

  • Freemium AI products become unsustainable

  • Only applications with strong unit economics remain

2. The rise of inference-optimized models
Training will optimize for inference efficiency, not just benchmark performance. Expect:

  • Smaller, faster models that sacrifice 5% accuracy for 50% cost reduction

  • Mixture-of-Experts becoming the dominant architecture

  • Aggressive quantization and pruning as standard practice

  • Models designed specifically for edge deployment

3. The subscription model wins
Per-query pricing creates unpredictable costs for users and providers. Expect shift to:

  • Flat-rate subscriptions (ChatGPT Plus, Claude Pro)

  • Compute credits (buy tokens in bulk, use over time)

  • Enterprise site licenses (unlimited usage for fixed fee)

This shifts inference cost risk from users to providers—who can then optimize infrastructure to manage it.

4. The vertical integration of AI companies
To control inference costs, AI companies will:

  • Build their own data centers (OpenAI already doing this)

  • Design custom silicon (Google TPU, Amazon Trainium)

  • Negotiate direct power purchase agreements

  • Acquire GPU cloud providers

Inference cost control becomes a competitive moat.

5. The geopolitics of inference
Countries will compete to host inference infrastructure, not just training. Why?

  • Data sovereignty - EU, China want local inference for regulatory compliance

  • Latency - Can't serve users from the other side of the world

  • Economic capture - Inference spending is continuous, training is one-time

Expect national policies to incentivize domestic inference capacity.

6. The energy arbitrage opportunity
Inference workloads can be geographically distributed and time-shifted. This creates opportunities for:

  • Running inference in regions with cheap electricity (Iceland, Quebec, Middle East)

  • Shifting non-urgent inference to off-peak hours

  • Using curtailed renewable energy (solar/wind that would otherwise be wasted)

Energy-aware inference scheduling could reduce costs by 30-50%.

7. The privacy/cost trade-off
Cloud inference is cheap but requires sending data to third parties. Edge inference is private but expensive. This creates a market segmentation:

  • Consumer apps → Cloud inference (cheap, convenient)

  • Enterprise apps → Hybrid (cloud for non-sensitive, on-prem for sensitive)

  • Regulated industries → On-premise/edge only (healthcare, finance, defense)

Privacy regulations will increase inference costs by forcing local deployment.

Winners and Losers in the Inference Economy

WINNERS:

1. Inference-optimized hardware companies

  • NVIDIA (still dominant, but share eroding)

  • AMD (Instinct MI300 gaining inference traction)

  • Google (TPU v5 designed for inference)

  • Amazon (Trainium/Inferentia custom chips)

  • Cerebras, Groq, SambaNova (purpose-built inference accelerators)

2. GPU cloud marketplaces

  • CoreWeave (went public on inference demand)

  • Lambda Labs, Vast.ai, RunPod (40-60% cheaper than hyperscalers)

  • Crusoe Energy (using stranded gas for cheap inference power)

3. Inference optimization software

  • vLLM, SGLang, TensorRT-LLM (open-source inference frameworks)

  • Modal, Baseten, Replicate (managed inference platforms)

  • Anyscale, Together.ai (inference-as-a-service)

4. Energy providers with 24/7 capacity

  • Nuclear operators (Constellation, EDF, Cameco)

  • Natural gas utilities (unfortunately, the current default)

  • Geothermal developers (Fervo Energy, Eavor)

5. Companies with pricing power

  • OpenAI, Anthropic, Google (can charge premium for quality)

  • Vertical AI companies (healthcare, legal, finance - where accuracy >> cost)

  • Enterprise AI platforms (where cost is amortized across many users)

LOSERS:

1. Undifferentiated AI wrappers
If you're just calling OpenAI's API and adding a thin UI layer, you have:

  • No pricing power (users can go direct to OpenAI)

  • No cost advantage (you're paying retail API prices)

  • No moat

Your margins will compress to zero.

2. Training-focused infrastructure

  • H100 clusters optimized for multi-GPU training (inference doesn't need NVLink)

  • InfiniBand networking (inference uses Ethernet)

  • High-memory-bandwidth GPUs (inference is compute-bound, not memory-bound)

The infrastructure that powered the training boom is mismatched for the inference economy.

3. Freemium AI products with no path to profitability
Offering unlimited free AI usage was viable when inference was cheap and VCs were funding growth-at-all-costs. Now:

  • Inference costs are the dominant expense

  • Free users consume resources without generating revenue

  • Conversion rates to paid tiers are low (<5% typical)

Expect mass shutdowns of free AI tools or aggressive feature limitations.

4. AI companies in high-electricity-cost regions
If your inference infrastructure is in California, Germany, or Japan (expensive electricity), you're at a permanent cost disadvantage vs. competitors in:

  • Texas, Georgia, Ohio (cheap US electricity)

  • Quebec, Iceland, Norway (cheap hydropower)

  • Middle East (subsidized energy)

Geography is destiny in the inference economy.

5. Renewable-only data centers
Sounds counterintuitive, but here's why:

  • Solar/wind have 25-40% capacity factors (only generate power part-time)

  • Inference demand is 24/7

  • Battery storage is still too expensive for multi-day backup

  • Result: Either idle GPUs (terrible economics) or grid backup (often fossil fuels)

Baseload power (nuclear, geothermal, hydro) wins for inference.

The $10 Trillion Question: Can Inference Ever Be Profitable?

Let's do some uncomfortable math.

OpenAI's reported metrics (estimated):

  • ~200 million ChatGPT users (100M+ weekly active)

  • ~1-2 billion queries per day

  • Average cost: ~$0.02-0.05 per query (including infrastructure, not just API cost)

  • Daily inference cost: $20-100 million

  • Annual inference cost: $7-36 billion

Revenue:

  • ChatGPT Plus: ~10 million subscribers × $20/month = $200M/month = $2.4B/year

  • Enterprise/API: Estimated $1-3B/year

  • Total revenue: ~$3-5 billion/year

Even with aggressive assumptions, OpenAI may be losing money on inference.

And that's with the most advanced infrastructure, custom optimizations, and economies of scale that no competitor can match.

If OpenAI can't make inference profitable, who can?

Here are the only paths I see:

Path 1: Inference costs fall 10x more
Requires breakthrough innovations:

  • New model architectures (sparse, mixture-of-experts)

  • New hardware (photonic computing, analog AI chips)

  • New algorithms (speculative decoding, early exit)

Possible, but not guaranteed.

Path 2: Willingness-to-pay increases
Users/enterprises pay more because AI becomes more valuable:

  • Replaces expensive human labor (customer service, coding, analysis)

  • Enables new revenue streams (personalization, automation)

  • Creates competitive necessity (everyone needs AI to compete)

Already happening in enterprise, less clear for consumers.

Path 3: Business model shifts
Move away from per-query pricing:

  • Subscriptions (flat-rate, predictable revenue)

  • Licensing (pay per seat, not per query)

  • Bundling (AI as part of larger platform, not standalone)

Most likely path for profitability.

Path 4: Consolidation
Only 2-3 companies survive with:

  • Massive scale (billions of queries → lowest per-token cost)

  • Vertical integration (own data centers, chips, power)

  • Pricing power (quality moat or lock-in)

Everyone else becomes a customer, not a competitor.

My bet? All four happen simultaneously.

Costs fall (but not fast enough). Prices rise (but not too much). Business models shift to subscriptions. Market consolidates to a few winners.

And even then, profit margins will be thin compared to traditional software (80%+ gross margins).

The inference economy will be a volume game, not a margin game.

What You Can Do

If you're building an AI product:

  • Model your inference costs at scale - Don't wait until you're in production to discover you're losing money on every query

  • Optimize for cost-per-token, not just quality - A 5% accuracy drop that cuts costs 50% is often worth it

  • Use caching aggressively - 75-90% cost reduction for repeated inputs

  • Consider smaller models for most queries - Route easy questions to cheap models, hard questions to expensive ones

  • Shift to subscription pricing - Flat-rate plans let you optimize infrastructure without punishing heavy users

  • Negotiate volume discounts - If you're doing >1M requests/day, you should be getting 30-50% off list prices

  • Explore GPU marketplaces - Vast.ai, RunPod often 40-60% cheaper than AWS/Azure for inference

  • Monitor your token efficiency - Shorter prompts, structured outputs, and early stopping can cut costs 20-40%

If you're investing:

  • Inference infrastructure is the next wave - Training infrastructure (H100s, InfiniBand) is saturated; inference infrastructure (L40S, edge chips, distributed systems) is just starting

  • Energy is the bottleneck - Companies that solve the power problem (nuclear, geothermal, energy arbitrage) will win

  • Vertical AI beats horizontal AI - Domain-specific models with pricing power (healthcare, legal, finance) can sustain inference costs; general-purpose chatbots cannot

  • Hardware is fragmenting - NVIDIA's inference dominance is weaker than training dominance; AMD, Google, Amazon, and startups have real shots

  • Watch unit economics, not usage - A product with 10M users losing $0.10/query is worse than a product with 100K users making $1/query

If you're in policy:

  • Inference energy consumption needs regulation - Without it, AI will drive a fossil fuel buildout to meet 24/7 power demand

  • Support baseload clean energy - Nuclear, geothermal, hydro are the only realistic options for 24/7 AI power

  • Incentivize inference efficiency - Tax breaks or credits for companies that optimize inference costs

  • Require energy transparency - AI companies should disclose energy consumption per query (like nutrition labels)

  • Invest in grid infrastructure - Inference will add 100+ GW to the grid by 2030; transmission must keep pace

If you're a consumer:

  • Understand what you're paying for - "Unlimited" AI subscriptions are subsidized by light users; heavy users cost the company money

  • Expect price increases or usage caps - As inference costs become clearer, free tiers will shrink and paid tiers will add limits

  • Value privacy-preserving AI - On-device inference (Apple Intelligence, Google on-device AI) costs you nothing after purchase and keeps data local

  • Support sustainable AI - Ask companies about their energy sources; reward those using clean power

The Bottom Line

Here's the story in one paragraph:

AI's training costs are falling. AI's inference costs are falling too—but total inference spending is exploding because usage is growing faster than efficiency. By 2028, inference will consume 80-90% of AI's energy, 70%+ of AI compute, and the majority of AI infrastructure investment. This creates a paradox: the better and cheaper AI gets, the more we use it, and the more total resources it consumes. We're building an economy on a technology whose marginal costs are near-zero but whose aggregate costs are approaching the scale of entire industries.

The inference economy is here.

And it's going to reshape:

  • Energy markets (100+ GW of new 24/7 demand)

  • Data center design (distributed, latency-optimized, energy-efficient)

  • Hardware (inference-optimized chips, not training GPUs)

  • Business models (subscriptions, not per-query pricing)

  • Geopolitics (countries competing for inference infrastructure, not just training)

  • Profitability (thin margins, volume game, consolidation)

The companies that master inference economics will dominate the AI era.

The ones that don't will burn through billions and collapse, no matter how good their models are.

Because here's the hard truth:

Training a great model is impressive. Running it profitably at scale is what actually matters.

And right now, almost nobody has figured that out.

The race is on.

Sources & Further Reading

Primary Sources (Accessed March 2026):

A Final Note

This is Part 4 of the Sterling Report on AI, infrastructure, and the physical constraints of the digital economy.

Next in the series: Part 5 - "Data Center Geopolitics: Why Nations Are Fighting Over AI Infrastructure"

If this made you think, share it with one person who needs to read it.

Follow @SloneSterling on X.com for daily research on AI, energy, commodities, and the collision of abundance and scarcity.

Precision in a world of noise.

Analysis by Slone Sterling

Keep Reading