Here's a number that should make you stop and think:
80-90% of AI's lifetime costs come from inference, not training.
Let me say that again, because most people—including most investors, most tech executives, and most policymakers—don't understand this yet.
Training a frontier AI model is expensive. GPT-4 reportedly cost over $100 million to train. DeepSeek's models cost a fraction of that, but still millions. These numbers make headlines. They're big, scary, impressive.
But they're also one-time costs.
Inference—the process of actually using the model, of serving billions of queries to millions of users—is a continuous cost. And it scales with adoption.
The more successful your AI product, the more expensive it becomes to run.
This is the hidden economics of AI that nobody talks about. The inference economy. And it's about to become the single largest driver of data center buildout, energy consumption, and infrastructure investment over the next decade.
Here's why: inference now accounts for 55-65% of all AI compute, up from 33% in 2023. By 2028, it will hit 70-80%. The inference-optimized chip market alone is projected to exceed $50 billion in 2026.
And unlike training—which you do once per model version—inference happens every single time someone uses your product.
Every ChatGPT query. Every Midjourney image. Every Claude code completion. Every AI agent action.
That's inference.
And the bill never stops coming.
The $200-to-$10,000 Problem
Let me tell you a story that perfectly captures the inference trap.
A construction company built an AI predictive analytics tool. During development, it cost them under $200/month. Reasonable. Manageable.
Then they launched it to actual users.
First month's bill? $10,000.
They weren't hacked. They didn't misconfigure anything. Their users just… used it. And usage at scale is 40-60% more expensive than most teams estimate when moving from development to production.
Here's the math that kills startups:
Development testing: Hundreds of queries per day
Production usage: Millions of queries per day
The pricing that looked reasonable in testing—$3 per million input tokens, $10 per million output tokens for GPT-4 (now succeeded by GPT-5.2)—suddenly becomes catastrophic at scale.
And here's the kicker: output tokens cost 2-5x more than input tokens because generation is computationally expensive.
So that innocent-looking chatbot that helps customers with support tickets? Every response it generates is burning money. The longer and more helpful the response, the more it costs.
Monthly production costs scale from:
$10-50 for personal projects
$1,000+ for small business applications
$10,000-100,000+ for enterprise applications
Millions for consumer-scale products
And this is just the API fees. If you're running your own infrastructure, you're paying for GPUs, electricity, cooling, and data center space 24/7, whether anyone is using your model or not.
The 1,000x Cost Collapse (That Still Isn't Cheap Enough)
Now, here's where it gets interesting.
Inference costs have dropped 1,000x in just three years.
In late 2022, running a GPT-4-class model cost approximately $20 per million tokens. In early 2026, equivalent performance costs $0.40 per million tokens—or less with providers like DeepSeek.
That's one of the fastest cost declines in computing history.
What drove this collapse?
1. Hardware efficiency gains
Each GPU generation delivers 2-3x more inference throughput per dollar. The H100 processes roughly 3x more tokens per second than the A100 at a similar price point.
2. Software optimization
Inference frameworks like vLLM, TensorRT-LLM, and SGLang improved GPU utilization from 30-40% to 70-80% through continuous batching, PagedAttention, and speculative decoding.
3. Model architecture efficiency
Mixture-of-Experts (MoE) models like Mixtral and DeepSeek V3 activate only a fraction of total parameters per token, delivering frontier-quality output at 3-5x lower compute cost.
4. Quantization and distillation
Running models at INT8 or INT4 precision reduces memory and compute requirements by 2-4x with minimal quality loss.
But here's the paradox: even with a 1,000x cost reduction, inference is still too expensive for most use cases to be profitable.
Why? Because demand is growing faster than costs are falling.
The Jevons Paradox of AI
There's an economic principle called the Jevons Paradox: when you make something more efficient, people use more of it, not less.
Coal-fired steam engines got more efficient in the 1800s. Did coal consumption fall? No. It exploded—because suddenly steam power was economical for a thousand new uses.
The same thing is happening with AI inference.
At $20 per million tokens, only the highest-value enterprise applications justified LLM deployment.
At $0.40 per million tokens, every SaaS product, internal tool, and consumer app can embed AI.
The addressable market expands by orders of magnitude.
So even though per-token costs are falling, total inference spending is growing. And it's growing fast.
Total GPU-hours consumed for inference are increasing, supporting rental rates on GPU marketplaces. The inference market is projected to exceed $50 billion in 2026, growing faster than training for the first time.
This is the inference economy.
And it's going to consume more electricity, more GPUs, more data center space, and more capital than training ever did.
The Energy Math That Doesn't Work
Let's talk about what this means for energy.
A single ChatGPT query uses roughly 10 times more electricity than a Google search according to IEA estimates.
Doesn't sound like much, right?
Now multiply that by billions of queries per day.
ChatGPT alone handles an estimated 1+ billion queries daily, leading to about 300 megawatt-hours (MWh) of electricity consumption per day and over 260,000 kilograms of CO₂ emissions per month from ChatGPT use alone.
And that's just one AI product.
Add in:
Google Gemini (hundreds of millions of users)
Microsoft Copilot (integrated into Office for 400+ million users)
Claude, Midjourney, Stable Diffusion, and dozens of other consumer AI products
Enterprise AI applications (customer service, coding assistants, analytics)
AI agents (which generate 5-50x more tokens per task than simple Q&A)
And you start to see the scale of the problem.
By 2028, over half of data center electricity will be used for AI, with inference accounting for 80-90% of that.
At that point, AI alone could consume as much electricity annually as 22% of all US households.
And here's the kicker: the energy used for inference is growing faster than renewable energy capacity.
Data centers now use electricity that is 48% more carbon-intensive than the US average, because they're increasingly powered by natural gas to meet immediate demand.
The math doesn't work.
We're building an economy on top of a technology whose energy requirements are growing exponentially, while our ability to provide clean energy for it is growing linearly.
Why Inference Is Different From Training
Let me break down why inference economics are fundamentally different from training economics:
Factor | Training | Inference |
|---|---|---|
Workload Duration | Days/weeks per run | Continuous 24/7 |
Lifetime Cost Share | 10-20% | 80-90% |
Scaling Pattern | Predictable | Variable demand |
Hardware Utilization | High (batch) | Variable (request-driven) |
Optimization Focus | Time-to-train | Cost-per-token |
Competitive Landscape | NVIDIA dominant | More alternatives viable |
Training is CapEx. You spend a lot upfront, but it's a known, finite cost.
Inference is OpEx. It's a continuous burn that scales with success.
And here's the brutal reality: if your inference costs scale faster than your revenue, you go bankrupt.
This is why so many AI startups are struggling. They built products that users love, usage is growing, and they're… losing money on every query.
The unit economics don't work.
The Cost-Per-Token War
This brings us to the most important metric in AI today: cost-per-token.
If your inference cost per million tokens is $1.00 and you charge $2.00, your gross margin is 50%.
If inference costs drop to $0.40 per million tokens, the same pricing yields 80% margin—or you can cut prices to grow users.
This is why a price war is brewing.
DeepSeek's models cost 20-50x less than OpenAI's equivalent, with inference at roughly $0.14 per million input tokens and $0.28 per million output tokens for DeepSeek-V3.
Compare that to:
GPT-4 (deprecated): $3.00 / $10.00 per million tokens
Claude 3.5 Sonnet (deprecated): $3.00 / $15.00 per million tokens
Gemini 1.5 Pro (deprecated): $1.25 / $5.00 per million tokens
DeepSeek is 90% cheaper.
How? Through a combination of:
Mixture-of-Experts architecture (only 37B of 671B parameters active per query)
Aggressive quantization (4-bit and 8-bit precision)
Context caching (reusing repeated inputs, cutting costs by 75-90%)
Open-source distribution (allowing self-hosting with no API fees)
Subsidized pricing (possibly loss-leading to gain market share)
And here's the thing: OpenAI, Google, and Anthropic can't ignore this.
If a competitor offers 90% lower costs with comparable quality, enterprise customers will switch. Developers will switch. Consumers will switch.
So the incumbents have two choices:
Cut prices (destroying margins)
Increase efficiency (requiring massive R&D and infrastructure investment)
Most likely, they'll do both.
This is the inference cost war. And it's just beginning.
The Hardware Shift: Why H100s Are the Wrong Tool
Here's something most people don't understand: the H100 is optimized for training, not inference.
The H100 dominates AI training because training requires:
Maximum memory bandwidth
Inter-GPU communication (NVLink)
FP16/BF16 compute precision
Inference has different requirements:
Low latency (respond in milliseconds)
High throughput (serve many requests simultaneously)
Cost efficiency (maximize queries per dollar)
And for those requirements, the H100 is often overkill.
An L40S delivers comparable cost-per-token at one-quarter the price:
GPU | MSRP | Inference Throughput (Llama 70B) | Cost per 1M Tokens |
|---|---|---|---|
H100 80GB | $30,000 | ~2,800 tokens/sec | $0.30 |
L40S 48GB | $7,500 | ~1,200 tokens/sec | $0.18 |
L4 24GB | $2,500 | ~400 tokens/sec | $0.17 |
A100 80GB | $10,000 (used) | ~1,600 tokens/sec | $0.17 |
For pure inference workloads, the L40S is the better choice. You get similar cost-per-token at a fraction of the capital cost.
This is why the inference hardware market is fragmenting:
Google TPUs deliver 4.7x better price-performance for inference and 67% lower power consumption
AWS Trainium/Inferentia chips are purpose-built for inference
AMD Instinct accelerators are gaining traction as NVIDIA alternatives
The H100's dominance in training doesn't guarantee dominance in inference.
The hardware wars are just beginning.
The Inference Supply Chain: Cloud, Edge, and Everything In Between
Inference doesn't just happen in centralized data centers. It happens everywhere users need it.
Tier 1: Hyperscale Cloud (AWS, Azure, GCP)
Best for: High-volume, latency-tolerant workloads
Cost: $1.50-$6.98/hr per H100
Latency: 50-200ms
Tier 2: GPU Marketplaces (Vast.ai, RunPod, CoreWeave)
Best for: Cost-sensitive workloads
Cost: 40-60% lower than hyperscalers
Latency: 80-300ms
Tier 3: On-Premise / Co-located
Best for: Sustained workloads above 50,000 GPU-hours/month
Cost: Lowest per-token at scale (after upfront CapEx)
Latency: 10-50ms
Tier 4: Edge Inference (devices, mobile, local)
Best for: Latency-critical, privacy-sensitive applications
Cost: Highest per-token, but zero marginal cost after deployment
Latency: 5-20ms
The "right" deployment depends on latency requirements, not just cost.
A medical imaging AI that needs 10ms response time cannot use a remote cloud API, regardless of price.
An autonomous vehicle cannot wait for a round-trip to a data center to decide whether to brake.
The inference supply chain is stratifying by latency tier, creating distinct GPU demand for each.
What This Means for the Next Decade
The shift from training-centric to inference-centric AI infrastructure is not just a technical trend. It's an economic and geopolitical restructuring.
Here's what I'm watching:
1. Data center design shifts
Engineers are redesigning data centers to use less copper per megawatt, with:
Higher-voltage distribution (reduces wiring copper)
Liquid cooling systems (more efficient)
Modular designs (minimize transmission distances)
2. Geographic distribution
Training concentrates in locations with the most compute. Inference benefits from distribution to reduce latency. Expect more regional data centers closer to population centers, not fewer megaclusters.
3. The profitability crisis
Most AI companies are not profitable on inference. OpenAI reportedly loses money on each ChatGPT query. This is unsustainable. Either:
Prices rise (killing adoption)
Costs fall faster (requiring breakthrough efficiency gains)
Business models shift (from per-query to subscription/enterprise licensing)
4. The agent explosion
AI agents use 10-100x more tokens than simple Q&A chatbots because they:
Chain multiple reasoning steps
Query external tools and databases
Self-correct and iterate
Generate structured outputs
As AI shifts from chatbots to agents, inference costs will explode even as per-token prices fall.
5. Energy becomes the bottleneck
We're already seeing this play out:
Data centers delayed due to grid capacity constraints
Tech companies buying nuclear reactors (see Article 3)
Inference workloads shifted to off-peak hours to reduce electricity costs
Energy cost per token becoming as important as GPU cost per token
By 2028, electricity may be the dominant variable cost for inference, not hardware.
The Abundance vs. Scarcity Framework: Applied to Inference
Let's apply my core analytical framework here.
What's becoming abundant:
Model capability - Frontier models getting cheaper and more accessible
Inference optimization techniques - Quantization, caching, batching all improving
Hardware alternatives - More GPU options, custom silicon, edge devices
Developer tools - Easier to build and deploy AI applications
What's becoming scarce:
Profitable unit economics - Harder to make money as costs fall but competition intensifies
Electrical capacity - Can't build inference infrastructure faster than grid capacity
Low-latency compute - Edge and near-user inference capacity constrained
Sustainable energy - Clean power growing slower than AI power demand
Attention/differentiation - As every product adds AI, competitive moats erode
The collision:
We're building an abundant AI capability on top of scarce physical infrastructure.
Every efficiency gain makes AI more accessible, which increases adoption, which increases total resource consumption, which hits physical limits (energy, cooling, space, materials).
This is the inference paradox.
The better AI gets, the more we use it. The more we use it, the more it costs—in aggregate—even as per-unit costs fall.
And eventually, we hit a wall. Not a software wall. A physics wall.
You can optimize code infinitely. You can't optimize the laws of thermodynamics.
Every computation generates heat. Every watt of compute requires a watt of cooling. Every data center requires copper, concrete, steel, and rare earth elements.
The atoms matter.
And the inference economy is about to consume more atoms than any computing paradigm in history.
Second-Order Effects: What Happens When Inference Dominates
Let me walk through some non-obvious consequences of the shift to an inference-dominated AI economy:
1. The death of "AI for everything"
Right now, every SaaS product is adding AI features. But if inference costs don't fall fast enough, only high-value use cases will survive. We'll see a shakeout where:
Low-margin AI features get removed
Freemium AI products become unsustainable
Only applications with strong unit economics remain
2. The rise of inference-optimized models
Training will optimize for inference efficiency, not just benchmark performance. Expect:
Smaller, faster models that sacrifice 5% accuracy for 50% cost reduction
Mixture-of-Experts becoming the dominant architecture
Aggressive quantization and pruning as standard practice
Models designed specifically for edge deployment
3. The subscription model wins
Per-query pricing creates unpredictable costs for users and providers. Expect shift to:
Flat-rate subscriptions (ChatGPT Plus, Claude Pro)
Compute credits (buy tokens in bulk, use over time)
Enterprise site licenses (unlimited usage for fixed fee)
This shifts inference cost risk from users to providers—who can then optimize infrastructure to manage it.
4. The vertical integration of AI companies
To control inference costs, AI companies will:
Build their own data centers (OpenAI already doing this)
Design custom silicon (Google TPU, Amazon Trainium)
Negotiate direct power purchase agreements
Acquire GPU cloud providers
Inference cost control becomes a competitive moat.
5. The geopolitics of inference
Countries will compete to host inference infrastructure, not just training. Why?
Data sovereignty - EU, China want local inference for regulatory compliance
Latency - Can't serve users from the other side of the world
Economic capture - Inference spending is continuous, training is one-time
Expect national policies to incentivize domestic inference capacity.
6. The energy arbitrage opportunity
Inference workloads can be geographically distributed and time-shifted. This creates opportunities for:
Running inference in regions with cheap electricity (Iceland, Quebec, Middle East)
Shifting non-urgent inference to off-peak hours
Using curtailed renewable energy (solar/wind that would otherwise be wasted)
Energy-aware inference scheduling could reduce costs by 30-50%.
7. The privacy/cost trade-off
Cloud inference is cheap but requires sending data to third parties. Edge inference is private but expensive. This creates a market segmentation:
Consumer apps → Cloud inference (cheap, convenient)
Enterprise apps → Hybrid (cloud for non-sensitive, on-prem for sensitive)
Regulated industries → On-premise/edge only (healthcare, finance, defense)
Privacy regulations will increase inference costs by forcing local deployment.
Winners and Losers in the Inference Economy
WINNERS:
1. Inference-optimized hardware companies
NVIDIA (still dominant, but share eroding)
AMD (Instinct MI300 gaining inference traction)
Google (TPU v5 designed for inference)
Amazon (Trainium/Inferentia custom chips)
Cerebras, Groq, SambaNova (purpose-built inference accelerators)
2. GPU cloud marketplaces
CoreWeave (went public on inference demand)
Lambda Labs, Vast.ai, RunPod (40-60% cheaper than hyperscalers)
Crusoe Energy (using stranded gas for cheap inference power)
3. Inference optimization software
vLLM, SGLang, TensorRT-LLM (open-source inference frameworks)
Modal, Baseten, Replicate (managed inference platforms)
Anyscale, Together.ai (inference-as-a-service)
4. Energy providers with 24/7 capacity
Nuclear operators (Constellation, EDF, Cameco)
Natural gas utilities (unfortunately, the current default)
Geothermal developers (Fervo Energy, Eavor)
5. Companies with pricing power
OpenAI, Anthropic, Google (can charge premium for quality)
Vertical AI companies (healthcare, legal, finance - where accuracy >> cost)
Enterprise AI platforms (where cost is amortized across many users)
LOSERS:
1. Undifferentiated AI wrappers
If you're just calling OpenAI's API and adding a thin UI layer, you have:
No pricing power (users can go direct to OpenAI)
No cost advantage (you're paying retail API prices)
No moat
Your margins will compress to zero.
2. Training-focused infrastructure
H100 clusters optimized for multi-GPU training (inference doesn't need NVLink)
InfiniBand networking (inference uses Ethernet)
High-memory-bandwidth GPUs (inference is compute-bound, not memory-bound)
The infrastructure that powered the training boom is mismatched for the inference economy.
3. Freemium AI products with no path to profitability
Offering unlimited free AI usage was viable when inference was cheap and VCs were funding growth-at-all-costs. Now:
Inference costs are the dominant expense
Free users consume resources without generating revenue
Conversion rates to paid tiers are low (<5% typical)
Expect mass shutdowns of free AI tools or aggressive feature limitations.
4. AI companies in high-electricity-cost regions
If your inference infrastructure is in California, Germany, or Japan (expensive electricity), you're at a permanent cost disadvantage vs. competitors in:
Texas, Georgia, Ohio (cheap US electricity)
Quebec, Iceland, Norway (cheap hydropower)
Middle East (subsidized energy)
Geography is destiny in the inference economy.
5. Renewable-only data centers
Sounds counterintuitive, but here's why:
Solar/wind have 25-40% capacity factors (only generate power part-time)
Inference demand is 24/7
Battery storage is still too expensive for multi-day backup
Result: Either idle GPUs (terrible economics) or grid backup (often fossil fuels)
Baseload power (nuclear, geothermal, hydro) wins for inference.
The $10 Trillion Question: Can Inference Ever Be Profitable?
Let's do some uncomfortable math.
OpenAI's reported metrics (estimated):
~200 million ChatGPT users (100M+ weekly active)
~1-2 billion queries per day
Average cost: ~$0.02-0.05 per query (including infrastructure, not just API cost)
Daily inference cost: $20-100 million
Annual inference cost: $7-36 billion
Revenue:
ChatGPT Plus: ~10 million subscribers × $20/month = $200M/month = $2.4B/year
Enterprise/API: Estimated $1-3B/year
Total revenue: ~$3-5 billion/year
Even with aggressive assumptions, OpenAI may be losing money on inference.
And that's with the most advanced infrastructure, custom optimizations, and economies of scale that no competitor can match.
If OpenAI can't make inference profitable, who can?
Here are the only paths I see:
Path 1: Inference costs fall 10x more
Requires breakthrough innovations:
New model architectures (sparse, mixture-of-experts)
New hardware (photonic computing, analog AI chips)
New algorithms (speculative decoding, early exit)
Possible, but not guaranteed.
Path 2: Willingness-to-pay increases
Users/enterprises pay more because AI becomes more valuable:
Replaces expensive human labor (customer service, coding, analysis)
Enables new revenue streams (personalization, automation)
Creates competitive necessity (everyone needs AI to compete)
Already happening in enterprise, less clear for consumers.
Path 3: Business model shifts
Move away from per-query pricing:
Subscriptions (flat-rate, predictable revenue)
Licensing (pay per seat, not per query)
Bundling (AI as part of larger platform, not standalone)
Most likely path for profitability.
Path 4: Consolidation
Only 2-3 companies survive with:
Massive scale (billions of queries → lowest per-token cost)
Vertical integration (own data centers, chips, power)
Pricing power (quality moat or lock-in)
Everyone else becomes a customer, not a competitor.
My bet? All four happen simultaneously.
Costs fall (but not fast enough). Prices rise (but not too much). Business models shift to subscriptions. Market consolidates to a few winners.
And even then, profit margins will be thin compared to traditional software (80%+ gross margins).
The inference economy will be a volume game, not a margin game.
What You Can Do
If you're building an AI product:
Model your inference costs at scale - Don't wait until you're in production to discover you're losing money on every query
Optimize for cost-per-token, not just quality - A 5% accuracy drop that cuts costs 50% is often worth it
Use caching aggressively - 75-90% cost reduction for repeated inputs
Consider smaller models for most queries - Route easy questions to cheap models, hard questions to expensive ones
Shift to subscription pricing - Flat-rate plans let you optimize infrastructure without punishing heavy users
Negotiate volume discounts - If you're doing >1M requests/day, you should be getting 30-50% off list prices
Explore GPU marketplaces - Vast.ai, RunPod often 40-60% cheaper than AWS/Azure for inference
Monitor your token efficiency - Shorter prompts, structured outputs, and early stopping can cut costs 20-40%
If you're investing:
Inference infrastructure is the next wave - Training infrastructure (H100s, InfiniBand) is saturated; inference infrastructure (L40S, edge chips, distributed systems) is just starting
Energy is the bottleneck - Companies that solve the power problem (nuclear, geothermal, energy arbitrage) will win
Vertical AI beats horizontal AI - Domain-specific models with pricing power (healthcare, legal, finance) can sustain inference costs; general-purpose chatbots cannot
Hardware is fragmenting - NVIDIA's inference dominance is weaker than training dominance; AMD, Google, Amazon, and startups have real shots
Watch unit economics, not usage - A product with 10M users losing $0.10/query is worse than a product with 100K users making $1/query
If you're in policy:
Inference energy consumption needs regulation - Without it, AI will drive a fossil fuel buildout to meet 24/7 power demand
Support baseload clean energy - Nuclear, geothermal, hydro are the only realistic options for 24/7 AI power
Incentivize inference efficiency - Tax breaks or credits for companies that optimize inference costs
Require energy transparency - AI companies should disclose energy consumption per query (like nutrition labels)
Invest in grid infrastructure - Inference will add 100+ GW to the grid by 2030; transmission must keep pace
If you're a consumer:
Understand what you're paying for - "Unlimited" AI subscriptions are subsidized by light users; heavy users cost the company money
Expect price increases or usage caps - As inference costs become clearer, free tiers will shrink and paid tiers will add limits
Value privacy-preserving AI - On-device inference (Apple Intelligence, Google on-device AI) costs you nothing after purchase and keeps data local
Support sustainable AI - Ask companies about their energy sources; reward those using clean power
The Bottom Line
Here's the story in one paragraph:
AI's training costs are falling. AI's inference costs are falling too—but total inference spending is exploding because usage is growing faster than efficiency. By 2028, inference will consume 80-90% of AI's energy, 70%+ of AI compute, and the majority of AI infrastructure investment. This creates a paradox: the better and cheaper AI gets, the more we use it, and the more total resources it consumes. We're building an economy on a technology whose marginal costs are near-zero but whose aggregate costs are approaching the scale of entire industries.
The inference economy is here.
And it's going to reshape:
Energy markets (100+ GW of new 24/7 demand)
Data center design (distributed, latency-optimized, energy-efficient)
Hardware (inference-optimized chips, not training GPUs)
Business models (subscriptions, not per-query pricing)
Geopolitics (countries competing for inference infrastructure, not just training)
Profitability (thin margins, volume game, consolidation)
The companies that master inference economics will dominate the AI era.
The ones that don't will burn through billions and collapse, no matter how good their models are.
Because here's the hard truth:
Training a great model is impressive. Running it profitably at scale is what actually matters.
And right now, almost nobody has figured that out.
The race is on.
Sources & Further Reading
Primary Sources (Accessed March 2026):
ByteIota: "AI Inference Costs 55% of Cloud Spending in 2026" - https://byteiota.com/ai-inference-costs-55-of-cloud-spending-in-2026/
GPUNex: "AI Inference Economics 2026: Complete Analysis" - https://www.gpunex.com/blog/ai-inference-economics-2026/
MIT News: "Explained: Generative AI's Environmental Impact" - https://news.mit.edu/2025/explained-generative-ai-environmental-impact-0117
Carbon Credits: "ChatGPT vs Claude AI Carbon Footprints, Pentagon Deal and Energy Impact" - https://carboncredits.com/chatgpt-vs-claude-ai-carbon-footprints-pentagon-deal-and-energy-impact/
Reuters: "How Chinese Startup DeepSeek Has Shaken AI World" - https://www.reuters.com/technology/artificial-intelligence/how-chinese-startup-deepseek-has-shaken-ai-world-2025-01-28/
AIM Research: "AI Energy Consumption: Statistics & Trends" - https://aimultiple.com/ai-energy-consumption
Introl: "AI Inference vs Training: Infrastructure Economics Diverging" - https://introl.com/blog/ai-inference-vs-training-infrastructure-economics-diverging
A Final Note
This is Part 4 of the Sterling Report on AI, infrastructure, and the physical constraints of the digital economy.
Next in the series: Part 5 - "Data Center Geopolitics: Why Nations Are Fighting Over AI Infrastructure" |
If this made you think, share it with one person who needs to read it.
Follow @SloneSterling on X.com for daily research on AI, energy, commodities, and the collision of abundance and scarcity.
Precision in a world of noise.

Analysis by Slone Sterling
