James Randall Musings on software development, business and technology.
Jensen Huang Is Training His Own Replacement

The leather jacket has no clothes.

Jensen Huang stands on stage at GTC, basking in the adulation of an audience that treats product launches like religious experiences while NVIDIA’s market cap hovers north of four trillion dollars. The narrative being pushed is simple, that Jensen is a visionary genius, that NVIDIA is the essential infrastructure of the AI revolution, and that the future belongs to GPU compute.

I think this narrative is wrong, not because NVIDIA isn’t dominant today, it obviously is, but because the very things driving that dominance are simultaneously building the machinery of its decline. The key diversification that made NVIDIA what it is today is being undone, and what’s left is a company eating its own tail — and not just with circular financing: it’s a silicon ouroboros.

Betrayal

Let’s start with where NVIDIA came from, gaming GPUs, and let’s be honest about how they’re treating that market now.

The RTX 50-series launch tells you everything. The 5080 is a 4080 Super whose headline exclusive feature is Multi Frame Generation — the insertion of AI-generated fake frames that add latency and visual artefacts — at a higher price. GamersNexus found it outperforms the 4080 Super by as little as 2.5% in some titles, with Blender benchmarks showing just 8% over the 4080 Super. The 5060 Ti ships with 8GB of VRAM in 2026 which is genuinely insulting — so much so that German retailers report the 16GB model outselling the 8GB version 16:1 and NVIDIA didn’t even send the 8GB card to reviewers.

I play on a 5080 and I’ve had personal experience of this and have spent significant time experimenting with these settings. The introduction of things like frame generation is noticeable, things just feel off.

Each generation now delivers less actual hardware improvement and more software gimmicks: DLSS, Frame Generation, Neural Shaders. These are all presented as features but they’re really admissions of failure and they exist because the raw silicon isn’t advancing fast enough to brute-force the problems they’re designed to mask and meanwhile prices go up.

But with no competitive alternative offering equivalent ray tracing performance, gamers have nowhere to go. They’re not convinced — they’re captive. And the gaming press, dependent on NVIDIA access and ad revenue, does the convincing for them, selling upscaled 1080p internal resolution at 4K output as though it were the same as native 4K. Follow independent gamers on YouTube and you’ll quickly come across people bemoaning that things now look messy, that we’ve lost the sharpness we used to have, and that modern games are running worse for little to no improved visuals.

Access Media

Why does this narrative go largely unchallenged? Because the gaming media covering GPUs is structurally compromised. As is the press covering technology in general. It all depends on access.

Digital Foundry produces technically excellent analysis, but the editorial framing is selectively applied in ways that are hard to ignore, for example in 2023, DF was publicly called out for an apparent double standard in how they covered NVIDIA’s frame generation versus AMD’s FSR3 — framing the same fundamental technology in markedly different terms depending on who made it. Their RTX 5080 DLSS 4 coverage arrived as an exclusive early preview on NVIDIA’s engineering sample hardware, before any independent testing was possible — effectively a first-look marketing vehicle presented as independent analysis. Then you’ve got the NVIDIA-sponsored content elephant in the room. DF has published multiple videos explicitly sponsored by NVIDIA. Thats when unconcious bias starts to creep in.

To be fair it’s not just Digital Foundry - this is rampant across the entire games press and it has long been a problem. I, for one, have always refused to use the word journalism with respect to games media for this reason.

There are exceptions and often today they can be found on YouTube. Gamers Nexus for example — Steve Burke seems genuinely happy to torch a relationship if the product deserves it. Both GamersNexus and Hardware Unboxed released videos detailing how NVIDIA was selectively granting access to produce favourable coverage using inflated Multi Frame Generation benchmarks. NVIDIA didn’t send reviewers the 8GB card at all — Hardware Unboxed had to buy one themselves to reveal the performance problems. Outlets like GN are the minority, and NVIDIA knows it.

The Data Centre Gold Rush

So NVIDIA has been neglecting consumers in favour of the data centre AI gold rush and that’s fair enough — there’s money to be made in them there warehouses, but it’s worth looking at how the money is flowing.

A significant portion of NVIDIA’s data centre revenue comes from companies that NVIDIA itself has invested in, who then use that capital to buy NVIDIA hardware. NVIDIA has poured money into CoreWeave, Lambda, Nebius, xAI, and even OpenAI — all of whom are major GPU customers. In January 2026 it invested another $2 billion in CoreWeave on top of a $3.3 billion existing stake, while also committing to be the buyer of last resort for any unsold CoreWeave capacity through 2032. CoreWeave, for its part, had $18.8 billion in debt obligations as of September 2025 with much of it collateralised against NVIDIA GPUs.

This is all entirely legal and unremarkable as a business practice. But it has a structural consequence: when your investors and your customers overlap this heavily, revenue growth and capital deployment become hard to distinguish.

As Ed Zitron has extensively documented, NVIDIA isn’t Enron — but the deals it’s doing with neoclouds are, in his words, “dodgy and weird and unsustainable.” NVIDIA felt compelled to leak a seven-page internal memo insisting it was nothing like Enron — which is the kind of thing you do when you’re definitely not worried about people thinking you’re like Enron. Short sellers Jim Chanos and Michael Burry weren’t convinced. Chanos warned that layering “arcane financial structures on top of these money-losing entities” is the real vulnerability and Burry flagged what he called “suspicious revenue recognition” across multiple AI companies. Bloomberg described the CoreWeave deal plainly as “the latest example of the circular financing deals that have lifted valuations of AI companies and fueled concerns about a bubble.”

Gary Marcus, the NYU cognitive scientist and long-standing AI sceptic (at least of the claims of the LLM makers), has been tracking these dynamics on his Substack since mid-2025. He called the Oracle-OpenAI deal “peak bubble” and warned that the industry had entered “peak musical chairs.” In a more recent piece, he argued that NVIDIA’s stock plateau — up 1200% over five years but essentially flat for six months — marked the point at which Wall Street began losing confidence, driven in part by concerns about circular financing and the profitability of LLM companies. Marcus and Zitron have been two of the most persistent voices making this case while much of the financial press and many analysts were still writing breathless coverage.

This doesn’t make NVIDIA’s revenue fraudulent, it makes it fragile. Analyst models that project these growth rates forward are treating circular capital flows as though they represent independent, end-user-driven demand. And it’s worth remembering that many of the businesses analysts work for generate revenue from fees: they’re not inclined to be too critical. When the AI investment cycle corrects — and it will, because cycles always do — the revenue that was never anchored to external demand will be the first to evaporate.

An Emerging Threat

Meanwhile, the real threat to NVIDIA’s data centre dominance is emerging from below. Purpose-built ASICs for AI inference are starting to compete on price-performance — and in some cases, they’re not just competing, they’re winning.

According to TrendForce, custom ASIC shipments from cloud providers are projected to grow 44% in 2026, while GPU shipments grow at just 16%, and the share of ASICs in AI servers is expected to jump from around 21% in 2025 to nearly 28% this year. Some predict NVIDIA could fall from over 90% inference market share to 20-30% by 2028 as ASICs take over production inference workloads, thats a heck of a fall but even a less severe drop than that is going to cause NVIDIA issues.

And there’s recent precedent for this. During the cryptocurrency boom, miners bought GPUs in bulk for proof-of-work hashing — until purpose-built ASICs arrived that were orders of magnitude more efficient at the same task. GPUs became uncompetitive almost overnight and NVIDIA was left with excess inventory. The stock fell over 50% from its October 2018 peak while gaming revenue nearly halved in a single quarter. Jensen called it a “crypto hangover.” The pattern is straightforward: when a workload becomes well-defined enough to justify custom silicon, general-purpose hardware loses. AI inference is reaching exactly that threshold now. His “ASIC hangover” could be the stuff of nightmares.

The specific challengers tell the story. Google’s TPU Ironwood (7th generation) is considered technically on par with or superior to NVIDIA’s GPUs by some experts, including Chris Miller, author of Chip War. Anthropic trains its most advanced models on up to one million Google TPUs — not NVIDIA GPUs. Amazon is filling data centres with its own Trainium2 chips. OpenAI has committed to deploying 10 gigawatts of custom Broadcom ASICs starting in 2026 while Cerebras’ wafer-scale engine delivers inference at over 6× the speed of Groq’s LPU, which itself was already dramatically outperforming NVIDIA hardware on key benchmarks. SambaNova claims 16 of its chips can replace 320 GPUs for serving a 671-billion-parameter model.

Perhaps the most telling data point is what NVIDIA did in response. In late 2025, it paid $20 billion to acqui-hire Groq — the inference startup founded by one of the original architects of Google’s TPU. Groq’s Language Processing Units were delivering 2-3× speedups over NVIDIA hardware on inference benchmarks, and both AMD and Intel were reportedly bidding aggressively for the company. NVIDIA’s move was widely characterised as defensive and based on neutralising an emerging threat rather than buying growth. When you spend $20 billion to absorb a competitor whose entire value proposition is that they’re faster than your products, then that’s not a position of strength: that’s weakness.

The standard counter-argument to this is CUDA lock-in, that every ML team thinks in CUDA, every codebase is coupled to it, and the institutional cost of switching is enormous. This is a genuinely strong defence — or at least it was - until AI began collapsing technical moats.

Oh The Irony

The technology driving NVIDIA’s revenue boom — large language models — is also the thing that neutralises NVIDIA’s deepest competitive moat.

The CUDA lock-in argument rests on the assumption that migrating millions of lines of GPU-optimised code is prohibitively expensive and time-consuming as porting meant humans manually rewriting and retesting everything. Thats a huge job.

But if you can point an LLM at a CUDA codebase and say “port this to ROCm” or “retarget this for our custom ASIC instruction set” — and get the vast majority of the way there in days rather than months — the switching cost argument collapses. The economic calculation changes dramatically when migration drops from “eighteen months” to “a few weeks of validation and tuning.”

And the companies best positioned to do this are the exact hyperscalers who are also building or commissioning ASICs. Google, Amazon, and Microsoft — they all have both the AI capability to automate the migration and the strategic incentive to break free from NVIDIA dependency.

NVIDIA is selling the tools that will be used to escape its own ecosystem.

Why CUDA Is the Perfect Target

This isn’t hand-wavy speculation about AI maybe being able to port code someday, CUDA is almost comically well-suited to automated translation.

Every CUDA kernel is a pure function with explicit inputs, explicit outputs, and no hidden state. There’s no spooky action at a distance — no global mutable state leaking between calls, no side effects you need to trace through a dependency graph. The contract is right there in the function signature and so an agent can look at a single kernel in isolation, understand exactly what it does, rewrite it for a different target, and verify the output without needing to comprehend the entire codebase.

And the verification story is perfect for agentic iteration loops as you have deterministic numerical inputs and outputs, so you can generate test cases from the CUDA version, run them on the ported version, and diff the results automatically. An agent doesn’t need to understand the mathematics — it just needs to confirm that the same inputs produce the same outputs within tolerance. Heck you can even firewall the agent writing the new code from the agent writing its test. That’s a tight, automatable feedback loop with no human judgement required.

But the real killer is the parallelisation. A CUDA codebase might contain thousands of kernels, but they’re largely independent units. So you spin up an orchestrator agent that inventories the codebase and builds a dependency graph and it fans out to N worker agents, each handling a kernel or module. Each worker rewrites its target, generates tests, and iterates until the output matches. A validation agent runs integration tests on the assembled result. The whole pipeline is embarrassingly parallel — the same property that made the code suitable for GPUs in the first place makes it suitable for parallel agentic translation and the hyperscalers are literally built for this.

Now, the obvious objection is that CUDA isn’t just kernels there’s cuDNN, TensorRT, Nsight, NCCL, Thrust — an entire ecosystem of libraries, profiling tools, and multi-GPU communication primitives that teams have built years of workflow around. And all that is true but its a dependency graph and not magic. These libraries are themselves composed of well-documented APIs with known input-output contracts. The migration challenge is real but it’s an engineering problem with a finite surface area and not an open research question.

And the hyperscalers aren’t starting from scratch — Google’s JAX ecosystem, AMD’s ROCm stack, and Intel’s oneAPI are all mature enough that the target platforms already have equivalents for most of this tooling. The gap isn’t “does an alternative exist” anymore, it’s “is the switching cost worth it” — and that cost is falling off a cliff precisely because the models NVIDIA’s hardware trained are now capable enough to automate the tedious parts of the migration.

They’re just as vulnerable to the automation of software development as the rest of us and with every quarter that passes, the moat gets shallower. And the hyperscalers have very very very big pumps.

The Narrative Arc

NVIDIA have a pretty simple story if you sum it up. Jensen built a great gaming GPU company and while 3dfx were flailing around he delivered great products that gamers wanted and consolidated his position with acquisition. Recognising the lack of diversity was a risk and looking for ways to make GPUs more broadly useful he diversified into data centre compute, a smart and necessary move, but not the stroke of genius the press portrays. Solid execution. Business school 101. And then the transformer revolution landed in his lap, LLMs became the new hot thing, and his interesting GPU company became the behemoth we know today.

Being in the right place at the right time with the right product isn’t the same as having engineered the entire outcome, but the leather jacket mythology requires a visionary, and the tech press love a messianic story, so that’s what we got.

Now trace the arc forward. Gaming company becomes compute company becomes AI company becomes victim of AI.

The diversification that saved NVIDIA from being just a gaming company is collapsing back into a single dependency — data centre AI revenue — that is simultaneously propped up by a form of circular financing and threatened by the very technology it enables. The customers buying the GPUs are using those GPUs to train the models that will make it trivially cheap to migrate away from NVIDIA’s ecosystem onto cheaper, faster, custom silicon.

It’s the Ouroboros business model. The snake is going to eat its own tail, except the tail is a four-trillion-dollar market cap.

The Intel Parallel

The historical parallel is Intel in the mid 2010s, the had absolute market dominance and no real competition. AMD had been written off and so they got lazy and extractive delivering incremental improvements with premium pricing, because where were you going to go? Everybody bought Intel. Then AMD came back with Zen and the whole thing unravelled faster than anyone expected. Look at Intel today.

NVIDIA is arguably more entrenched, but the dynamics are in many ways similar and the arrogance that comes from unchallenged dominance eventually creates the opening for someone else. Whether that’s ASICs eating the data centre business, AMD getting serious about RT, or Intel maturing their architecture — something will crack. And although perhaps more entrenched their dediversification (is that even a word?) makes them extremely vulnerable - they have a single product and a handful of customers.

Its not a question of whether NVIDIA’s position is vulnerable - it is. Its got a single product line, a handful of customers with enormous leverage, and cheaper more performant alternatives emerging. It’s whether Jensen recognises it before the correction arrives but the GTC keynotes suggest a man who has started to believe his own mythology, and that’s usually when theres a fall.

You can see it in how he handles pressure. Just recently, the $100 billion OpenAI infrastructure deal — that was announced with great fanfare in September 2025 — quietly collapsed to $30 billion. The deal had been in trouble for months: NVIDIA’s own quarterly filings warned there was “no assurance” it would be completed and Jensen himself fell back on this when challenged. The Wall Street Journal reported that Jensen had been privately criticising OpenAI’s business approach while the deal was supposedly “on track.” When the WSJ first reported the deal was stalling, Jensen called it “nonsense.”. And yet weeks later, it was confirmed. Meanwhile, reports emerged that OpenAI was unhappy with NVIDIA’s inference capabilities and had been blaming weaknesses in its Codex product on NVIDIA hardware.

MIT Sloan professor Michael Cusumano described the original $100 billion arrangement to the Financial Times as “kind of a wash” — NVIDIA invests $100 billion in OpenAI stock, OpenAI spends $100 billion on NVIDIA chips. As TechCrunch noted, Jensen’s stated reason for pulling back — that OpenAI’s upcoming IPO closes the window — doesn’t square with how late-stage private investing actually works.

This is not the behaviour of someone operating from a position of strength. Dismissing credible reporting as nonsense, then being proven wrong. Blaming the other party when a deal falls through. Offering explanations that don’t withstand scrutiny. These are the tells of someone who feels the ground shifting and doesn’t like it.

NVIDIA’s pricing confidence tells you everything about how they see the competitive landscape. They believe there’s nowhere else to go. They’ve gotten comfortable in a market where that has recently been the case. History suggests that this kind of belief is the beginning of the end and while NVIDIAs rise to its present height has been meteoric its possible its fall will be just as swift. And for gamers like me, not unwelcome.

Built by James Randall — tool-maker, system builder, and occasional cyclist. Walking the hills with my four-legged friend when I'm not building worlds.
© 2025