Thoughts after reading the DeepSeek V4 paper:
- NVIDIA really is something else. Remember how back in 2024 people were bashing Blackwell as overspec'd and dismissing FP4 as just marketing? Turns out it was all groundwork for the next generation of models. Maybe NVIDIA's moat is its ability to anticipate the trajectory of mainstream LLM technology and the new demands on accelerators 3–5 years out, plan and position accordingly, and bake that foresight into product design. Other GPU companies don't anticipate demand — they react to it.
- Have NVIDIA and DeepSeek been talking to each other? Looking at the 6144 FLOPs/Byte passage — I'd been wondering why NVIDIA was pushing HBM4 pin speeds up so aggressively, and it turns out raising Rubin's HBM4 pin speed isn't "overkill" from the perspective of a model like V4. It's a precisely balanced design.
- NVIDIA is once again working hard to crank up bandwidth on Rubin Ultra, which is telling: it implies that Rubin Ultra's FP4 compute is getting too fast relative to HBM bandwidth, and that HBM bandwidth could once again become the bottleneck when training MoE models like DeepSeek-V4.
- Why is the next-gen NVIDIA chip scaling up the NVL domain? Why are they moving toward Kyber? You can read this as an attempt to push up the bandwidth-density of the interconnect fabric, pulling compute — which has surged above the threshold — back into a communication-friendly balance.
- The upshot is that the DeepSeek paper is essentially telling us NVIDIA's current chip design lines up with DeepSeek's model patterns. If Blackwell alone produced this much evolution, imagine what Rubin and Feynman will deliver.
- NVIDIA's G3.5, unveiled this year, is almost uncanny. It's a new tier sitting between local NVMe SSDs and object/file storage — one that exists only for AI inference — meaning NVIDIA has created an entirely new memory-tier category exclusively for AI workloads. And in §3.6.2 of the V4 paper, DeepSeek argues that the KV cache can break out of GPU HBM's limits and be permanently offloaded to NVMe storage. That maps exactly onto the ICMS rack NVIDIA showed at CES. NVIDIA called it precisely — they saw that labs like DeepSeek would need this.