Nvidia's Blackwell AI GPU overheating issues are seemingly overhyped

Reports of Nvidia's GB200 NVL72 server racks overheating have purportedly been exaggerated.Business Insiderreports that Blackwell's cooling design faults have already been addressed. Dylan Patel, chief analyst at Semianalysis, purportedly told Business Insider that Blackwell's design issues, which have been present for months, have been largely addressed, stating that the overheating issues are largely overblown.

Advertisement

Semianalysis' five analysts monitoring the semiconductor industry reported that the cooling system issues triggering "reworks" from several suppliers were a "minor" change. Blackwell'scooling faultshave been specifically problematic with Nvidia's massive 72-chip server rack, which can consume up to 120kW. Design flaws in the rack's design have forced Nvidia to reevaluate its design multiple times due to the GPUs inside overheating. This has setback shipments of Nvidia's GB200 hardware, causing additional delays due to the required design changes.

Advertisement

Nvidia'sB200GPUs are the most potent processing chips for AI workloads. The GB200 superchip, for instance, has a configurable TDP in thethousandsof watts, with a peak rating of up to 2,700 watts. These absurdly high power figures make air cooling virtually impossible to use in the constraints of a standard rack mount form factor.

This physics problem has forced Nvidia to require liquid cooling on its latest Blackwell GPUs. It also requires data centers to revamp their server farms to accommodate the infrastructure needed to support liquid-cooled servers.

Nvidia could solve this problem by creating slower air-cooled GPUs — which the GPU manufacturer still does, in the form of GPUs such as theH200 NVL. However, to remain at the bleeding edge of the AI GPU arms race, Nvidia is prioritizing performance no matter the cost, which is why the company has opted to make GPUs that require thousands of watts of power at the expense of air-cooling.

The good news is that Nvidia's 72-chip Blackwell cooling issues are apparently minor and have been largely addressed already. In addition, only Nvidia's flagship 72-chip server rack is having the problem.

Advertisement

Hot Rec

Advertisement

Toshiba stuffs an entire PC into a dot matrix printer

Toshiba stuffs an entire PC into a dot matrix printer

Teen 'swatter for hire' pleads guilty to making more than 375 swattings and fake mass-shooting and bombing calls

Teen 'swatter for hire' pleads guilty to making more than 375 swattings and fake mass-shooting and bombing calls

AMD Phoenix CPU brings palm-sized SBC to life for up to $329

AMD Phoenix CPU brings palm-sized SBC to life for up to $329

Maker creates thumb-sized Raspberry Pi USB-C Ethernet module using the RP2040

Maker creates thumb-sized Raspberry Pi USB-C Ethernet module using the RP2040

ASRock launches almost 20 ATX 3.1 power supplies

ASRock launches almost 20 ATX 3.1 power supplies

Chinese DDR4 producers are undercutting South Korean rivals' pricing by 50%

Chinese DDR4 producers are undercutting South Korean rivals' pricing by 50%

Intel's Core 200 family poised to mix Arrow, Lunar, Meteor, Alder, and Raptor Lake parts

Intel's Core 200 family poised to mix Arrow, Lunar, Meteor, Alder, and Raptor Lake parts

This Raspberry Pi 'Expanso Football' is a cool distributed compute cluster in a briefcase

This Raspberry Pi 'Expanso Football' is a cool distributed compute cluster in a briefcase

Dell ships first Nvidia Blackwell server racks — PowerEdge XE9712 servers are enterprise-ready

Dell ships first Nvidia Blackwell server racks — PowerEdge XE9712 servers are enterprise-ready

Intel celebrates the arrival of MRDIMMs

Intel celebrates the arrival of MRDIMMs