"If you know CFD, you know that we’ve been unable to get the most out of a CPU for years due to inadequate bandwidth, which has driven the transition to GPUs, but as of the launch of DDR5 for Power10, there’s now a CPU server on the market that can actually deliver data to the cores as fast as they can chew through it."
Interesting analogy regarding Computational Fluid Dynamics, and pipelines of code/data (I think queueing theory applies, and is mathematically related). But the above sums up the problem: if you can't get your data and code into the CPU fast enough, you're CPU idles, and you're wasting money on the CPU where an investment in memory bandwidth to the CPU would be better.
Brings to mind comparisons between PCs and mainframes, where mainframe I/O bandwidth far exceeded PC I/O bandwidth, relative to its CPU processing power. And where IBM implemented/offloaded a lot of the I/O processing in the I/O channels to minimize CPU work for I/O. As a result, I think you should have used "decades" rather than "years" for this problem, because the problem has existed since the introduction of the PC in 1981.
I think that for CFD, the bandwidth vs FLOPS started to get out of whack when AVX instructions and multi-core CPUs hit the scene. In fact, if a CFD code on x86 isn't using AVX at all, it is unlikely to fully saturate the bandwidth on a number of models.
"If you know CFD, you know that we’ve been unable to get the most out of a CPU for years due to inadequate bandwidth, which has driven the transition to GPUs, but as of the launch of DDR5 for Power10, there’s now a CPU server on the market that can actually deliver data to the cores as fast as they can chew through it."
Interesting analogy regarding Computational Fluid Dynamics, and pipelines of code/data (I think queueing theory applies, and is mathematically related). But the above sums up the problem: if you can't get your data and code into the CPU fast enough, you're CPU idles, and you're wasting money on the CPU where an investment in memory bandwidth to the CPU would be better.
Brings to mind comparisons between PCs and mainframes, where mainframe I/O bandwidth far exceeded PC I/O bandwidth, relative to its CPU processing power. And where IBM implemented/offloaded a lot of the I/O processing in the I/O channels to minimize CPU work for I/O. As a result, I think you should have used "decades" rather than "years" for this problem, because the problem has existed since the introduction of the PC in 1981.
I think that for CFD, the bandwidth vs FLOPS started to get out of whack when AVX instructions and multi-core CPUs hit the scene. In fact, if a CFD code on x86 isn't using AVX at all, it is unlikely to fully saturate the bandwidth on a number of models.