One of the cool recent announcements we've seen is the announcement
of the Titan Supercomputer, which is now the fastest supercomputer in the world,
and this has NVIDIA processors in its core.
Can you talk a little bit about that process and how NVIDIA got to be involved
and why that's such an exciting thing for GPU computing?
Well, first of all, Titan is an awesome machine.
It's 18,688 Kepler K20 GPUs and is the fastest computer in the world at running high performance winPAC.
There's an interesting story there.
The story of Titan actually starts with a meeting that I had with Steve Scott,
who at the time was CTO of Cray at the Salishan Conference up on the Oregon Coast in 2009.
I was talking to Steve and trying to see how can we work together.
We really should get NVIDIA GPUs into Cray supercomputers
because we have the best compute per dollar, compute per watt,
which are the two things that matter in high performance computing,
of anybody in the world, and they were actually going through a problem
because they had bet on a different vendor who cancelled a project on them,
and it left a hole in what they wanted to bid for this solicitation
that was out from Oakridge to build what they call their leadership class computing facility--
ultimately what turned in into Titan.
It was just a nice sort of juxtaposition in time that I was having this conversation
with him right at the point in time where there was a hole to be filled.
It turns out Kepler filled that hole wonderfully.
There was a lot of challenges along the way that, I think, really had to do with getting the people
in the National Labs to embrace the model of parallelism that Cuda presents.
I think that once they embraced it, they found it was actually easier
to write their programs that way, and they actually ran better across the board
once they were reorganized into that style of parallelism of watching CTAs and organizing things in that style.
But they had a very large chunk of legacy code mostly in Fortran.
It was coded sort of Fortran running on a single node with MPI used to communicate between the nodes,
and it was a nontrivial exercise to really bring that software over
and get it to run well on a GPU accelerated system.
And I think beyond the LINPAC number, which is extra relatively easy number to get because it's one program,
what really is a success of Titan is the very large number
of basic energy science and defense codes that have been ported over very successfully
and get just tremendous performance on the K20s.
最近的公告中,让我们印象深刻的一则公告就是有关
Titan超级计算机的公告,
它是现在世界上最快的超级计算机,
而它的核心拥有 NVIDIA 处理器。
你稍微谈谈那个过程和 NVIDIA 是如何参与的
以及为什么对于 GPU 计算这是如此令人兴奋的事?
嗯,首先,Titan是一个非常棒的机器。
它有 18,688个 开普勒 K20 GPU,
并且是世界上运行高性能 winPAC 最快的计算机。
关于这个,有个有趣的故事。
Titan的故事实际上从我与史蒂夫 · 斯科特的一次会谈开始,
他当时是Cray公司的首席技术官,
那是2009年在俄勒冈州海滨召开的Salishan 大会。
我当时和史蒂夫交谈想看看我们如何合作。
我们真的应该把英伟达 GPU引入 Cray 超级计算机
因为我们有最好的每美元计算能力,以每瓦特计算能力,
这些是高性能计算中重要的两个问题,
比世界上的其他人都好,他们实际上正好遇到一个问题
因为他们之前指望的另外一家供应商
取消了与他们的一个项目,
这样就留下一个空缺,他们想为此招投标
那就是在橡树岭之外建造他们称之为领袖级的计算设备,
最终诞生了Titan。
它只是一个不错的时间并列,我与他进行的这次会谈
正好在那个时间点,那是有一个空缺需要填补
事实证明,开普勒美妙地填补了那个空缺。
一路走来,我觉得有许多挑战其实与让
国家实验室的人员接受Cuda展现的并行模型有关。
我想,一旦他们接受了,他们会发现他们那样
编写程序更简单,他们实际上在各方面会运行得更好
如果他们以那种并行风格重新组合,
即并行查看CTA, 以那种方式整理事物。
但他们绝大部份的代码是旧式代码,
其中大多是 Fortran语言编写的。
它好像是以Fortran编码,在单一节点上运行,
在节点之间使用MPI 进行通信,
因此这真的是一件不平凡的工作,转换那个软件
然后使它在 GPU 加速的系统上很好运行。
我觉得除了LINPAC数,这是相对特别容易实现的数字,
因为它是一个程序,
Titan的真正成功之处是非常大量的
基础能源科学和防御代码已经非常成功地被移植过去,
并在 K20s 上完全地获得出色的性能。