English subtitles

← More Computing power - Intro to Parallel Programming

Get Embed Code
2 Languages

Showing Revision 6 created 05/25/2016 by Udacity Robot.

  1. One of the questions I ask the students in the classes that I teach is,
  2. "What are you going to do with 100 times more compute?"
  3. And sometimes that's a really hard question for them.
  4. There's a lot of head scratching both in terms of what can we do with a super computer that's 100 times more powerful
  5. and what can you do with something on your desk or in your pocket?
  6. -Where do you see us going in this direction?
    -Yeah, well I I have an insatiable appetite for FLOPs.
  7. I would have no trouble using 100 or even 1,000 or even 10,000 times more compute.
  8. A lot of what I do is designing computers,
  9. and a lot of that involves prototyping and simulation new computer designs.
  10. I'm always frustrated about how those simulations run.
  11. So of I could run RTL simulations of a new computer 100 times faster,
  12. it would enable me to be much more productive in trying out new ideas for computer design.
  13. Same for circuit simulations.
  14. I spend a lot of time waiting for circuit simulation to converge.
  15. If I could run it 100 times faster, I could not just run one simulation but run whole parameter sweeps at once
  16. and do optimizations the same time I'm simulating.
  17. Another thing is also you look at the computers in your car.
  18. I mean our Tegra processors are actually designed into lots of different automobiles,
  19. including the Tesla Model S, the Motor Trend Car of the Year,
  20. but also Audis and BMWs and all sorts of Fords have Tegras in them.
  21. And the applications people are starting to use for these mobile processors in cars
  22. involve having lots of computer vision to look at what people inside the car are doing,
  23. look at what people outside of the car are doing.
  24. And in many ways it makes your cars much safer by having the car aware of what's going on around it.
  25. It can in many ways compensate for the driver not being completely alert
  26. or perhaps texting or doing something they shouldn't be doing.
  27. And in mobile devices I think there are a lot of compelling applications
  28. in both computational photography and augmented reality.
  29. If your mobile device is constantly aware of what's around you, it can be informing you.
  30. Oh, I think you're hungry.
  31. Here's a place that has gyros that I know you like
  32. because I have your profile of your likes and dislikes. Maybe you should stop for lunch.
  33. Or a block away is this guy who you really don't like.
  34. Maybe you should turn right at this corner and avoid running into him.
  35. In many ways, I think it sort of evolved to having your computing devices becoming your personal assistant.
  36. I always liked Jeeves in the Iron Man movies.
  37. I would like to have a device I can kind of talk to that is aware
  38. of the environment around me and can be basically a brain amplifier for me.
  39. It can sort of remember things that I forget
  40. and tell me about things in my environment and basically assist me in going through my day,
  41. both on professional and personal bases.
  42. So one of the goals of the supercomputer industry is to get up to—the term they use is exascale
  43. that they'd like to do 10 ^ 18 FLOPs per second.
  44. Certainly, Nvidia is going to be interested in being in those computers. What are we going to use that for?
  45. Well, I think first of all, there's nothing magical about an exascale.
  46. It's like, you know, when we first made
  47. petascale machines, which is just a few years ago,
  48. it wasn't like breaking the sound barrier or anything really qualitatively changed,
  49. but enabled better science and there's always—
  50. You look at sort of the fidelity of simulations we're able to do today
  51. to, say, simulate a more efficient engine for automobiles to improve gas mileage,
  52. and we're making lots of approximations to fit them on the supercomputers we have today.
  53. As we can get to higher fidelity by resolving grids finer
  54. and modeling a bunch of effects like turbulence more directly
  55. rather than using macro models to model them, we'll get more accurate simulations.
  56. And that will enable a better understanding of combustion in some of the you biotech applications
  57. of how proteins fold, various other climate—
  58. -Climate modeling.
  59. Climate evolves.
  60. Basically as we get better computing capacity—
  61. and it's not you're reaching magic exascale and wonderful things happen,
  62. but at every step along the way, we get better science,
  63. we are able to design better products.
  64. And computing is a big driver of both scientific understanding
  65. and economic progress across across the board.
  66. And I think it's very important that we maintain that steady march forward,
  67. and exascale is just one milestone along that march.
  68. And my understanding is that power is really an enormously crucial thing for them to get right
  69. to be able to enable the exascale that we don't want machines
  70. that are going to cost $2 million a month just to plug in.
  71. It's really an economic argument.
  72. I mean if you really wanted an exascale machine today, you could build one.
  73. You just have to write a really big check and locate it right next to the nuclear power plant,
  74. the entire output of which it will consume.
  75. But I think if there was some application that was so compelling
  76. they were willing to really write the multi-billion dollar check required to do that, you would do it.
  77. I think that the real question of exascale is an economical exascale,
  78. and because on total cost of ownership the power bill is a tremendous fraction.
  79. So it's not actually an economical exascale machine
  80. unless you can do it for reasonable power level,
  81. and the number that's been thrown out is 20 megawatts.
  82. So that's $20 million a year.
  83. Yeah, $20 million a year power bill if you're paying roughly $10 a kilowatt hour.
  84. In fact, the bill actually winds up usually being a little bit higher than that
  85. because the cost of provisioning energy amortized over, say, a 30-year lifetime of the facility,
  86. usually is about equal to the annual bill for the energy.
  87. There is also something called the PUE,
  88. which is basically the efficiency of providing the energy.
  89. Even for a very good installation today maybe on the order of 1.1 to 1.2.
  90. So you pay another, say, 20% to run the air conditioners and fans and things like that in the facility,
  91. and basically energy you're consuming isn't being consumed by the computer.
  92. But it's a big challenge for us to get from, say, Sandy Bridge today—
  93. that's 1.5 nanojoules per instruction—to if you wanted to do exa instructions per second,
  94. to do an exa FLOP you might have to do more than an exa instruction per second.
  95. But even if you take that as a thing at 20 megawatts,
  96. that's 20 picojoules per instruction.
  97. And that's not just the processor; that's everything.
  98. That's the memory system, that's the network, that's the io storage system.
  99. It's the whole ball of wax that to do it.
  100. So you mainly get 10 picojoules per instruction to actually use in the processor.
  101. And so even in Nvidia it's not quite close enough to that.
  102. Yeah, well compared to Sandy Bridge, that's a factor of 150 down,
  103. and process isn't going to help you much.
  104. So that's why conventional CPUs are not going to get there.
  105. It's going to require a hybrid multi-core approach
  106. with most of the work being done in a GPU like throughput processor to get there.
  107. But even we have a ways to go.
  108. We're probably close to over-magnitude, and we might get a factor of three from process.
  109. We need to be very clever to come up with the other factor of 3 or 4 that we need.
  110. -Titan does have CPUs in it, yes?
    -That's correct.
  111. So is there a vision where that won't even be the case?
  112. No I think there are always pieces of the code
  113. where you have a critical path, you have a piece of single thread code that you need to run very quickly.
  114. And so you always need a latency optimized processor around to do that,
  115. but most of the work it's one of these things, it's kind of like a cache memory, where most of your acesses
  116. are to this little memory that runs really fast but you still need the capacity
  117. of the big memory sitting behind it, right?
  118. And so it's, it's the same thing on throughput versus latency.
  119. Most of your work is done in the throughput processors,
  120. but when you do have a latency critical thing, you run it in on the latency optimized processors.
  121. And so you wind up getting the critical path performance of the CPU
  122. with the bulk of the energy consumption of the GPU.
  123. -And the bulk of the FLOPs and Titan is certainly going to the GPU's.
  124. The bulk of the FLOPs will be in the GPU's.