English subtitles

← Slac and Big Data - Intro to Computer Science

Get Embed Code
4 Languages

Showing Revision 5 created 05/24/2016 by Udacity Robot.

  1. We're here at SLAC National Accelerator Lab,
  2. and we're going to see how they use computing to understand the mysteries of the universe.
  3. [Spencer Gessner:] We're standing in the klystron gallery, formerly the longest building in the world.
  4. [Richard Mount:] You're here at SLAC National Accelerator Laboratory.
  5. This is a 50-year-old laboratory, as all the flags on the lampposts around the lab are telling you.
  6. It was founded to build a 2-mile-long linear accelerator.
  7. SLAC is an accelerator laboratory still.
  8. Its main science is based on accelerating particles and creating new states of matter
  9. or exploring the nature of matter with the accelerated particles.
  10. This always has generated a lot of data, a lot of information.
  11. It's very data-intensive experimental science.
  12. From the earliest days of SLAC computing
  13. to analyze data has been a major part of the activity here.
  14. You really can only study the cosmos by studying it in a computer.
  15. You get one chance to look at it,
  16. but to understand how it evolved into the state it is now,
  17. you have to do all this in the computer.
  18. There are massive computations going on for that sort of simulation,
  19. massive computations in catalysis and material science
  20. and massive data analysis going on here as well.
  21. The particular particle physics experiment
  22. that I am involved in right now has some 300 petabytes of disk space--
  23. some 300,000 terabytes, some 300 million gigabytes of disk space
  24. around the world to do this analysis.
  25. Of course, we are far from understanding everything about the universe,
  26. but this is probably one of the most data-intensive activity in science today.
  27. The raw data rate coming out of the ATLAS detector that I'm involved in
  28. is about a petabyte a second.
  29. That's 1 million gigabytes a second.
  30. You can't store that with any budget known to man,
  31. so most of it is inspected on the fly and reduced to a much smaller, but still large, storable amount of data.
  32. Right now we are sifting through these many, many petabytes of data
  33. to look for signals of the Higgs boson, as no doubt people have heard in the news.
  34. There are tantalizing hints that I'm not holding my breath about at all right now,
  35. but this is the way we do it.
  36. You need to have those vast amounts of data
  37. just to pick out the things that will really revolutionize physics in there,
  38. and you need to understand all of it in detail, because what you're looking for
  39. is something slightly unusual compared with everything else.
  40. If you don't understand everything else perfectly then you don't understand anything.
  41. [Max Swiatlowski:] We're looking at one of the racks that contains
  42. the ATLAS proof buster at SLAC.
  43. ATLAS is an experimental Large Hadron Collider in Geneva, Switzerland,
  44. that collides protons, fundamental building blocks of nature,
  45. traveling at very, very, very close to the speed of light
  46. with trillions of times the energy that they have at room temperature.
  47. You get many and many of these collisions happening at once
  48. and this enormous machine that reads out trillions of data channels.
  49. At the end of the day, you have this enormous amount of data--petabytes of data--
  50. that you have to analyse looking for very rare, very particular signatures inside of that.
  51. If I want to look for a rare signature--something that had a lot of energy
  52. and a lot of really strange particles at once--
  53. there are trillions and trillions of these events stored on this machine.
  54. To look for them in any reasonable amount of time,
  55. I have to do many searches at once.
  56. I have to use all the cores on the computers--
  57. the hundreds of cores on the machine all running at full-speed at the same time--
  58. to have any hope of doing it in any reasonable amount of time.
  59. [Richard Mount:] This isn't the sort of thing that search engines currently do.
  60. They're looking for text strings and indexing all the text strings that they find
  61. in some way like this.
  62. What we have is very, very structured.
  63. We know the structure of these data.
  64. We know exactly how to go to anything that we want to get to in these data,
  65. because the way in which everything is linked together is very well understood.
  66. Things will go wrong all the time.
  67. You cannot assume you won't lose data from the disk.
  68. You send it by network from one computer center to another.
  69. You cannot assume it arrives undamaged.
  70. You cannot assume your computers don't die in the middle of calculations.
  71. Everything can go wrong, so the computing we do for the LHC
  72. has many layers of error correction and retry.
  73. Some of the basic failure rates are quite high,
  74. but by the time everything has been fairly automatically retried
  75. and errors have been corrected, we get high throughput and a high success rate.