Return to Video

11-21 Why is Dynamic Parallel Quicksort is More Efficient

  • 0:00 - 0:04
    Okay, so, for more efficient partitioning, that is actually not true.
  • 0:04 - 0:07
    We have not been touching the partition function, the partition function does not have
  • 0:07 - 0:11
    anything to do with the dynamic launches that I can do the recursive parallelism.
  • 0:11 - 0:13
    Launching on the fly, however, yes.
  • 0:13 - 0:20
    That does substantially contribute because I don't have to keep returning back to the CPU to do my launch forming.
  • 0:20 - 0:23
    That means I'm communicating less data and it means that my
  • 0:23 - 0:26
    launch occurs immediately when I need it,
  • 0:26 - 0:30
    instead of waiting around until that particular wave of launched is finished.
  • 0:30 - 0:34
    Simple code, while convenient and I can probably maintain it faster,
  • 0:34 - 0:37
    is not the reason why it actually runs any faster.
  • 0:37 - 0:42
    And finally, greater GPU utilization is probably the cause for the greatest of speedups.
  • 0:42 - 0:49
    By launching on the fly, I'm making sure my GPU is always busy, so when one partial sort finishes,
  • 0:49 - 0:54
    it creates 2 more immediately, keeping my GPU fully stacked up and busy with work.
  • 0:54 - 0:59
    It streams more work for my GPU at one time, and my sort ends up faster end-to-end.
  • 0:59 - 1:05
    In fact, when I've written this program in dynamic parallel form and then host launched form,
  • 1:05 - 1:09
    I see a pretty much exactly factor of 2 speed up between the two.
Tytuł:
11-21 Why is Dynamic Parallel Quicksort is More Efficient
Opis:

more » « less
Video Language:
English
Team:
Udacity
Projekt:
CS344 - Intro to Parallel Programming
Duration:
01:09

English subtitles

Revisions Compare revisions