Return to Video

Make Big Index - Intro to Computer Science

  • 0:00 - 0:04
    So we seen a few examples measuring the time of execution of
  • 0:04 - 0:07
    toy programs, of a loop that does nothing. What we care about though
  • 0:07 - 0:11
    is the time of execution of our indexing code, so what I'm
  • 0:11 - 0:15
    going to do next is write a program to be able to test the
  • 0:15 - 0:18
    time of execution of indexing code. To get a good test we
  • 0:18 - 0:20
    need to make a big index, we need to fill up an index
  • 0:20 - 0:22
    with lots and lots of words, and we can do that by hand;
  • 0:22 - 0:25
    that would take a lot of time and effort. So what I'm going to
  • 0:25 - 0:28
    do instead, is to find a procedure that makes a
  • 0:28 - 0:31
    big index. So what it does is take in, a size,
  • 0:31 - 0:34
    and then it's going to, fill up an index with that
  • 0:34 - 0:37
    number of words. To fill up the index with that number
  • 0:37 - 0:40
    of keywords, we need to generate different words. So what
  • 0:40 - 0:43
    I've done is created a variable called letters, that is initially
  • 0:43 - 0:47
    is all A's. And as we go through the loop, we
  • 0:47 - 0:50
    go through the loop size number of times. We keep making
  • 0:50 - 0:52
    a new string. We add that string to the next, I'll
  • 0:52 - 0:53
    explain what make_string does later. We add that string to the
  • 0:53 - 1:00
    index, then we change the letters. And we're
  • 1:00 - 1:06
    going to keep increasing the letter once we get to z, we wrap around. For now
  • 1:06 - 1:09
    it's not too important to understand everything in this code, but
  • 1:09 - 1:10
    I do want to walk through the code a little bit.
  • 1:10 - 1:13
    So what this loop is doing is going through all the
  • 1:13 - 1:16
    positions and the letters are A and we filled this up with
  • 1:16 - 1:19
    eight different elements and we're going to go through those elements starting
  • 1:19 - 1:23
    from the last one going backwards, so this range loop goes from
  • 1:23 - 1:26
    the length minus 1 to 0, stepping by negative 1. We
  • 1:26 - 1:30
    are going to check each letter. If the letter is less than z,
  • 1:30 - 1:32
    that means we can increase and we are going to increase it
  • 1:32 - 1:35
    using this code here and I'll talk soon about the code that
  • 1:35 - 1:38
    turns letters into numbers, but what this expression does is get
  • 1:38 - 1:41
    the next letter. So if the letter was an a the result
  • 1:41 - 1:45
    after this will be a b, and we're going to replace the
  • 1:45 - 1:48
    letter at that position in the letters list, with the next
  • 1:48 - 1:51
    letter. If the letter is a z, we don't want to go
  • 1:51 - 1:53
    beyond the alphabet, so instead, we're going to set that letter to
  • 1:53 - 1:56
    a, and we're going to go back through the loop to find
  • 1:56 - 1:59
    the next letter. Once we found one less than z, we
  • 1:59 - 2:02
    break, we only need to change one letter. What the make_string
  • 2:02 - 2:07
    procedure does, that we call here. Is just turn that array into
  • 2:07 - 2:10
    one string. So it's going through the elements of P which
  • 2:10 - 2:13
    is this list of letters that keeps changing. And concatenating all those
  • 2:13 - 2:16
    letters into one string. So the whole point of this is to
  • 2:16 - 2:19
    allow us to easily make big indexes so we can run tests
  • 2:19 - 2:23
    on different size indexes. So let's try this in the Python shell.
  • 2:23 - 2:26
    First I'll show you what the result is when we use make
  • 2:26 - 2:29
    big index. We'll start with a fairly small one, so I'm passing
  • 2:29 - 2:32
    three is the size. What make big index gives us is an
  • 2:32 - 2:35
    index with three keywords: aaaaaaaa, aaaaaaab, aaaaaaac and for
  • 2:35 - 2:40
    each of them there's one url which is the name
  • 2:40 - 2:44
    fake. If we're passing in a bigger value, this
  • 2:44 - 2:47
    will have an index with a hundred keywords, so we're
  • 2:47 - 2:49
    going to passing a hundred is the size, we get
  • 2:49 - 2:51
    this big index, and you can see it's starting to
  • 2:51 - 2:54
    change the second from the last letter. To make sure
  • 2:54 - 2:57
    that each word is a different word than the next.
  • 2:57 - 2:59
    So, what we want to do now is look at
  • 2:59 - 3:03
    how the time of executions take for different size indexes. So,
  • 3:03 - 3:07
    let's make a really big index. So what we're going to do,
  • 3:07 - 3:11
    we'll make an index of size 10,000. And remember, our concern
  • 3:11 - 3:13
    is the time for lookups. It's the operation that's going to happen
  • 3:13 - 3:16
    most frequently. So we're not timing the time to make the
  • 3:16 - 3:19
    big index. Let's see what the time is to do a
  • 3:19 - 3:22
    lookup, and so I'll time the execution of looking up in
  • 3:22 - 3:27
    index 10,000 the keyword, and the word makes
  • 3:27 - 3:29
    a difference. So, first let's try looking up
  • 3:29 - 3:33
    the word udacity. Which, sadly, is not in
  • 3:33 - 3:35
    our index. We'd need a much bigger index
  • 3:35 - 3:42
    to get up to udacity. And the time of execution is shown here. So, it's 0.0008
  • 3:42 - 3:45
    seconds. So, still getting close to a millisecond,
  • 3:45 - 3:48
    but still quite fast. Let's make a bigger index.
  • 3:48 - 3:51
    This time we have a 100,000 keywords. going to take
  • 3:51 - 3:53
    a little longer to make it, but we're not
  • 3:53 - 3:55
    worrying about the time to make it now. What
  • 3:55 - 3:57
    we care is the time to do a look up.
  • 3:59 - 4:03
    So that took a long time. Let's see how, many entries there are, so we can look
  • 4:03 - 4:09
    at the very last element in our index. And we can see got to aaaafryd,
  • 4:09 - 4:14
    which I don't know to pronounce. Another way to see that, which
  • 4:14 - 4:16
    we didn't talk about yet, we can actually index from the
  • 4:16 - 4:19
    back using negative numbers, so if you use negative 1 as
  • 4:19 - 4:21
    the index, that will give us the last entry in the
  • 4:21 - 4:24
    list. So, now we'll try doing a timed execution. We're going to
  • 4:24 - 4:27
    look up in the 10,000 size index. And we'll
  • 4:27 - 4:30
    see the time's pretty similar to what it was before,
  • 4:30 - 4:31
    that time might vary a little bit, let's try
  • 4:31 - 4:34
    it once more, and again, it's just under a millisecond.
  • 4:34 - 4:38
    So now, we'll try, instead of the 10,000 index,
  • 4:38 - 4:43
    looking up in the 100,000 length index, the same look-up,
  • 4:43 - 4:45
    and we see that the time is now 10
  • 4:45 - 4:49
    times that, so it's now about 8.6 milliseconds whereas before
  • 4:49 - 4:55
    it was 0.9 milliseconds. And let's, for consistency, try that again.
  • 4:55 - 4:57
    We'll note that these timings vary a little bit, each time
  • 4:57 - 5:00
    we do it. And there's lots of reasons why the timing
  • 5:00 - 5:03
    varies. We're running lots of other things on the computer at the
  • 5:03 - 5:07
    same time. So it's not the case that we have total
  • 5:07 - 5:10
    control over the processor and are running exactly the same thing every
  • 5:10 - 5:12
    time. Because all of the other programs might be doing other
  • 5:12 - 5:14
    things. The other reason the time can vary is where things are
  • 5:14 - 5:18
    in memory. Sometime it's very quick to look up a
  • 5:18 - 5:21
    value in memory, sometimes it takes longer. And we're not going to
  • 5:21 - 5:24
    talk about the details of that. What matters is that, that
  • 5:24 - 5:27
    the time's roughly the same, each time we execute it. And
  • 5:27 - 5:31
    it really depends on the size of the input, in this
  • 5:31 - 5:33
    case it's the size of the input table. So when we
  • 5:33 - 5:36
    increase the size of the table to have 100,000 entires, it's
  • 5:36 - 5:40
    about 10 times as slow as when we had 10,000 entries.
  • 5:40 - 5:43
    So now let's have a few quizzes to see if you can guess how these timings work.
タイトル:
Make Big Index - Intro to Computer Science
概説:

more » « less
Video Language:
English
Team:
Udacity
プロジェクト:
CS101 - Intro to Computer Science
Duration:
05:45

English subtitles

改訂 Compare revisions