Return to Video

Red-Black Trees (21 min)

  • 0:00 - 0:03
    So, in this video, we'll graduate beyond
    the domain of just vanilla binary search
  • 0:03 - 0:07
    trees, like we've been talking about
    before, and we'll start talking about
  • 0:07 - 0:10
    balanced binary search trees.
    These are the search trees you'd really
  • 0:10 - 0:14
    want to use when you want to have real
    time guarantees on your operation time.
  • 0:14 - 0:18
    Cuz they're search trees which are
    guaranteed to stay balanced, which means
  • 0:18 - 0:21
    the height is guaranteed to stay
    logarithmic, which means all of the
  • 0:21 - 0:25
    operations search trees support that we
    know and love, will also be a logarithmic
  • 0:25 - 0:27
    in the number of keys that they're
    storing.
  • 0:27 - 0:30
    So, let's just quickly recap.
    What is the basic structure tree property?
  • 0:30 - 0:34
    It should be the case that at every single
    node of your search tree, if you go to the
  • 0:34 - 0:38
    left, you'll only see keys that are
    smaller than where you started and if you
  • 0:38 - 0:42
    go to the right you only see keys that
    are bigger than where you started.
  • 0:42 - 0:46
    And a really important observation, which
    is that, given a set of keys, there's
  • 0:46 - 0:49
    going to be lot and lots of legitimate,
    valid, binary search trees with those
  • 0:49 - 0:51
    keys.
    So, we've been having these running
  • 0:51 - 0:54
    examples where the keys one, two, three,
    four, five.
  • 0:54 - 0:58
    On the one hand, you can have a nice and
    balanced search tree that has height only
  • 0:58 - 1:02
    two, with the keys one through five.
    On the other hand, you can also have these
  • 1:02 - 1:06
    crazy chains, basically devolved to link
    lists where the heights for, and elements
  • 1:06 - 1:09
    could be as high as N - 1.
    So, in general, you could have an
  • 1:09 - 1:13
    exponential difference in the height.
    It can be as small, in the best case, as
  • 1:13 - 1:16
    logarithmic and as big, in the worst case,
    as linear.
  • 1:16 - 1:20
    So, this obviously motivates search trees
    that have the additional property that you
  • 1:20 - 1:24
    never have to worry about their height.
    You know they're going to be well
  • 1:24 - 1:26
    balanced.
    You know they're going to have height
  • 1:26 - 1:28
    logarithmic.
    You're never worried about them having
  • 1:28 - 1:32
    this really lousy linear height.
    Remember, why it's so important to have a
  • 1:32 - 1:35
    small height?
    It's because the running time of all of
  • 1:35 - 1:39
    the operations of search trees depends on
    the height.
  • 1:39 - 1:43
    You want to do search, you want to
    insertions, you want to find predecessors
  • 1:43 - 1:48
    or whatever, the height is going to be
    what governs the running time of all those
  • 1:48 - 1:50
    properties.
    So, the high level idea behind balanced search
  • 1:50 - 1:55
    trees is really exactly what you think,
    which is that, you know, because the
  • 1:55 - 1:58
    height can't be any better than
    logarithmic in the number of things you're
  • 1:58 - 2:02
    storing, that's because the trees are
    binary so the number of nodes can only
  • 2:02 - 2:05
    double each level so you need a
    logarithmic number of levels to
  • 2:05 - 2:07
    accommodate everything that you are
    storing.
  • 2:07 - 2:11
    But it's got to be logarithmic, lets make
    sure it stays logarithmic all the time,
  • 2:11 - 2:16
    even as we do insertions and deletions.
    If we can do that, then we get a very rich
  • 2:16 - 2:21
    collection of supported operations all
    running in logarithmic time.
  • 2:21 - 2:25
    As usual, n denotes, the number of
    keys being stored in the tree.
  • 2:25 - 2:29
    There are many, many, many different
    balanced search trees.
  • 2:29 - 2:33
    They're not super, most of them are not
    super different from each other.
  • 2:33 - 2:37
    I'm going to talk about one of the more
    popular ones which are called Red Black
  • 2:37 - 2:40
    Trees.
    So, these were invented back in the '70s.
  • 2:40 - 2:45
    These were not the first balanced binary
    search tree data structures, that honor
  • 2:45 - 2:49
    belongs to AVL trees, which again are not
    very different from red black trees,
  • 2:49 - 2:52
    though the invariants are slightly
    different.
  • 2:52 - 2:56
    Another thing you might want to look up
    and read about is a very cool data
  • 2:56 - 2:58
    structure called splay trees, due to
    Sleator and Tarjan,
  • 2:58 - 3:02
    These, unlike red black trees and AVL
    trees, which only are modified on
  • 3:02 - 3:06
    insertions and deletions, which, if you
    think about it, is sort of what you'd
  • 3:06 - 3:09
    expect.
    Splay trees modify themselves, even when
  • 3:09 - 3:12
    you're doing look ups, even when you're
    doing searches.
  • 3:12 - 3:16
    So, they're sometimes called
    self-adjusting trees for that reason.
  • 3:16 - 3:21
    And it's super simple, but they still have
    kind of amazing guarantees.
  • 3:21 - 3:25
    And then finally, going beyond the, just
    the binary tree paradigm many of you might
  • 3:25 - 3:28
    want to look up examples of B trees or
    also B+ trees.
  • 3:28 - 3:31
    These are very relevant for implementing
    databases.
  • 3:31 - 3:35
    Here what the idea is, in a given node
    you're going to have not just one key but
  • 3:35 - 3:40
    many keys and from a node, you have
    multiple branches that you can take
  • 3:40 - 3:44
    depending where you're searching for falls
    with respect to the multiple keys that are
  • 3:44 - 3:47
    at that node.
    The motivation in a database context for
  • 3:47 - 3:51
    going beyond the binary paradigm, is to
    have a better match up with the memory
  • 3:51 - 3:53
    hierarchy.
    So, that's also very important, although a
  • 3:53 - 3:57
    little bit out of the scope here.
    That said, what we discuss about red-black
  • 3:57 - 4:01
    trees, much of the intuition will
    translate to all of these other balance
  • 4:01 - 4:06
    tree data structures, if you ever find
    yourself in a position where you need to
  • 4:06 - 4:09
    learn more about them.
    So, red black trees are just the same as
  • 4:09 - 4:13
    binary search trees, except they also
    always maintain a number of additional
  • 4:13 - 4:16
    invariants.
    And so, what I'm going to focus on in this
  • 4:16 - 4:20
    video is, first of all, what the
    invariants are, and then how the
  • 4:20 - 4:23
    invariants guarantee that the height will
    be logarithmic.
  • 4:23 - 4:27
    Time permitting, at some point, there will
    be optional videos more about the guts,
  • 4:27 - 4:32
    more about the implementations of red
    black trees namely how do you maintain
  • 4:32 - 4:34
    these invariants under insertions and
    deletions.
  • 4:34 - 4:38
    That's quite a bit more complicated, so
    that's appropriate for, for optional
  • 4:38 - 4:41
    material.
    But understanding what the invariants are
  • 4:41 - 4:45
    and what role they play in controlling the
    height is very accessible, and it's
  • 4:45 - 4:47
    something I think every programmer should
    know.
  • 4:47 - 4:51
    So, there, I'm going to write down four
    invariants and really, the bite comes from
  • 4:51 - 4:54
    the second two, okay, from the third and
    the fourth invariant.
  • 4:54 - 4:58
    The first two invariants you know, are
    just really cosmetic.
  • 4:58 - 5:02
    So, the first one we're going to store one
    bit of information additionally at each
  • 5:02 - 5:07
    node, beyond just the key and we're going
    call this bit as indicating whether it's a
  • 5:07 - 5:10
    red or a black node.
    You might be wondering, you know, why red
  • 5:10 - 5:12
    black?
    Well, I asked my colleague, Leo Guibas
  • 5:12 - 5:16
    about that a few years ago.
    And he told me that when he and Professor
  • 5:16 - 5:21
    Sedgewick were writing up this article the
    journals were, just had access to a
  • 5:21 - 5:25
    certain kind of new printing technology
    that allowed very limited color in the
  • 5:25 - 5:28
    printed copies of the journals.
    And so, they were eager to use it, and so
  • 5:28 - 5:32
    they named the data structure red black,
    so they could have these nice red and
  • 5:32 - 5:36
    black pictures in the journal article.
    Unfortunately, there was then some snafu,
  • 5:36 - 5:40
    and at the end of the day, that technology
    wasn't actually available, so it wasn't
  • 5:40 - 5:44
    actually printed the way they were
    envisioning it but the name has stuck.
  • 5:44 - 5:48
    So, that's the rather idiosyncratic reason
    why these data structures got the name
  • 5:48 - 5:52
    that they did, red black trees.
    So, secondly we're going to maintain the
  • 5:52 - 5:56
    invariant that the roots of the search
    tree is always black, it can never be red.
  • 5:56 - 5:58
    Okay.
    So, with the superficial pair of
  • 5:58 - 6:02
    invariants out of the way, let's go to the
    two main ones.
  • 6:02 - 6:06
    So, first of all, we're never going to
    allow two reds in a row.
  • 6:06 - 6:13
    By which, I mean, if you have a red node
    in the search tree, then its children must
  • 6:13 - 6:16
    be black.
    If you think about for a second, you
  • 6:16 - 6:20
    realize this also implies that if a notice
    red, and it has a parent, then that
  • 6:20 - 6:24
    parent has to be a black node.
    So, in that sense, there are no two red
  • 6:24 - 6:30
    nodes in a row anywhere in the tree.
    And the final invariant which is also
  • 6:30 - 6:37
    rather severe is that every path you might
    take from a root to a null pointer, passes
  • 6:37 - 6:41
    through exactly the same number of black
    nodes.
  • 6:41 - 6:46
    So, to be clear on what I mean by a root
    null path, what you should think about is an
  • 6:46 - 6:49
    unsuccessful search, right?
    So, what happens in an unsuccessful
  • 6:49 - 6:53
    search, you start at the root depending on
    whether you need to go smaller or bigger,
  • 6:53 - 6:57
    you go left or right respectably.
    You keep going left right as appropriate
  • 6:57 - 7:01
    until eventually you hit a null pointer.
    So, I want you to think about the process
  • 7:01 - 7:05
    that which you start at the root and then,
    eventually, fall off the end of the tree.
  • 7:05 - 7:07
    In doing so, you traverse some number of
    nodes.
  • 7:07 - 7:11
    Some of those nodes will be black some of
    those nodes will be red.
  • 7:11 - 7:15
    And I want you to keep track of the number
    of black nodes and the constraints that a
  • 7:15 - 7:20
    red black tree, by definition, must
    satisfy, is that no matter what path you
  • 7:20 - 7:24
    take through the tree starting from the
    root terminating at a null pointer, the
  • 7:24 - 7:27
    number of black nodes traversed, has to be
    exactly the same.
  • 7:27 - 7:32
    It cannot depend on the path, it has to be
    exactly the same on every single root null path.
  • 7:32 - 7:35

    Let's move on to some examples.
  • 7:35 - 7:38
    So, here's a claim.
    And this is meant to, kind of, whet your
  • 7:38 - 7:43
    appetite for the idea that red black trees
    must be pretty balanced.
  • 7:43 - 7:46
    They have to have height, basically
    logarithmic.
  • 7:46 - 7:49
    So, remember, what's the most unbalanced
    search tree?
  • 7:49 - 7:54
    Well, that's these chains.
    So, the claim is, even a chain with three
  • 7:54 - 7:58
    nodes can not be a red black tree.
    So, what's the proof?
  • 7:58 - 8:05
    Well, consider such a search tree.
    So, maybe, with the key values one, two and three.
  • 8:05 - 8:08
    So, the question that we're asking is, is
  • 8:08 - 8:13
    there a way to color the node, these three
    nodes, red and black so that all four of
  • 8:13 - 8:17
    the invariants are satisfied.
    So, we need to color each red or black.
  • 8:17 - 8:20
    Remember, variant two says, the root, the
    one has to be black.
  • 8:20 - 8:24
    So, we have four possibilities for how to
    use the color two and three.
  • 8:24 - 8:27
    But really, because of the third
    invariant, we only have three possibilities.
  • 8:27 - 8:30
    We can't color two and three both red, cuz
  • 8:30 - 8:34
    then we'd have two reds in a row.
    So, we can either make two red, three
  • 8:34 - 8:37
    black, two black, three red, or both two
    and three black.
  • 8:37 - 8:41
    And all of the cases are the same.
    Just to give one example, suppose that we
  • 8:41 - 8:44
    colored the node two, red, and one and
    three are black.
  • 8:44 - 8:48
    The claim is invariant four has been
    broken and invariant four is going to be
  • 8:48 - 8:53
    broken no matter how we try to color two
    and three red and black.
  • 8:53 - 8:57
    What is invariant four says?
    It says, really on any unsuccessful
  • 8:57 - 9:00
    search, you pass through the same number
    of black nodes.
  • 9:00 - 9:03
    And so, one unsuccessful search would be,
    you search for zero.
  • 9:03 - 9:07
    And if you search for a zero, you go to
    the root, you immediately go left to hit a
  • 9:07 - 9:10
    null pointer.
    So, you see exactly one black node.
  • 9:10 - 9:12
    Namely one.
    On the other hand, suppose you searched
  • 9:12 - 9:15
    for four, then you'd start at the root,
    and you'd go right, and you go to two,
  • 9:15 - 9:19
    you'd go right, and you go to three, you'd
    go right again, and only then will you get
  • 9:19 - 9:22
    a null pointer.
    And on that, unsuccessful search, you'd
  • 9:22 - 9:24
    encounter two black nodes, both the one
    and the three.
  • 9:24 - 9:28
    So, it's a violation of the fourth
    invariant, therefore, this would not be a
  • 9:28 - 9:30
    red black tree.
    I'll leave that for you to check, that no
  • 9:30 - 9:34
    matter how you try to code two and three
    red or black, you're going to break one of
  • 9:34 - 9:37
    the invariants.
    If they're both red, you'd break the third
  • 9:37 - 9:39
    invariant.
    If at most one is red, you'd break the
  • 9:39 - 9:42
    fourth invariant.
    So, that's a non-example of a red-black tree.
  • 9:42 - 9:44
    So, let's look at an example of a red-black tree.
  • 9:44 - 9:48
    One, a search tree where you can actually
  • 9:48 - 9:52
    color the nodes red or black so that all
    four invariants are maintained.
  • 9:52 - 9:56
    So, one search tree which is very easy to
    make red black is a perfectly balanced one.
  • 9:56 - 9:59
    So, for example, let's consider this three
  • 9:59 - 10:03
    nodes search tree has the keys three,
    five, and seven and let's suppose the five
  • 10:03 - 10:06
    is the root.
    So, it has one child on each side, the
  • 10:06 - 10:09
    three and the seven.
    So, can this be made a red black tree?
  • 10:09 - 10:11
    So, remember what that question really
    means.
  • 10:11 - 10:16
    It's asking can we color theses three
    nodes some combination of red and black so
  • 10:16 - 10:19
    that all four of the invariants are
    satisfied?
  • 10:19 - 10:24
    If you think about it a little bit, you
    realize, yeah, you can definitely color
  • 10:24 - 10:27
    these nodes red or black to make and
    satisfy for the invariants.
  • 10:27 - 10:31
    In particular, suppose we color all three
    of the nodes, black.
  • 10:31 - 10:34
    We've satisfied variant number one, we've
    colored all the nodes.
  • 10:34 - 10:37
    We've satisfied variant number two, and
    particularly, the root is black.
  • 10:37 - 10:41
    We've satisfied invariant number three.
    There's no reds at all, so there's
  • 10:41 - 10:45
    certainly no two reds in a row.
    And, if you think about it, we've
  • 10:45 - 10:48
    satisfied invariant four because this tree
    is perfectly balanced.
  • 10:48 - 10:52
    No matter what you unsuccessfully search
    for, you're going to encounter two black
  • 10:52 - 10:55
    nodes.
    If you search for, say, one, you're going
  • 10:55 - 10:59
    to encounter three and five.
    If you search for, say, six, you're going
  • 10:59 - 11:02
    to encounter five and seven.
    So, all root null paths have exactly
  • 11:02 - 11:05
    two black nodes and variant number four is
    also satisfied.
  • 11:05 - 11:08
    So, that's great.
    But, of course, the whole point of having
  • 11:08 - 11:11
    a binary search tree data structure is you
    want to be dynamic.
  • 11:11 - 11:13
    You want to accommodate insertions and
    deletions.
  • 11:13 - 11:17
    Every time you have an insertion or a
    deletion into a red black tree, you get a
  • 11:17 - 11:19
    new node.
    Let's say, an insertion, you get a new
  • 11:19 - 11:23
    node, you have to color it something.
    And now, all of a sudden, you got to worry
  • 11:23 - 11:25
    about breaking one of these four
    invariants.
  • 11:25 - 11:29
    So, let me just show you some easy cases
    where you can accommodate insertions
  • 11:29 - 11:32
    without too much work.
    Time permitting we will include some
  • 11:32 - 11:36
    optional videos with the notion of
    rotations which do more fundamental
  • 11:36 - 11:41
    restructuring of search trees so that they
    can maintain the four invariants, and stay
  • 11:41 - 11:44
    nearly perfectly balanced.
    So, if we have this red black tree where
  • 11:44 - 11:48
    everything's black, and we insert, say,
    six, that's going to get inserted down
  • 11:48 - 11:51
    here.
    Now, if we try to color it black, it's no
  • 11:51 - 11:55
    longer going to be a red black tree.
    And that's because, if we do an
  • 11:55 - 12:00
    unsuccessful search now for, say, 5.5,
    we're going to encounter three black
  • 12:00 - 12:03
    nodes, where if we do an unsuccessful
    search for one, we only encounter two
  • 12:03 - 12:06
    black nodes.
    So, that's not going to work.
  • 12:06 - 12:10
    But the way we can fix it is instead of
    coloring the six black, we color it red.
  • 12:10 - 12:14
    And now, this six is basically invisible
    to invariant number four.
  • 12:14 - 12:16
    It doesn't show up in any root null
    paths.
  • 12:16 - 12:21
    So, because you have two black nodes in
    all roots in all paths before, before the
  • 12:21 - 12:24
    six was there, that's still true now that
    you have this red six.
  • 12:24 - 12:29
    So, all four invariants are satisfied once
    you insert the six and color it red.
  • 12:29 - 12:33
    If we then insert, say, an eight, we can
    pull exactly the same trick, we can call
  • 12:33 - 12:37
    it an eight red.
    Again, it doesn't participate in invariant
  • 12:37 - 12:42
    four at all so we haven't broken it.
    Moreover, we still don't have two reds in
  • 12:42 - 12:45
    a row, so we haven't broken invariant
    number three either.
  • 12:45 - 12:50
    So, this is yet another red black tree.
    In fact, this is not the unique way to
  • 12:50 - 12:54
    color the nodes of this search tree, so
    that it satisfies all four of the
  • 12:54 - 12:57
    invariants.
    If we, instead, recolor six and eight
  • 12:57 - 13:02
    black, but at the same time, recolor the
    node seven, red, we're also golden.
  • 13:02 - 13:05
    Clearly, the first three invariants are
    all satisfied.
  • 13:05 - 13:09
    But also, in pushing the red upward,
    consolidating the red at six and eight,
  • 13:09 - 13:13
    and putting it at seven instead, we
    haven't changed the number of black nodes
  • 13:13 - 13:16
    on any given path.
    Any black, any path that previously went
  • 13:16 - 13:20
    through six, went through seven, anything
    that went through eight, went through
  • 13:20 - 13:25
    seven so there's exactly the same number
    of red and black nodes on each such path
  • 13:25 - 13:28
    as there was before.
    So, all paths still have equal number of
  • 13:28 - 13:31
    black nodes and invariant four remains
    satisfied.
  • 13:31 - 13:37
    As I said, I've shown you here only simple
    examples, where you don't have to do much
  • 13:37 - 13:40
    work on an insertion to retain the red
    black properties.
  • 13:40 - 13:44
    In general, if you keep inserting more and
    more stuff and certainly if you do the
  • 13:44 - 13:49
    deletions, you have to work much harder to
    maintain those four invariants.
  • 13:49 - 13:53
    Time permitting, we'll cover just a taste
    of it in some optional videos.
  • 13:53 - 13:58
    So, what's the point of these seemingly
    arbitrary four invariants of a red black
  • 13:58 - 14:00
    tree?
    Well, the whole point is that if you
  • 14:00 - 14:05
    satisfy these four invariants in your
    search tree, then your height is going to
  • 14:05 - 14:08
    be small.
    And because your height's going to be
  • 14:08 - 14:10
    small, all your operations are going to be
    fast.
  • 14:10 - 14:16
    So, let me give you a proof that if a
    search tree satisfies the four invariants,
  • 14:16 - 14:20
    then it has super small height.
    In fact, no more than double the absolute
  • 14:20 - 14:24
    minimum that we conceivably have, almost
    two times log base two of N.
  • 14:24 - 14:31
    So, the formal claim, is that every
    red-black tree with N nodes, has height O
  • 14:31 - 14:37
    of log N, were precisely in those two
    times log base two of N + 1.
  • 14:37 - 14:42
    So, here's the proof.
    And what's clear about this proof is it's
  • 14:42 - 14:46
    very obvious the role played by this
    invariants three and four.
  • 14:46 - 14:52
    Essentially, what the invariants guarantee
    is that, a red black tree has to look like
  • 14:52 - 14:57
    a perfectly balanced tree with at most a
    sort of factor two inflation.
  • 14:57 - 15:01
    So, let's see exactly what I mean.
    So, let's begin with an observation.
  • 15:01 - 15:03
    And this, this has nothing to do with red
    black trees.
  • 15:03 - 15:08
    Forget about the colors for a moment, and
    just think about the structure of binary
  • 15:08 - 15:10
    trees.
    And let's suppose we have a lower bound on
  • 15:10 - 15:14
    how long root null paths are in the tree.
    So, for some parameter k, and go ahead and
  • 15:14 - 15:18
    think of k as, like, ten if you want.
    Suppose we have a tree where if you start
  • 15:18 - 15:22
    from the root, and no matter how it is you
    navigate left and right, child pointers
  • 15:22 - 15:26
    until you terminate in a null pointer.
    No matter how you do it, you have no
  • 15:26 - 15:29
    choice but to see at least k nodes along
    the way.
  • 15:29 - 15:34
    If that hypothesis is satisfied, then if
    you think about it, the top of this tree
  • 15:34 - 15:39
    has to be totally filled in.
    So, the top of this tree has to include a
  • 15:39 - 15:43
    perfectly balanced search tree, binary
    tree of depth k - 1.
  • 15:43 - 15:46
    So, let me draw a picture here of the case
    of k = three.
  • 15:46 - 15:50
    So, if no matter how you go from the root
    to a null pointer, you have to see at
  • 15:50 - 15:54
    least three nodes along the way.
    That means the top three levels of this
  • 15:54 - 15:57
    tree have to be full.
    So, you have to have the root.
  • 15:57 - 16:01
    It has to have both of its children.
    It has to have all four of its
  • 16:01 - 16:04
    grandchildren.
    The proof of this observation is by
  • 16:04 - 16:08
    contradiction.
    If, in fact, you were missing some nodes
  • 16:08 - 16:13
    in any of these top k levels.
    We'll that would give you a way of hitting
  • 16:13 - 16:18
    a null pointer seeing less then k nodes.
    So, what's the point is, the point is this
  • 16:18 - 16:23
    gives us a lower bound on the population
    of a search tree as a function of the
  • 16:23 - 16:28
    lengths of its root null paths.
    So, the size N of the tree must include at
  • 16:28 - 16:34
    least the number of nodes in a perfectly
    balanced tree of depth k - 1 which is
  • 16:34 - 16:40
    2^k - 1, So, for example,
    when k = 3, it's 2^3 (two cubed) - 1,
  • 16:40 - 16:45
    or 7 that's just a basic fact about trees,
  • 16:45 - 16:50
    nothing about red black trees.
    So, let's now combine that with a red
  • 16:50 - 16:55
    black tree invariant to see why red black
    trees have to have small height.
  • 16:55 - 16:59
    So again, to recap where we got to on the
    previous slide.
  • 16:59 - 17:04
    The size N, the number of nodes in a tree,
    is at least 2^k - 1, where k is
  • 17:04 - 17:09
    the fewest number of nodes you will ever
    see on a root null path.
  • 17:09 - 17:14
    So, let's rewrite this a little bit and
    let's actually say, instead of having a
  • 17:14 - 17:18
    lower bound on N in terms of k, let's have
    an upper bound on k in terms of N.
  • 17:18 - 17:23
    So, the length of every root null path,
    the minimum length of every root null
  • 17:23 - 17:27
    path is bounded above by log base two of
    quantity N + 1.
  • 17:27 - 17:31
    This is just adding one to both sides and
    taking the logarithm base two.
  • 17:31 - 17:35
    So, what does this buy us?
    Well, now, let's start thinking about red
  • 17:35 - 17:38
    black trees.
    So now, red black tree with N nodes.
  • 17:38 - 17:42
    What does this say?
    This says that the number of nodes, forget
  • 17:42 - 17:48
    about red or black, just the number of
    nodes on some root null path has to be the
  • 17:48 - 17:53
    most log base two of N + 1.
    In the best case, all of those are black.
  • 17:53 - 17:57
    Maybe some of them are red, but in the,
    in, the maximum case, all of them are
  • 17:57 - 18:00
    black.
    So, we can write in a red black tree with
  • 18:00 - 18:06
    N nodes, there is a root null path with at
    most log base two of N + 1, black nodes.
  • 18:06 - 18:11
    This is an even weaker statement than what
    we just proved.
  • 18:11 - 18:16
    We proved that it have some, somehow must
    have at most log based two, n + 1 total
  • 18:16 - 18:18
    nodes.
    So, certainly, that path has the most log
  • 18:18 - 18:22
    base two of N + 1 black nodes.
    Now, let's, now let's apply the two
  • 18:22 - 18:26
    knockout punches of our two invariants.
    Alright, so fundamentally, what is the
  • 18:26 - 18:29
    fourth invariant telling us?
    It's telling us that if we look at a path
  • 18:29 - 18:33
    in our red black tree, we go from the
    root, we think about, let's say, that's an
  • 18:33 - 18:35
    unsuccessful search, we go down to a null
    pointer.
  • 18:35 - 18:39
    It says, if we think of the red nodes as
    invisible, if we don't count them in our
  • 18:39 - 18:43
    tally, then we're only going to see log,
    basically a logarithmic number of nodes.
  • 18:43 - 18:47
    But when we care about the height of the
    red black tree, of course, we care about
  • 18:47 - 18:50
    all of the nodes, the red nodes and the
    black nodes.
  • 18:50 - 18:54
    So, so far we know, that if we only count
    black nodes then we're good, We only have
  • 18:54 - 18:57
    log base two of N + 1 nodes that we need
    to count.
  • 18:57 - 18:59
    So, here's where the third invariant comes
    in.
  • 18:59 - 19:04
    It says, well actually, black nodes are a
    majority of nodes in the tree.
  • 19:04 - 19:08
    In a strong sense, there are no two reds
    in a row, on any path.
  • 19:08 - 19:12
    So, if we know the number of black nodes
    is small, then because you can't have two
  • 19:12 - 19:17
    reds in a row, the number of total nodes
    on the path is at most twice as large.
  • 19:17 - 19:21
    In the worst case, you have a black route,
    then red, then black, then red, then
  • 19:21 - 19:26
    black, then red, then black, et cetera.
    At the worst case, the number of red nodes
  • 19:26 - 19:30
    is equal to the number of black nodes,
    which doubles the length of the path once
  • 19:30 - 19:35
    you start counting the red nodes as well.
    And this is exactly what it means for a
  • 19:35 - 19:39
    tree to have a logarithmic depth.
    So, this, in fact, proves the claim, if
  • 19:39 - 19:43
    the search trees satisfies the invariants
    one through four, in particular if there's
  • 19:43 - 19:47
    no two reds in a row and all root null
    paths have an equal number of black nodes,
  • 19:47 - 19:51
    then, knowing nothing else about this
    search tree, it's got to be almost
  • 19:51 - 19:54
    balanced.
    It's perfectly balanced up to a factor of
  • 19:54 - 19:56
    two.
    And again, the point then is that
  • 19:56 - 20:00
    operations in a search tree and the search
    trees are going to run in logarithmic
  • 20:00 - 20:04
    time, because the height is what governs
    the running time of those operations.
  • 20:04 - 20:09
    Now, in some sense, I've only told you the
    easy part which is if it just so happens
  • 20:09 - 20:12
    that your search tree satisfies these
    four invariants, then you're good.
  • 20:12 - 20:16
    The height is guaranteed to be small so
    the operations are guaranteed to be fast.
  • 20:16 - 20:19
    Clearly that's exactly what you want from
    this data structure.
  • 20:19 - 20:23
    But for the poor soul who has to actually
    implement this data structure, the hard
  • 20:23 - 20:27
    work is maintaining these invariants even
    as the data structure changes.
  • 20:27 - 20:31
    Remember, the point here is to be dynamic,
    to accommodate insertions and deletions.
  • 20:31 - 20:36
    And searches and deletions can disrupt
    these four invariants and then one has to
  • 20:36 - 20:40
    actually change the code to make sure
    they're satisfied again, so that the tree
  • 20:40 - 20:44
    stays balanced, has low height, even under
    arbitrary sequences of insertions and
  • 20:44 - 20:47
    deletions.
    So, we're not going to cover that in this
  • 20:47 - 20:49
    video.
    It can be done, without significantly
  • 20:49 - 20:53
    slowing down any of the operations.
    It's pretty tricky, takes some nice ideas.
  • 20:53 - 20:57
    There's a couple well-known algorithms
    textbooks that cover those details.
  • 20:57 - 21:00
    Or if you look at open source and
    limitations of balanced search trees, you
  • 21:00 - 21:03
    can look at code that does that
    implementations.
  • 21:03 - 21:07
    But, because it can be done in a practical
    way and because Red Black Tree supports
  • 21:07 - 21:12
    such an original array of operations,
    that's why you will find them used in a
  • 21:12 - 21:15
    number practical applications.
    That's why balanced search trees should be
  • 21:15 - 21:17
    part of your programmer tool box.
Title:
Red-Black Trees (21 min)
Video Language:
English
Retired user edited English subtitles for Red-Black Trees (21 min)
Retired user edited English subtitles for Red-Black Trees (21 min)
stanford-bot edited English subtitles for Red-Black Trees (21 min)
stanford-bot added a translation

English subtitles

Revisions