< Return to Video

07-04-multivalued-dependencies.mp4

  • 0:00 - 0:01
    This video continues the topic
  • 0:01 - 0:03
    of relational design, talking specifically
  • 0:03 - 0:07
    about multivalued dependencies and Fourth Normal Form.
  • 0:07 - 0:08
    I know I've reminded you many times
  • 0:08 - 0:10
    about relational designed by decomposition,
  • 0:10 - 0:12
    so let me do it very quickly.
  • 0:12 - 0:14
    The designer specifies mega relations
  • 0:14 - 0:15
    with all the information they
  • 0:15 - 0:18
    want to store and properties of the data.
  • 0:18 - 0:19
    The system decomposes the mega
  • 0:19 - 0:21
    relations into smaller relations that
  • 0:21 - 0:25
    have good properties -- no anomalies and no lost information.
  • 0:25 - 0:27
    When we have functional dependencies, as
  • 0:27 - 0:28
    are properties of the data, we
  • 0:28 - 0:30
    get Boyce-Codd Normal form, and
  • 0:30 - 0:32
    then when we add to the
  • 0:32 - 0:34
    functional dependencies multi-value dependencies
  • 0:34 - 0:36
    we get fourth normal form.
  • 0:36 - 0:38
    And the specification of multi-
  • 0:38 - 0:40
    value dependencies and decomposition into
  • 0:40 - 0:43
    Fourth Normal Form is the topic of this video.
  • 0:43 - 0:44
    As a reminder from earlier,
  • 0:44 - 0:46
    Fourth Normal Form is stronger
  • 0:46 - 0:48
    than Boyce-Codd Normal Form so
  • 0:48 - 0:50
    if we have here all of
  • 0:50 - 0:51
    the relations that are in Boyce-
  • 0:51 - 0:54
    Codd normal form, some subset
  • 0:54 - 0:59
    of those are also in fourth normal form.
  • 0:59 - 1:01
    When we have functional dependencies, we can guarantee Boyce Codd normal
  • 1:01 - 1:03
    form and then when we add
  • 1:03 - 1:04
    multi-value dependencies that's what
  • 1:04 - 1:06
    allows us to narrow down to
  • 1:06 - 1:08
    the stronger property of fourth normal form.
  • 1:08 - 1:10
    So let's start with an example.
  • 1:10 - 1:13
    We have information about students applying to colleges.
  • 1:13 - 1:15
    The student is identified by their social security number.
  • 1:15 - 1:17
    They may apply to several colleges
  • 1:17 - 1:18
    and in this video we're not going
  • 1:18 - 1:20
    to have college states, just college names.
  • 1:20 - 1:22
    We'll assume they're unique.
  • 1:22 - 1:24
    And then the student may have hobbies.
  • 1:24 - 1:25
    And they may apply, as I've said,
  • 1:25 - 1:26
    to several colleges and have several
  • 1:26 - 1:29
    hobbies, but let's assume for now those are independent.
  • 1:29 - 1:33
    So do we have any functional dependencies for this relation?
  • 1:33 - 1:35
    Actually we don't have any all.
  • 1:35 - 1:37
    The social security number does
  • 1:37 - 1:38
    not determine the college name
  • 1:38 - 1:41
    or the hobby or anything in the other direction.
  • 1:41 - 1:43
    With no functional dependencies, the
  • 1:43 - 1:44
    only key for the relation
  • 1:44 - 1:47
    then is all attributes of the relation.
  • 1:47 - 1:50
    So is this relation in Boyce-Codd Normal Form?
  • 1:50 - 1:51
    As you might remember from the
  • 1:51 - 1:52
    previous video, Boyce-Codd
  • 1:52 - 1:54
    Normal Form says all functional
  • 1:54 - 1:56
    dependencies have a key on the left hand side.
  • 1:56 - 1:59
    Well, since we have no functional dependencies then the answer is yes.
  • 1:59 - 2:02
    It is in Boyce-Codd normal form.
  • 2:02 - 2:04
    On the other hand do we think this is a good design?
  • 2:04 - 2:08
    I'm going to say no this is not a good design Why not?
  • 2:08 - 2:09
    Well, let's suppose that somebody
  • 2:09 - 2:12
    applies to five colleges
  • 2:12 - 2:14
    and they have, say, six hobbies.
  • 2:14 - 2:16
    Then to have all
  • 2:16 - 2:18
    combinations of colleges and
  • 2:18 - 2:19
    hobbies that would yield 30
  • 2:19 - 2:22
    tuples in the relation and clearly that's not a good idea.
  • 2:22 - 2:24
    We'd rather separate the college
  • 2:24 - 2:27
    and hobbies if they are independent.
  • 2:27 - 2:29
    So the separation of independent facts
  • 2:29 - 2:31
    is what fourth normal form is about.
  • 2:31 - 2:34
    And now let's get a little bit more formal.
  • 2:34 - 2:36
    Like functional dependencies, multivalued dependencies
  • 2:36 - 2:38
    are specified based on knowledge
  • 2:38 - 2:40
    of the real world constraints on
  • 2:40 - 2:42
    the data being captured and all
  • 2:42 - 2:43
    instances of a relation
  • 2:43 - 2:47
    with a multivalued dependency must adhere to the dependency.
  • 2:47 - 2:50
    Now let's define exactly what a multi value dependency is.
  • 2:50 - 2:52
    For relation R we write
  • 2:52 - 2:54
    a multi value dependency using
  • 2:54 - 2:56
    a double headed arrow and
  • 2:56 - 2:58
    we say 'A' multi determines 'B'.
  • 2:58 - 3:00
    In this case, again, with 'A'
  • 3:00 - 3:03
    and 'B' possibly being sets of attributes, so that would be A
  • 3:03 - 3:05
    one through A N
  • 3:05 - 3:06
    and B one through B
  • 3:06 - 3:10
    M, which I'm abbreviating with A bar and B bar.
  • 3:10 - 3:12
    So let me write the formal definition of A multi
  • 3:12 - 3:14
    determines B. Again using first
  • 3:14 - 3:15
    order logic similarly to what
  • 3:15 - 3:19
    we did with functional dependencies but this one's a bit more complicated.
  • 3:19 - 3:21
    It says for all tuples T and
  • 3:21 - 3:23
    U that are in relation
  • 3:23 - 3:25
    R, if T with
  • 3:25 - 3:28
    the attributes A of T equal U
  • 3:28 - 3:31
    for the attributes A of U. Again these are lists of attributes.
  • 3:31 - 3:33
    So if the two tuples agree on
  • 3:33 - 3:36
    their A values then, and
  • 3:36 - 3:37
    remember for functional dependencies it was
  • 3:37 - 3:40
    simple we just said they agreed on their B values.
  • 3:40 - 3:41
    But now it gets more complicated.
  • 3:41 - 3:44
    We're going to say that there exists
  • 3:44 - 3:48
    a third tuple V in R that has the following properties.
  • 3:48 - 3:50
    V has the same A
  • 3:50 - 3:51
    values as T and
  • 3:51 - 3:53
    U. So V sub A equals
  • 3:53 - 3:57
    T sub A, furthermore V
  • 3:57 - 3:59
    has its B value,
  • 3:59 - 4:03
    okay, drawn from T
  • 4:03 - 4:06
    so it's equal there.
  • 4:06 - 4:08
    And finally it has its
  • 4:08 - 4:10
    rest, so those are
  • 4:10 - 4:11
    all the attributes other than
  • 4:11 - 4:15
    A and B equal to U rest.
  • 4:15 - 4:19
    Okay, so that's a mouthful but let's look at that pictorially.
  • 4:19 - 4:21
    So here's our relation R and
  • 4:21 - 4:22
    we'll have the set of
  • 4:22 - 4:23
    attributes A, the set of
  • 4:23 - 4:26
    attributes B and the rest of the attributes.
  • 4:26 - 4:28
    And now let's make some tuples.
  • 4:28 - 4:29
    So let's say that this
  • 4:29 - 4:31
    is tuple T and this
  • 4:31 - 4:32
    is tuple U. And we
  • 4:32 - 4:33
    said that T and U
  • 4:33 - 4:35
    agree on their A values.
  • 4:35 - 4:37
    So they have the same A values
  • 4:37 - 4:39
    and then they don't have to have the same B values.
  • 4:39 - 4:41
    So we'll call the first one B-1
  • 4:41 - 4:43
    and the second one B-2 and
  • 4:43 - 4:45
    then for the rest we'll call this R-1 and R-2.
  • 4:45 - 4:48
    So what the multi -value dependency
  • 4:48 - 4:49
    says is that we have a
  • 4:49 - 4:51
    third tuple, V and
  • 4:51 - 4:53
    V again has the same
  • 4:53 - 4:55
    A and it has
  • 4:55 - 4:57
    its B value from tuple
  • 4:57 - 4:59
    T. So it has B-1,
  • 4:59 - 5:01
    but it has its rest value
  • 5:01 - 5:03
    from tuple U, so then
  • 5:03 - 5:05
    it has R-2 here.
  • 5:05 - 5:06
    So again what we're saying is
  • 5:06 - 5:08
    that if we have these first
  • 5:08 - 5:09
    two tuples T and U,
  • 5:09 - 5:11
    then we also have tuple
  • 5:11 - 5:13
    V. Now let me do something a little tricky.
  • 5:13 - 5:14
    Let me swap the roles of
  • 5:14 - 5:16
    T and U and show
  • 5:16 - 5:18
    that we also with this definition,
  • 5:18 - 5:19
    are guaranteed to have a
  • 5:19 - 5:21
    fourth tuple and we'll call
  • 5:21 - 5:23
    that fourth tuple W. By
  • 5:23 - 5:24
    swapping the roles of
  • 5:24 - 5:28
    T and U, W has again the same A value.
  • 5:28 - 5:29
    Now it will take its B
  • 5:29 - 5:30
    value from U and that
  • 5:30 - 5:32
    will give us B2, and
  • 5:32 - 5:35
    we'll take its rest
  • 5:35 - 5:37
    value from T and that gives us R1.
  • 5:37 - 5:40
    So what we can see
  • 5:40 - 5:41
    here is that when we
  • 5:41 - 5:43
    have the first two tuples
  • 5:43 - 5:45
    that have this particular combination of
  • 5:45 - 5:47
    B values and rest values,
  • 5:47 - 5:48
    it tells us we
  • 5:48 - 5:51
    must have the other combinations as well.
  • 5:51 - 5:54
    We must have B1 with R2,
  • 5:54 - 5:55
    and B2 with R1.
  • 5:55 - 5:56
    What it's really saying
  • 5:56 - 5:58
    is those B values and
  • 5:58 - 6:02
    the rest values are independent of each other and we'll have all combinations.
  • 6:02 - 6:03
    So that might get you thinking
  • 6:03 - 6:06
    back to our colleges and hobbies.
  • 6:06 - 6:09
    Incidentally, sometimes multi-value dependencies
  • 6:09 - 6:12
    are called tuple generating dependencies.
  • 6:12 - 6:14
    And that's because the definition is
  • 6:14 - 6:16
    is about having additional tuples
  • 6:16 - 6:18
    when you have some existing tuples,
  • 6:18 - 6:20
    unlike functional dependencies which
  • 6:20 - 6:23
    just talk about the relationships among existing tuples.
  • 6:23 - 6:25
    So let's go back to our example.
  • 6:25 - 6:28
    Now we have students applying to colleges and having hobbies.
  • 6:28 - 6:31
    Those are independent facts about the student.
  • 6:31 - 6:33
    We'll write our multi-value dependency
  • 6:33 - 6:36
    as 'social security number multi
  • 6:36 - 6:37
    determine C name' and
  • 6:37 - 6:39
    now lets use some example
  • 6:39 - 6:42
    data to see our definition and how it works here.
  • 6:42 - 6:44
    Here's our apply relation with the
  • 6:44 - 6:46
    social security number, the college
  • 6:46 - 6:48
    name and the hobby.
  • 6:48 - 6:51
    Let's suppose that we have
  • 6:51 - 6:53
    a student, 123 who's applied
  • 6:53 - 6:57
    to Stanford and plays the trumpet.
  • 6:57 - 6:59
    Now, let's suppose that same
  • 6:59 - 7:01
    student, 123, has applied
  • 7:01 - 7:04
    to Berkeley and plays tennis.
  • 7:04 - 7:07
    So what our multivalued dependency
  • 7:07 - 7:08
    says, and let's make this
  • 7:08 - 7:10
    tuple T and tuple U,
  • 7:10 - 7:12
    is that there's a further tuple
  • 7:12 - 7:14
    V. V takes the
  • 7:14 - 7:15
    same social security number and
  • 7:15 - 7:18
    it takes the first value
  • 7:18 - 7:20
    for the college name and
  • 7:20 - 7:22
    the second for the hobby.
  • 7:22 - 7:24
    It says if we have
  • 7:24 - 7:26
    a 123 playing trumpet at Stanford
  • 7:26 - 7:28
    and tennis at Berkeley, then that
  • 7:28 - 7:31
    same person will be playing tennis at Stanford.
  • 7:31 - 7:34
    Furthermore, I show that the same definition will generate automatically.
  • 7:34 - 7:36
    A fourth tuple with the
  • 7:36 - 7:39
    other combination which would be Berkley and Trumpet.
  • 7:39 - 7:40
    By the way one thing you
  • 7:40 - 7:41
    might notice here is that
  • 7:41 - 7:43
    we also have the multivalued
  • 7:43 - 7:47
    dependency, social security number multi determines hobby.
  • 7:47 - 7:48
    This is actually one of
  • 7:48 - 7:51
    the rules for multivalued dependency saying
  • 7:51 - 7:53
    that when you have A determines
  • 7:53 - 7:55
    B, then you, A multidetermines
  • 7:55 - 7:57
    B, then you also have A
  • 7:57 - 7:58
    multi determines rest and we'll
  • 7:58 - 8:01
    see some rules for multivalued dependencies later.
  • 8:01 - 8:03
    Let's look quickly at a
  • 8:03 - 8:04
    modification of our example where
  • 8:04 - 8:07
    the real world assumptions about the data are different.
  • 8:07 - 8:10
    So we still have exactly the same relation with the same attributes.
  • 8:10 - 8:12
    But let's suppose that we don't
  • 8:12 - 8:14
    want to reveal every hobby to every college.
  • 8:14 - 8:15
    Maybe we'll decide that we don't
  • 8:15 - 8:16
    want Stanford to know that
  • 8:16 - 8:18
    we're a surfer or Berkeley
  • 8:18 - 8:20
    to know that we're on the speech and debate team.
  • 8:20 - 8:22
    So if that's the case, then
  • 8:22 - 8:25
    what multivalued dependencies do we have in this relation?
  • 8:25 - 8:27
    We actually have none.
  • 8:27 - 8:29
    And we don't have any functional dependencies either by the way.
  • 8:29 - 8:31
    And is this a good design?
  • 8:31 - 8:33
    Well, actually I would argue yes.
  • 8:33 - 8:35
    In this case, this design
  • 8:35 - 8:36
    is a good one because
  • 8:36 - 8:38
    we're not going to have that
  • 8:38 - 8:40
    multiplicative effect of information.
  • 8:40 - 8:41
    Every tuple that we have
  • 8:41 - 8:43
    in the applied relation will
  • 8:43 - 8:46
    be an independent piece of important information.
  • 8:46 - 8:48
    Let's look at one more example
  • 8:48 - 8:51
    before we go on to talk about properties of multivalued dependencies.
  • 8:51 - 8:53
    I've extended the apply relation
  • 8:53 - 8:55
    now to not only include colleges
  • 8:55 - 8:56
    and hobbies but also the
  • 8:56 - 8:58
    date of application to a
  • 8:58 - 9:01
    college, and the major or majors that are being applied for.
  • 9:01 - 9:04
    Let's continue to assume that hobbies are revealed to college selectively.
  • 9:04 - 9:06
    We don't need to have same
  • 9:06 - 9:10
    hobbies for each college that a student applies to.
  • 9:10 - 9:12
    Secondly, lets assume that
  • 9:12 - 9:13
    we restrict students to apply
  • 9:13 - 9:15
    only once to each college,
  • 9:15 - 9:15
    but what I what we mean
  • 9:15 - 9:17
    by that is just on one day.
  • 9:17 - 9:19
    A student can still apply
  • 9:19 - 9:21
    to multiple majors at a
  • 9:21 - 9:24
    single college and to different majors at different colleges.
  • 9:24 - 9:26
    Let's also assume that majors
  • 9:26 - 9:29
    are independent of hobbies, which seems to make sense.
  • 9:29 - 9:30
    It takes some thinking to come
  • 9:30 - 9:32
    up with the right functional and
  • 9:32 - 9:34
    multivalued dependencies to capture these
  • 9:34 - 9:36
    constraints, but here they are.
  • 9:36 - 9:37
    The first one when we
  • 9:37 - 9:38
    say that we reveal hobbies to
  • 9:38 - 9:40
    college selectively is actually
  • 9:40 - 9:42
    the absence of a multivalued
  • 9:42 - 9:45
    dependency on hobbies and colleges.
  • 9:45 - 9:46
    The second one says as
  • 9:46 - 9:47
    we apply once to each
  • 9:47 - 9:49
    college, or on one particular
  • 9:49 - 9:51
    day to each college, so
  • 9:51 - 9:52
    that would say that when
  • 9:52 - 9:54
    we have a particular student
  • 9:54 - 9:57
    and a particular college that always
  • 9:57 - 9:59
    going to have the same date,
  • 9:59 - 10:00
    so any two tuples for
  • 10:00 - 10:04
    a student and college combination will be on the same date.
  • 10:04 - 10:06
    The last dependency that
  • 10:06 - 10:08
    we will have involves the independence
  • 10:08 - 10:09
    of the majors that are
  • 10:09 - 10:11
    being applied for and the
  • 10:11 - 10:13
    hobbies that a student has, so
  • 10:13 - 10:14
    we'll write that as the
  • 10:14 - 10:16
    multivalue dependency social security
  • 10:16 - 10:18
    number, plus college name,
  • 10:18 - 10:22
    plus date, multidetermines major,
  • 10:22 - 10:24
    and remember what that's saying
  • 10:24 - 10:25
    is that major, for a
  • 10:25 - 10:27
    given student, college, and
  • 10:27 - 10:29
    date the majors that
  • 10:29 - 10:31
    they apply for are independent
  • 10:31 - 10:33
    of what we call the rest,
  • 10:33 - 10:36
    which in this case is the hobbies.
  • 10:36 - 10:37
    So, you might take some time to
  • 10:37 - 10:38
    look at the formal definitions of
  • 10:38 - 10:41
    functional dependencies, multivalue dependencies, and
  • 10:41 - 10:42
    maybe write out some sample
  • 10:42 - 10:44
    data to convince yourself that
  • 10:44 - 10:45
    these are the dependencies that
  • 10:45 - 10:49
    are capturing the assumptions that we make about the real world.
  • 10:49 - 10:51
    Like with functional dependencies we
  • 10:51 - 10:53
    have a notion of trivial dependency
  • 10:53 - 10:55
    those that always hold we
  • 10:55 - 10:57
    also have some rule for multi valued dependencies.
  • 10:57 - 10:59
    The definition for a trivial
  • 10:59 - 11:01
    multi valued dependency A multi
  • 11:01 - 11:03
    determines B is in
  • 11:03 - 11:05
    this case, that either B
  • 11:05 - 11:08
    is a subset of A,
  • 11:08 - 11:10
    or A union B
  • 11:10 - 11:12
    are all attributes, a multi-value
  • 11:12 - 11:16
    dependency is non-trivial if that's not the case.
  • 11:16 - 11:17
    So let's take the look at
  • 11:17 - 11:20
    why these multi-value dependencies are trivial.
  • 11:20 - 11:22
    So let's start with the first
  • 11:22 - 11:24
    case where we have our
  • 11:24 - 11:26
    attributes A and the rest
  • 11:26 - 11:28
    and then attributes B are a
  • 11:28 - 11:29
    subset of A so lets
  • 11:29 - 11:31
    say that these are attributes B.
  • 11:31 - 11:33
    So what are definition of multi-valued
  • 11:33 - 11:34
    dependencies says that when we
  • 11:34 - 11:37
    have the same values for
  • 11:37 - 11:39
    A in two tuples, so
  • 11:39 - 11:40
    here A and A, then
  • 11:40 - 11:42
    we have every combination of the
  • 11:42 - 11:44
    B values and the rest,
  • 11:44 - 11:45
    well obviously we do since
  • 11:45 - 11:47
    the B's are subsets of the
  • 11:47 - 11:49
    A's here, the B values
  • 11:49 - 11:50
    are going to be the same as
  • 11:50 - 11:52
    well and we clearly have every combination.
  • 11:52 - 11:55
    For the other case of trivial multi-value dependencies.
  • 11:55 - 11:57
    We have A and B together
  • 11:57 - 11:58
    being all attributes of the
  • 11:58 - 11:59
    relation, so in that
  • 11:59 - 12:02
    case, there's no rest, so
  • 12:02 - 12:04
    clearly we have every combination
  • 12:04 - 12:06
    of values of A and
  • 12:06 - 12:09
    B and rest, because there's no rest to combine with.
  • 12:09 - 12:11
    Like with functional dependencies there are
  • 12:11 - 12:14
    a whole bunch of rules that hold for multi-valued dependencies.
  • 12:14 - 12:15
    We will just talk about three of
  • 12:15 - 12:16
    them, and the first one is
  • 12:16 - 12:18
    the most important and interesting,
  • 12:18 - 12:20
    and that's the rule that says
  • 12:20 - 12:21
    if we have a functional dependency
  • 12:21 - 12:23
    from A to B then we
  • 12:23 - 12:26
    also have a multi-valued dependency
  • 12:26 - 12:27
    from A to B. And I'm
  • 12:27 - 12:29
    gonna go ahead and prove that rule
  • 12:29 - 12:32
    for you again because this is an important one.
  • 12:32 - 12:33
    I'm going do this proof using a
  • 12:33 - 12:35
    template for the relation similar to
  • 12:35 - 12:37
    the what I did with rules for functional dependencies.
  • 12:37 - 12:38
    So let's say we have our
  • 12:38 - 12:40
    A attributes, our B attributes,
  • 12:40 - 12:42
    and our rest, and what
  • 12:42 - 12:43
    we need to prove, to prove
  • 12:43 - 12:46
    the multi-value dependencies, is when
  • 12:46 - 12:46
    there are tuples T and
  • 12:46 - 12:48
    U with a certain form,
  • 12:48 - 12:51
    there exists a tuple V of another form.
  • 12:51 - 12:54
    So let's fill in some values first for the tuples.
  • 12:54 - 12:55
    So Let's say that we
  • 12:55 - 12:56
    have A and A here,
  • 12:56 - 12:58
    that's what we need for the
  • 12:58 - 13:00
    equality of the A values.
  • 13:00 - 13:02
    Then we have B1 and R1,
  • 13:02 - 13:05
    and we have B2 and
  • 13:05 - 13:06
    R2, and in order
  • 13:06 - 13:09
    to prove this multi-value dependency,
  • 13:09 - 13:10
    I need to prove that there
  • 13:10 - 13:12
    exists a tuple V that has
  • 13:12 - 13:13
    the same A value that it
  • 13:13 - 13:16
    has B1 from tuple T
  • 13:16 - 13:18
    and R2 from Tuple U,
  • 13:18 - 13:20
    and what I have in order
  • 13:20 - 13:21
    to prove that is the fact
  • 13:21 - 13:23
    that we have a functional dependency from
  • 13:23 - 13:25
    A to B. Because we
  • 13:25 - 13:27
    have the functional dependencies and because
  • 13:27 - 13:29
    T and U have the same A value.
  • 13:29 - 13:30
    What that tells us is
  • 13:30 - 13:33
    that B1 equals B2 here.
  • 13:33 - 13:35
    And so if B1 equals B2
  • 13:35 - 13:37
    then we know that this value
  • 13:37 - 13:39
    B1 here is equivalent
  • 13:39 - 13:41
    to B2 and in order
  • 13:41 - 13:43
    to prove the existence of this
  • 13:43 - 13:44
    tuple well we have that tuple
  • 13:44 - 13:46
    here already and we're done.
  • 13:46 - 13:47
    So you might check that again,
  • 13:47 - 13:48
    but what that says is
  • 13:48 - 13:50
    using the knowledge of a
  • 13:50 - 13:52
    functional dependency we can prove
  • 13:52 - 13:53
    that we always have a corresponding
  • 13:53 - 13:56
    multivalued dependency there are
  • 13:56 - 13:57
    a couple more rules for
  • 13:57 - 13:58
    multivalued dependencies that you can
  • 13:58 - 14:01
    prove for yourself if you're so inclined.
  • 14:01 - 14:02
    The first one is the intersection
  • 14:02 - 14:03
    rule, it says that if
  • 14:03 - 14:06
    we have A multi determines
  • 14:06 - 14:08
    B and A multi determines
  • 14:08 - 14:10
    C, then we have
  • 14:10 - 14:13
    A multi determines B
  • 14:13 - 14:15
    intersects C. The transitive
  • 14:15 - 14:18
    rule is slightly different than from exact transitivity.
  • 14:18 - 14:19
    What it says is if
  • 14:19 - 14:22
    we have A multi determine B,
  • 14:22 - 14:24
    and we have B multi determines
  • 14:24 - 14:26
    C then we have
  • 14:26 - 14:28
    A multi determined not
  • 14:28 - 14:31
    C exactly but C minus
  • 14:31 - 14:32
    B.
    And you might work
  • 14:32 - 14:33
    some examples because it yourself
  • 14:33 - 14:35
    why we don't have just
  • 14:35 - 14:36
    A multi determines B and
  • 14:36 - 14:40
    to subtract the attributes for B, although it's fairly complicated.
  • 14:40 - 14:41
    So again these rules can
  • 14:41 - 14:42
    be proven and there are many
  • 14:42 - 14:44
    other rules of multivalued dependencies
  • 14:44 - 14:48
    that you can read about in any of the readings provided on our website.
  • 14:48 - 14:50
    By the way, regarding rules, let's
  • 14:50 - 14:51
    come back to the fact
  • 14:51 - 14:54
    that every functional dependency is a multivalued dependency.
  • 14:54 - 14:55
    So we can use another Venn diagram.
  • 14:55 - 14:57
    This is different than our previous one.
  • 14:57 - 14:59
    We can list all of our
  • 14:59 - 15:01
    multivalued dependencies here and the
  • 15:01 - 15:02
    functional dependencies are a
  • 15:02 - 15:04
    subset of those, so what
  • 15:04 - 15:06
    that tells us is if we
  • 15:06 - 15:07
    ever have a rule that applies
  • 15:07 - 15:10
    for multivalued dependencies here, that
  • 15:10 - 15:12
    will cover the entire Ven diagram
  • 15:12 - 15:15
    and so that rule will apply for functional dependencies as well.
  • 15:15 - 15:16
    So every rule for MVDs is
  • 15:16 - 15:19
    also a rule for functional dependencies.
  • 15:19 - 15:21
    On the other hand if we
  • 15:21 - 15:22
    have a rule that applies
  • 15:22 - 15:24
    for functional dependencies that rule
  • 15:24 - 15:26
    does not necessarily have to
  • 15:26 - 15:29
    apply all multivalued dependencies because
  • 15:29 - 15:32
    it might be specialized just for this portion of the Venn diagram.
  • 15:32 - 15:35
    So an example of such a rule is the splitting rule.
  • 15:35 - 15:36
    The splitting rule is a
  • 15:36 - 15:38
    rule that applies to functional dependencies,
  • 15:38 - 15:41
    but does not always apply to multivalued dependencies.
  • 15:41 - 15:45
    And again you could work an example to convince yourself of that fact.
  • 15:45 - 15:45
    Woo.
  • 15:45 - 15:47
    So after all that set up
  • 15:47 - 15:48
    of multivalue dependencies, we're finally
  • 15:48 - 15:51
    ready to talk about fourth normal form.
  • 15:51 - 15:52
    The definition of fourth normal
  • 15:52 - 15:55
    form looks very similar to the one for Boyce-Codd normal form.
  • 15:55 - 15:56
    Says we take a relation and
  • 15:56 - 15:57
    we take now a set of
  • 15:57 - 15:59
    multivalued dependencies for that
  • 15:59 - 16:01
    relation and the relation
  • 16:01 - 16:03
    is in fourth normal form if
  • 16:03 - 16:04
    every non-trivial
  • 16:04 - 16:07
    multivalued dependency has on
  • 16:07 - 16:09
    it's left hand side a key.
  • 16:09 - 16:10
    Remember for functional dependencies it
  • 16:10 - 16:12
    looks exactly the same except
  • 16:12 - 16:16
    we have the functional dependency all here instead of multivalued dependencies.
  • 16:16 - 16:17
    So, let's see exactly what fourth
  • 16:17 - 16:20
    normal form telling us and why it's a good thing.
  • 16:20 - 16:21
    So we have A, B, and
  • 16:21 - 16:23
    rest as usual and let's
  • 16:23 - 16:27
    suppose that we have a non trivial multivalued dependency.
  • 16:27 - 16:28
    So that's telling us that
  • 16:28 - 16:30
    if we have 2 tuples, T
  • 16:30 - 16:32
    and U and we'll
  • 16:32 - 16:33
    put in some values for B
  • 16:33 - 16:35
    and the rest, then we're going
  • 16:35 - 16:37
    to have the combination of those, as well.
  • 16:37 - 16:39
    So, that's kind of the proliferation of
  • 16:39 - 16:40
    tuples we get when we
  • 16:40 - 16:43
    squish independent facts in the same relation.
  • 16:43 - 16:44
    But, if the left
  • 16:44 - 16:47
    side is a key, so if
  • 16:47 - 16:48
    the A attributes are
  • 16:48 - 16:50
    a key here then we won't have
  • 16:50 - 16:51
    those 2 tuples and will
  • 16:51 - 16:54
    never have to worry about the proliferation.
  • 16:54 - 16:55
    Now, remember that I said fourth
  • 16:55 - 16:58
    normal form implies Boyce-Codd Normal Form.
  • 16:58 - 16:59
    Or if you prefer it in
  • 16:59 - 17:02
    Venn diagram format, Fourth
  • 17:02 - 17:04
    Normal Form is stronger than
  • 17:04 - 17:07
    Boyce-Codd Normal Form and let's see why that's the case.
  • 17:07 - 17:09
    If we have a fourth
  • 17:09 - 17:10
    normal form and we want
  • 17:10 - 17:11
    to show that we're in Boyce-Codd normal
  • 17:11 - 17:13
    form, then we have to
  • 17:13 - 17:14
    show that if we have
  • 17:14 - 17:17
    a functional dependency then
  • 17:17 - 17:19
    the left hand side A is a key.
  • 17:19 - 17:21
    That would tell us we're in Boyce-Codd normal form.
  • 17:21 - 17:22
    Well, if we have a functional
  • 17:22 - 17:24
    dependency, we had a rule
  • 17:24 - 17:25
    that tells us we also have
  • 17:25 - 17:27
    the multivalued dependency and then
  • 17:27 - 17:30
    since we're in fourth normal form, we get that A as a key.
  • 17:30 - 17:32
    So again, fourth normal form
  • 17:32 - 17:35
    implies Boyce-Codd normal form.
  • 17:35 - 17:35
    Now let's take a look at
  • 17:35 - 17:38
    the decomposition of algorithm into fourth normal form.
  • 17:38 - 17:40
    It's extremely similar to the
  • 17:40 - 17:42
    BCNF decomposition algorithm.
  • 17:42 - 17:43
    The input is a relation.
  • 17:43 - 17:45
    A set of functional dependencies
  • 17:45 - 17:46
    and multi value dependencies and we
  • 17:46 - 17:48
    need to separate them because
  • 17:48 - 17:50
    we use the functional dependencies to find keys.
  • 17:50 - 17:52
    The output is a decomposition
  • 17:52 - 17:54
    of R into good relations,
  • 17:54 - 17:56
    in this case fourth normal form,
  • 17:56 - 17:58
    and it's a good decomposition in
  • 17:58 - 18:02
    the sense that reassembling the relations gives you back the original.
  • 18:02 - 18:03
    As with Boyce-Codd normal form
  • 18:03 - 18:05
    we start by computing keys using
  • 18:05 - 18:07
    the functional dependencies, and then
  • 18:07 - 18:08
    we repeat the decomposition process
  • 18:08 - 18:11
    until all of our relations are in fourth normal
  • 18:11 - 18:12
    form.
  • 18:12 - 18:14
    Just as with functional dependencies
  • 18:14 - 18:15
    in BCNF, we pick a relation
  • 18:15 - 18:17
    that has a violating dependency, in
  • 18:17 - 18:19
    this case a multi-value dependency, and
  • 18:19 - 18:22
    we split the relation based on that dependency.
  • 18:22 - 18:24
    So we create one relation that
  • 18:24 - 18:25
    has the attributes of the dependency
  • 18:25 - 18:27
    and another relation that has
  • 18:27 - 18:30
    the left-hand side of the dependency and the rest of the attributes.
  • 18:30 - 18:31
    After that, we need to
  • 18:31 - 18:33
    compute the functional dependencies for
  • 18:33 - 18:35
    the decomposed relation and the
  • 18:35 - 18:39
    multi-value dependencies for it, and then we can compute the keys.
  • 18:39 - 18:41
    Now finding these multi-value
  • 18:41 - 18:45
    dependencies is actually a fairly complex process.
  • 18:45 - 18:47
    Usually it's very intuitive,
  • 18:47 - 18:48
    but I'm going to refer you
  • 18:48 - 18:50
    to the readings to read about the algorithm itself.
  • 18:50 - 18:51
    And in fact, it can be
  • 18:51 - 18:53
    so complicated in the general
  • 18:53 - 18:56
    case that some of the readings don't even provide the algorithm.
  • 18:56 - 18:59
    But again, in general, it's very intuitive.
  • 18:59 - 19:01
    Our first example is going to be very fast to do.
  • 19:01 - 19:03
    As you remember, this example has
  • 19:03 - 19:05
    one multi-value dependency - social
  • 19:05 - 19:07
    security number determines college name,
  • 19:07 - 19:09
    multi determines college name -
  • 19:09 - 19:12
    and it has no keys other than all of the attributes.
  • 19:12 - 19:13
    So obviously, this is a
  • 19:13 - 19:16
    violating multi value dependency,
  • 19:16 - 19:18
    and so we decompose into two
  • 19:18 - 19:21
    relations, we'll call them A1 and A2.
  • 19:21 - 19:23
    The first one has the attributes
  • 19:23 - 19:24
    of the multivalue dependency, the
  • 19:24 - 19:26
    social security number and
  • 19:26 - 19:28
    the college name, and the second
  • 19:28 - 19:29
    one has the left hand
  • 19:29 - 19:31
    side multivalued dependency as well
  • 19:31 - 19:35
    as all the remaining attributes, which in this case is the hobby.
  • 19:35 - 19:37
    These two decomposed relations actually
  • 19:37 - 19:39
    have no FDs and no
  • 19:39 - 19:41
    MVDs so in that
  • 19:41 - 19:42
    case we're definitely in 4th
  • 19:42 - 19:44
    normal form and we're done
  • 19:44 - 19:46
    with the decomposition and I think
  • 19:46 - 19:47
    we can agree that this looks like
  • 19:47 - 19:49
    a good schema for the data at hand.
  • 19:49 - 19:52
    Our second example is quite a bit more complicated.
  • 19:52 - 19:54
    Remember in this example we
  • 19:54 - 19:55
    have that the social
  • 19:55 - 19:57
    security number and college name
  • 19:57 - 19:59
    functionally determine date.
  • 19:59 - 20:01
    That means we have each student
  • 20:01 - 20:03
    applies to each college on a specific date.
  • 20:03 - 20:05
    And secondly, we assume that
  • 20:05 - 20:08
    majors that were being applied for were independent of hobbies.
  • 20:08 - 20:10
    So we have social security number,
  • 20:10 - 20:13
    college name and date
  • 20:13 - 20:15
    multi determines the major.
  • 20:15 - 20:19
    And incidentally that would mean it multi determines the hobby too.
  • 20:19 - 20:21
    Once again, we have no keys
  • 20:21 - 20:24
    for the relation, except for all attributes.
  • 20:24 - 20:26
    So we have both a violating
  • 20:26 - 20:28
    functional dependency in this case
  • 20:28 - 20:31
    and we have a violating multivalue dependency.
  • 20:31 - 20:33
    Let's use the multivalue dependency for
  • 20:33 - 20:35
    our first decomposition step.
  • 20:35 - 20:38
    So we'll create A1 and A2.
  • 20:38 - 20:43
    A1 will contain all the attributes of our multivalued dependency.
  • 20:43 - 20:45
    And then A2 will contain
  • 20:45 - 20:47
    all the remaining attributes along with
  • 20:47 - 20:51
    the left hand side of our multivalue dependency.
  • 20:51 - 20:55
    And that turns out to be all of the attributes except the major.
  • 20:55 - 20:56
    Now let's look at our
  • 20:56 - 20:57
    decomposed relations and see
  • 20:57 - 20:58
    what we have in terms of
  • 20:58 - 21:02
    functional dependencies and multi-value dependencies for them.
  • 21:02 - 21:03
    So after the decomposition, we don't
  • 21:03 - 21:05
    have any more multivalued dependency but
  • 21:05 - 21:07
    our functional dependency actually applies
  • 21:07 - 21:09
    to both of the decomposed
  • 21:09 - 21:11
    relations and we still
  • 21:11 - 21:13
    don't have a key on the left hand side.
  • 21:13 - 21:17
    So we need to decompose further based on the first functional dependency.
  • 21:17 - 21:19
    So let's start by decomposing A1.
  • 21:19 - 21:22
    We'll turn A1 into A3
  • 21:22 - 21:24
    and A4, and A3 will have
  • 21:24 - 21:28
    the functional dependency, all three attributes.
  • 21:28 - 21:29
    And then A4 will have the
  • 21:29 - 21:30
    left side of the functional
  • 21:30 - 21:32
    dependency and the remaining
  • 21:32 - 21:35
    attributes, which in this case is the major.
  • 21:35 - 21:37
    So now we're finished with A1
  • 21:37 - 21:39
    and we have a similar problem with A2.
  • 21:39 - 21:41
    And so we decompose A2
  • 21:41 - 21:43
    similarly, although we'll discover
  • 21:43 - 21:45
    that A3 is the same relation in
  • 21:45 - 21:48
    the decomposition of A2 as we got with A1.
  • 21:48 - 21:50
    So we actually only add one
  • 21:50 - 21:52
    more relation now, which is A5.
  • 21:52 - 21:54
    That contains the social
  • 21:54 - 21:55
    security number and the college
  • 21:55 - 21:56
    name from the left side of
  • 21:56 - 21:58
    the functional dependency and the
  • 21:58 - 22:00
    hobby, which is the remaining attribute.
  • 22:00 - 22:04
    And then we cross out A2.
  • 22:04 - 22:06
    Now the only functional dependencies are multi-value dependencies we have
  • 22:06 - 22:08
    left do have a key on the left-hand side.
  • 22:08 - 22:10
    I'll let you verify that for yourself.
  • 22:10 - 22:11
    And these three relations are our
  • 22:11 - 22:14
    final decomposition into 4th normal form.
  • 22:14 - 22:15
    And I think you will agree
  • 22:15 - 22:17
    that this is a good design
  • 22:17 - 22:19
    again for the data at hand.
  • 22:19 - 22:21
    So let's wrap up this long
  • 22:21 - 22:24
    unit on on dependencies and normal forms with a quick summary.
  • 22:24 - 22:27
    If we have a relation RABC, a
  • 22:27 - 22:29
    functional dependency from A
  • 22:29 - 22:30
    to B tells us that
  • 22:30 - 22:32
    when we have the same A values,
  • 22:32 - 22:34
    we have the same B values,
  • 22:34 - 22:36
    and the Boyce-Codd normal form tells
  • 22:36 - 22:38
    us to factor that...those attributes
  • 22:38 - 22:40
    into their own relation so that
  • 22:40 - 22:43
    we don't repeat that relationship over and over.
  • 22:43 - 22:45
    For multi-value dependencies, let's say
  • 22:45 - 22:47
    we have the relation RABCD,
  • 22:47 - 22:49
    and if we have
  • 22:49 - 22:51
    the multi-value dependency A multi
  • 22:51 - 22:53
    determines B, what that tells
  • 22:53 - 22:55
    us is that we have every combination
  • 22:55 - 22:57
    for a given A of B
  • 22:57 - 22:59
    values and CD values
  • 22:59 - 23:01
    - we called those rest earlier -
  • 23:01 - 23:03
    and when we have that multiplicative effect
  • 23:03 - 23:05
    of combinations, again, we take
  • 23:05 - 23:07
    the A and B attributes and
  • 23:07 - 23:08
    we put them in a separate
  • 23:08 - 23:09
    relation so that we can
  • 23:09 - 23:10
    separate out those facts from
  • 23:10 - 23:15
    the independent fact of A and its CD values.
  • 23:15 - 23:17
    Finally, in the design process,
  • 23:17 - 23:19
    multi-value dependencies are something
  • 23:19 - 23:22
    we add to functional dependencies, only they're stronger.
  • 23:22 - 23:23
    So fourth normal form is
  • 23:23 - 23:27
    a stronger property than Boyce-Codd normal form.
  • 23:27 - 23:29
    Now usually this design process
  • 23:29 - 23:30
    works very well and is
  • 23:30 - 23:34
    very intuitive for many schemas, I hope for the examples that I gave here.
  • 23:34 - 23:35
    But there are actually a few
  • 23:35 - 23:37
    shortcomings to using Boyce-Codd
  • 23:37 - 23:38
    Normal Form or Fourth Normal Form
  • 23:38 -
    and we'll cover those in the next video.
Title:
07-04-multivalued-dependencies.mp4
Video Language:
English
Duration:
23:42
Amara Bot edited English subtitles for 07-04-multivalued-dependencies.mp4
Amara Bot added a translation

English subtitles

Revisions