07-04-multivalued-dependencies.mp4

Edit subtitles

0:00 - 0:01

This video continues the topic
0:01 - 0:03

of relational design, talking specifically
0:03 - 0:07

about multivalued dependencies and Fourth Normal Form.
0:07 - 0:08

I know I've reminded you many times
0:08 - 0:10

about relational designed by decomposition,
0:10 - 0:12

so let me do it very quickly.
0:12 - 0:14

The designer specifies mega relations
0:14 - 0:15

with all the information they
0:15 - 0:18

want to store and properties of the data.
0:18 - 0:19

The system decomposes the mega
0:19 - 0:21

relations into smaller relations that
0:21 - 0:25

have good properties -- no anomalies and no lost information.
0:25 - 0:27

When we have functional dependencies, as
0:27 - 0:28

are properties of the data, we
0:28 - 0:30

get Boyce-Codd Normal form, and
0:30 - 0:32

then when we add to the
0:32 - 0:34

functional dependencies multi-value dependencies
0:34 - 0:36

we get fourth normal form.
0:36 - 0:38

And the specification of multi-
0:38 - 0:40

value dependencies and decomposition into
0:40 - 0:43

Fourth Normal Form is the topic of this video.
0:43 - 0:44

As a reminder from earlier,
0:44 - 0:46

Fourth Normal Form is stronger
0:46 - 0:48

than Boyce-Codd Normal Form so
0:48 - 0:50

if we have here all of
0:50 - 0:51

the relations that are in Boyce-
0:51 - 0:54

Codd normal form, some subset
0:54 - 0:59

of those are also in fourth normal form.
0:59 - 1:01

When we have functional dependencies, we can guarantee Boyce Codd normal
1:01 - 1:03

form and then when we add
1:03 - 1:04

multi-value dependencies that's what
1:04 - 1:06

allows us to narrow down to
1:06 - 1:08

the stronger property of fourth normal form.
1:08 - 1:10

So let's start with an example.
1:10 - 1:13

We have information about students applying to colleges.
1:13 - 1:15

The student is identified by their social security number.
1:15 - 1:17

They may apply to several colleges
1:17 - 1:18

and in this video we're not going
1:18 - 1:20

to have college states, just college names.
1:20 - 1:22

We'll assume they're unique.
1:22 - 1:24

And then the student may have hobbies.
1:24 - 1:25

And they may apply, as I've said,
1:25 - 1:26

to several colleges and have several
1:26 - 1:29

hobbies, but let's assume for now those are independent.
1:29 - 1:33

So do we have any functional dependencies for this relation?
1:33 - 1:35

Actually we don't have any all.
1:35 - 1:37

The social security number does
1:37 - 1:38

not determine the college name
1:38 - 1:41

or the hobby or anything in the other direction.
1:41 - 1:43

With no functional dependencies, the
1:43 - 1:44

only key for the relation
1:44 - 1:47

then is all attributes of the relation.
1:47 - 1:50

So is this relation in Boyce-Codd Normal Form?
1:50 - 1:51

As you might remember from the
1:51 - 1:52

previous video, Boyce-Codd
1:52 - 1:54

Normal Form says all functional
1:54 - 1:56

dependencies have a key on the left hand side.
1:56 - 1:59

Well, since we have no functional dependencies then the answer is yes.
1:59 - 2:02

It is in Boyce-Codd normal form.
2:02 - 2:04

On the other hand do we think this is a good design?
2:04 - 2:08

I'm going to say no this is not a good design Why not?
2:08 - 2:09

Well, let's suppose that somebody
2:09 - 2:12

applies to five colleges
2:12 - 2:14

and they have, say, six hobbies.
2:14 - 2:16

Then to have all
2:16 - 2:18

combinations of colleges and
2:18 - 2:19

hobbies that would yield 30
2:19 - 2:22

tuples in the relation and clearly that's not a good idea.
2:22 - 2:24

We'd rather separate the college
2:24 - 2:27

and hobbies if they are independent.
2:27 - 2:29

So the separation of independent facts
2:29 - 2:31

is what fourth normal form is about.
2:31 - 2:34

And now let's get a little bit more formal.
2:34 - 2:36

Like functional dependencies, multivalued dependencies
2:36 - 2:38

are specified based on knowledge
2:38 - 2:40

of the real world constraints on
2:40 - 2:42

the data being captured and all
2:42 - 2:43

instances of a relation
2:43 - 2:47

with a multivalued dependency must adhere to the dependency.
2:47 - 2:50

Now let's define exactly what a multi value dependency is.
2:50 - 2:52

For relation R we write
2:52 - 2:54

a multi value dependency using
2:54 - 2:56

a double headed arrow and
2:56 - 2:58

we say 'A' multi determines 'B'.
2:58 - 3:00

In this case, again, with 'A'
3:00 - 3:03

and 'B' possibly being sets of attributes, so that would be A
3:03 - 3:05

one through A N
3:05 - 3:06

and B one through B
3:06 - 3:10

M, which I'm abbreviating with A bar and B bar.
3:10 - 3:12

So let me write the formal definition of A multi
3:12 - 3:14

determines B. Again using first
3:14 - 3:15

order logic similarly to what
3:15 - 3:19

we did with functional dependencies but this one's a bit more complicated.
3:19 - 3:21

It says for all tuples T and
3:21 - 3:23

U that are in relation
3:23 - 3:25

R, if T with
3:25 - 3:28

the attributes A of T equal U
3:28 - 3:31

for the attributes A of U. Again these are lists of attributes.
3:31 - 3:33

So if the two tuples agree on
3:33 - 3:36

their A values then, and
3:36 - 3:37

remember for functional dependencies it was
3:37 - 3:40

simple we just said they agreed on their B values.
3:40 - 3:41

But now it gets more complicated.
3:41 - 3:44

We're going to say that there exists
3:44 - 3:48

a third tuple V in R that has the following properties.
3:48 - 3:50

V has the same A
3:50 - 3:51

values as T and
3:51 - 3:53

U. So V sub A equals
3:53 - 3:57

T sub A, furthermore V
3:57 - 3:59

has its B value,
3:59 - 4:03

okay, drawn from T
4:03 - 4:06

so it's equal there.
4:06 - 4:08

And finally it has its
4:08 - 4:10

rest, so those are
4:10 - 4:11

all the attributes other than
4:11 - 4:15

A and B equal to U rest.
4:15 - 4:19

Okay, so that's a mouthful but let's look at that pictorially.
4:19 - 4:21

So here's our relation R and
4:21 - 4:22

we'll have the set of
4:22 - 4:23

attributes A, the set of
4:23 - 4:26

attributes B and the rest of the attributes.
4:26 - 4:28

And now let's make some tuples.
4:28 - 4:29

So let's say that this
4:29 - 4:31

is tuple T and this
4:31 - 4:32

is tuple U. And we
4:32 - 4:33

said that T and U
4:33 - 4:35

agree on their A values.
4:35 - 4:37

So they have the same A values
4:37 - 4:39

and then they don't have to have the same B values.
4:39 - 4:41

So we'll call the first one B-1
4:41 - 4:43

and the second one B-2 and
4:43 - 4:45

then for the rest we'll call this R-1 and R-2.
4:45 - 4:48

So what the multi -value dependency
4:48 - 4:49

says is that we have a
4:49 - 4:51

third tuple, V and
4:51 - 4:53

V again has the same
4:53 - 4:55

A and it has
4:55 - 4:57

its B value from tuple
4:57 - 4:59

T. So it has B-1,
4:59 - 5:01

but it has its rest value
5:01 - 5:03

from tuple U, so then
5:03 - 5:05

it has R-2 here.
5:05 - 5:06

So again what we're saying is
5:06 - 5:08

that if we have these first
5:08 - 5:09

two tuples T and U,
5:09 - 5:11

then we also have tuple
5:11 - 5:13

V. Now let me do something a little tricky.
5:13 - 5:14

Let me swap the roles of
5:14 - 5:16

T and U and show
5:16 - 5:18

that we also with this definition,
5:18 - 5:19

are guaranteed to have a
5:19 - 5:21

fourth tuple and we'll call
5:21 - 5:23

that fourth tuple W. By
5:23 - 5:24

swapping the roles of
5:24 - 5:28

T and U, W has again the same A value.
5:28 - 5:29

Now it will take its B
5:29 - 5:30

value from U and that
5:30 - 5:32

will give us B2, and
5:32 - 5:35

we'll take its rest
5:35 - 5:37

value from T and that gives us R1.
5:37 - 5:40

So what we can see
5:40 - 5:41

here is that when we
5:41 - 5:43

have the first two tuples
5:43 - 5:45

that have this particular combination of
5:45 - 5:47

B values and rest values,
5:47 - 5:48

it tells us we
5:48 - 5:51

must have the other combinations as well.
5:51 - 5:54

We must have B1 with R2,
5:54 - 5:55

and B2 with R1.
5:55 - 5:56

What it's really saying
5:56 - 5:58

is those B values and
5:58 - 6:02

the rest values are independent of each other and we'll have all combinations.
6:02 - 6:03

So that might get you thinking
6:03 - 6:06

back to our colleges and hobbies.
6:06 - 6:09

Incidentally, sometimes multi-value dependencies
6:09 - 6:12

are called tuple generating dependencies.
6:12 - 6:14

And that's because the definition is
6:14 - 6:16

is about having additional tuples
6:16 - 6:18

when you have some existing tuples,
6:18 - 6:20

unlike functional dependencies which
6:20 - 6:23

just talk about the relationships among existing tuples.
6:23 - 6:25

So let's go back to our example.
6:25 - 6:28

Now we have students applying to colleges and having hobbies.
6:28 - 6:31

Those are independent facts about the student.
6:31 - 6:33

We'll write our multi-value dependency
6:33 - 6:36

as 'social security number multi
6:36 - 6:37

determine C name' and
6:37 - 6:39

now lets use some example
6:39 - 6:42

data to see our definition and how it works here.
6:42 - 6:44

Here's our apply relation with the
6:44 - 6:46

social security number, the college
6:46 - 6:48

name and the hobby.
6:48 - 6:51

Let's suppose that we have
6:51 - 6:53

a student, 123 who's applied
6:53 - 6:57

to Stanford and plays the trumpet.
6:57 - 6:59

Now, let's suppose that same
6:59 - 7:01

student, 123, has applied
7:01 - 7:04

to Berkeley and plays tennis.
7:04 - 7:07

So what our multivalued dependency
7:07 - 7:08

says, and let's make this
7:08 - 7:10

tuple T and tuple U,
7:10 - 7:12

is that there's a further tuple
7:12 - 7:14

V. V takes the
7:14 - 7:15

same social security number and
7:15 - 7:18

it takes the first value
7:18 - 7:20

for the college name and
7:20 - 7:22

the second for the hobby.
7:22 - 7:24

It says if we have
7:24 - 7:26

a 123 playing trumpet at Stanford
7:26 - 7:28

and tennis at Berkeley, then that
7:28 - 7:31

same person will be playing tennis at Stanford.
7:31 - 7:34

Furthermore, I show that the same definition will generate automatically.
7:34 - 7:36

A fourth tuple with the
7:36 - 7:39

other combination which would be Berkley and Trumpet.
7:39 - 7:40

By the way one thing you
7:40 - 7:41

might notice here is that
7:41 - 7:43

we also have the multivalued
7:43 - 7:47

dependency, social security number multi determines hobby.
7:47 - 7:48

This is actually one of
7:48 - 7:51

the rules for multivalued dependency saying
7:51 - 7:53

that when you have A determines
7:53 - 7:55

B, then you, A multidetermines
7:55 - 7:57

B, then you also have A
7:57 - 7:58

multi determines rest and we'll
7:58 - 8:01

see some rules for multivalued dependencies later.
8:01 - 8:03

Let's look quickly at a
8:03 - 8:04

modification of our example where
8:04 - 8:07

the real world assumptions about the data are different.
8:07 - 8:10

So we still have exactly the same relation with the same attributes.
8:10 - 8:12

But let's suppose that we don't
8:12 - 8:14

want to reveal every hobby to every college.
8:14 - 8:15

Maybe we'll decide that we don't
8:15 - 8:16

want Stanford to know that
8:16 - 8:18

we're a surfer or Berkeley
8:18 - 8:20

to know that we're on the speech and debate team.
8:20 - 8:22

So if that's the case, then
8:22 - 8:25

what multivalued dependencies do we have in this relation?
8:25 - 8:27

We actually have none.
8:27 - 8:29

And we don't have any functional dependencies either by the way.
8:29 - 8:31

And is this a good design?
8:31 - 8:33

Well, actually I would argue yes.
8:33 - 8:35

In this case, this design
8:35 - 8:36

is a good one because
8:36 - 8:38

we're not going to have that
8:38 - 8:40

multiplicative effect of information.
8:40 - 8:41

Every tuple that we have
8:41 - 8:43

in the applied relation will
8:43 - 8:46

be an independent piece of important information.
8:46 - 8:48

Let's look at one more example
8:48 - 8:51

before we go on to talk about properties of multivalued dependencies.
8:51 - 8:53

I've extended the apply relation
8:53 - 8:55

now to not only include colleges
8:55 - 8:56

and hobbies but also the
8:56 - 8:58

date of application to a
8:58 - 9:01

college, and the major or majors that are being applied for.
9:01 - 9:04

Let's continue to assume that hobbies are revealed to college selectively.
9:04 - 9:06

We don't need to have same
9:06 - 9:10

hobbies for each college that a student applies to.
9:10 - 9:12

Secondly, lets assume that
9:12 - 9:13

we restrict students to apply
9:13 - 9:15

only once to each college,
9:15 - 9:15

but what I what we mean
9:15 - 9:17

by that is just on one day.
9:17 - 9:19

A student can still apply
9:19 - 9:21

to multiple majors at a
9:21 - 9:24

single college and to different majors at different colleges.
9:24 - 9:26

Let's also assume that majors
9:26 - 9:29

are independent of hobbies, which seems to make sense.
9:29 - 9:30

It takes some thinking to come
9:30 - 9:32

up with the right functional and
9:32 - 9:34

multivalued dependencies to capture these
9:34 - 9:36

constraints, but here they are.
9:36 - 9:37

The first one when we
9:37 - 9:38

say that we reveal hobbies to
9:38 - 9:40

college selectively is actually
9:40 - 9:42

the absence of a multivalued
9:42 - 9:45

dependency on hobbies and colleges.
9:45 - 9:46

The second one says as
9:46 - 9:47

we apply once to each
9:47 - 9:49

college, or on one particular
9:49 - 9:51

day to each college, so
9:51 - 9:52

that would say that when
9:52 - 9:54

we have a particular student
9:54 - 9:57

and a particular college that always
9:57 - 9:59

going to have the same date,
9:59 - 10:00

so any two tuples for
10:00 - 10:04

a student and college combination will be on the same date.
10:04 - 10:06

The last dependency that
10:06 - 10:08

we will have involves the independence
10:08 - 10:09

of the majors that are
10:09 - 10:11

being applied for and the
10:11 - 10:13

hobbies that a student has, so
10:13 - 10:14

we'll write that as the
10:14 - 10:16

multivalue dependency social security
10:16 - 10:18

number, plus college name,
10:18 - 10:22

plus date, multidetermines major,
10:22 - 10:24

and remember what that's saying
10:24 - 10:25

is that major, for a
10:25 - 10:27

given student, college, and
10:27 - 10:29

date the majors that
10:29 - 10:31

they apply for are independent
10:31 - 10:33

of what we call the rest,
10:33 - 10:36

which in this case is the hobbies.
10:36 - 10:37

So, you might take some time to
10:37 - 10:38

look at the formal definitions of
10:38 - 10:41

functional dependencies, multivalue dependencies, and
10:41 - 10:42

maybe write out some sample
10:42 - 10:44

data to convince yourself that
10:44 - 10:45

these are the dependencies that
10:45 - 10:49

are capturing the assumptions that we make about the real world.
10:49 - 10:51

Like with functional dependencies we
10:51 - 10:53

have a notion of trivial dependency
10:53 - 10:55

those that always hold we
10:55 - 10:57

also have some rule for multi valued dependencies.
10:57 - 10:59

The definition for a trivial
10:59 - 11:01

multi valued dependency A multi
11:01 - 11:03

determines B is in
11:03 - 11:05

this case, that either B
11:05 - 11:08

is a subset of A,
11:08 - 11:10

or A union B
11:10 - 11:12

are all attributes, a multi-value
11:12 - 11:16

dependency is non-trivial if that's not the case.
11:16 - 11:17

So let's take the look at
11:17 - 11:20

why these multi-value dependencies are trivial.
11:20 - 11:22

So let's start with the first
11:22 - 11:24

case where we have our
11:24 - 11:26

attributes A and the rest
11:26 - 11:28

and then attributes B are a
11:28 - 11:29

subset of A so lets
11:29 - 11:31

say that these are attributes B.
11:31 - 11:33

So what are definition of multi-valued
11:33 - 11:34

dependencies says that when we
11:34 - 11:37

have the same values for
11:37 - 11:39

A in two tuples, so
11:39 - 11:40

here A and A, then
11:40 - 11:42

we have every combination of the
11:42 - 11:44

B values and the rest,
11:44 - 11:45

well obviously we do since
11:45 - 11:47

the B's are subsets of the
11:47 - 11:49

A's here, the B values
11:49 - 11:50

are going to be the same as
11:50 - 11:52

well and we clearly have every combination.
11:52 - 11:55

For the other case of trivial multi-value dependencies.
11:55 - 11:57

We have A and B together
11:57 - 11:58

being all attributes of the
11:58 - 11:59

relation, so in that
11:59 - 12:02

case, there's no rest, so
12:02 - 12:04

clearly we have every combination
12:04 - 12:06

of values of A and
12:06 - 12:09

B and rest, because there's no rest to combine with.
12:09 - 12:11

Like with functional dependencies there are
12:11 - 12:14

a whole bunch of rules that hold for multi-valued dependencies.
12:14 - 12:15

We will just talk about three of
12:15 - 12:16

them, and the first one is
12:16 - 12:18

the most important and interesting,
12:18 - 12:20

and that's the rule that says
12:20 - 12:21

if we have a functional dependency
12:21 - 12:23

from A to B then we
12:23 - 12:26

also have a multi-valued dependency
12:26 - 12:27

from A to B. And I'm
12:27 - 12:29

gonna go ahead and prove that rule
12:29 - 12:32

for you again because this is an important one.
12:32 - 12:33

I'm going do this proof using a
12:33 - 12:35

template for the relation similar to
12:35 - 12:37

the what I did with rules for functional dependencies.
12:37 - 12:38

So let's say we have our
12:38 - 12:40

A attributes, our B attributes,
12:40 - 12:42

and our rest, and what
12:42 - 12:43

we need to prove, to prove
12:43 - 12:46

the multi-value dependencies, is when
12:46 - 12:46

there are tuples T and
12:46 - 12:48

U with a certain form,
12:48 - 12:51

there exists a tuple V of another form.
12:51 - 12:54

So let's fill in some values first for the tuples.
12:54 - 12:55

So Let's say that we
12:55 - 12:56

have A and A here,
12:56 - 12:58

that's what we need for the
12:58 - 13:00

equality of the A values.
13:00 - 13:02

Then we have B1 and R1,
13:02 - 13:05

and we have B2 and
13:05 - 13:06

R2, and in order
13:06 - 13:09

to prove this multi-value dependency,
13:09 - 13:10

I need to prove that there
13:10 - 13:12

exists a tuple V that has
13:12 - 13:13

the same A value that it
13:13 - 13:16

has B1 from tuple T
13:16 - 13:18

and R2 from Tuple U,
13:18 - 13:20

and what I have in order
13:20 - 13:21

to prove that is the fact
13:21 - 13:23

that we have a functional dependency from
13:23 - 13:25

A to B. Because we
13:25 - 13:27

have the functional dependencies and because
13:27 - 13:29

T and U have the same A value.
13:29 - 13:30

What that tells us is
13:30 - 13:33

that B1 equals B2 here.
13:33 - 13:35

And so if B1 equals B2
13:35 - 13:37

then we know that this value
13:37 - 13:39

B1 here is equivalent
13:39 - 13:41

to B2 and in order
13:41 - 13:43

to prove the existence of this
13:43 - 13:44

tuple well we have that tuple
13:44 - 13:46

here already and we're done.
13:46 - 13:47

So you might check that again,
13:47 - 13:48

but what that says is
13:48 - 13:50

using the knowledge of a
13:50 - 13:52

functional dependency we can prove
13:52 - 13:53

that we always have a corresponding
13:53 - 13:56

multivalued dependency there are
13:56 - 13:57

a couple more rules for
13:57 - 13:58

multivalued dependencies that you can
13:58 - 14:01

prove for yourself if you're so inclined.
14:01 - 14:02

The first one is the intersection
14:02 - 14:03

rule, it says that if
14:03 - 14:06

we have A multi determines
14:06 - 14:08

B and A multi determines
14:08 - 14:10

C, then we have
14:10 - 14:13

A multi determines B
14:13 - 14:15

intersects C. The transitive
14:15 - 14:18

rule is slightly different than from exact transitivity.
14:18 - 14:19

What it says is if
14:19 - 14:22

we have A multi determine B,
14:22 - 14:24

and we have B multi determines
14:24 - 14:26

C then we have
14:26 - 14:28

A multi determined not
14:28 - 14:31

C exactly but C minus
14:31 - 14:32

B.
And you might work
14:32 - 14:33

some examples because it yourself
14:33 - 14:35

why we don't have just
14:35 - 14:36

A multi determines B and
14:36 - 14:40

to subtract the attributes for B, although it's fairly complicated.
14:40 - 14:41

So again these rules can
14:41 - 14:42

be proven and there are many
14:42 - 14:44

other rules of multivalued dependencies
14:44 - 14:48

that you can read about in any of the readings provided on our website.
14:48 - 14:50

By the way, regarding rules, let's
14:50 - 14:51

come back to the fact
14:51 - 14:54

that every functional dependency is a multivalued dependency.
14:54 - 14:55

So we can use another Venn diagram.
14:55 - 14:57

This is different than our previous one.
14:57 - 14:59

We can list all of our
14:59 - 15:01

multivalued dependencies here and the
15:01 - 15:02

functional dependencies are a
15:02 - 15:04

subset of those, so what
15:04 - 15:06

that tells us is if we
15:06 - 15:07

ever have a rule that applies
15:07 - 15:10

for multivalued dependencies here, that
15:10 - 15:12

will cover the entire Ven diagram
15:12 - 15:15

and so that rule will apply for functional dependencies as well.
15:15 - 15:16

So every rule for MVDs is
15:16 - 15:19

also a rule for functional dependencies.
15:19 - 15:21

On the other hand if we
15:21 - 15:22

have a rule that applies
15:22 - 15:24

for functional dependencies that rule
15:24 - 15:26

does not necessarily have to
15:26 - 15:29

apply all multivalued dependencies because
15:29 - 15:32

it might be specialized just for this portion of the Venn diagram.
15:32 - 15:35

So an example of such a rule is the splitting rule.
15:35 - 15:36

The splitting rule is a
15:36 - 15:38

rule that applies to functional dependencies,
15:38 - 15:41

but does not always apply to multivalued dependencies.
15:41 - 15:45

And again you could work an example to convince yourself of that fact.
15:45 - 15:45

Woo.
15:45 - 15:47

So after all that set up
15:47 - 15:48

of multivalue dependencies, we're finally
15:48 - 15:51

ready to talk about fourth normal form.
15:51 - 15:52

The definition of fourth normal
15:52 - 15:55

form looks very similar to the one for Boyce-Codd normal form.
15:55 - 15:56

Says we take a relation and
15:56 - 15:57

we take now a set of
15:57 - 15:59

multivalued dependencies for that
15:59 - 16:01

relation and the relation
16:01 - 16:03

is in fourth normal form if
16:03 - 16:04

every non-trivial
16:04 - 16:07

multivalued dependency has on
16:07 - 16:09

it's left hand side a key.
16:09 - 16:10

Remember for functional dependencies it
16:10 - 16:12

looks exactly the same except
16:12 - 16:16

we have the functional dependency all here instead of multivalued dependencies.
16:16 - 16:17

So, let's see exactly what fourth
16:17 - 16:20

normal form telling us and why it's a good thing.
16:20 - 16:21

So we have A, B, and
16:21 - 16:23

rest as usual and let's
16:23 - 16:27

suppose that we have a non trivial multivalued dependency.
16:27 - 16:28

So that's telling us that
16:28 - 16:30

if we have 2 tuples, T
16:30 - 16:32

and U and we'll
16:32 - 16:33

put in some values for B
16:33 - 16:35

and the rest, then we're going
16:35 - 16:37

to have the combination of those, as well.
16:37 - 16:39

So, that's kind of the proliferation of
16:39 - 16:40

tuples we get when we
16:40 - 16:43

squish independent facts in the same relation.
16:43 - 16:44

But, if the left
16:44 - 16:47

side is a key, so if
16:47 - 16:48

the A attributes are
16:48 - 16:50

a key here then we won't have
16:50 - 16:51

those 2 tuples and will
16:51 - 16:54

never have to worry about the proliferation.
16:54 - 16:55

Now, remember that I said fourth
16:55 - 16:58

normal form implies Boyce-Codd Normal Form.
16:58 - 16:59

Or if you prefer it in
16:59 - 17:02

Venn diagram format, Fourth
17:02 - 17:04

Normal Form is stronger than
17:04 - 17:07

Boyce-Codd Normal Form and let's see why that's the case.
17:07 - 17:09

If we have a fourth
17:09 - 17:10

normal form and we want
17:10 - 17:11

to show that we're in Boyce-Codd normal
17:11 - 17:13

form, then we have to
17:13 - 17:14

show that if we have
17:14 - 17:17

a functional dependency then
17:17 - 17:19

the left hand side A is a key.
17:19 - 17:21

That would tell us we're in Boyce-Codd normal form.
17:21 - 17:22

Well, if we have a functional
17:22 - 17:24

dependency, we had a rule
17:24 - 17:25

that tells us we also have
17:25 - 17:27

the multivalued dependency and then
17:27 - 17:30

since we're in fourth normal form, we get that A as a key.
17:30 - 17:32

So again, fourth normal form
17:32 - 17:35

implies Boyce-Codd normal form.
17:35 - 17:35

Now let's take a look at
17:35 - 17:38

the decomposition of algorithm into fourth normal form.
17:38 - 17:40

It's extremely similar to the
17:40 - 17:42

BCNF decomposition algorithm.
17:42 - 17:43

The input is a relation.
17:43 - 17:45

A set of functional dependencies
17:45 - 17:46

and multi value dependencies and we
17:46 - 17:48

need to separate them because
17:48 - 17:50

we use the functional dependencies to find keys.
17:50 - 17:52

The output is a decomposition
17:52 - 17:54

of R into good relations,
17:54 - 17:56

in this case fourth normal form,
17:56 - 17:58

and it's a good decomposition in
17:58 - 18:02

the sense that reassembling the relations gives you back the original.
18:02 - 18:03

As with Boyce-Codd normal form
18:03 - 18:05

we start by computing keys using
18:05 - 18:07

the functional dependencies, and then
18:07 - 18:08

we repeat the decomposition process
18:08 - 18:11

until all of our relations are in fourth normal
18:11 - 18:12

form.
18:12 - 18:14

Just as with functional dependencies
18:14 - 18:15

in BCNF, we pick a relation
18:15 - 18:17

that has a violating dependency, in
18:17 - 18:19

this case a multi-value dependency, and
18:19 - 18:22

we split the relation based on that dependency.
18:22 - 18:24

So we create one relation that
18:24 - 18:25

has the attributes of the dependency
18:25 - 18:27

and another relation that has
18:27 - 18:30

the left-hand side of the dependency and the rest of the attributes.
18:30 - 18:31

After that, we need to
18:31 - 18:33

compute the functional dependencies for
18:33 - 18:35

the decomposed relation and the
18:35 - 18:39

multi-value dependencies for it, and then we can compute the keys.
18:39 - 18:41

Now finding these multi-value
18:41 - 18:45

dependencies is actually a fairly complex process.
18:45 - 18:47

Usually it's very intuitive,
18:47 - 18:48

but I'm going to refer you
18:48 - 18:50

to the readings to read about the algorithm itself.
18:50 - 18:51

And in fact, it can be
18:51 - 18:53

so complicated in the general
18:53 - 18:56

case that some of the readings don't even provide the algorithm.
18:56 - 18:59

But again, in general, it's very intuitive.
18:59 - 19:01

Our first example is going to be very fast to do.
19:01 - 19:03

As you remember, this example has
19:03 - 19:05

one multi-value dependency - social
19:05 - 19:07

security number determines college name,
19:07 - 19:09

multi determines college name -
19:09 - 19:12

and it has no keys other than all of the attributes.
19:12 - 19:13

So obviously, this is a
19:13 - 19:16

violating multi value dependency,
19:16 - 19:18

and so we decompose into two
19:18 - 19:21

relations, we'll call them A1 and A2.
19:21 - 19:23

The first one has the attributes
19:23 - 19:24

of the multivalue dependency, the
19:24 - 19:26

social security number and
19:26 - 19:28

the college name, and the second
19:28 - 19:29

one has the left hand
19:29 - 19:31

side multivalued dependency as well
19:31 - 19:35

as all the remaining attributes, which in this case is the hobby.
19:35 - 19:37

These two decomposed relations actually
19:37 - 19:39

have no FDs and no
19:39 - 19:41

MVDs so in that
19:41 - 19:42

case we're definitely in 4th
19:42 - 19:44

normal form and we're done
19:44 - 19:46

with the decomposition and I think
19:46 - 19:47

we can agree that this looks like
19:47 - 19:49

a good schema for the data at hand.
19:49 - 19:52

Our second example is quite a bit more complicated.
19:52 - 19:54

Remember in this example we
19:54 - 19:55

have that the social
19:55 - 19:57

security number and college name
19:57 - 19:59

functionally determine date.
19:59 - 20:01

That means we have each student
20:01 - 20:03

applies to each college on a specific date.
20:03 - 20:05

And secondly, we assume that
20:05 - 20:08

majors that were being applied for were independent of hobbies.
20:08 - 20:10

So we have social security number,
20:10 - 20:13

college name and date
20:13 - 20:15

multi determines the major.
20:15 - 20:19

And incidentally that would mean it multi determines the hobby too.
20:19 - 20:21

Once again, we have no keys
20:21 - 20:24

for the relation, except for all attributes.
20:24 - 20:26

So we have both a violating
20:26 - 20:28

functional dependency in this case
20:28 - 20:31

and we have a violating multivalue dependency.
20:31 - 20:33

Let's use the multivalue dependency for
20:33 - 20:35

our first decomposition step.
20:35 - 20:38

So we'll create A1 and A2.
20:38 - 20:43

A1 will contain all the attributes of our multivalued dependency.
20:43 - 20:45

And then A2 will contain
20:45 - 20:47

all the remaining attributes along with
20:47 - 20:51

the left hand side of our multivalue dependency.
20:51 - 20:55

And that turns out to be all of the attributes except the major.
20:55 - 20:56

Now let's look at our
20:56 - 20:57

decomposed relations and see
20:57 - 20:58

what we have in terms of
20:58 - 21:02

functional dependencies and multi-value dependencies for them.
21:02 - 21:03

So after the decomposition, we don't
21:03 - 21:05

have any more multivalued dependency but
21:05 - 21:07

our functional dependency actually applies
21:07 - 21:09

to both of the decomposed
21:09 - 21:11

relations and we still
21:11 - 21:13

don't have a key on the left hand side.
21:13 - 21:17

So we need to decompose further based on the first functional dependency.
21:17 - 21:19

So let's start by decomposing A1.
21:19 - 21:22

We'll turn A1 into A3
21:22 - 21:24

and A4, and A3 will have
21:24 - 21:28

the functional dependency, all three attributes.
21:28 - 21:29

And then A4 will have the
21:29 - 21:30

left side of the functional
21:30 - 21:32

dependency and the remaining
21:32 - 21:35

attributes, which in this case is the major.
21:35 - 21:37

So now we're finished with A1
21:37 - 21:39

and we have a similar problem with A2.
21:39 - 21:41

And so we decompose A2
21:41 - 21:43

similarly, although we'll discover
21:43 - 21:45

that A3 is the same relation in
21:45 - 21:48

the decomposition of A2 as we got with A1.
21:48 - 21:50

So we actually only add one
21:50 - 21:52

more relation now, which is A5.
21:52 - 21:54

That contains the social
21:54 - 21:55

security number and the college
21:55 - 21:56

name from the left side of
21:56 - 21:58

the functional dependency and the
21:58 - 22:00

hobby, which is the remaining attribute.
22:00 - 22:04

And then we cross out A2.
22:04 - 22:06

Now the only functional dependencies are multi-value dependencies we have
22:06 - 22:08

left do have a key on the left-hand side.
22:08 - 22:10

I'll let you verify that for yourself.
22:10 - 22:11

And these three relations are our
22:11 - 22:14

final decomposition into 4th normal form.
22:14 - 22:15

And I think you will agree
22:15 - 22:17

that this is a good design
22:17 - 22:19

again for the data at hand.
22:19 - 22:21

So let's wrap up this long
22:21 - 22:24

unit on on dependencies and normal forms with a quick summary.
22:24 - 22:27

If we have a relation RABC, a
22:27 - 22:29

functional dependency from A
22:29 - 22:30

to B tells us that
22:30 - 22:32

when we have the same A values,
22:32 - 22:34

we have the same B values,
22:34 - 22:36

and the Boyce-Codd normal form tells
22:36 - 22:38

us to factor that...those attributes
22:38 - 22:40

into their own relation so that
22:40 - 22:43

we don't repeat that relationship over and over.
22:43 - 22:45

For multi-value dependencies, let's say
22:45 - 22:47

we have the relation RABCD,
22:47 - 22:49

and if we have
22:49 - 22:51

the multi-value dependency A multi
22:51 - 22:53

determines B, what that tells
22:53 - 22:55

us is that we have every combination
22:55 - 22:57

for a given A of B
22:57 - 22:59

values and CD values
22:59 - 23:01

- we called those rest earlier -
23:01 - 23:03

and when we have that multiplicative effect
23:03 - 23:05

of combinations, again, we take
23:05 - 23:07

the A and B attributes and
23:07 - 23:08

we put them in a separate
23:08 - 23:09

relation so that we can
23:09 - 23:10

separate out those facts from
23:10 - 23:15

the independent fact of A and its CD values.
23:15 - 23:17

Finally, in the design process,
23:17 - 23:19

multi-value dependencies are something
23:19 - 23:22

we add to functional dependencies, only they're stronger.
23:22 - 23:23

So fourth normal form is
23:23 - 23:27

a stronger property than Boyce-Codd normal form.
23:27 - 23:29

Now usually this design process
23:29 - 23:30

works very well and is
23:30 - 23:34

very intuitive for many schemas, I hope for the examples that I gave here.
23:34 - 23:35

But there are actually a few
23:35 - 23:37

shortcomings to using Boyce-Codd
23:37 - 23:38

Normal Form or Fourth Normal Form
23:38 -

and we'll cover those in the next video.

Title:: 07-04-multivalued-dependencies.mp4
Video Language:: English
Duration:: 23:42

	Amara Bot edited English subtitles for 07-04-multivalued-dependencies.mp4
	Amara Bot added a translation

English subtitles

Revisions

Revision 1

Amara Bot

07-04-multivalued-dependencies.mp4

Revisions

Our website uses cookies

Operating cookies (Required)