This video continues the topic

of relational design, talking specifically

about multivalued dependencies and Fourth Normal Form.

I know I've reminded you many times

about relational designed by decomposition,

so let me do it very quickly.

The designer specifies mega relations

with all the information they

want to store and properties of the data.

The system decomposes the mega

relations into smaller relations that

have good properties -- no anomalies and no lost information.

When we have functional dependencies, as

are properties of the data, we

get Boyce-Codd Normal form, and

then when we add to the

functional dependencies multi-value dependencies

we get fourth normal form.

And the specification of multi-

value dependencies and decomposition into

Fourth Normal Form is the topic of this video.

As a reminder from earlier,

Fourth Normal Form is stronger

than Boyce-Codd Normal Form so

if we have here all of

the relations that are in Boyce-

Codd normal form, some subset

of those are also in fourth normal form.

When we have functional dependencies, we can guarantee Boyce Codd normal

form and then when we add

multi-value dependencies that's what

allows us to narrow down to

the stronger property of fourth normal form.

So let's start with an example.

We have information about students applying to colleges.

The student is identified by their social security number.

They may apply to several colleges

and in this video we're not going

to have college states, just college names.

We'll assume they're unique.

And then the student may have hobbies.

And they may apply, as I've said,

to several colleges and have several

hobbies, but let's assume for now those are independent.

So do we have any functional dependencies for this relation?

Actually we don't have any all.

The social security number does

not determine the college name

or the hobby or anything in the other direction.

With no functional dependencies, the

only key for the relation

then is all attributes of the relation.

So is this relation in Boyce-Codd Normal Form?

As you might remember from the

previous video, Boyce-Codd

Normal Form says all functional

dependencies have a key on the left hand side.

Well, since we have no functional dependencies then the answer is yes.

It is in Boyce-Codd normal form.

On the other hand do we think this is a good design?

I'm going to say no this is not a good design Why not?

Well, let's suppose that somebody

applies to five colleges

and they have, say, six hobbies.

Then to have all

combinations of colleges and

hobbies that would yield 30

tuples in the relation and clearly that's not a good idea.

We'd rather separate the college

and hobbies if they are independent.

So the separation of independent facts

is what fourth normal form is about.

And now let's get a little bit more formal.

Like functional dependencies, multivalued dependencies

are specified based on knowledge

of the real world constraints on

the data being captured and all

instances of a relation

with a multivalued dependency must adhere to the dependency.

Now let's define exactly what a multi value dependency is.

For relation R we write

a multi value dependency using

a double headed arrow and

we say 'A' multi determines 'B'.

In this case, again, with 'A'

and 'B' possibly being sets of attributes, so that would be A

one through A N

and B one through B

M, which I'm abbreviating with A bar and B bar.

So let me write the formal definition of A multi

determines B. Again using first

order logic similarly to what

we did with functional dependencies but this one's a bit more complicated.

It says for all tuples T and

U that are in relation

R, if T with

the attributes A of T equal U

for the attributes A of U. Again these are lists of attributes.

So if the two tuples agree on

their A values then, and

remember for functional dependencies it was

simple we just said they agreed on their B values.

But now it gets more complicated.

We're going to say that there exists

a third tuple V in R that has the following properties.

V has the same A

values as T and

U. So V sub A equals

T sub A, furthermore V

has its B value,

okay, drawn from T

so it's equal there.

And finally it has its

rest, so those are

all the attributes other than

A and B equal to U rest.

Okay, so that's a mouthful but let's look at that pictorially.

So here's our relation R and

we'll have the set of

attributes A, the set of

attributes B and the rest of the attributes.

And now let's make some tuples.

So let's say that this

is tuple T and this

is tuple U. And we

said that T and U

agree on their A values.

So they have the same A values

and then they don't have to have the same B values.

So we'll call the first one B-1

and the second one B-2 and

then for the rest we'll call this R-1 and R-2.

So what the multi -value dependency

says is that we have a

third tuple, V and

V again has the same

A and it has

its B value from tuple

T. So it has B-1,

but it has its rest value

from tuple U, so then

it has R-2 here.

So again what we're saying is

that if we have these first

two tuples T and U,

then we also have tuple

V. Now let me do something a little tricky.

Let me swap the roles of

T and U and show

that we also with this definition,

are guaranteed to have a

fourth tuple and we'll call

that fourth tuple W. By

swapping the roles of

T and U, W has again the same A value.

Now it will take its B

value from U and that

will give us B2, and

we'll take its rest

value from T and that gives us R1.

So what we can see

here is that when we

have the first two tuples

that have this particular combination of

B values and rest values,

it tells us we

must have the other combinations as well.

We must have B1 with R2,

and B2 with R1.

What it's really saying

is those B values and

the rest values are independent of each other and we'll have all combinations.

So that might get you thinking

back to our colleges and hobbies.

Incidentally, sometimes multi-value dependencies

are called tuple generating dependencies.

And that's because the definition is

is about having additional tuples

when you have some existing tuples,

unlike functional dependencies which

just talk about the relationships among existing tuples.

So let's go back to our example.

Now we have students applying to colleges and having hobbies.

Those are independent facts about the student.

We'll write our multi-value dependency

as 'social security number multi

determine C name' and

now lets use some example

data to see our definition and how it works here.

Here's our apply relation with the

social security number, the college

name and the hobby.

Let's suppose that we have

a student, 123 who's applied

to Stanford and plays the trumpet.

Now, let's suppose that same

student, 123, has applied

to Berkeley and plays tennis.

So what our multivalued dependency

says, and let's make this

tuple T and tuple U,

is that there's a further tuple

V. V takes the

same social security number and

it takes the first value

for the college name and

the second for the hobby.

It says if we have

a 123 playing trumpet at Stanford

and tennis at Berkeley, then that

same person will be playing tennis at Stanford.

Furthermore, I show that the same definition will generate automatically.

A fourth tuple with the

other combination which would be Berkley and Trumpet.

By the way one thing you

might notice here is that

we also have the multivalued

dependency, social security number multi determines hobby.

This is actually one of

the rules for multivalued dependency saying

that when you have A determines

B, then you, A multidetermines

B, then you also have A

multi determines rest and we'll

see some rules for multivalued dependencies later.

Let's look quickly at a

modification of our example where

the real world assumptions about the data are different.

So we still have exactly the same relation with the same attributes.

But let's suppose that we don't

want to reveal every hobby to every college.

Maybe we'll decide that we don't

want Stanford to know that

we're a surfer or Berkeley

to know that we're on the speech and debate team.

So if that's the case, then

what multivalued dependencies do we have in this relation?

We actually have none.

And we don't have any functional dependencies either by the way.

And is this a good design?

Well, actually I would argue yes.

In this case, this design

is a good one because

we're not going to have that

multiplicative effect of information.

Every tuple that we have

in the applied relation will

be an independent piece of important information.

Let's look at one more example

before we go on to talk about properties of multivalued dependencies.

I've extended the apply relation

now to not only include colleges

and hobbies but also the

date of application to a

college, and the major or majors that are being applied for.

Let's continue to assume that hobbies are revealed to college selectively.

We don't need to have same

hobbies for each college that a student applies to.

Secondly, lets assume that

we restrict students to apply

only once to each college,

but what I what we mean

by that is just on one day.

A student can still apply

to multiple majors at a

single college and to different majors at different colleges.

Let's also assume that majors

are independent of hobbies, which seems to make sense.

It takes some thinking to come

up with the right functional and

multivalued dependencies to capture these

constraints, but here they are.

The first one when we

say that we reveal hobbies to

college selectively is actually

the absence of a multivalued

dependency on hobbies and colleges.

The second one says as

we apply once to each

college, or on one particular

day to each college, so

that would say that when

we have a particular student

and a particular college that always

going to have the same date,

so any two tuples for

a student and college combination will be on the same date.

The last dependency that

we will have involves the independence

of the majors that are

being applied for and the

hobbies that a student has, so

we'll write that as the

multivalue dependency social security

number, plus college name,

plus date, multidetermines major,

and remember what that's saying

is that major, for a

given student, college, and

date the majors that

they apply for are independent

of what we call the rest,

which in this case is the hobbies.

So, you might take some time to

look at the formal definitions of

functional dependencies, multivalue dependencies, and

maybe write out some sample

data to convince yourself that

these are the dependencies that

are capturing the assumptions that we make about the real world.

Like with functional dependencies we

have a notion of trivial dependency

those that always hold we

also have some rule for multi valued dependencies.

The definition for a trivial

multi valued dependency A multi

determines B is in

this case, that either B

is a subset of A,

or A union B

are all attributes, a multi-value

dependency is non-trivial if that's not the case.

So let's take the look at

why these multi-value dependencies are trivial.

So let's start with the first

case where we have our

attributes A and the rest

and then attributes B are a

subset of A so lets

say that these are attributes B.

So what are definition of multi-valued

dependencies says that when we

have the same values for

A in two tuples, so

here A and A, then

we have every combination of the

B values and the rest,

well obviously we do since

the B's are subsets of the

A's here, the B values

are going to be the same as

well and we clearly have every combination.

For the other case of trivial multi-value dependencies.

We have A and B together

being all attributes of the

relation, so in that

case, there's no rest, so

clearly we have every combination

of values of A and

B and rest, because there's no rest to combine with.

Like with functional dependencies there are

a whole bunch of rules that hold for multi-valued dependencies.

We will just talk about three of

them, and the first one is

the most important and interesting,

and that's the rule that says

if we have a functional dependency

from A to B then we

also have a multi-valued dependency

from A to B. And I'm

gonna go ahead and prove that rule

for you again because this is an important one.

I'm going do this proof using a

template for the relation similar to

the what I did with rules for functional dependencies.

So let's say we have our

A attributes, our B attributes,

and our rest, and what

we need to prove, to prove

the multi-value dependencies, is when

there are tuples T and

U with a certain form,

there exists a tuple V of another form.

So let's fill in some values first for the tuples.

So Let's say that we

have A and A here,

that's what we need for the

equality of the A values.

Then we have B1 and R1,

and we have B2 and

R2, and in order

to prove this multi-value dependency,

I need to prove that there

exists a tuple V that has

the same A value that it

has B1 from tuple T

and R2 from Tuple U,

and what I have in order

to prove that is the fact

that we have a functional dependency from

A to B. Because we

have the functional dependencies and because

T and U have the same A value.

What that tells us is

that B1 equals B2 here.

And so if B1 equals B2

then we know that this value

B1 here is equivalent

to B2 and in order

to prove the existence of this

tuple well we have that tuple

here already and we're done.

So you might check that again,

but what that says is

using the knowledge of a

functional dependency we can prove

that we always have a corresponding

multivalued dependency there are

a couple more rules for

multivalued dependencies that you can

prove for yourself if you're so inclined.

The first one is the intersection

rule, it says that if

we have A multi determines

B and A multi determines

C, then we have

A multi determines B

intersects C. The transitive

rule is slightly different than from exact transitivity.

What it says is if

we have A multi determine B,

and we have B multi determines

C then we have

A multi determined not

C exactly but C minus

B. 
And you might work

some examples because it yourself

why we don't have just

A multi determines B and

to subtract the attributes for B, although it's fairly complicated.

So again these rules can

be proven and there are many

other rules of multivalued dependencies

that you can read about in any of the readings provided on our website.

By the way, regarding rules, let's

come back to the fact

that every functional dependency is a multivalued dependency.

So we can use another Venn diagram.

This is different than our previous one.

We can list all of our

multivalued dependencies here and the

functional dependencies are a

subset of those, so what

that tells us is if we

ever have a rule that applies

for multivalued dependencies here, that

will cover the entire Ven diagram

and so that rule will apply for functional dependencies as well.

So every rule for MVDs is

also a rule for functional dependencies.

On the other hand if we

have a rule that applies

for functional dependencies that rule

does not necessarily have to

apply all multivalued dependencies because

it might be specialized just for this portion of the Venn diagram.

So an example of such a rule is the splitting rule.

The splitting rule is a

rule that applies to functional dependencies,

but does not always apply to multivalued dependencies.

And again you could work an example to convince yourself of that fact.

Woo.

So after all that set up

of multivalue dependencies, we're finally

ready to talk about fourth normal form.

The definition of fourth normal

form looks very similar to the one for Boyce-Codd normal form.

Says we take a relation and

we take now a set of

multivalued dependencies for that

relation and the relation

is in fourth normal form if

every non-trivial

multivalued dependency has on

it's left hand side a key.

Remember for functional dependencies it

looks exactly the same except

we have the functional dependency all here instead of multivalued dependencies.

So, let's see exactly what fourth

normal form telling us and why it's a good thing.

So we have A, B, and

rest as usual and let's

suppose that we have a non trivial multivalued dependency.

So that's telling us that

if we have 2 tuples, T

and U and we'll

put in some values for B

and the rest, then we're going

to have the combination of those, as well.

So, that's kind of the proliferation of

tuples we get when we

squish independent facts in the same relation.

But, if the left

side is a key, so if

the A attributes are

a key here then we won't have

those 2 tuples and will

never have to worry about the proliferation.

Now, remember that I said fourth

normal form implies Boyce-Codd Normal Form.

Or if you prefer it in

Venn diagram format, Fourth

Normal Form is stronger than

Boyce-Codd Normal Form and let's see why that's the case.

If we have a fourth

normal form and we want

to show that we're in Boyce-Codd normal

form, then we have to

show that if we have

a functional dependency then

the left hand side A is a key.

That would tell us we're in Boyce-Codd normal form.

Well, if we have a functional

dependency, we had a rule

that tells us we also have

the multivalued dependency and then

since we're in fourth normal form, we get that A as a key.

So again, fourth normal form

implies Boyce-Codd normal form.

Now let's take a look at

the decomposition of algorithm into fourth normal form.

It's extremely similar to the

BCNF decomposition algorithm.

The input is a relation.

A set of functional dependencies

and multi value dependencies and we

need to separate them because

we use the functional dependencies to find keys.

The output is a decomposition

of R into good relations,

in this case fourth normal form,

and it's a good decomposition in

the sense that reassembling the relations gives you back the original.

As with Boyce-Codd normal form

we start by computing keys using

the functional dependencies, and then

we repeat the decomposition process

until all of our relations are in fourth normal

form.

Just as with functional dependencies

in BCNF, we pick a relation

that has a violating dependency, in

this case a multi-value dependency, and

we split the relation based on that dependency.

So we create one relation that

has the attributes of the dependency

and another relation that has

the left-hand side of the dependency and the rest of the attributes.

After that, we need to

compute the functional dependencies for

the decomposed relation and the

multi-value dependencies for it, and then we can compute the keys.

Now finding these multi-value

dependencies is actually a fairly complex process.

Usually it's very intuitive,

but I'm going to refer you

to the readings to read about the algorithm itself.

And in fact, it can be

so complicated in the general

case that some of the readings don't even provide the algorithm.

But again, in general, it's very intuitive.

Our first example is going to be very fast to do.

As you remember, this example has

one multi-value dependency - social

security number determines college name,

multi determines college name -

and it has no keys other than all of the attributes.

So obviously, this is a

violating multi value dependency,

and so we decompose into two

relations, we'll call them A1 and A2.

The first one has the attributes

of the multivalue dependency, the

social security number and

the college name, and the second

one has the left hand

side multivalued dependency as well

as all the remaining attributes, which in this case is the hobby.

These two decomposed relations actually

have no FDs and no

MVDs so in that

case we're definitely in 4th

normal form and we're done

with the decomposition and I think

we can agree that this looks like

a good schema for the data at hand.

Our second example is quite a bit more complicated.

Remember in this example we

have that the social

security number and college name

functionally determine date.

That means we have each student

applies to each college on a specific date.

And secondly, we assume that

majors that were being applied for were independent of hobbies.

So we have social security number,

college name and date

multi determines the major.

And incidentally that would mean it multi determines the hobby too.

Once again, we have no keys

for the relation, except for all attributes.

So we have both a violating

functional dependency in this case

and we have a violating multivalue dependency.

Let's use the multivalue dependency for

our first decomposition step.

So we'll create A1 and A2.

A1 will contain all the attributes of our multivalued dependency.

And then A2 will contain

all the remaining attributes along with

the left hand side of our multivalue dependency.

And that turns out to be all of the attributes except the major.

Now let's look at our

decomposed relations and see

what we have in terms of

functional dependencies and multi-value dependencies for them.

So after the decomposition, we don't

have any more multivalued dependency but

our functional dependency actually applies

to both of the decomposed

relations and we still

don't have a key on the left hand side.

So we need to decompose further based on the first functional dependency.

So let's start by decomposing A1.

We'll turn A1 into A3

and A4, and A3 will have

the functional dependency, all three attributes.

And then A4 will have the

left side of the functional

dependency and the remaining

attributes, which in this case is the major.

So now we're finished with A1

and we have a similar problem with A2.

And so we decompose A2

similarly, although we'll discover

that A3 is the same relation in

the decomposition of A2 as we got with A1.

So we actually only add one

more relation now, which is A5.

That contains the social

security number and the college

name from the left side of

the functional dependency and the

hobby, which is the remaining attribute.

And then we cross out A2.

Now the only functional dependencies are multi-value dependencies we have

left do have a key on the left-hand side.

I'll let you verify that for yourself.

And these three relations are our

final decomposition into 4th normal form.

And I think you will agree

that this is a good design

again for the data at hand.

So let's wrap up this long

unit on on dependencies and normal forms with a quick summary.

If we have a relation RABC, a

functional dependency from A

to B tells us that

when we have the same A values,

we have the same B values,

and the Boyce-Codd normal form tells

us to factor that...those attributes

into their own relation so that

we don't repeat that relationship over and over.

For multi-value dependencies, let's say

we have the relation RABCD,

and if we have

the multi-value dependency A multi

determines B, what that tells

us is that we have every combination

for a given A of B

values and CD values

- we called those rest earlier -

and when we have that multiplicative effect

of combinations, again, we take

the A and B attributes and

we put them in a separate

relation so that we can

separate out those facts from

the independent fact of A and its CD values.

Finally, in the design process,

multi-value dependencies are something

we add to functional dependencies, only they're stronger.

So fourth normal form is

a stronger property than Boyce-Codd normal form.

Now usually this design process

works very well and is

very intuitive for many schemas, I hope for the examples that I gave here.

But there are actually a few

shortcomings to using Boyce-Codd

Normal Form or Fourth Normal Form

and we'll cover those in the next video.