-
This video continues the topic
-
of relational design, talking specifically
-
about multivalued dependencies and Fourth Normal Form.
-
I know I've reminded you many times
-
about relational designed by decomposition,
-
so let me do it very quickly.
-
The designer specifies mega relations
-
with all the information they
-
want to store and properties of the data.
-
The system decomposes the mega
-
relations into smaller relations that
-
have good properties -- no anomalies and no lost information.
-
When we have functional dependencies, as
-
are properties of the data, we
-
get Boyce-Codd Normal form, and
-
then when we add to the
-
functional dependencies multi-value dependencies
-
we get fourth normal form.
-
And the specification of multi-
-
value dependencies and decomposition into
-
Fourth Normal Form is the topic of this video.
-
As a reminder from earlier,
-
Fourth Normal Form is stronger
-
than Boyce-Codd Normal Form so
-
if we have here all of
-
the relations that are in Boyce-
-
Codd normal form, some subset
-
of those are also in fourth normal form.
-
When we have functional dependencies, we can guarantee Boyce Codd normal
-
form and then when we add
-
multi-value dependencies that's what
-
allows us to narrow down to
-
the stronger property of fourth normal form.
-
So let's start with an example.
-
We have information about students applying to colleges.
-
The student is identified by their social security number.
-
They may apply to several colleges
-
and in this video we're not going
-
to have college states, just college names.
-
We'll assume they're unique.
-
And then the student may have hobbies.
-
And they may apply, as I've said,
-
to several colleges and have several
-
hobbies, but let's assume for now those are independent.
-
So do we have any functional dependencies for this relation?
-
Actually we don't have any all.
-
The social security number does
-
not determine the college name
-
or the hobby or anything in the other direction.
-
With no functional dependencies, the
-
only key for the relation
-
then is all attributes of the relation.
-
So is this relation in Boyce-Codd Normal Form?
-
As you might remember from the
-
previous video, Boyce-Codd
-
Normal Form says all functional
-
dependencies have a key on the left hand side.
-
Well, since we have no functional dependencies then the answer is yes.
-
It is in Boyce-Codd normal form.
-
On the other hand do we think this is a good design?
-
I'm going to say no this is not a good design Why not?
-
Well, let's suppose that somebody
-
applies to five colleges
-
and they have, say, six hobbies.
-
Then to have all
-
combinations of colleges and
-
hobbies that would yield 30
-
tuples in the relation and clearly that's not a good idea.
-
We'd rather separate the college
-
and hobbies if they are independent.
-
So the separation of independent facts
-
is what fourth normal form is about.
-
And now let's get a little bit more formal.
-
Like functional dependencies, multivalued dependencies
-
are specified based on knowledge
-
of the real world constraints on
-
the data being captured and all
-
instances of a relation
-
with a multivalued dependency must adhere to the dependency.
-
Now let's define exactly what a multi value dependency is.
-
For relation R we write
-
a multi value dependency using
-
a double headed arrow and
-
we say 'A' multi determines 'B'.
-
In this case, again, with 'A'
-
and 'B' possibly being sets of attributes, so that would be A
-
one through A N
-
and B one through B
-
M, which I'm abbreviating with A bar and B bar.
-
So let me write the formal definition of A multi
-
determines B. Again using first
-
order logic similarly to what
-
we did with functional dependencies but this one's a bit more complicated.
-
It says for all tuples T and
-
U that are in relation
-
R, if T with
-
the attributes A of T equal U
-
for the attributes A of U. Again these are lists of attributes.
-
So if the two tuples agree on
-
their A values then, and
-
remember for functional dependencies it was
-
simple we just said they agreed on their B values.
-
But now it gets more complicated.
-
We're going to say that there exists
-
a third tuple V in R that has the following properties.
-
V has the same A
-
values as T and
-
U. So V sub A equals
-
T sub A, furthermore V
-
has its B value,
-
okay, drawn from T
-
so it's equal there.
-
And finally it has its
-
rest, so those are
-
all the attributes other than
-
A and B equal to U rest.
-
Okay, so that's a mouthful but let's look at that pictorially.
-
So here's our relation R and
-
we'll have the set of
-
attributes A, the set of
-
attributes B and the rest of the attributes.
-
And now let's make some tuples.
-
So let's say that this
-
is tuple T and this
-
is tuple U. And we
-
said that T and U
-
agree on their A values.
-
So they have the same A values
-
and then they don't have to have the same B values.
-
So we'll call the first one B-1
-
and the second one B-2 and
-
then for the rest we'll call this R-1 and R-2.
-
So what the multi -value dependency
-
says is that we have a
-
third tuple, V and
-
V again has the same
-
A and it has
-
its B value from tuple
-
T. So it has B-1,
-
but it has its rest value
-
from tuple U, so then
-
it has R-2 here.
-
So again what we're saying is
-
that if we have these first
-
two tuples T and U,
-
then we also have tuple
-
V. Now let me do something a little tricky.
-
Let me swap the roles of
-
T and U and show
-
that we also with this definition,
-
are guaranteed to have a
-
fourth tuple and we'll call
-
that fourth tuple W. By
-
swapping the roles of
-
T and U, W has again the same A value.
-
Now it will take its B
-
value from U and that
-
will give us B2, and
-
we'll take its rest
-
value from T and that gives us R1.
-
So what we can see
-
here is that when we
-
have the first two tuples
-
that have this particular combination of
-
B values and rest values,
-
it tells us we
-
must have the other combinations as well.
-
We must have B1 with R2,
-
and B2 with R1.
-
What it's really saying
-
is those B values and
-
the rest values are independent of each other and we'll have all combinations.
-
So that might get you thinking
-
back to our colleges and hobbies.
-
Incidentally, sometimes multi-value dependencies
-
are called tuple generating dependencies.
-
And that's because the definition is
-
is about having additional tuples
-
when you have some existing tuples,
-
unlike functional dependencies which
-
just talk about the relationships among existing tuples.
-
So let's go back to our example.
-
Now we have students applying to colleges and having hobbies.
-
Those are independent facts about the student.
-
We'll write our multi-value dependency
-
as 'social security number multi
-
determine C name' and
-
now lets use some example
-
data to see our definition and how it works here.
-
Here's our apply relation with the
-
social security number, the college
-
name and the hobby.
-
Let's suppose that we have
-
a student, 123 who's applied
-
to Stanford and plays the trumpet.
-
Now, let's suppose that same
-
student, 123, has applied
-
to Berkeley and plays tennis.
-
So what our multivalued dependency
-
says, and let's make this
-
tuple T and tuple U,
-
is that there's a further tuple
-
V. V takes the
-
same social security number and
-
it takes the first value
-
for the college name and
-
the second for the hobby.
-
It says if we have
-
a 123 playing trumpet at Stanford
-
and tennis at Berkeley, then that
-
same person will be playing tennis at Stanford.
-
Furthermore, I show that the same definition will generate automatically.
-
A fourth tuple with the
-
other combination which would be Berkley and Trumpet.
-
By the way one thing you
-
might notice here is that
-
we also have the multivalued
-
dependency, social security number multi determines hobby.
-
This is actually one of
-
the rules for multivalued dependency saying
-
that when you have A determines
-
B, then you, A multidetermines
-
B, then you also have A
-
multi determines rest and we'll
-
see some rules for multivalued dependencies later.
-
Let's look quickly at a
-
modification of our example where
-
the real world assumptions about the data are different.
-
So we still have exactly the same relation with the same attributes.
-
But let's suppose that we don't
-
want to reveal every hobby to every college.
-
Maybe we'll decide that we don't
-
want Stanford to know that
-
we're a surfer or Berkeley
-
to know that we're on the speech and debate team.
-
So if that's the case, then
-
what multivalued dependencies do we have in this relation?
-
We actually have none.
-
And we don't have any functional dependencies either by the way.
-
And is this a good design?
-
Well, actually I would argue yes.
-
In this case, this design
-
is a good one because
-
we're not going to have that
-
multiplicative effect of information.
-
Every tuple that we have
-
in the applied relation will
-
be an independent piece of important information.
-
Let's look at one more example
-
before we go on to talk about properties of multivalued dependencies.
-
I've extended the apply relation
-
now to not only include colleges
-
and hobbies but also the
-
date of application to a
-
college, and the major or majors that are being applied for.
-
Let's continue to assume that hobbies are revealed to college selectively.
-
We don't need to have same
-
hobbies for each college that a student applies to.
-
Secondly, lets assume that
-
we restrict students to apply
-
only once to each college,
-
but what I what we mean
-
by that is just on one day.
-
A student can still apply
-
to multiple majors at a
-
single college and to different majors at different colleges.
-
Let's also assume that majors
-
are independent of hobbies, which seems to make sense.
-
It takes some thinking to come
-
up with the right functional and
-
multivalued dependencies to capture these
-
constraints, but here they are.
-
The first one when we
-
say that we reveal hobbies to
-
college selectively is actually
-
the absence of a multivalued
-
dependency on hobbies and colleges.
-
The second one says as
-
we apply once to each
-
college, or on one particular
-
day to each college, so
-
that would say that when
-
we have a particular student
-
and a particular college that always
-
going to have the same date,
-
so any two tuples for
-
a student and college combination will be on the same date.
-
The last dependency that
-
we will have involves the independence
-
of the majors that are
-
being applied for and the
-
hobbies that a student has, so
-
we'll write that as the
-
multivalue dependency social security
-
number, plus college name,
-
plus date, multidetermines major,
-
and remember what that's saying
-
is that major, for a
-
given student, college, and
-
date the majors that
-
they apply for are independent
-
of what we call the rest,
-
which in this case is the hobbies.
-
So, you might take some time to
-
look at the formal definitions of
-
functional dependencies, multivalue dependencies, and
-
maybe write out some sample
-
data to convince yourself that
-
these are the dependencies that
-
are capturing the assumptions that we make about the real world.
-
Like with functional dependencies we
-
have a notion of trivial dependency
-
those that always hold we
-
also have some rule for multi valued dependencies.
-
The definition for a trivial
-
multi valued dependency A multi
-
determines B is in
-
this case, that either B
-
is a subset of A,
-
or A union B
-
are all attributes, a multi-value
-
dependency is non-trivial if that's not the case.
-
So let's take the look at
-
why these multi-value dependencies are trivial.
-
So let's start with the first
-
case where we have our
-
attributes A and the rest
-
and then attributes B are a
-
subset of A so lets
-
say that these are attributes B.
-
So what are definition of multi-valued
-
dependencies says that when we
-
have the same values for
-
A in two tuples, so
-
here A and A, then
-
we have every combination of the
-
B values and the rest,
-
well obviously we do since
-
the B's are subsets of the
-
A's here, the B values
-
are going to be the same as
-
well and we clearly have every combination.
-
For the other case of trivial multi-value dependencies.
-
We have A and B together
-
being all attributes of the
-
relation, so in that
-
case, there's no rest, so
-
clearly we have every combination
-
of values of A and
-
B and rest, because there's no rest to combine with.
-
Like with functional dependencies there are
-
a whole bunch of rules that hold for multi-valued dependencies.
-
We will just talk about three of
-
them, and the first one is
-
the most important and interesting,
-
and that's the rule that says
-
if we have a functional dependency
-
from A to B then we
-
also have a multi-valued dependency
-
from A to B. And I'm
-
gonna go ahead and prove that rule
-
for you again because this is an important one.
-
I'm going do this proof using a
-
template for the relation similar to
-
the what I did with rules for functional dependencies.
-
So let's say we have our
-
A attributes, our B attributes,
-
and our rest, and what
-
we need to prove, to prove
-
the multi-value dependencies, is when
-
there are tuples T and
-
U with a certain form,
-
there exists a tuple V of another form.
-
So let's fill in some values first for the tuples.
-
So Let's say that we
-
have A and A here,
-
that's what we need for the
-
equality of the A values.
-
Then we have B1 and R1,
-
and we have B2 and
-
R2, and in order
-
to prove this multi-value dependency,
-
I need to prove that there
-
exists a tuple V that has
-
the same A value that it
-
has B1 from tuple T
-
and R2 from Tuple U,
-
and what I have in order
-
to prove that is the fact
-
that we have a functional dependency from
-
A to B. Because we
-
have the functional dependencies and because
-
T and U have the same A value.
-
What that tells us is
-
that B1 equals B2 here.
-
And so if B1 equals B2
-
then we know that this value
-
B1 here is equivalent
-
to B2 and in order
-
to prove the existence of this
-
tuple well we have that tuple
-
here already and we're done.
-
So you might check that again,
-
but what that says is
-
using the knowledge of a
-
functional dependency we can prove
-
that we always have a corresponding
-
multivalued dependency there are
-
a couple more rules for
-
multivalued dependencies that you can
-
prove for yourself if you're so inclined.
-
The first one is the intersection
-
rule, it says that if
-
we have A multi determines
-
B and A multi determines
-
C, then we have
-
A multi determines B
-
intersects C. The transitive
-
rule is slightly different than from exact transitivity.
-
What it says is if
-
we have A multi determine B,
-
and we have B multi determines
-
C then we have
-
A multi determined not
-
C exactly but C minus
-
B.
And you might work
-
some examples because it yourself
-
why we don't have just
-
A multi determines B and
-
to subtract the attributes for B, although it's fairly complicated.
-
So again these rules can
-
be proven and there are many
-
other rules of multivalued dependencies
-
that you can read about in any of the readings provided on our website.
-
By the way, regarding rules, let's
-
come back to the fact
-
that every functional dependency is a multivalued dependency.
-
So we can use another Venn diagram.
-
This is different than our previous one.
-
We can list all of our
-
multivalued dependencies here and the
-
functional dependencies are a
-
subset of those, so what
-
that tells us is if we
-
ever have a rule that applies
-
for multivalued dependencies here, that
-
will cover the entire Ven diagram
-
and so that rule will apply for functional dependencies as well.
-
So every rule for MVDs is
-
also a rule for functional dependencies.
-
On the other hand if we
-
have a rule that applies
-
for functional dependencies that rule
-
does not necessarily have to
-
apply all multivalued dependencies because
-
it might be specialized just for this portion of the Venn diagram.
-
So an example of such a rule is the splitting rule.
-
The splitting rule is a
-
rule that applies to functional dependencies,
-
but does not always apply to multivalued dependencies.
-
And again you could work an example to convince yourself of that fact.
-
Woo.
-
So after all that set up
-
of multivalue dependencies, we're finally
-
ready to talk about fourth normal form.
-
The definition of fourth normal
-
form looks very similar to the one for Boyce-Codd normal form.
-
Says we take a relation and
-
we take now a set of
-
multivalued dependencies for that
-
relation and the relation
-
is in fourth normal form if
-
every non-trivial
-
multivalued dependency has on
-
it's left hand side a key.
-
Remember for functional dependencies it
-
looks exactly the same except
-
we have the functional dependency all here instead of multivalued dependencies.
-
So, let's see exactly what fourth
-
normal form telling us and why it's a good thing.
-
So we have A, B, and
-
rest as usual and let's
-
suppose that we have a non trivial multivalued dependency.
-
So that's telling us that
-
if we have 2 tuples, T
-
and U and we'll
-
put in some values for B
-
and the rest, then we're going
-
to have the combination of those, as well.
-
So, that's kind of the proliferation of
-
tuples we get when we
-
squish independent facts in the same relation.
-
But, if the left
-
side is a key, so if
-
the A attributes are
-
a key here then we won't have
-
those 2 tuples and will
-
never have to worry about the proliferation.
-
Now, remember that I said fourth
-
normal form implies Boyce-Codd Normal Form.
-
Or if you prefer it in
-
Venn diagram format, Fourth
-
Normal Form is stronger than
-
Boyce-Codd Normal Form and let's see why that's the case.
-
If we have a fourth
-
normal form and we want
-
to show that we're in Boyce-Codd normal
-
form, then we have to
-
show that if we have
-
a functional dependency then
-
the left hand side A is a key.
-
That would tell us we're in Boyce-Codd normal form.
-
Well, if we have a functional
-
dependency, we had a rule
-
that tells us we also have
-
the multivalued dependency and then
-
since we're in fourth normal form, we get that A as a key.
-
So again, fourth normal form
-
implies Boyce-Codd normal form.
-
Now let's take a look at
-
the decomposition of algorithm into fourth normal form.
-
It's extremely similar to the
-
BCNF decomposition algorithm.
-
The input is a relation.
-
A set of functional dependencies
-
and multi value dependencies and we
-
need to separate them because
-
we use the functional dependencies to find keys.
-
The output is a decomposition
-
of R into good relations,
-
in this case fourth normal form,
-
and it's a good decomposition in
-
the sense that reassembling the relations gives you back the original.
-
As with Boyce-Codd normal form
-
we start by computing keys using
-
the functional dependencies, and then
-
we repeat the decomposition process
-
until all of our relations are in fourth normal
-
form.
-
Just as with functional dependencies
-
in BCNF, we pick a relation
-
that has a violating dependency, in
-
this case a multi-value dependency, and
-
we split the relation based on that dependency.
-
So we create one relation that
-
has the attributes of the dependency
-
and another relation that has
-
the left-hand side of the dependency and the rest of the attributes.
-
After that, we need to
-
compute the functional dependencies for
-
the decomposed relation and the
-
multi-value dependencies for it, and then we can compute the keys.
-
Now finding these multi-value
-
dependencies is actually a fairly complex process.
-
Usually it's very intuitive,
-
but I'm going to refer you
-
to the readings to read about the algorithm itself.
-
And in fact, it can be
-
so complicated in the general
-
case that some of the readings don't even provide the algorithm.
-
But again, in general, it's very intuitive.
-
Our first example is going to be very fast to do.
-
As you remember, this example has
-
one multi-value dependency - social
-
security number determines college name,
-
multi determines college name -
-
and it has no keys other than all of the attributes.
-
So obviously, this is a
-
violating multi value dependency,
-
and so we decompose into two
-
relations, we'll call them A1 and A2.
-
The first one has the attributes
-
of the multivalue dependency, the
-
social security number and
-
the college name, and the second
-
one has the left hand
-
side multivalued dependency as well
-
as all the remaining attributes, which in this case is the hobby.
-
These two decomposed relations actually
-
have no FDs and no
-
MVDs so in that
-
case we're definitely in 4th
-
normal form and we're done
-
with the decomposition and I think
-
we can agree that this looks like
-
a good schema for the data at hand.
-
Our second example is quite a bit more complicated.
-
Remember in this example we
-
have that the social
-
security number and college name
-
functionally determine date.
-
That means we have each student
-
applies to each college on a specific date.
-
And secondly, we assume that
-
majors that were being applied for were independent of hobbies.
-
So we have social security number,
-
college name and date
-
multi determines the major.
-
And incidentally that would mean it multi determines the hobby too.
-
Once again, we have no keys
-
for the relation, except for all attributes.
-
So we have both a violating
-
functional dependency in this case
-
and we have a violating multivalue dependency.
-
Let's use the multivalue dependency for
-
our first decomposition step.
-
So we'll create A1 and A2.
-
A1 will contain all the attributes of our multivalued dependency.
-
And then A2 will contain
-
all the remaining attributes along with
-
the left hand side of our multivalue dependency.
-
And that turns out to be all of the attributes except the major.
-
Now let's look at our
-
decomposed relations and see
-
what we have in terms of
-
functional dependencies and multi-value dependencies for them.
-
So after the decomposition, we don't
-
have any more multivalued dependency but
-
our functional dependency actually applies
-
to both of the decomposed
-
relations and we still
-
don't have a key on the left hand side.
-
So we need to decompose further based on the first functional dependency.
-
So let's start by decomposing A1.
-
We'll turn A1 into A3
-
and A4, and A3 will have
-
the functional dependency, all three attributes.
-
And then A4 will have the
-
left side of the functional
-
dependency and the remaining
-
attributes, which in this case is the major.
-
So now we're finished with A1
-
and we have a similar problem with A2.
-
And so we decompose A2
-
similarly, although we'll discover
-
that A3 is the same relation in
-
the decomposition of A2 as we got with A1.
-
So we actually only add one
-
more relation now, which is A5.
-
That contains the social
-
security number and the college
-
name from the left side of
-
the functional dependency and the
-
hobby, which is the remaining attribute.
-
And then we cross out A2.
-
Now the only functional dependencies are multi-value dependencies we have
-
left do have a key on the left-hand side.
-
I'll let you verify that for yourself.
-
And these three relations are our
-
final decomposition into 4th normal form.
-
And I think you will agree
-
that this is a good design
-
again for the data at hand.
-
So let's wrap up this long
-
unit on on dependencies and normal forms with a quick summary.
-
If we have a relation RABC, a
-
functional dependency from A
-
to B tells us that
-
when we have the same A values,
-
we have the same B values,
-
and the Boyce-Codd normal form tells
-
us to factor that...those attributes
-
into their own relation so that
-
we don't repeat that relationship over and over.
-
For multi-value dependencies, let's say
-
we have the relation RABCD,
-
and if we have
-
the multi-value dependency A multi
-
determines B, what that tells
-
us is that we have every combination
-
for a given A of B
-
values and CD values
-
- we called those rest earlier -
-
and when we have that multiplicative effect
-
of combinations, again, we take
-
the A and B attributes and
-
we put them in a separate
-
relation so that we can
-
separate out those facts from
-
the independent fact of A and its CD values.
-
Finally, in the design process,
-
multi-value dependencies are something
-
we add to functional dependencies, only they're stronger.
-
So fourth normal form is
-
a stronger property than Boyce-Codd normal form.
-
Now usually this design process
-
works very well and is
-
very intuitive for many schemas, I hope for the examples that I gave here.
-
But there are actually a few
-
shortcomings to using Boyce-Codd
-
Normal Form or Fourth Normal Form
-
and we'll cover those in the next video.