This video continues the topic
of relational design, talking specifically
about multivalued dependencies and Fourth Normal Form.
I know I've reminded you many times
about relational designed by decomposition,
so let me do it very quickly.
The designer specifies mega relations
with all the information they
want to store and properties of the data.
The system decomposes the mega
relations into smaller relations that
have good properties -- no anomalies and no lost information.
When we have functional dependencies, as
are properties of the data, we
get Boyce-Codd Normal form, and
then when we add to the
functional dependencies multi-value dependencies
we get fourth normal form.
And the specification of multi-
value dependencies and decomposition into
Fourth Normal Form is the topic of this video.
As a reminder from earlier,
Fourth Normal Form is stronger
than Boyce-Codd Normal Form so
if we have here all of
the relations that are in Boyce-
Codd normal form, some subset
of those are also in fourth normal form.
When we have functional dependencies, we can guarantee Boyce Codd normal
form and then when we add
multi-value dependencies that's what
allows us to narrow down to
the stronger property of fourth normal form.
So let's start with an example.
We have information about students applying to colleges.
The student is identified by their social security number.
They may apply to several colleges
and in this video we're not going
to have college states, just college names.
We'll assume they're unique.
And then the student may have hobbies.
And they may apply, as I've said,
to several colleges and have several
hobbies, but let's assume for now those are independent.
So do we have any functional dependencies for this relation?
Actually we don't have any all.
The social security number does
not determine the college name
or the hobby or anything in the other direction.
With no functional dependencies, the
only key for the relation
then is all attributes of the relation.
So is this relation in Boyce-Codd Normal Form?
As you might remember from the
previous video, Boyce-Codd
Normal Form says all functional
dependencies have a key on the left hand side.
Well, since we have no functional dependencies then the answer is yes.
It is in Boyce-Codd normal form.
On the other hand do we think this is a good design?
I'm going to say no this is not a good design Why not?
Well, let's suppose that somebody
applies to five colleges
and they have, say, six hobbies.
Then to have all
combinations of colleges and
hobbies that would yield 30
tuples in the relation and clearly that's not a good idea.
We'd rather separate the college
and hobbies if they are independent.
So the separation of independent facts
is what fourth normal form is about.
And now let's get a little bit more formal.
Like functional dependencies, multivalued dependencies
are specified based on knowledge
of the real world constraints on
the data being captured and all
instances of a relation
with a multivalued dependency must adhere to the dependency.
Now let's define exactly what a multi value dependency is.
For relation R we write
a multi value dependency using
a double headed arrow and
we say 'A' multi determines 'B'.
In this case, again, with 'A'
and 'B' possibly being sets of attributes, so that would be A
one through A N
and B one through B
M, which I'm abbreviating with A bar and B bar.
So let me write the formal definition of A multi
determines B. Again using first
order logic similarly to what
we did with functional dependencies but this one's a bit more complicated.
It says for all tuples T and
U that are in relation
R, if T with
the attributes A of T equal U
for the attributes A of U. Again these are lists of attributes.
So if the two tuples agree on
their A values then, and
remember for functional dependencies it was
simple we just said they agreed on their B values.
But now it gets more complicated.
We're going to say that there exists
a third tuple V in R that has the following properties.
V has the same A
values as T and
U. So V sub A equals
T sub A, furthermore V
has its B value,
okay, drawn from T
so it's equal there.
And finally it has its
rest, so those are
all the attributes other than
A and B equal to U rest.
Okay, so that's a mouthful but let's look at that pictorially.
So here's our relation R and
we'll have the set of
attributes A, the set of
attributes B and the rest of the attributes.
And now let's make some tuples.
So let's say that this
is tuple T and this
is tuple U. And we
said that T and U
agree on their A values.
So they have the same A values
and then they don't have to have the same B values.
So we'll call the first one B-1
and the second one B-2 and
then for the rest we'll call this R-1 and R-2.
So what the multi -value dependency
says is that we have a
third tuple, V and
V again has the same
A and it has
its B value from tuple
T. So it has B-1,
but it has its rest value
from tuple U, so then
it has R-2 here.
So again what we're saying is
that if we have these first
two tuples T and U,
then we also have tuple
V. Now let me do something a little tricky.
Let me swap the roles of
T and U and show
that we also with this definition,
are guaranteed to have a
fourth tuple and we'll call
that fourth tuple W. By
swapping the roles of
T and U, W has again the same A value.
Now it will take its B
value from U and that
will give us B2, and
we'll take its rest
value from T and that gives us R1.
So what we can see
here is that when we
have the first two tuples
that have this particular combination of
B values and rest values,
it tells us we
must have the other combinations as well.
We must have B1 with R2,
and B2 with R1.
What it's really saying
is those B values and
the rest values are independent of each other and we'll have all combinations.
So that might get you thinking
back to our colleges and hobbies.
Incidentally, sometimes multi-value dependencies
are called tuple generating dependencies.
And that's because the definition is
is about having additional tuples
when you have some existing tuples,
unlike functional dependencies which
just talk about the relationships among existing tuples.
So let's go back to our example.
Now we have students applying to colleges and having hobbies.
Those are independent facts about the student.
We'll write our multi-value dependency
as 'social security number multi
determine C name' and
now lets use some example
data to see our definition and how it works here.
Here's our apply relation with the
social security number, the college
name and the hobby.
Let's suppose that we have
a student, 123 who's applied
to Stanford and plays the trumpet.
Now, let's suppose that same
student, 123, has applied
to Berkeley and plays tennis.
So what our multivalued dependency
says, and let's make this
tuple T and tuple U,
is that there's a further tuple
V. V takes the
same social security number and
it takes the first value
for the college name and
the second for the hobby.
It says if we have
a 123 playing trumpet at Stanford
and tennis at Berkeley, then that
same person will be playing tennis at Stanford.
Furthermore, I show that the same definition will generate automatically.
A fourth tuple with the
other combination which would be Berkley and Trumpet.
By the way one thing you
might notice here is that
we also have the multivalued
dependency, social security number multi determines hobby.
This is actually one of
the rules for multivalued dependency saying
that when you have A determines
B, then you, A multidetermines
B, then you also have A
multi determines rest and we'll
see some rules for multivalued dependencies later.
Let's look quickly at a
modification of our example where
the real world assumptions about the data are different.
So we still have exactly the same relation with the same attributes.
But let's suppose that we don't
want to reveal every hobby to every college.
Maybe we'll decide that we don't
want Stanford to know that
we're a surfer or Berkeley
to know that we're on the speech and debate team.
So if that's the case, then
what multivalued dependencies do we have in this relation?
We actually have none.
And we don't have any functional dependencies either by the way.
And is this a good design?
Well, actually I would argue yes.
In this case, this design
is a good one because
we're not going to have that
multiplicative effect of information.
Every tuple that we have
in the applied relation will
be an independent piece of important information.
Let's look at one more example
before we go on to talk about properties of multivalued dependencies.
I've extended the apply relation
now to not only include colleges
and hobbies but also the
date of application to a
college, and the major or majors that are being applied for.
Let's continue to assume that hobbies are revealed to college selectively.
We don't need to have same
hobbies for each college that a student applies to.
Secondly, lets assume that
we restrict students to apply
only once to each college,
but what I what we mean
by that is just on one day.
A student can still apply
to multiple majors at a
single college and to different majors at different colleges.
Let's also assume that majors
are independent of hobbies, which seems to make sense.
It takes some thinking to come
up with the right functional and
multivalued dependencies to capture these
constraints, but here they are.
The first one when we
say that we reveal hobbies to
college selectively is actually
the absence of a multivalued
dependency on hobbies and colleges.
The second one says as
we apply once to each
college, or on one particular
day to each college, so
that would say that when
we have a particular student
and a particular college that always
going to have the same date,
so any two tuples for
a student and college combination will be on the same date.
The last dependency that
we will have involves the independence
of the majors that are
being applied for and the
hobbies that a student has, so
we'll write that as the
multivalue dependency social security
number, plus college name,
plus date, multidetermines major,
and remember what that's saying
is that major, for a
given student, college, and
date the majors that
they apply for are independent
of what we call the rest,
which in this case is the hobbies.
So, you might take some time to
look at the formal definitions of
functional dependencies, multivalue dependencies, and
maybe write out some sample
data to convince yourself that
these are the dependencies that
are capturing the assumptions that we make about the real world.
Like with functional dependencies we
have a notion of trivial dependency
those that always hold we
also have some rule for multi valued dependencies.
The definition for a trivial
multi valued dependency A multi
determines B is in
this case, that either B
is a subset of A,
or A union B
are all attributes, a multi-value
dependency is non-trivial if that's not the case.
So let's take the look at
why these multi-value dependencies are trivial.
So let's start with the first
case where we have our
attributes A and the rest
and then attributes B are a
subset of A so lets
say that these are attributes B.
So what are definition of multi-valued
dependencies says that when we
have the same values for
A in two tuples, so
here A and A, then
we have every combination of the
B values and the rest,
well obviously we do since
the B's are subsets of the
A's here, the B values
are going to be the same as
well and we clearly have every combination.
For the other case of trivial multi-value dependencies.
We have A and B together
being all attributes of the
relation, so in that
case, there's no rest, so
clearly we have every combination
of values of A and
B and rest, because there's no rest to combine with.
Like with functional dependencies there are
a whole bunch of rules that hold for multi-valued dependencies.
We will just talk about three of
them, and the first one is
the most important and interesting,
and that's the rule that says
if we have a functional dependency
from A to B then we
also have a multi-valued dependency
from A to B. And I'm
gonna go ahead and prove that rule
for you again because this is an important one.
I'm going do this proof using a
template for the relation similar to
the what I did with rules for functional dependencies.
So let's say we have our
A attributes, our B attributes,
and our rest, and what
we need to prove, to prove
the multi-value dependencies, is when
there are tuples T and
U with a certain form,
there exists a tuple V of another form.
So let's fill in some values first for the tuples.
So Let's say that we
have A and A here,
that's what we need for the
equality of the A values.
Then we have B1 and R1,
and we have B2 and
R2, and in order
to prove this multi-value dependency,
I need to prove that there
exists a tuple V that has
the same A value that it
has B1 from tuple T
and R2 from Tuple U,
and what I have in order
to prove that is the fact
that we have a functional dependency from
A to B. Because we
have the functional dependencies and because
T and U have the same A value.
What that tells us is
that B1 equals B2 here.
And so if B1 equals B2
then we know that this value
B1 here is equivalent
to B2 and in order
to prove the existence of this
tuple well we have that tuple
here already and we're done.
So you might check that again,
but what that says is
using the knowledge of a
functional dependency we can prove
that we always have a corresponding
multivalued dependency there are
a couple more rules for
multivalued dependencies that you can
prove for yourself if you're so inclined.
The first one is the intersection
rule, it says that if
we have A multi determines
B and A multi determines
C, then we have
A multi determines B
intersects C. The transitive
rule is slightly different than from exact transitivity.
What it says is if
we have A multi determine B,
and we have B multi determines
C then we have
A multi determined not
C exactly but C minus
B.
And you might work
some examples because it yourself
why we don't have just
A multi determines B and
to subtract the attributes for B, although it's fairly complicated.
So again these rules can
be proven and there are many
other rules of multivalued dependencies
that you can read about in any of the readings provided on our website.
By the way, regarding rules, let's
come back to the fact
that every functional dependency is a multivalued dependency.
So we can use another Venn diagram.
This is different than our previous one.
We can list all of our
multivalued dependencies here and the
functional dependencies are a
subset of those, so what
that tells us is if we
ever have a rule that applies
for multivalued dependencies here, that
will cover the entire Ven diagram
and so that rule will apply for functional dependencies as well.
So every rule for MVDs is
also a rule for functional dependencies.
On the other hand if we
have a rule that applies
for functional dependencies that rule
does not necessarily have to
apply all multivalued dependencies because
it might be specialized just for this portion of the Venn diagram.
So an example of such a rule is the splitting rule.
The splitting rule is a
rule that applies to functional dependencies,
but does not always apply to multivalued dependencies.
And again you could work an example to convince yourself of that fact.
Woo.
So after all that set up
of multivalue dependencies, we're finally
ready to talk about fourth normal form.
The definition of fourth normal
form looks very similar to the one for Boyce-Codd normal form.
Says we take a relation and
we take now a set of
multivalued dependencies for that
relation and the relation
is in fourth normal form if
every non-trivial
multivalued dependency has on
it's left hand side a key.
Remember for functional dependencies it
looks exactly the same except
we have the functional dependency all here instead of multivalued dependencies.
So, let's see exactly what fourth
normal form telling us and why it's a good thing.
So we have A, B, and
rest as usual and let's
suppose that we have a non trivial multivalued dependency.
So that's telling us that
if we have 2 tuples, T
and U and we'll
put in some values for B
and the rest, then we're going
to have the combination of those, as well.
So, that's kind of the proliferation of
tuples we get when we
squish independent facts in the same relation.
But, if the left
side is a key, so if
the A attributes are
a key here then we won't have
those 2 tuples and will
never have to worry about the proliferation.
Now, remember that I said fourth
normal form implies Boyce-Codd Normal Form.
Or if you prefer it in
Venn diagram format, Fourth
Normal Form is stronger than
Boyce-Codd Normal Form and let's see why that's the case.
If we have a fourth
normal form and we want
to show that we're in Boyce-Codd normal
form, then we have to
show that if we have
a functional dependency then
the left hand side A is a key.
That would tell us we're in Boyce-Codd normal form.
Well, if we have a functional
dependency, we had a rule
that tells us we also have
the multivalued dependency and then
since we're in fourth normal form, we get that A as a key.
So again, fourth normal form
implies Boyce-Codd normal form.
Now let's take a look at
the decomposition of algorithm into fourth normal form.
It's extremely similar to the
BCNF decomposition algorithm.
The input is a relation.
A set of functional dependencies
and multi value dependencies and we
need to separate them because
we use the functional dependencies to find keys.
The output is a decomposition
of R into good relations,
in this case fourth normal form,
and it's a good decomposition in
the sense that reassembling the relations gives you back the original.
As with Boyce-Codd normal form
we start by computing keys using
the functional dependencies, and then
we repeat the decomposition process
until all of our relations are in fourth normal
form.
Just as with functional dependencies
in BCNF, we pick a relation
that has a violating dependency, in
this case a multi-value dependency, and
we split the relation based on that dependency.
So we create one relation that
has the attributes of the dependency
and another relation that has
the left-hand side of the dependency and the rest of the attributes.
After that, we need to
compute the functional dependencies for
the decomposed relation and the
multi-value dependencies for it, and then we can compute the keys.
Now finding these multi-value
dependencies is actually a fairly complex process.
Usually it's very intuitive,
but I'm going to refer you
to the readings to read about the algorithm itself.
And in fact, it can be
so complicated in the general
case that some of the readings don't even provide the algorithm.
But again, in general, it's very intuitive.
Our first example is going to be very fast to do.
As you remember, this example has
one multi-value dependency - social
security number determines college name,
multi determines college name -
and it has no keys other than all of the attributes.
So obviously, this is a
violating multi value dependency,
and so we decompose into two
relations, we'll call them A1 and A2.
The first one has the attributes
of the multivalue dependency, the
social security number and
the college name, and the second
one has the left hand
side multivalued dependency as well
as all the remaining attributes, which in this case is the hobby.
These two decomposed relations actually
have no FDs and no
MVDs so in that
case we're definitely in 4th
normal form and we're done
with the decomposition and I think
we can agree that this looks like
a good schema for the data at hand.
Our second example is quite a bit more complicated.
Remember in this example we
have that the social
security number and college name
functionally determine date.
That means we have each student
applies to each college on a specific date.
And secondly, we assume that
majors that were being applied for were independent of hobbies.
So we have social security number,
college name and date
multi determines the major.
And incidentally that would mean it multi determines the hobby too.
Once again, we have no keys
for the relation, except for all attributes.
So we have both a violating
functional dependency in this case
and we have a violating multivalue dependency.
Let's use the multivalue dependency for
our first decomposition step.
So we'll create A1 and A2.
A1 will contain all the attributes of our multivalued dependency.
And then A2 will contain
all the remaining attributes along with
the left hand side of our multivalue dependency.
And that turns out to be all of the attributes except the major.
Now let's look at our
decomposed relations and see
what we have in terms of
functional dependencies and multi-value dependencies for them.
So after the decomposition, we don't
have any more multivalued dependency but
our functional dependency actually applies
to both of the decomposed
relations and we still
don't have a key on the left hand side.
So we need to decompose further based on the first functional dependency.
So let's start by decomposing A1.
We'll turn A1 into A3
and A4, and A3 will have
the functional dependency, all three attributes.
And then A4 will have the
left side of the functional
dependency and the remaining
attributes, which in this case is the major.
So now we're finished with A1
and we have a similar problem with A2.
And so we decompose A2
similarly, although we'll discover
that A3 is the same relation in
the decomposition of A2 as we got with A1.
So we actually only add one
more relation now, which is A5.
That contains the social
security number and the college
name from the left side of
the functional dependency and the
hobby, which is the remaining attribute.
And then we cross out A2.
Now the only functional dependencies are multi-value dependencies we have
left do have a key on the left-hand side.
I'll let you verify that for yourself.
And these three relations are our
final decomposition into 4th normal form.
And I think you will agree
that this is a good design
again for the data at hand.
So let's wrap up this long
unit on on dependencies and normal forms with a quick summary.
If we have a relation RABC, a
functional dependency from A
to B tells us that
when we have the same A values,
we have the same B values,
and the Boyce-Codd normal form tells
us to factor that...those attributes
into their own relation so that
we don't repeat that relationship over and over.
For multi-value dependencies, let's say
we have the relation RABCD,
and if we have
the multi-value dependency A multi
determines B, what that tells
us is that we have every combination
for a given A of B
values and CD values
- we called those rest earlier -
and when we have that multiplicative effect
of combinations, again, we take
the A and B attributes and
we put them in a separate
relation so that we can
separate out those facts from
the independent fact of A and its CD values.
Finally, in the design process,
multi-value dependencies are something
we add to functional dependencies, only they're stronger.
So fourth normal form is
a stronger property than Boyce-Codd normal form.
Now usually this design process
works very well and is
very intuitive for many schemas, I hope for the examples that I gave here.
But there are actually a few
shortcomings to using Boyce-Codd
Normal Form or Fourth Normal Form
and we'll cover those in the next video.