This video continues the topic of relational design, talking specifically about multivalued dependencies and Fourth Normal Form. I know I've reminded you many times about relational designed by decomposition, so let me do it very quickly. The designer specifies mega relations with all the information they want to store and properties of the data. The system decomposes the mega relations into smaller relations that have good properties -- no anomalies and no lost information. When we have functional dependencies, as are properties of the data, we get Boyce-Codd Normal form, and then when we add to the functional dependencies multi-value dependencies we get fourth normal form. And the specification of multi- value dependencies and decomposition into Fourth Normal Form is the topic of this video. As a reminder from earlier, Fourth Normal Form is stronger than Boyce-Codd Normal Form so if we have here all of the relations that are in Boyce- Codd normal form, some subset of those are also in fourth normal form. When we have functional dependencies, we can guarantee Boyce Codd normal form and then when we add multi-value dependencies that's what allows us to narrow down to the stronger property of fourth normal form. So let's start with an example. We have information about students applying to colleges. The student is identified by their social security number. They may apply to several colleges and in this video we're not going to have college states, just college names. We'll assume they're unique. And then the student may have hobbies. And they may apply, as I've said, to several colleges and have several hobbies, but let's assume for now those are independent. So do we have any functional dependencies for this relation? Actually we don't have any all. The social security number does not determine the college name or the hobby or anything in the other direction. With no functional dependencies, the only key for the relation then is all attributes of the relation. So is this relation in Boyce-Codd Normal Form? As you might remember from the previous video, Boyce-Codd Normal Form says all functional dependencies have a key on the left hand side. Well, since we have no functional dependencies then the answer is yes. It is in Boyce-Codd normal form. On the other hand do we think this is a good design? I'm going to say no this is not a good design Why not? Well, let's suppose that somebody applies to five colleges and they have, say, six hobbies. Then to have all combinations of colleges and hobbies that would yield 30 tuples in the relation and clearly that's not a good idea. We'd rather separate the college and hobbies if they are independent. So the separation of independent facts is what fourth normal form is about. And now let's get a little bit more formal. Like functional dependencies, multivalued dependencies are specified based on knowledge of the real world constraints on the data being captured and all instances of a relation with a multivalued dependency must adhere to the dependency. Now let's define exactly what a multi value dependency is. For relation R we write a multi value dependency using a double headed arrow and we say 'A' multi determines 'B'. In this case, again, with 'A' and 'B' possibly being sets of attributes, so that would be A one through A N and B one through B M, which I'm abbreviating with A bar and B bar. So let me write the formal definition of A multi determines B. Again using first order logic similarly to what we did with functional dependencies but this one's a bit more complicated. It says for all tuples T and U that are in relation R, if T with the attributes A of T equal U for the attributes A of U. Again these are lists of attributes. So if the two tuples agree on their A values then, and remember for functional dependencies it was simple we just said they agreed on their B values. But now it gets more complicated. We're going to say that there exists a third tuple V in R that has the following properties. V has the same A values as T and U. So V sub A equals T sub A, furthermore V has its B value, okay, drawn from T so it's equal there. And finally it has its rest, so those are all the attributes other than A and B equal to U rest. Okay, so that's a mouthful but let's look at that pictorially. So here's our relation R and we'll have the set of attributes A, the set of attributes B and the rest of the attributes. And now let's make some tuples. So let's say that this is tuple T and this is tuple U. And we said that T and U agree on their A values. So they have the same A values and then they don't have to have the same B values. So we'll call the first one B-1 and the second one B-2 and then for the rest we'll call this R-1 and R-2. So what the multi -value dependency says is that we have a third tuple, V and V again has the same A and it has its B value from tuple T. So it has B-1, but it has its rest value from tuple U, so then it has R-2 here. So again what we're saying is that if we have these first two tuples T and U, then we also have tuple V. Now let me do something a little tricky. Let me swap the roles of T and U and show that we also with this definition, are guaranteed to have a fourth tuple and we'll call that fourth tuple W. By swapping the roles of T and U, W has again the same A value. Now it will take its B value from U and that will give us B2, and we'll take its rest value from T and that gives us R1. So what we can see here is that when we have the first two tuples that have this particular combination of B values and rest values, it tells us we must have the other combinations as well. We must have B1 with R2, and B2 with R1. What it's really saying is those B values and the rest values are independent of each other and we'll have all combinations. So that might get you thinking back to our colleges and hobbies. Incidentally, sometimes multi-value dependencies are called tuple generating dependencies. And that's because the definition is is about having additional tuples when you have some existing tuples, unlike functional dependencies which just talk about the relationships among existing tuples. So let's go back to our example. Now we have students applying to colleges and having hobbies. Those are independent facts about the student. We'll write our multi-value dependency as 'social security number multi determine C name' and now lets use some example data to see our definition and how it works here. Here's our apply relation with the social security number, the college name and the hobby. Let's suppose that we have a student, 123 who's applied to Stanford and plays the trumpet. Now, let's suppose that same student, 123, has applied to Berkeley and plays tennis. So what our multivalued dependency says, and let's make this tuple T and tuple U, is that there's a further tuple V. V takes the same social security number and it takes the first value for the college name and the second for the hobby. It says if we have a 123 playing trumpet at Stanford and tennis at Berkeley, then that same person will be playing tennis at Stanford. Furthermore, I show that the same definition will generate automatically. A fourth tuple with the other combination which would be Berkley and Trumpet. By the way one thing you might notice here is that we also have the multivalued dependency, social security number multi determines hobby. This is actually one of the rules for multivalued dependency saying that when you have A determines B, then you, A multidetermines B, then you also have A multi determines rest and we'll see some rules for multivalued dependencies later. Let's look quickly at a modification of our example where the real world assumptions about the data are different. So we still have exactly the same relation with the same attributes. But let's suppose that we don't want to reveal every hobby to every college. Maybe we'll decide that we don't want Stanford to know that we're a surfer or Berkeley to know that we're on the speech and debate team. So if that's the case, then what multivalued dependencies do we have in this relation? We actually have none. And we don't have any functional dependencies either by the way. And is this a good design? Well, actually I would argue yes. In this case, this design is a good one because we're not going to have that multiplicative effect of information. Every tuple that we have in the applied relation will be an independent piece of important information. Let's look at one more example before we go on to talk about properties of multivalued dependencies. I've extended the apply relation now to not only include colleges and hobbies but also the date of application to a college, and the major or majors that are being applied for. Let's continue to assume that hobbies are revealed to college selectively. We don't need to have same hobbies for each college that a student applies to. Secondly, lets assume that we restrict students to apply only once to each college, but what I what we mean by that is just on one day. A student can still apply to multiple majors at a single college and to different majors at different colleges. Let's also assume that majors are independent of hobbies, which seems to make sense. It takes some thinking to come up with the right functional and multivalued dependencies to capture these constraints, but here they are. The first one when we say that we reveal hobbies to college selectively is actually the absence of a multivalued dependency on hobbies and colleges. The second one says as we apply once to each college, or on one particular day to each college, so that would say that when we have a particular student and a particular college that always going to have the same date, so any two tuples for a student and college combination will be on the same date. The last dependency that we will have involves the independence of the majors that are being applied for and the hobbies that a student has, so we'll write that as the multivalue dependency social security number, plus college name, plus date, multidetermines major, and remember what that's saying is that major, for a given student, college, and date the majors that they apply for are independent of what we call the rest, which in this case is the hobbies. So, you might take some time to look at the formal definitions of functional dependencies, multivalue dependencies, and maybe write out some sample data to convince yourself that these are the dependencies that are capturing the assumptions that we make about the real world. Like with functional dependencies we have a notion of trivial dependency those that always hold we also have some rule for multi valued dependencies. The definition for a trivial multi valued dependency A multi determines B is in this case, that either B is a subset of A, or A union B are all attributes, a multi-value dependency is non-trivial if that's not the case. So let's take the look at why these multi-value dependencies are trivial. So let's start with the first case where we have our attributes A and the rest and then attributes B are a subset of A so lets say that these are attributes B. So what are definition of multi-valued dependencies says that when we have the same values for A in two tuples, so here A and A, then we have every combination of the B values and the rest, well obviously we do since the B's are subsets of the A's here, the B values are going to be the same as well and we clearly have every combination. For the other case of trivial multi-value dependencies. We have A and B together being all attributes of the relation, so in that case, there's no rest, so clearly we have every combination of values of A and B and rest, because there's no rest to combine with. Like with functional dependencies there are a whole bunch of rules that hold for multi-valued dependencies. We will just talk about three of them, and the first one is the most important and interesting, and that's the rule that says if we have a functional dependency from A to B then we also have a multi-valued dependency from A to B. And I'm gonna go ahead and prove that rule for you again because this is an important one. I'm going do this proof using a template for the relation similar to the what I did with rules for functional dependencies. So let's say we have our A attributes, our B attributes, and our rest, and what we need to prove, to prove the multi-value dependencies, is when there are tuples T and U with a certain form, there exists a tuple V of another form. So let's fill in some values first for the tuples. So Let's say that we have A and A here, that's what we need for the equality of the A values. Then we have B1 and R1, and we have B2 and R2, and in order to prove this multi-value dependency, I need to prove that there exists a tuple V that has the same A value that it has B1 from tuple T and R2 from Tuple U, and what I have in order to prove that is the fact that we have a functional dependency from A to B. Because we have the functional dependencies and because T and U have the same A value. What that tells us is that B1 equals B2 here. And so if B1 equals B2 then we know that this value B1 here is equivalent to B2 and in order to prove the existence of this tuple well we have that tuple here already and we're done. So you might check that again, but what that says is using the knowledge of a functional dependency we can prove that we always have a corresponding multivalued dependency there are a couple more rules for multivalued dependencies that you can prove for yourself if you're so inclined. The first one is the intersection rule, it says that if we have A multi determines B and A multi determines C, then we have A multi determines B intersects C. The transitive rule is slightly different than from exact transitivity. What it says is if we have A multi determine B, and we have B multi determines C then we have A multi determined not C exactly but C minus B. And you might work some examples because it yourself why we don't have just A multi determines B and to subtract the attributes for B, although it's fairly complicated. So again these rules can be proven and there are many other rules of multivalued dependencies that you can read about in any of the readings provided on our website. By the way, regarding rules, let's come back to the fact that every functional dependency is a multivalued dependency. So we can use another Venn diagram. This is different than our previous one. We can list all of our multivalued dependencies here and the functional dependencies are a subset of those, so what that tells us is if we ever have a rule that applies for multivalued dependencies here, that will cover the entire Ven diagram and so that rule will apply for functional dependencies as well. So every rule for MVDs is also a rule for functional dependencies. On the other hand if we have a rule that applies for functional dependencies that rule does not necessarily have to apply all multivalued dependencies because it might be specialized just for this portion of the Venn diagram. So an example of such a rule is the splitting rule. The splitting rule is a rule that applies to functional dependencies, but does not always apply to multivalued dependencies. And again you could work an example to convince yourself of that fact. Woo. So after all that set up of multivalue dependencies, we're finally ready to talk about fourth normal form. The definition of fourth normal form looks very similar to the one for Boyce-Codd normal form. Says we take a relation and we take now a set of multivalued dependencies for that relation and the relation is in fourth normal form if every non-trivial multivalued dependency has on it's left hand side a key. Remember for functional dependencies it looks exactly the same except we have the functional dependency all here instead of multivalued dependencies. So, let's see exactly what fourth normal form telling us and why it's a good thing. So we have A, B, and rest as usual and let's suppose that we have a non trivial multivalued dependency. So that's telling us that if we have 2 tuples, T and U and we'll put in some values for B and the rest, then we're going to have the combination of those, as well. So, that's kind of the proliferation of tuples we get when we squish independent facts in the same relation. But, if the left side is a key, so if the A attributes are a key here then we won't have those 2 tuples and will never have to worry about the proliferation. Now, remember that I said fourth normal form implies Boyce-Codd Normal Form. Or if you prefer it in Venn diagram format, Fourth Normal Form is stronger than Boyce-Codd Normal Form and let's see why that's the case. If we have a fourth normal form and we want to show that we're in Boyce-Codd normal form, then we have to show that if we have a functional dependency then the left hand side A is a key. That would tell us we're in Boyce-Codd normal form. Well, if we have a functional dependency, we had a rule that tells us we also have the multivalued dependency and then since we're in fourth normal form, we get that A as a key. So again, fourth normal form implies Boyce-Codd normal form. Now let's take a look at the decomposition of algorithm into fourth normal form. It's extremely similar to the BCNF decomposition algorithm. The input is a relation. A set of functional dependencies and multi value dependencies and we need to separate them because we use the functional dependencies to find keys. The output is a decomposition of R into good relations, in this case fourth normal form, and it's a good decomposition in the sense that reassembling the relations gives you back the original. As with Boyce-Codd normal form we start by computing keys using the functional dependencies, and then we repeat the decomposition process until all of our relations are in fourth normal form. Just as with functional dependencies in BCNF, we pick a relation that has a violating dependency, in this case a multi-value dependency, and we split the relation based on that dependency. So we create one relation that has the attributes of the dependency and another relation that has the left-hand side of the dependency and the rest of the attributes. After that, we need to compute the functional dependencies for the decomposed relation and the multi-value dependencies for it, and then we can compute the keys. Now finding these multi-value dependencies is actually a fairly complex process. Usually it's very intuitive, but I'm going to refer you to the readings to read about the algorithm itself. And in fact, it can be so complicated in the general case that some of the readings don't even provide the algorithm. But again, in general, it's very intuitive. Our first example is going to be very fast to do. As you remember, this example has one multi-value dependency - social security number determines college name, multi determines college name - and it has no keys other than all of the attributes. So obviously, this is a violating multi value dependency, and so we decompose into two relations, we'll call them A1 and A2. The first one has the attributes of the multivalue dependency, the social security number and the college name, and the second one has the left hand side multivalued dependency as well as all the remaining attributes, which in this case is the hobby. These two decomposed relations actually have no FDs and no MVDs so in that case we're definitely in 4th normal form and we're done with the decomposition and I think we can agree that this looks like a good schema for the data at hand. Our second example is quite a bit more complicated. Remember in this example we have that the social security number and college name functionally determine date. That means we have each student applies to each college on a specific date. And secondly, we assume that majors that were being applied for were independent of hobbies. So we have social security number, college name and date multi determines the major. And incidentally that would mean it multi determines the hobby too. Once again, we have no keys for the relation, except for all attributes. So we have both a violating functional dependency in this case and we have a violating multivalue dependency. Let's use the multivalue dependency for our first decomposition step. So we'll create A1 and A2. A1 will contain all the attributes of our multivalued dependency. And then A2 will contain all the remaining attributes along with the left hand side of our multivalue dependency. And that turns out to be all of the attributes except the major. Now let's look at our decomposed relations and see what we have in terms of functional dependencies and multi-value dependencies for them. So after the decomposition, we don't have any more multivalued dependency but our functional dependency actually applies to both of the decomposed relations and we still don't have a key on the left hand side. So we need to decompose further based on the first functional dependency. So let's start by decomposing A1. We'll turn A1 into A3 and A4, and A3 will have the functional dependency, all three attributes. And then A4 will have the left side of the functional dependency and the remaining attributes, which in this case is the major. So now we're finished with A1 and we have a similar problem with A2. And so we decompose A2 similarly, although we'll discover that A3 is the same relation in the decomposition of A2 as we got with A1. So we actually only add one more relation now, which is A5. That contains the social security number and the college name from the left side of the functional dependency and the hobby, which is the remaining attribute. And then we cross out A2. Now the only functional dependencies are multi-value dependencies we have left do have a key on the left-hand side. I'll let you verify that for yourself. And these three relations are our final decomposition into 4th normal form. And I think you will agree that this is a good design again for the data at hand. So let's wrap up this long unit on on dependencies and normal forms with a quick summary. If we have a relation RABC, a functional dependency from A to B tells us that when we have the same A values, we have the same B values, and the Boyce-Codd normal form tells us to factor that...those attributes into their own relation so that we don't repeat that relationship over and over. For multi-value dependencies, let's say we have the relation RABCD, and if we have the multi-value dependency A multi determines B, what that tells us is that we have every combination for a given A of B values and CD values - we called those rest earlier - and when we have that multiplicative effect of combinations, again, we take the A and B attributes and we put them in a separate relation so that we can separate out those facts from the independent fact of A and its CD values. Finally, in the design process, multi-value dependencies are something we add to functional dependencies, only they're stronger. So fourth normal form is a stronger property than Boyce-Codd normal form. Now usually this design process works very well and is very intuitive for many schemas, I hope for the examples that I gave here. But there are actually a few shortcomings to using Boyce-Codd Normal Form or Fourth Normal Form and we'll cover those in the next video.