0:00:00.120,0:00:05.631 I'm Uta Francke. I'm a Human Geneticist originally from Germany as you can hear. 0:00:05.631,0:00:08.031 And I'm an Emeritus Professor here at 0:00:08.031,0:00:11.800 Stanford and also Senior Medical Director at 23andMe. 0:00:11.800,0:00:14.652 >> So, we have three billion based pairs and just one copy 0:00:14.652,0:00:18.061 of the human genome. And we, we've learned that, you know we have 0:00:18.061,0:00:21.077 two copies of the human genome of the nuclear genome in, in our 0:00:21.077,0:00:25.240 cells. And we've been learning a bit about the coding regions, the protein 0:00:25.240,0:00:30.460 coding regions. what, what can you tell us about the relative size of 0:00:30.460,0:00:34.210 the protein coding regions of the genome versus the sort of non coding regions? 0:00:34.210,0:00:37.890 >> The big question was how many protein coding genes 0:00:37.890,0:00:42.320 are there in the genome? And initially, before the genome project 0:00:42.320,0:00:44.860 the, the number that's been kicked around was like 100,000. 0:00:44.860,0:00:49.960 Sort of, a wild guess. And then, between 2000 and 2003, 0:00:49.960,0:00:53.110 there was an official betting game going on 0:00:53.110,0:00:55.310 where people could put down a dollar and a 0:00:55.310,0:01:00.290 number. And then years later, it was $5 and then it was $20 because in a way, it 0:01:00.290,0:01:03.800 became more difficult. And everybody could only put on 0:01:03.800,0:01:06.500 down one bet. About 150 people put down the 0:01:06.500,0:01:09.920 bet, and their range, the ranges of their guesses 0:01:09.920,0:01:15.109 went from, you wouldn't believe it. 25,000 to 150,000. 0:01:15.109,0:01:15.460 >> Wow. 0:01:15.460,0:01:19.051 >> So nobody had really any good idea. 0:01:19.051,0:01:23.080 The mean was around 61,000 genes. And when the 0:01:23.080,0:01:27.435 first draft was published they said, well we had a brief look at it and we think 0:01:27.435,0:01:30.235 it's between 30 and 35,000. And when it 0:01:30.235,0:01:32.881 was finally finished, the number had come down to 0:01:32.881,0:01:37.092 20 to 25,000. And many people were surprised because 0:01:37.092,0:01:40.260 this is hardly any more than the, the round 0:01:40.260,0:01:44.290 worm or the fruit fly. So the humans are so much 0:01:44.290,0:01:48.558 more complex, shouldn't they have more protein coding genes? So the 0:01:48.558,0:01:52.846 complexity cannot be immediately dedu, deduced from the number of genes, 0:01:52.846,0:01:56.440 when you don't know what these, these genes can be used for. 0:01:56.440,0:01:58.230 >> So what percentage of the genome 0:01:58.230,0:02:00.900 is actually made up of coding region, roughly? 0:02:00.900,0:02:05.759 >> It's only 1 to 2%. And so all this other, 0:02:05.759,0:02:10.539 you know, 90, 98 to 99% then. I mean, a lot of people who you know, are 0:02:10.539,0:02:12.290 just learning about the genome may be wondering, 0:02:12.290,0:02:14.840 what exactly is the rest of that sequence doing? 0:02:14.840,0:02:18.340 >> You see, originally people thought it was just junk. 0:02:18.340,0:02:21.230 It was just a virus getting in and replicating itself. 0:02:21.230,0:02:22.041 >> Mm-hm. 0:02:22.041,0:02:26.340 >> And, in recent years, people started to look at how much of that 0:02:26.340,0:02:28.430 sequence is being made into RNA. How 0:02:28.430,0:02:30.970 much is being transcribed. And to everyone's 0:02:30.970,0:02:38.060 surprise more than 80% is actually made into a copy of RNA. And these RNAs have 0:02:38.060,0:02:43.830 all kinds of interesting functions. For example, to regulate activity of 0:02:43.830,0:02:49.540 other genes. To regulate the activity of messenger RNAs, how they are being 0:02:49.540,0:02:54.270 translated. And many different function that the 0:02:54.270,0:02:57.102 RNAs have. Some of them are structural. 0:02:57.102,0:03:00.306 There are RNA set as structural components 0:03:00.306,0:03:03.533 of the ribosome. And otherwise, outside of the 0:03:03.533,0:03:06.269 coding sequence are control regions that are 0:03:06.269,0:03:09.370 important to regulate the activity of each gene. 0:03:09.370,0:03:14.180 >> I see. So, we learn in this lesson about messenger RNA. 0:03:14.180,0:03:14.721 >> Mm-hm. 0:03:14.721,0:03:17.080 >> mRNA. And so what you're saying, there are actually other kinds of 0:03:17.080,0:03:21.500 RNAs that can be made besides mRNA that doesn't get turned into protein? 0:03:21.500,0:03:22.520 >> That's right. 0:03:22.520,0:03:25.330 And what was found out recently, that you remember 0:03:25.330,0:03:27.655 there are two strands in the DNA and the messenger 0:03:27.655,0:03:30.070 RNAs only made of one strand that gives the 0:03:30.070,0:03:34.080 information for the protein. But what is being found out 0:03:34.080,0:03:37.540 now is that there are anti-sense RNAs that actually 0:03:37.540,0:03:40.160 the other strand of DNA can also be made into 0:03:40.160,0:03:43.390 RNA. It goes in the opposite direction. It has 0:03:43.390,0:03:48.310 no coding function for proteins, usually not, sometimes it does. 0:03:48.310,0:03:52.380 But it has regulatory function. And you just can imagine, if a gene 0:03:52.380,0:03:55.230 is transcribed in the other direction, then 0:03:55.230,0:03:57.450 the transcription of the messenger RNA has 0:03:57.450,0:04:02.204 a problem. You know, it runs into, it's a train wreck, right? So 0:04:02.204,0:04:07.590 there was a regulation of gene activity by anti sense RNA that's one mechanism. 0:04:07.590,0:04:08.890 >> Right. So it sounds like maybe some of the 0:04:08.890,0:04:13.500 complexity of different species or complexity of cells is in part 0:04:13.500,0:04:16.560 not so much the, the sheer content. The number of genes you have, 0:04:16.560,0:04:22.265 but maybe how you regulate all of those genes together to, to create something. 0:04:22.265,0:04:25.880 >> What we are finding out now is that the genomic 0:04:25.880,0:04:30.910 regions communicate with each other. Like you can have an enhancer region 0:04:30.910,0:04:34.890 that is downstream away from the gene or even in an entrant 0:04:34.890,0:04:39.930 of another gene that then falls over, communicates with the promoter and 0:04:39.930,0:04:44.200 sets in motion the messenger RNA sentences. So the 0:04:44.200,0:04:47.850 whole genome is three-dimensional. It's not just a one 0:04:47.850,0:04:51.866 dimensional series of letters. There's a lot of three-dimensional 0:04:51.866,0:04:55.790 arrangement and interaction that's very important for its function.