1 00:00:00,120 --> 00:00:05,631 I'm Uta Francke. I'm a Human Geneticist originally from Germany as you can hear. 2 00:00:05,631 --> 00:00:08,031 And I'm an Emeritus Professor here at 3 00:00:08,031 --> 00:00:11,800 Stanford and also Senior Medical Director at 23andMe. 4 00:00:11,800 --> 00:00:14,652 >> So, we have three billion based pairs and just one copy 5 00:00:14,652 --> 00:00:18,061 of the human genome. And we, we've learned that, you know we have 6 00:00:18,061 --> 00:00:21,077 two copies of the human genome of the nuclear genome in, in our 7 00:00:21,077 --> 00:00:25,240 cells. And we've been learning a bit about the coding regions, the protein 8 00:00:25,240 --> 00:00:30,460 coding regions. what, what can you tell us about the relative size of 9 00:00:30,460 --> 00:00:34,210 the protein coding regions of the genome versus the sort of non coding regions? 10 00:00:34,210 --> 00:00:37,890 >> The big question was how many protein coding genes 11 00:00:37,890 --> 00:00:42,320 are there in the genome? And initially, before the genome project 12 00:00:42,320 --> 00:00:44,860 the, the number that's been kicked around was like 100,000. 13 00:00:44,860 --> 00:00:49,960 Sort of, a wild guess. And then, between 2000 and 2003, 14 00:00:49,960 --> 00:00:53,110 there was an official betting game going on 15 00:00:53,110 --> 00:00:55,310 where people could put down a dollar and a 16 00:00:55,310 --> 00:01:00,290 number. And then years later, it was $5 and then it was $20 because in a way, it 17 00:01:00,290 --> 00:01:03,800 became more difficult. And everybody could only put on 18 00:01:03,800 --> 00:01:06,500 down one bet. About 150 people put down the 19 00:01:06,500 --> 00:01:09,920 bet, and their range, the ranges of their guesses 20 00:01:09,920 --> 00:01:15,109 went from, you wouldn't believe it. 25,000 to 150,000. 21 00:01:15,109 --> 00:01:15,460 >> Wow. 22 00:01:15,460 --> 00:01:19,051 >> So nobody had really any good idea. 23 00:01:19,051 --> 00:01:23,080 The mean was around 61,000 genes. And when the 24 00:01:23,080 --> 00:01:27,435 first draft was published they said, well we had a brief look at it and we think 25 00:01:27,435 --> 00:01:30,235 it's between 30 and 35,000. And when it 26 00:01:30,235 --> 00:01:32,881 was finally finished, the number had come down to 27 00:01:32,881 --> 00:01:37,092 20 to 25,000. And many people were surprised because 28 00:01:37,092 --> 00:01:40,260 this is hardly any more than the, the round 29 00:01:40,260 --> 00:01:44,290 worm or the fruit fly. So the humans are so much 30 00:01:44,290 --> 00:01:48,558 more complex, shouldn't they have more protein coding genes? So the 31 00:01:48,558 --> 00:01:52,846 complexity cannot be immediately dedu, deduced from the number of genes, 32 00:01:52,846 --> 00:01:56,440 when you don't know what these, these genes can be used for. 33 00:01:56,440 --> 00:01:58,230 >> So what percentage of the genome 34 00:01:58,230 --> 00:02:00,900 is actually made up of coding region, roughly? 35 00:02:00,900 --> 00:02:05,759 >> It's only 1 to 2%. And so all this other, 36 00:02:05,759 --> 00:02:10,539 you know, 90, 98 to 99% then. I mean, a lot of people who you know, are 37 00:02:10,539 --> 00:02:12,290 just learning about the genome may be wondering, 38 00:02:12,290 --> 00:02:14,840 what exactly is the rest of that sequence doing? 39 00:02:14,840 --> 00:02:18,340 >> You see, originally people thought it was just junk. 40 00:02:18,340 --> 00:02:21,230 It was just a virus getting in and replicating itself. 41 00:02:21,230 --> 00:02:22,041 >> Mm-hm. 42 00:02:22,041 --> 00:02:26,340 >> And, in recent years, people started to look at how much of that 43 00:02:26,340 --> 00:02:28,430 sequence is being made into RNA. How 44 00:02:28,430 --> 00:02:30,970 much is being transcribed. And to everyone's 45 00:02:30,970 --> 00:02:38,060 surprise more than 80% is actually made into a copy of RNA. And these RNAs have 46 00:02:38,060 --> 00:02:43,830 all kinds of interesting functions. For example, to regulate activity of 47 00:02:43,830 --> 00:02:49,540 other genes. To regulate the activity of messenger RNAs, how they are being 48 00:02:49,540 --> 00:02:54,270 translated. And many different function that the 49 00:02:54,270 --> 00:02:57,102 RNAs have. Some of them are structural. 50 00:02:57,102 --> 00:03:00,306 There are RNA set as structural components 51 00:03:00,306 --> 00:03:03,533 of the ribosome. And otherwise, outside of the 52 00:03:03,533 --> 00:03:06,269 coding sequence are control regions that are 53 00:03:06,269 --> 00:03:09,370 important to regulate the activity of each gene. 54 00:03:09,370 --> 00:03:14,180 >> I see. So, we learn in this lesson about messenger RNA. 55 00:03:14,180 --> 00:03:14,721 >> Mm-hm. 56 00:03:14,721 --> 00:03:17,080 >> mRNA. And so what you're saying, there are actually other kinds of 57 00:03:17,080 --> 00:03:21,500 RNAs that can be made besides mRNA that doesn't get turned into protein? 58 00:03:21,500 --> 00:03:22,520 >> That's right. 59 00:03:22,520 --> 00:03:25,330 And what was found out recently, that you remember 60 00:03:25,330 --> 00:03:27,655 there are two strands in the DNA and the messenger 61 00:03:27,655 --> 00:03:30,070 RNAs only made of one strand that gives the 62 00:03:30,070 --> 00:03:34,080 information for the protein. But what is being found out 63 00:03:34,080 --> 00:03:37,540 now is that there are anti-sense RNAs that actually 64 00:03:37,540 --> 00:03:40,160 the other strand of DNA can also be made into 65 00:03:40,160 --> 00:03:43,390 RNA. It goes in the opposite direction. It has 66 00:03:43,390 --> 00:03:48,310 no coding function for proteins, usually not, sometimes it does. 67 00:03:48,310 --> 00:03:52,380 But it has regulatory function. And you just can imagine, if a gene 68 00:03:52,380 --> 00:03:55,230 is transcribed in the other direction, then 69 00:03:55,230 --> 00:03:57,450 the transcription of the messenger RNA has 70 00:03:57,450 --> 00:04:02,204 a problem. You know, it runs into, it's a train wreck, right? So 71 00:04:02,204 --> 00:04:07,590 there was a regulation of gene activity by anti sense RNA that's one mechanism. 72 00:04:07,590 --> 00:04:08,890 >> Right. So it sounds like maybe some of the 73 00:04:08,890 --> 00:04:13,500 complexity of different species or complexity of cells is in part 74 00:04:13,500 --> 00:04:16,560 not so much the, the sheer content. The number of genes you have, 75 00:04:16,560 --> 00:04:22,265 but maybe how you regulate all of those genes together to, to create something. 76 00:04:22,265 --> 00:04:25,880 >> What we are finding out now is that the genomic 77 00:04:25,880 --> 00:04:30,910 regions communicate with each other. Like you can have an enhancer region 78 00:04:30,910 --> 00:04:34,890 that is downstream away from the gene or even in an entrant 79 00:04:34,890 --> 00:04:39,930 of another gene that then falls over, communicates with the promoter and 80 00:04:39,930 --> 00:04:44,200 sets in motion the messenger RNA sentences. So the 81 00:04:44,200 --> 00:04:47,850 whole genome is three-dimensional. It's not just a one 82 00:04:47,850 --> 00:04:51,866 dimensional series of letters. There's a lot of three-dimensional 83 00:04:51,866 --> 00:04:55,790 arrangement and interaction that's very important for its function.