WEBVTT 00:00:00.000 --> 00:00:19.090 36c3 preroll music 00:00:19.090 --> 00:00:24.929 Herald: Our next talk will be "The ultimate Acorn Archimedes Talk", in which 00:00:24.929 --> 00:00:28.819 there will be spoken about everything about the Archimedes computer. There's a 00:00:28.819 --> 00:00:33.360 promise in advance that there will be no heureka jokes in there. Give a warm 00:00:33.360 --> 00:00:35.483 welcome to Matt Evans. 00:00:35.483 --> 00:00:40.790 applause 00:00:40.790 --> 00:00:48.060 Matt Evans: Thank you. Okay. Little bit of retro computing first thing in the 00:00:48.060 --> 00:00:54.949 morning, sort of. Welcome. My name is Matt Evans. The Acorn Archimedes was my 00:00:54.949 --> 00:00:59.379 favorite computer when I was a small hacker and I'm privileged to be able to 00:00:59.379 --> 00:01:04.780 talk a bit little bit about it with you today. Let's start with: What is an Acorn 00:01:04.780 --> 00:01:08.720 Archimedes? So I'd like an interactive session, I'm afraid. Please indulge me, 00:01:08.720 --> 00:01:15.130 like a show of hands. Who's heard of the Acorn Archimedes before? Ah, OK, maybe 50, 00:01:15.130 --> 00:01:23.090 60%. Who has used one? Maybe 10%, maybe. Okay. Who has programs - 00:01:23.090 --> 00:01:30.139 who has coded on an Archimedes? Maybe half? Two, three people. Great. Okay. 00:01:30.139 --> 00:01:34.180 Three. laughs Okay, so a small percentage. I don't see these machines as 00:01:34.180 --> 00:01:39.650 being as famous as say the Apple Macintosh or IBM PC. And certainly outside of Europe 00:01:39.650 --> 00:01:44.030 they were not that common. So this is kind of interesting just how many people here 00:01:44.030 --> 00:01:49.840 have seen this. So it was the first ARM- based computer. This is an astonishingly 00:01:49.840 --> 00:01:55.530 1980s - I think one of them is drawing, actually. But they're not just the first 00:01:55.530 --> 00:02:01.439 ARM-based machine, but the machine that the ARM was originally designed to drive. 00:02:01.439 --> 00:02:07.230 It's a... Is that a comment for me? Mic? 00:02:07.230 --> 00:02:13.750 I'm being heckled already. It's only slide two. Let's see how this goes. So it's a 00:02:13.750 --> 00:02:18.849 two box computer. It looks a bit like a Mega S.T. ... to me. Its main unit with 00:02:18.849 --> 00:02:26.480 the processor and disks and expansion cards and so on. Now this is an A3000. 00:02:26.480 --> 00:02:30.519 This is mine, in fact, and I didn't bother to clean it before taking the photo. And 00:02:30.519 --> 00:02:33.335 now it's on this huge screen. That was a really bad idea. You can see all the 00:02:33.335 --> 00:02:37.429 disgusting muck in the keyboard. It has a bit of ink on it, I don't know why. But 00:02:37.429 --> 00:02:41.660 this machine is 30 years old. And this was luckily my machine, as I said, as 00:02:41.660 --> 00:02:45.069 a small hacker. And this is why I'm doing the talk today. This had a big influence 00:02:45.069 --> 00:02:52.540 on me. I'd like to say as a person, but more as an engineer. In terms of what my 00:02:52.540 --> 00:02:57.170 programing experience when I was learning to program and so on. So I live and work 00:02:57.170 --> 00:03:02.040 in Cambridge in the U.K., where this machine was designed. And through the 00:03:02.040 --> 00:03:05.470 funny sort of turn of events, I ended up there and actually work in the building 00:03:05.470 --> 00:03:09.310 next to the building where this was designed. And a bunch of the people that 00:03:09.310 --> 00:03:13.720 were on that original team that designed this system are still around and 00:03:13.720 --> 00:03:18.280 relatively contactable. And I thought this is a good opportunity to get on the phone 00:03:18.280 --> 00:03:21.760 and call them up or go for a beer with a couple of them and ask them: Why are 00:03:21.760 --> 00:03:25.280 things the way they are? There's all sorts of weird quirks to this machine. I was 00:03:25.280 --> 00:03:28.901 always wondering this, for 20 years. Can you please tell me - why did you do it 00:03:28.901 --> 00:03:33.330 this way? And they were a really good bunch of people. So I talked to Steve Ferber, 00:03:33.330 --> 00:03:37.790 who led the hardware design, Sophie Wilson, who was the same with software. 00:03:37.790 --> 00:03:43.350 Tudor Brown, who did the video system. Mike Miller, the IO system. John Biggs and 00:03:43.350 --> 00:03:46.489 Jamie Urquhart , who did the silicon design, I spoiled one of the 00:03:46.489 --> 00:03:50.140 surprises here. There's been some silicon design that's gone on in building this 00:03:50.140 --> 00:03:55.060 Acorn. And they were all wonderful people that gave me their time and told me a 00:03:55.060 --> 00:03:59.550 bunch of anecdotes that I will pass on to you. So I'm going to talk about the 00:03:59.550 --> 00:04:04.520 classic Arc. There's a bunch of different machines that Acorn built into the 1990s. 00:04:04.520 --> 00:04:08.960 But the ones I'm talking about started in 1987. There were 2 models, effectively a 00:04:08.960 --> 00:04:14.970 low end and a high end. One had an option for a hard disk, 20 megabytes, 2300 00:04:14.970 --> 00:04:20.700 pounds, up to 4MB of RAM. They all share the same basic architecture, they're all 00:04:20.700 --> 00:04:25.820 basically the same. So the A3000 that I just showed you came out in 1989. That was 00:04:25.820 --> 00:04:29.600 the machine I had. Those again, the same. It had the memory controller slightly 00:04:29.600 --> 00:04:35.970 updated, was slightly faster. They all had an ARM 2. This was the released version of 00:04:35.970 --> 00:04:40.910 the ARM processor designed for this machine, at 8 MHz. And then finally in 00:04:40.910 --> 00:04:46.250 1990, what I call the last of the classic Arc, Archimedes, is the A540. This was the 00:04:46.250 --> 00:04:50.720 top end machine - could have up to 16 MB of memory, which is a fair bit 00:04:50.720 --> 00:04:57.600 even in 1990. It had a 30 MHz ARM 3. The ARM 3 was the evolution of the ARM 2, but 00:04:57.600 --> 00:05:02.130 with a cache and a lot faster. So this talk will be centered around how these 00:05:02.130 --> 00:05:08.820 machines work, not the more modern machines. So around 1987, what else 00:05:08.820 --> 00:05:13.760 was available? This is a random selection of machines. Apologies if your favorite 00:05:13.760 --> 00:05:18.490 machine is not on this list. It wouldn't fit on the slide otherwise. So at the 00:05:18.490 --> 00:05:22.110 start of the 80s, we had the exotic things like the Apple Lisa and the Apple Mac. 00:05:22.110 --> 00:05:28.720 Very expensive machines. The Amiga - I had to put in here. Started off relatively 00:05:28.720 --> 00:05:32.530 expensive because the Amiga 500 was, you know, very good value for money, very 00:05:32.530 --> 00:05:37.160 capable machine. But I'm comparing this more to PCs and Macs, because that was the 00:05:37.160 --> 00:05:41.950 sort of, you know, market it was going for. And although it was an expensive 00:05:41.950 --> 00:05:46.790 machine compared to Macintosh, it was pretty cheap. Even put NeXT Cube on there, 00:05:46.790 --> 00:05:49.890 I figured that... I'd heard that they were incredibly expensive. And actually 00:05:49.890 --> 00:05:53.640 compared to the Macintosh, they're not that expensive at all. Well I don't know 00:05:53.640 --> 00:05:57.930 which one I would have preferred. So the first question I asked them - the first 00:05:57.930 --> 00:06:02.970 thing they told me: Why was it built? I've used them in school and as I said, had one 00:06:02.970 --> 00:06:08.560 at home. But I was never really quite sure what it was for. And I think a lot of the 00:06:08.560 --> 00:06:11.850 Acorn marketing wasn't quite sure what it was for either. They told me it was the 00:06:11.850 --> 00:06:15.940 successor to the BBC Micro, this 8 bit machine. Lovely 6502 machine, incredibly 00:06:15.940 --> 00:06:20.100 popular, especially in the UK. And the goal was to make a machine that was 10 00:06:20.100 --> 00:06:23.770 times the performance of this. The successor would be 10 times faster at the 00:06:23.770 --> 00:06:29.680 same price. And the thing I didn't know is they had been inspired. The team Acorn had 00:06:29.680 --> 00:06:35.620 seen the Apple Lisa and the Xerox Star, which comes from the famous Xerox Alto, 00:06:35.620 --> 00:06:41.140 Xerox PARC, first GUI workstation in the 70s, monumental machine. They'd been 00:06:41.140 --> 00:06:44.690 inspired by these machines and they wanted to make something very similar. So this is 00:06:44.690 --> 00:06:49.190 the same story as the Macintosh. They wanted to make something that was desktop 00:06:49.190 --> 00:06:52.310 machine for business, for office automation, desktop publishing and that 00:06:52.310 --> 00:06:56.270 kind of thing. But I never really understood this before. So this was this 00:06:56.270 --> 00:07:01.650 inspiration came from the Xerox machines. It was supposed to be obviously a lot more 00:07:01.650 --> 00:07:06.680 affordable and a lot faster. So this is what happens when Acorn marketing gets 00:07:06.680 --> 00:07:12.020 hold of this vision. So Xerox Star on the left is this nice, sensible business 00:07:12.020 --> 00:07:15.212 machine. Someone's wearing nice, crisp suit bumps microphon banging their 00:07:15.212 --> 00:07:20.470 microphone - and it gets turned into the very Cambridge Tweed version on the right. 00:07:20.470 --> 00:07:24.410 It's apparently illegal to program one of these if you're not wearing a top hat. But 00:07:24.410 --> 00:07:28.850 no one told me that when I was a kid. And my court case comes up next week. So 00:07:28.850 --> 00:07:32.240 Cambridge is a bit of a funny place. And for those that been there, this picture on 00:07:32.240 --> 00:07:38.680 the right sums it all up. So they began Project A, which was build this new 00:07:38.680 --> 00:07:43.240 machine. And they looked at the alternatives. They looked at the 00:07:43.240 --> 00:07:49.560 processors that were available at that time, the 286, the 68 K, then that semi 00:07:49.560 --> 00:07:55.056 32016, which was an early 32 bit machine, a bit of a weird processor. And 00:07:55.056 --> 00:07:58.030 they all had something in common that they're ridiculously expensive and in 00:07:58.030 --> 00:08:02.760 Tudors words a bit crap. They weren't a lot faster than the BBC Micro. They're a 00:08:02.760 --> 00:08:06.620 lot more expensive. They're much more complicated in terms of the processor 00:08:06.620 --> 00:08:10.490 itself. But also the system around them was very complicated. They need lots of 00:08:10.490 --> 00:08:15.400 weird support chips. This just drove the price up of the system and it wasn't going 00:08:15.400 --> 00:08:20.400 to hit that 10 times performance, let alone at the same price point. They'd 00:08:20.400 --> 00:08:24.100 visited a couple of other companies designing their own custom silicon. They 00:08:24.100 --> 00:08:28.090 got this idea in about 1983. They were looking at some of the RISC papers coming 00:08:28.090 --> 00:08:31.330 out of Berkeley and they were quite impressed by what a bunch of grad students 00:08:31.330 --> 00:08:38.070 were doing. They managed to get a working RISC processor and they went to Western 00:08:38.070 --> 00:08:42.140 Design Center and looked at 6502 successors being design there. They had a 00:08:42.140 --> 00:08:45.210 positive experience. They saw a bunch of high school kids with Apple 2s doing 00:08:45.210 --> 00:08:48.930 silicon layout. And they though "OK, well". They'd never designed a CPU before 00:08:48.930 --> 00:08:53.310 at ACORN. ACORN hadn't done any custom silicon to this degree, but they were 00:08:53.310 --> 00:08:57.160 buoyed by this and they thought, okay, well, maybe RISC is the secret and we can 00:08:57.160 --> 00:09:02.250 do this. And this was not really the done thing in this timeframe and not for a 00:09:02.250 --> 00:09:05.890 company the size of ACORN, but they designed their computer from scratch. They 00:09:05.890 --> 00:09:09.200 designed all of the major pieces of silicon in this machine. And it wasn't 00:09:09.200 --> 00:09:12.380 about designing the ARM chip. Hey, we've got a processor core. What should we do 00:09:12.380 --> 00:09:16.000 with it? But it was about designing the machine that ARM and the history of that 00:09:16.000 --> 00:09:20.310 company has kind of benefited from. But this is all about designing the machine as 00:09:20.310 --> 00:09:26.710 a whole. They're a tiny team. They're a handful of people - about a dozen...ish 00:09:26.710 --> 00:09:30.780 that did the hardware design, a similar sort of order for software and operating 00:09:30.780 --> 00:09:36.210 systems on top, which is orders of magnitude different from IBM and Motorola 00:09:36.210 --> 00:09:40.950 and so forth that were designing computers at this time. RISC was the key. They 00:09:40.950 --> 00:09:44.323 needed to be incredibly simple. One of the other experiences they had was they went 00:09:44.323 --> 00:09:48.820 to a CISC processor design center. They had a team in a couple of hundred people 00:09:48.820 --> 00:09:52.650 and they were on revision H and it still had bugs and it was just this unwieldy, 00:09:52.650 --> 00:09:58.160 complex machine. So RISC was the secret. Steve Ferber has an interview somewhere. 00:09:58.160 --> 00:10:03.470 He jokes about ACORN management giving him two things. Special sauce was two things 00:10:03.470 --> 00:10:07.810 that no one else had: He'd no people and no money. So it had to be incredibly 00:10:07.810 --> 00:10:14.710 simple. It had to be built on a shoestring, as Jamie said to me. So there 00:10:14.710 --> 00:10:18.460 are lots of corners cut, but in the right way. I would say "corners cut", that 00:10:18.460 --> 00:10:23.220 sounds ungenerous. There's some very shrewd design decisions, always weighing 00:10:23.220 --> 00:10:30.210 up cost versus benefit. And I think they erred on the correct side for all of them. 00:10:30.210 --> 00:10:34.480 So Steve sent me this picture. That's he's got a cameo here. That's the outline of 00:10:34.480 --> 00:10:39.180 him in the reflection on the glass there. He's got this up in his office. So he 00:10:39.180 --> 00:10:43.630 led the hardware design of all of these chips at ACORN. Across the top, we've got 00:10:43.630 --> 00:10:49.450 the original ARM, the ARM 1, ARM 2 and the ARM 3 - guess the naming scheme - and the 00:10:49.450 --> 00:10:53.090 video controller, memory controller and IO controller. Think, sort of see their 00:10:53.090 --> 00:10:57.320 relative sizes and it's kind of pretty. This was also on a processor where you 00:10:57.320 --> 00:11:00.930 could really point at that and say, "oh, that's the register file and you can see 00:11:00.930 --> 00:11:07.210 the cache over there". You can't really do that nowadays with modern processors. So 00:11:07.210 --> 00:11:11.080 the bit about the specification, what it could do, the end product. So I mentioned 00:11:11.080 --> 00:11:16.850 they all had this ARM 2 8MHz, up to four MB of RAM, 26-bit addresses, remember 00:11:16.850 --> 00:11:21.670 that. That's weird. So a lot of 32-bit machines, had 32-bit addresses or the ones 00:11:21.670 --> 00:11:25.550 that we know today do. That wasn't the case here. And I'll explain why in a 00:11:25.550 --> 00:11:32.610 minute. The A540 had a updated CPU. The memory controller had an MMU, which was 00:11:32.610 --> 00:11:39.350 unusual for machines of the mid 80s. So it could support, the hardware would support 00:11:39.350 --> 00:11:45.620 virtual memory, page faults and so on. It had decent sound, it had 8-channel sound, 00:11:45.620 --> 00:11:49.460 hardware mixed and stereo. It was 8 bit, but it was logarithmic - so it was a bit 00:11:49.460 --> 00:11:53.240 like u-law, if anyone knows that - instead of PCM, so you got more precision at the 00:11:53.240 --> 00:11:58.300 low end and it sounded to me a little bit like 12 bit PCM sound. So this is quite 00:11:58.300 --> 00:12:04.840 good. Storage wise, it's the same floppy controller as the Atari S.T.. It's fairly 00:12:04.840 --> 00:12:09.690 boring. Hard disk controller was a horrible standard called ST506, MFM 00:12:09.690 --> 00:12:16.420 drives, which were very, very crude compared to disks we have today. Keyboard 00:12:16.420 --> 00:12:19.980 and mouse, nothing to write home about. I mean, it was a normal keyboard. It was 00:12:19.980 --> 00:12:23.430 nothing special going on there. And printer port, serial port and some 00:12:23.430 --> 00:12:29.380 expansion slots which, I'll outline later on. The thing I really liked 00:12:29.380 --> 00:12:32.650 about the ARC was the graphics capabilities. It's fairly capable, 00:12:32.650 --> 00:12:37.800 especially for a machine of that era and of the price. It just had a flat frame 00:12:37.800 --> 00:12:42.170 buffer so it didn't have sprites, which is unfortunate. It didn't have a blitter and 00:12:42.170 --> 00:12:47.270 a bitplanes and so forth. But the upshot of that is dead simple to program. It had 00:12:47.270 --> 00:12:52.320 a 256 color mode, 8 bits per pixel, so it's a byte, and it's all just laid out as 00:12:52.320 --> 00:12:55.890 a linear string of bytes. So it was dead easy to just write some really nice 00:12:55.890 --> 00:12:59.910 optimized code to just blit stuff to the screen. Part of the reason why there isn't 00:12:59.910 --> 00:13:05.090 a blitter is actually the CPU was so good at doing this. Colorwise, it's got 00:13:05.090 --> 00:13:10.620 paletted modes out of a 4096 color palette, same as the Amiga. It has this 00:13:10.620 --> 00:13:16.350 256 color mode, which is different. The big high end machines, the top end 00:13:16.350 --> 00:13:21.290 machines, the A540 and the A400 series could also do this very high res 1152 by 00:13:21.290 --> 00:13:24.235 900, which was more of a workstation resolution. If you bought a Sun 00:13:24.235 --> 00:13:28.140 workstation a Sun 3 in those days, could do this and some higher resolutions. But 00:13:28.140 --> 00:13:32.890 this is really not seen on computers that might have in the office or school or 00:13:32.890 --> 00:13:36.370 education at the end of the market. And it's quite clever the way they did that. 00:13:36.370 --> 00:13:40.450 I'll come back to that in a sec. But for me, the thing about the ARC: For the 00:13:40.450 --> 00:13:45.920 money, it was the fastest machine around. It was definitely faster than 386s and all 00:13:45.920 --> 00:13:49.548 the stuff that Motorola was doing at the time by quite a long way. It is almost 00:13:49.548 --> 00:13:53.580 eight times faster than a 68k at about the same clock speed. And it's to do with it's 00:13:53.580 --> 00:13:57.020 pipelineing and to do with it having a 32 bit word and a couple of other tricks 00:13:57.020 --> 00:14:01.070 again. I'll show you later on what the secret to that performance was. About 00:14:01.070 --> 00:14:04.850 minicomputer speed and compared to some of the other RISC machines at the time, it 00:14:04.850 --> 00:14:09.450 wasn't the first RISC in the world, it was the first cheap RISC and the first RISC 00:14:09.450 --> 00:14:14.020 machine that people could feasibly buy and have on their desks at work or in 00:14:14.020 --> 00:14:19.222 education. And if you compare it to something like the MIPS or the SPARC, it 00:14:19.222 --> 00:14:25.300 was not as fast as a MIPS or SPARC chip. It was also a lot smaller, a lot cheaper. 00:14:25.300 --> 00:14:29.240 Both of those other processers had very big Die. They needed other support chips. 00:14:29.240 --> 00:14:33.040 They had huge packages, lots of pins, lots of cooling requirements. So all this 00:14:33.040 --> 00:14:36.180 really added up. So I priced up a Sun 4 workstation at the time and 00:14:36.180 --> 00:14:40.050 it was well over four times the price of one of these machines. And that was before 00:14:40.050 --> 00:14:44.400 you add on extras such as disks and network interfaces and things like that. 00:14:44.400 --> 00:14:47.480 So it's very good, very competitive for the money. And if you think about building 00:14:47.480 --> 00:14:50.140 a cluster, then you could get a lot more throughput, you could network them 00:14:50.140 --> 00:14:56.980 together. So this is about as far as I got when I was a youngster, I was wasn't brave 00:14:56.980 --> 00:15:03.230 enough to really take the machine apart and poke around. Fortunately, now it's 30 00:15:03.230 --> 00:15:07.180 years old and I'm fine. I'm qualified and doing this. I'm going to take it apart. 00:15:07.180 --> 00:15:12.089 Here's the motherboard. Quite a nice clean design. This was built in Wales for anyone 00:15:12.089 --> 00:15:17.510 that's been to the UK. Very unusual these days. Anything to be built in the UK. It's 00:15:17.510 --> 00:15:23.420 got several main sections around these four chips. Remember the Steve photo 00:15:23.420 --> 00:15:29.470 earlier on? This is the chip set: the ARM BMC, PDC, IOC. So the IOC side of things 00:15:29.470 --> 00:15:34.090 happens over on the left video and sound in the top right. And the memory and the 00:15:34.090 --> 00:15:38.399 processor in the middle. It's got a megabyte onboard and you can plug in an 00:15:38.399 --> 00:15:43.640 expansion for 4 MB. So memory map from the software view. I mentioned this 00:15:43.640 --> 00:15:46.930 26-bit addressing and I think this is one of the key characteristics of one of these 00:15:46.930 --> 00:15:52.210 machines. So you have a 64MB address space, it's quite packed. That's quite a 00:15:52.210 --> 00:15:56.980 lot of stuff shoehorned into here. So there's the memory. The bottom half of the 00:15:56.980 --> 00:16:02.040 address space, 32MB of that is the processor. It's got user space and 00:16:02.040 --> 00:16:08.100 privilege mode. It's got a concept of privilege within the processor execution. 00:16:08.100 --> 00:16:11.851 So when you're in user mode, you only get to see the bottom half and that's the 00:16:11.851 --> 00:16:16.250 virtual maps. There's the MMU, that will map pages into that space and then when 00:16:16.250 --> 00:16:18.980 you're in supervisor mode, you get to see the whole of the rest of the memory, 00:16:18.980 --> 00:16:23.380 including the physical memory and various registers up the top. The thing to notice 00:16:23.380 --> 00:16:27.460 here is: there's stuff hidden behind the ROM, this address space is very packed 00:16:27.460 --> 00:16:31.390 together. So there's a requirement for control registers, for the memory 00:16:31.390 --> 00:16:34.770 controller, for the video controller and so on, and they write only registers in 00:16:34.770 --> 00:16:39.700 ROM basically. So you write to the ROM and you get to hit these registers. Kind of 00:16:39.700 --> 00:16:43.730 weird when you first see it, but it was quite a clever way to fit this stuff into 00:16:43.730 --> 00:16:50.810 the address space. So it will start with the ARM1. So Sophie Wilson designed the 00:16:50.810 --> 00:16:59.070 instruction set late 1983, Steve took the instruction set and designed the top 00:16:59.070 --> 00:17:02.880 level, the block, the micro architecture of this processor. So this is the data 00:17:02.880 --> 00:17:08.140 path and how the control logic works. And then the VLSI team, then implemented this, 00:17:08.140 --> 00:17:12.420 did their own custom cells. There's a custom data path and custom logic 00:17:12.420 --> 00:17:18.179 throughout this. It took them about a year, all in. Well, 1984, that sort of... 00:17:18.179 --> 00:17:23.832 This project A really kicked off early 1984. And this staked out first thing 00:17:23.832 --> 00:17:34.690 early 1985. The design process the guys gave me a little bit of... So Jamie 00:17:34.690 --> 00:17:40.800 Urquhart and John Biggs gave me a bit of an insight into how they worked on the 00:17:40.800 --> 00:17:46.870 VLSI side of things. So they had an Apollo workstation, just one Apollo workstation, 00:17:46.870 --> 00:17:51.760 the DN600. This is a 68K based washing machine, as Jamie described it. It's this 00:17:51.760 --> 00:17:56.180 huge thing. It cost about 50˙000 £. It's incredibly expensive. And they 00:17:56.180 --> 00:18:00.220 designed all of this with just one of these workstations. Jamie got in at 5:00 00:18:00.220 --> 00:18:04.060 a.m., worked until the afternoon and then let someone else on the machine. So they 00:18:04.060 --> 00:18:06.760 shared the workstation, they worked shifts so that they could design this 00:18:06.760 --> 00:18:10.020 whole thing on one workstation. So this comes back to that. It was designed on a 00:18:10.020 --> 00:18:13.660 bit of a shoestring budget. When they got a couple of other workstations later on in 00:18:13.660 --> 00:18:17.760 the projects, there was an allegation that the software might not have been licensed 00:18:17.760 --> 00:18:21.950 initially on the other workstations and the CAD software might have been. I can 00:18:21.950 --> 00:18:28.450 neither confirm nor deny whether that's true. So Steve wrote a BBC Basic 00:18:28.450 --> 00:18:33.300 simulator for this. When he's designing this block level micro architecture run on 00:18:33.300 --> 00:18:38.750 his BBC Micro. So this could then run real software. There could be a certain amount 00:18:38.750 --> 00:18:41.570 of software development, but then they could also validate that the design was 00:18:41.570 --> 00:18:46.820 correct. There's no cache on this. This is a quite a large chip. 50 square 00:18:46.820 --> 00:18:52.290 millimeters was the economic limit of those days for this part of the market. 00:18:52.290 --> 00:18:56.100 There's no cache. That also would have been far too complicated. So this was 00:18:56.100 --> 00:19:03.120 also, I think, quite a big risk, no pun intended. The aim of doing this 00:19:03.120 --> 00:19:07.620 with such a small team that they're all very clever people. But they hadn't all 00:19:07.620 --> 00:19:11.490 got experience in building chips before. And I think they knew what they were up 00:19:11.490 --> 00:19:15.100 against. And so not having a cache of complicated things like that was the right 00:19:15.100 --> 00:19:20.910 choice to make. I'll show you later that that didn't actually affect things. So 00:19:20.910 --> 00:19:24.810 this was a RISC machine. If anyone has not programmed ARM in this room then get out 00:19:24.810 --> 00:19:29.400 at once. But if you have programed ARM this is quite familiar with some 00:19:29.400 --> 00:19:36.210 differences. It's a classical three operand RISC, its got three shift on one of 00:19:36.210 --> 00:19:38.790 the operands for most of the instructions. So you can do things like static 00:19:38.790 --> 00:19:43.820 multiplies quite easily. It's not purist RISC though. It does have load or store 00:19:43.820 --> 00:19:47.980 multiple instructions. So these will, as the name implies, load or store multiple 00:19:47.980 --> 00:19:51.460 number of registers in one go. So one register per cycle, but it's all done 00:19:51.460 --> 00:19:54.970 through one instruction. This is not RISC. Again, there's a good reason for doing 00:19:54.970 --> 00:19:59.300 that. So when one comes back and it gets plugged into a board that looks a bit like 00:19:59.300 --> 00:20:07.400 this. This is called the A2P, the ARM second processor. It plugs into a BBC Micro. It's 00:20:07.400 --> 00:20:11.280 basically there's a thing called the Tube, which is sort of a FIFO like arrangement. 00:20:11.280 --> 00:20:15.230 The BBC Micro can send messages one way and this can send messages back. And the 00:20:15.230 --> 00:20:20.250 BBC Micro has the discs, it has the I/O, keyboard and so on. And that's used as the 00:20:20.250 --> 00:20:23.960 hosts to then download code into one megabytes of RAM up here and then you 00:20:23.960 --> 00:20:29.010 combine the code on the ARM. So this was the initial system, 6 MHz. The 00:20:29.010 --> 00:20:32.350 thing I found quite interesting about this, I mentioned that Steve had built 00:20:32.350 --> 00:20:37.200 this BBC Basic simulation, one of the early bits of software that could run on 00:20:37.200 --> 00:20:41.870 this. So he'd ported BBC Basic to ARM and written an ARM version of it. The Basic 00:20:41.870 --> 00:20:47.780 interpreter was very fast, very lean, and it was running on this board early on. 00:20:47.780 --> 00:20:51.750 They then built a simulator called ASIM, which was an event based simulator for 00:20:51.750 --> 00:20:55.240 doing logic design and all of the other chips in the chips on the chipset that 00:20:55.240 --> 00:20:59.020 were simulated using ASIM on ARM1 which is quite nice. So this was the fastest 00:20:59.020 --> 00:21:02.480 machine that they had around. They didn't have, you know, the thousands of machines 00:21:02.480 --> 00:21:07.730 in the cluster like you'd have in a modern company doing EDA. They had 00:21:07.730 --> 00:21:11.370 a very small number of machines and these were the fastest ones they had about. So 00:21:11.370 --> 00:21:17.450 ARM2 was simulated on ARM1 and all the other chipset. So then ARM2 comes along. 00:21:17.450 --> 00:21:21.590 So it's a year later, this is a shrink of the design. It's based on the same basic 00:21:21.590 --> 00:21:26.000 micro architecture but has a multiplier now. It's a booth multiplier , so it is at 00:21:26.000 --> 00:21:32.090 worst case, 16 cycle, multiply just two bits per clock. Again, no cache. But one 00:21:32.090 --> 00:21:36.950 thing they did add in on to is banked registers. Some of the processor modes I 00:21:36.950 --> 00:21:42.940 mentioned there's an interrupt mode. Next slide, some of the processor modes will 00:21:42.940 --> 00:21:47.960 basically give you different view on registers, which is very useful. These 00:21:47.960 --> 00:21:51.090 were all validated at 8 MHz. So the product was designed for 8 MHz. 00:21:51.090 --> 00:21:54.020 The company that built them said, okay, put the stamp on the outside 00:21:54.020 --> 00:21:57.681 saying 8 MHz. There's two versions of this chip and I think they're 00:21:57.681 --> 00:22:01.390 actually the same silicon. I've got a suspicion that they're the same. They just 00:22:01.390 --> 00:22:05.420 tested this batch saying that works at 10 or 12. So on my project list is 00:22:05.420 --> 00:22:12.270 overclocking my A3000 to see how fast it'll go and see if I can get it to 12 MHz. 00:22:12.270 --> 00:22:18.559 Okay. So the banking of the registers. ARM has got this even modern 32 bit 00:22:18.559 --> 00:22:25.060 type of interrupts and an IRQ pronounced "erk" in English and FIQ 00:22:25.060 --> 00:22:28.559 pronounced "fic" in English. I appreciate it doesn't mean quite the same thing in 00:22:28.559 --> 00:22:34.290 German. So I call if FIQ from here on in and FIQ mode has this property where 00:22:34.290 --> 00:22:37.830 the top half of the registers are effectively different registers when you get into 00:22:37.830 --> 00:22:42.670 this mode. So this lets you first of all you don't have to back up those registers. 00:22:42.670 --> 00:22:47.950 I mean your FIQ handler. And secondly if you can write an FIQ handler 00:22:47.950 --> 00:22:51.970 using just those registers and there's enough for doing most basic tasks, you 00:22:51.970 --> 00:22:55.940 don't have to save and restore anything when you get an interrupt. So this is 00:22:55.940 --> 00:23:02.510 designed specifically to be very, very low overhead interrupt mode. So I'm coming to 00:23:02.510 --> 00:23:07.890 why there's a 26 bit address space. And so I found this link very unintuitive. So 00:23:07.890 --> 00:23:13.520 unlike 32 bit ARM, the more modern 1990s onwards ARMs, the program counter 00:23:13.520 --> 00:23:17.020 register 15 doesn't just contain the program counter, but also contains the 00:23:17.020 --> 00:23:20.420 status flags and processor mode and effectively all of the machine state is 00:23:20.420 --> 00:23:24.200 packed in there as well. So I asked the question, well why, why 64 megabytes of 00:23:24.200 --> 00:23:27.700 address space? What's special about 64. And Mike told me, well, you're asking the 00:23:27.700 --> 00:23:31.980 wrong question. It's the other way round. What we wanted was this property that all 00:23:31.980 --> 00:23:35.990 of the machine state is in one register. So this means you just have to save one 00:23:35.990 --> 00:23:40.000 register. Well, you know, what's the harm in saving two registers? And he reminded 00:23:40.000 --> 00:23:43.490 me of this FIQ mode. Well, if you're already in a state where you've really 00:23:43.490 --> 00:23:47.890 optimized your interrupt handler so that you don't need any other registers to deal 00:23:47.890 --> 00:23:51.390 with, you're not saving restoring anything apart from your PC, then saving another 00:23:51.390 --> 00:23:56.000 register is 50 percent overhead on that operation. So that was the prime motivator 00:23:56.000 --> 00:24:00.500 was to keep all of the state in one word. And then once you take all of the flags 00:24:00.500 --> 00:24:04.600 away, you're left with 24 bits for a word aligned program counter, which leads to 00:24:04.600 --> 00:24:09.799 26 bit addressing. And that was then seen as well, 64 MB is enough. There were 00:24:09.799 --> 00:24:14.690 machines in 1985 that, you know, could conceivably have more memory than that. 00:24:14.690 --> 00:24:18.260 But for a desktop that was still seen as a very large, very expensive amount of 00:24:18.260 --> 00:24:24.450 memory. The other thing, you don't need to reinvent another instruction to do 00:24:24.450 --> 00:24:28.170 return from exception so you can return using one of your existing instructions. 00:24:28.170 --> 00:24:32.740 In this case, it's the subtract into PC which looks a bit strange, but trust me, 00:24:32.740 --> 00:24:39.030 that does the right thing. So the memory controller. This is - I mentioned the 00:24:39.030 --> 00:24:43.040 address translation, so this has an MMU in it. In fact, the thing directly on the 00:24:43.040 --> 00:24:46.080 left hand side. I was worried that these slides actually might 00:24:46.080 --> 00:24:49.520 not be the right resolution and they might be sort of too small for people to see 00:24:49.520 --> 00:24:53.570 this. And in fact, it's the size of a house is really useful here. So the left 00:24:53.570 --> 00:24:58.500 hand side of this chip is the MMU. This chip is the same size as ARM2. Yeah, 00:24:58.500 --> 00:25:02.380 pretty much. So that's part of the reason why the MMU is on another chip ARM2 was 00:25:02.380 --> 00:25:06.610 as big as they could make it to fit the price as you don't have anyone here done 00:25:06.610 --> 00:25:10.810 silicon design. But as the area goes up effectively your yield goes down and 00:25:10.810 --> 00:25:14.690 the price it's a non-linear effect on price. So the MMU had to be on a separate 00:25:14.690 --> 00:25:19.910 chip and it's half the size of that as well. MEMC does most mundane things 00:25:19.910 --> 00:25:23.920 like it drives DRAM, it does refresh for DRAM and it converts from linear addresses 00:25:23.920 --> 00:25:33.799 into row and column addresses which DRAM takes. So the key thing about this 00:25:33.799 --> 00:25:39.090 ARM and MEMC binding is the key factor of performance is making use of memory 00:25:39.090 --> 00:25:43.740 bandwidth. When the team had looked at all the other processors in Project A before 00:25:43.740 --> 00:25:49.380 designing their own, one of the things they looked at was how well they utilized 00:25:49.380 --> 00:25:56.320 DRAM and 68K and the semi chips made very, very poor use of DRAM bandwidth. 00:25:56.320 --> 00:25:59.940 Steve said, well, okay. The DRAM is the most expensive component of any of these 00:25:59.940 --> 00:26:04.280 machines and they're making poor use of it. And I think a key insight here is if 00:26:04.280 --> 00:26:07.740 you maximize that use of the DRAM, then you're going to be able to get much higher 00:26:07.740 --> 00:26:13.490 performance in those machines. And so it's 32 bits wide. The ARM is pipelined, so it can 00:26:13.490 --> 00:26:18.730 do a 32 bit word every cycle. And it also indicates whether it's sequential or non 00:26:18.730 --> 00:26:25.250 sequential addressing. This then lets your MEMC 00:26:25.250 --> 00:26:31.200 decide whether to do an N cycle or an S cycle. So there's a fast one and a slow 00:26:31.200 --> 00:26:35.220 one basically. So when you access a new random address and DRAM, you have to open 00:26:35.220 --> 00:26:40.710 that row and that takes twice the time. It's a 4 MHz cycle. But then once 00:26:40.710 --> 00:26:45.150 you've access that address and then once you're accessing linearly ahead of that 00:26:45.150 --> 00:26:49.599 address, you can do fast page mode accesses, which are 8 MHz cycles. 00:26:49.599 --> 00:26:54.030 So ultimately, that's the reason why these load store multiples exist. The 00:26:54.030 --> 00:26:57.820 non-RISC instructions, they're there so that you can stream out registers and back 00:26:57.820 --> 00:27:03.100 in and make use of this DRAM bandwidth. So store multiple. This is just a simple 00:27:03.100 --> 00:27:07.860 calculation for 14 registers, you're hitting about 25 megabytes a second out of 00:27:07.860 --> 00:27:13.083 30. So this is it's not 100%, but it's way more than a 10th or an 8th. 00:27:13.083 --> 00:27:16.880 Which a lot of the other processors were using. So this was really good. This 00:27:16.880 --> 00:27:21.170 is the prime factor of why this machine was so fast. It's effectively the load store 00:27:21.170 --> 00:27:28.069 multiple instructions and being able to access the stuff linearly. So the MMU is 00:27:28.069 --> 00:27:36.980 weird. It's not TLB in the traditional sense, so TLB's today, if you take your 00:27:36.980 --> 00:27:43.040 MIPS chip or something where the TLB is visible to software, it will map a virtual 00:27:43.040 --> 00:27:47.760 address into a chosen physical address and you'll have some number of entries and you 00:27:47.760 --> 00:27:53.880 more or less arbitrarily, you know, poke an entry and with the set mapping in it. 00:27:53.880 --> 00:27:57.789 The MEMC does it upside down. So it says it's got a fixed number of entries for every 00:27:57.789 --> 00:28:02.380 page in DRAM. And then for each of those entries, it checks an incoming address to 00:28:02.380 --> 00:28:08.600 see whether it matches. So it has all of those entries that we've showed on the 00:28:08.600 --> 00:28:13.500 chip diagram a couple of slides ago. That big left hand side had that big array. All 00:28:13.500 --> 00:28:16.831 of those effectively just storing a virtual address and then matching it and 00:28:16.831 --> 00:28:20.030 have a comparator. And then one of them lights up and says yes, it's mine. So 00:28:20.030 --> 00:28:24.551 effectively, the aphysical page says that virtual address is mine instead of the 00:28:24.551 --> 00:28:30.030 other way round. So this also limits your memory. If you're saying I have to have 00:28:30.030 --> 00:28:34.480 one of these entries on chip per page of physical memory and you don't want pages 00:28:34.480 --> 00:28:40.720 to be enormous. The 32 K if you do the maths is 4 MB over 128 pages, it's a 00:28:40.720 --> 00:28:44.460 32K page. If you don't want the page to get much bigger than that and trust me you 00:28:44.460 --> 00:28:47.890 don't, then you need to add more of these entries and it's already half the size of 00:28:47.890 --> 00:28:52.540 the chip. So effectively, this is one of the limits of why you can only have 4 MB 00:28:52.540 --> 00:28:58.360 on one of these memory controller chips. OK. So VIDC is the core 00:28:58.360 --> 00:29:05.230 of the video and sound system. It's a set of FIFOs and a set of shift digital analog 00:29:05.230 --> 00:29:09.970 converters for doing video and sound. You stream stuff into the FIFOs and it does 00:29:09.970 --> 00:29:14.850 the display timing and pallet lookup and so forth. It has an 8 bit mode I 00:29:14.850 --> 00:29:21.210 mentioned. It's slightly strange. It also has an output for transparency bit. So in 00:29:21.210 --> 00:29:23.830 your palette you can set 12 bits of color, but you can set a bit of 00:29:23.830 --> 00:29:31.580 transparency as well so you can do video gen- looking quite easily with this. So 00:29:31.580 --> 00:29:36.701 there was a revision later on Tudor explains that the very first one had a bit 00:29:36.701 --> 00:29:41.230 of crosstalk between the video and the sound, so you'd get sound with noise on 00:29:41.230 --> 00:29:45.480 it. That was basically video noise and it's quite hard to get rid of. And so they 00:29:45.480 --> 00:29:50.000 did this revision and the way he fixed it was quite cool. They shuffled the power 00:29:50.000 --> 00:29:53.690 supply around and did all the sensible engineering things. But he also filtered 00:29:53.690 --> 00:29:58.050 out a bit of the noise that is being output on the sound. He 00:29:58.050 --> 00:30:02.630 inverted it and then fed that back in as the reference current for the DACs. So that 00:30:02.630 --> 00:30:06.090 sort of self compensating and took the noise a bit like the noise canceling 00:30:06.090 --> 00:30:13.239 headphones. It was kind of a nice hack. And that was that was VIDC1. OK, the final 00:30:13.239 --> 00:30:17.700 one, I'm going to stop showing you chip plots after this, unfortunately, but just 00:30:17.700 --> 00:30:20.980 get your fill while we're here. And again, I'm really glad this is enormous for the 00:30:20.980 --> 00:30:25.590 people in the room and maybe those zooming in online. There's a cool little 00:30:25.590 --> 00:30:29.510 Illuminati eye logo in the bottom left corner. So I feared that you weren't gonna 00:30:29.510 --> 00:30:34.010 be able to see and I didn't have time to do zoomed in version, but. Okay. So IOC 00:30:34.010 --> 00:30:37.720 is the center of the IO system as much of the IO system as possible, all the random 00:30:37.720 --> 00:30:41.030 bits of glue logic to do things like timing. Some peripherals are slower than 00:30:41.030 --> 00:30:47.309 others lives in IOC. It contains a UART for the keyboard, so the keyboard is 00:30:47.309 --> 00:30:52.020 looked after by an 8051 microcontroller. Just nice and easy, you don't have to do scanning 00:30:52.020 --> 00:30:57.429 in software. This microcontroller just sends stuff up of serial port to this chip. So 00:30:57.429 --> 00:31:02.039 UART keyboard, asynchronous receiver and transmitter. It was at one point called 00:31:02.039 --> 00:31:06.080 the fast asynchronous receiver and transmitter. Mike got forced to change the 00:31:06.080 --> 00:31:11.900 name. Not everyone has a 12 year old sense of humor, but I admire his spirit. So the 00:31:11.900 --> 00:31:15.630 other thing it does is interrupts all the interrupts go into IOC and it's got masks 00:31:15.630 --> 00:31:20.341 and consolidates them effectively for sending an interrupt up to the on the ARM. 00:31:20.341 --> 00:31:24.690 The ARM can then check the status and do fast response to it. So the eye of providence 00:31:24.690 --> 00:31:27.540 there, the little logo I pointed out, Mike said he put that in for future 00:31:27.540 --> 00:31:35.799 archaeologists to wonder about. Okay. That was it. I was hoping there'd be 00:31:35.799 --> 00:31:39.440 this big back story about, you know, he was in the Illuminati or something. Maybe 00:31:39.440 --> 00:31:44.690 he is, but not allowed to say anyway. So just like the other dev board I showed you so 00:31:44.690 --> 00:31:49.930 this one's A 500 2P, it's still a second processor that plugs into a BBC Micro. 00:31:49.930 --> 00:31:54.460 It's still got this host having disk drives and so forth attached to it and 00:31:54.460 --> 00:32:00.289 pushing stuff down the tube into the memory here. But now, finally 00:32:00.289 --> 00:32:04.730 all of this, the chip set now assembled in one place. So this is 00:32:04.730 --> 00:32:08.100 starting to look like an Archimedes. It got video out. It's got keyboard 00:32:08.100 --> 00:32:11.620 interface. It's got some expansion stuff. So this is bring up an early software 00:32:11.620 --> 00:32:17.720 headstart. But very shortly afterwards, we got the a five A500 internal to Acorn. And 00:32:17.720 --> 00:32:21.460 this is really the first Archimedes. This is the prototype Archimedes. Actually got 00:32:21.460 --> 00:32:27.300 a gorgeous gray brick sort of look to it, kind of concrete. It weighs like concrete, 00:32:27.300 --> 00:32:31.480 too, but it has all the hallmarks. It's got the IO interfaces, it's got the 00:32:31.480 --> 00:32:36.810 expansion slots. You can see at the back. It's got all, it runs the same operating 00:32:36.810 --> 00:32:39.550 system. Now, this was used for the OS development. There's only a couple of 00:32:39.550 --> 00:32:44.540 hundred of these made. Well, this is a serial 222. So this is one of the last, 00:32:44.540 --> 00:32:50.730 I think. But yeah. Only an internal to ACORN. There are lots of nice tweaks to this 00:32:50.730 --> 00:32:55.700 machine. So the hardware team had designed this, Tudor designed this as well as the 00:32:55.700 --> 00:33:01.390 video system. And he said, well, his A500 was the special one that he had a video 00:33:01.390 --> 00:33:05.409 controller. He'd hand-picked one of the VCs so that instead of running 00:33:05.409 --> 00:33:10.855 at 24 MHz to run at 56, so some silicon variations in manufacturer. So he found a 00:33:10.855 --> 00:33:16.169 56 MHz part so he could do. I think it was 1024 x 768, which is way out 00:33:16.169 --> 00:33:22.400 of respect for the rest of the Archimedes. So he had the really, really cool machine. 00:33:22.400 --> 00:33:26.050 They also ran some of them at 12 MHz as well instead of 8. This is a massive 00:33:26.050 --> 00:33:30.500 performance improvement. I think it used expensive memory, which is kind of out of 00:33:30.500 --> 00:33:37.180 reach for the product. Right. So believe me, this is the simplified 00:33:37.180 --> 00:33:41.240 circuit diagram. The technical reference manuals are available online if anyone wants 00:33:41.240 --> 00:33:47.969 the complicated one. The main parts of the display are ARM, MEMC, VIDC and some RAM 00:33:47.969 --> 00:33:52.049 and we have a little walk through them. So the clocks are generated actually by the 00:33:52.049 --> 00:33:56.815 memory controller. Memory controller gives the clocks to the ARM. The main reason for 00:33:56.815 --> 00:34:00.327 this is that the memory controller has to do some slow things now and then. It has 00:34:00.327 --> 00:34:05.860 to open pages of DRAMs, refresh cycles and things. So it stops the CPU and generates 00:34:05.860 --> 00:34:11.559 the clock and it pauses the CPU by stopping that clock from time to time. 00:34:11.559 --> 00:34:15.929 When you do a DRAM access, your adress on bus along the top, the ARM outputs an 00:34:15.929 --> 00:34:19.720 address that goes into the MEMC. The MEMC then converts that, it does an address 00:34:19.720 --> 00:34:23.339 translation and then it converts that into a row and column addresses suitable for 00:34:23.339 --> 00:34:27.139 DRAM. And then if you're doing a read DRAM outputs the address, outputs the data 00:34:27.139 --> 00:34:33.419 onto the data bus, which ARM then sees. MEMC is the the critical path on 00:34:33.419 --> 00:34:37.109 this, but the address flows through MEMC effectively. Notice that MEMC is not on 00:34:37.109 --> 00:34:41.329 the data bus. It just gets addresses flowing through it, this is important later 00:34:41.329 --> 00:34:44.892 on. ROM is another slow thing. 00:34:44.892 --> 00:34:49.204 Another reason why MEMC might slow down the access from the CPU, it works in a 00:34:49.204 --> 00:34:54.099 similar sort of way. There is also a permission check done when you're doing 00:34:54.099 --> 00:35:00.259 the address translation per... user permission versus OS, a supervisor. 00:35:00.259 --> 00:35:05.356 And so this information is output as part of the cycle when the ARM does that access. 00:35:05.356 --> 00:35:09.730 If you miss in that translation, you get a page fault or permission fault, then an 00:35:09.730 --> 00:35:13.391 abort signal comes back and you take an exception. 00:35:13.391 --> 00:35:17.410 And the ARM deals with that in software. 00:35:17.410 --> 00:35:22.289 The data bus is a critical path, and so the IO stuff is buffered, it is kept away 00:35:22.289 --> 00:35:27.599 from that. So the IO bus is 16 bits and not a lot 32 bit peripherals were around 00:35:27.599 --> 00:35:32.599 in those days. All the peripherals 8 or 16 bits. So that's the right thing to do. 00:35:32.599 --> 00:35:36.150 The IOC decodes that and there's a handshake with MEMC. If it needs more 00:35:36.150 --> 00:35:39.809 time, if it's accessing one of the expansion cards and the expansion card 00:35:39.809 --> 00:35:47.691 has something slow on it then that's dealt with in the IOC. So I mentioned the 00:35:47.691 --> 00:35:53.680 interrupt status that gets funneled into IOC and then back out again. There's a 00:35:53.680 --> 00:35:57.599 VSync interrupt, but not an HSync interrupt. You have to use timers for that, 00:35:57.599 --> 00:36:01.500 really annoyingly. There's one timer and there's a 2 MHz timer available. I 00:36:01.500 --> 00:36:05.199 think I had that in a previous slide, forgot to mention it. So if you want to 00:36:05.199 --> 00:36:09.730 do funny palette switching stuff or copper bars or something - that's possible with the 00:36:09.730 --> 00:36:13.400 timers, it's also simple hardware mod to make a real HSync interrupt as well. 00:36:13.400 --> 00:36:18.529 There's some spare interrupt inputs on the IOC as an exercise for you . So the bit I 00:36:18.529 --> 00:36:23.440 really like about this system, I mentioned that MEMC is not on the data bus. The VIDC 00:36:23.440 --> 00:36:28.079 is only on the data bus and it doesn't have an address bus either. The VIDC is the 00:36:28.079 --> 00:36:31.200 thing responsible for turning the frame buffer into video, reading that frame 00:36:31.200 --> 00:36:35.509 buffer out of RAM, so on. So how does it actually do that RAM read without the 00:36:35.509 --> 00:36:40.780 address? Well, the MEMC contains all of the registers for doing this DMA: the 00:36:40.780 --> 00:36:44.970 start of the frame buffer, the current position and size, and so on. They all 00:36:44.970 --> 00:36:51.410 live in the MEMC. So there's a handshake where VIDC sends a request up to the MEMC. 00:36:51.410 --> 00:36:55.239 When it's FIFO gets low, the MEMC then actually generates the address into the 00:36:55.239 --> 00:37:01.102 DRAM, DRAM outputs that data and then the MEMC, gives an acknowledge 00:37:01.102 --> 00:37:05.509 to the ARM Excuse me - too many chips. The MEMC gives an acknowledged to 00:37:05.509 --> 00:37:11.210 VIDC, which then latches that data into the FIFO. So this partitioning is 00:37:11.210 --> 00:37:16.710 quite neat. A lot of the video, DMA. The video DMA stuff all lives in MEMC and 00:37:16.710 --> 00:37:20.799 there's this kind of split across the two chips. The sound one I've just 00:37:20.799 --> 00:37:24.839 highlighted one interrupt that comes from MEMC. Sound works exactly the same way, 00:37:24.839 --> 00:37:27.730 except there's a double buffering scheme that goes on. And when one half of it 00:37:27.730 --> 00:37:32.359 becomes empty, you get an interrupt so you can refill that so you don't glitch your 00:37:32.359 --> 00:37:39.700 sound. So this all works really very smoothly. So finally the high res- mono 00:37:39.700 --> 00:37:44.509 thing that I mentioned before is quite novel way they did that. Tudor had realized 00:37:44.509 --> 00:37:49.931 that with one external component to the shift register and running very fast, he 00:37:49.931 --> 00:37:53.400 could implement this very high resolution mode without really affecting the rest of 00:37:53.400 --> 00:37:59.276 the chip. So VIDC still runs at 24 MHz to sort of VGA resolution. It 00:37:59.276 --> 00:38:05.290 outputs on a digital bus that was a test board, originally. It outputs 4 bits. So 4 00:38:05.290 --> 00:38:09.420 pixels in one chunk at 24 MHz and this external component then shifts 00:38:09.420 --> 00:38:13.880 through that 4 times the speed. There's one component. I mean, this is a 00:38:13.880 --> 00:38:17.569 very cheap way of doing this. And as I said, this high res- mode is very 00:38:17.569 --> 00:38:23.009 unusual for machines of this era. I've got a feeling an A500 the top end 00:38:23.009 --> 00:38:26.979 machine, if anyone's got one of these and wants to try this trick and please get in 00:38:26.979 --> 00:38:31.080 touch, I've got a feeling an A500 will do 1280 x 1024 by 00:38:31.080 --> 00:38:35.750 overclocking this. I think all of the parts survive it. But for some reason, 00:38:35.750 --> 00:38:40.369 ACORN didn't support that on the board. And finally, clock selection VIDC on 00:38:40.369 --> 00:38:44.839 some of the machines, quite flexible set of clocks for different resolutions, 00:38:44.839 --> 00:38:51.170 basically. So MEMC is not on the data bus. How do we program it? It's got registers 00:38:51.170 --> 00:38:55.259 for DMA and it's got all this address translation. So the memory map I showed 00:38:55.259 --> 00:39:00.909 before has an 8 MB space reserved for the address translation registers. It 00:39:00.909 --> 00:39:04.690 doesn't have 8 MB of it. I mean, doesn't have two million... 32 bit registers 00:39:04.690 --> 00:39:09.819 behind there, which is a hint of what's going on here. So what you do is you write 00:39:09.819 --> 00:39:14.410 any value to this space and you encode the information that you want to put into one 00:39:14.410 --> 00:39:19.539 of these registers in the address. So this address, the top three bits are 1 - it's 00:39:19.539 --> 00:39:25.230 in the top 8 MB of the 64 MB address space and you format your 00:39:25.230 --> 00:39:28.999 logical physical page information in this address and then you write any byte 00:39:28.999 --> 00:39:35.479 effectively. This sort of feels really dirty, but also really a very nice 00:39:35.479 --> 00:39:39.779 way of doing it because there's no other space in the address map. And this reads 00:39:39.779 --> 00:39:45.069 to the the price balance. So it's not worth having an address bus going into 00:39:45.069 --> 00:39:49.809 MEMC costing 32 more pins just to write these registers as opposed to playing this 00:39:49.809 --> 00:39:55.849 sort of trick. If you have that address bus just for that data bus, just for 00:39:55.849 --> 00:39:59.990 that, then you have to get to a more expensive package. And this was 00:39:59.990 --> 00:40:05.140 really in their minds: a 68 pin chip versus an 84 pin chip. It was a big deal. 00:40:05.140 --> 00:40:08.719 So everything they really strived to make sure it was in the very smallest 00:40:08.719 --> 00:40:13.250 package possible. And this system partitioning effort led to these sorts of 00:40:13.250 --> 00:40:22.890 tricks to then program it. So on the A540, we get multiple MEMCs. Each one is 00:40:22.890 --> 00:40:27.329 assigned a colored stripe here of the physical address space. So you have a 00:40:27.329 --> 00:40:31.049 16 MB space, each one looks after 4 MB of it. But then when you do a 00:40:31.049 --> 00:40:36.039 virtual access in the bottom half of the user space, regular program access, all of 00:40:36.039 --> 00:40:39.362 them light up and all of them will translate that address in parallel. And 00:40:39.362 --> 00:40:43.663 one of them hopefully will translate and then energize the RAM to do the read, for 00:40:43.663 --> 00:40:49.930 example. When you put an ARM 3 in this system, the ARM 3 has its cache and then 00:40:49.930 --> 00:40:54.420 the address leads into the MEMC. So then that means that the address is being 00:40:54.420 --> 00:40:58.240 translated outside of the cache or after the cache. So your caching virtual 00:40:58.240 --> 00:41:02.900 addresses and as we all know, this is kind of bad for performance because whenever 00:41:02.900 --> 00:41:07.459 you change that virtual address space, you have to invalidate your cache. Or tag it, 00:41:07.459 --> 00:41:11.459 but they didn't do that. There's other ways of solving this problem. Basically on this 00:41:11.459 --> 00:41:14.950 machine, what you need to do is invalidate the whole cache. It's quite a quick 00:41:14.950 --> 00:41:23.540 operation, but it's still not good for performance to have an empty cache. The 00:41:23.540 --> 00:41:28.393 only DMA present in the system is for the video, for the video and sound. I/O 00:41:28.393 --> 00:41:32.569 doesn't have any DMA at all. And this is another area where as younger engineer 00:41:32.569 --> 00:41:35.969 "crap, why didn't they have DMA? That would be way better." DMA is the solution 00:41:35.969 --> 00:41:40.989 to everyone's problems, as we all know. And I think the quote on the right 00:41:40.989 --> 00:41:47.390 ties in with the ACORN team's discovery that all of these other processes needed 00:41:47.390 --> 00:41:51.969 quite complex chipsets, quite expensive support chips. So the quote on the right 00:41:51.969 --> 00:41:56.539 says that if you've got some chips, that vendors will be charging more for their 00:41:56.539 --> 00:42:03.259 DMA devices even than the CPU. So not having dedicated DMA engine on board is a 00:42:03.259 --> 00:42:08.930 massive cost saving. The comment I made on the previous 2 slides about the system 00:42:08.930 --> 00:42:14.440 partitioning, putting a lot of attention into how many pins were on one chip versus 00:42:14.440 --> 00:42:19.380 another, how many buses were going around the place. Not having IOC having to access 00:42:19.380 --> 00:42:25.019 memory was a massive saving in cost for the number of pins and the system as a 00:42:25.019 --> 00:42:33.539 whole. The other thing is the FIQ mode was effectively the means for doing IO. 00:42:33.539 --> 00:42:37.999 Therefore, FIQ Mode was designed to be an incredibly low overhead way of doing 00:42:37.999 --> 00:42:44.010 programed IO, having the CPU do the IO. So this was saying that the CPU is 00:42:44.010 --> 00:42:48.850 going to be doing all of the IO stuff, but lets just optimize it, let's make it make 00:42:48.850 --> 00:42:53.930 it as good as it could be and that's what led to the programmed IO. I also 00:42:53.930 --> 00:42:57.849 remember ARM 2 didn't have a cache. If you don't have a cache on your CPU then 00:42:57.849 --> 00:43:03.099 DMA is going to hold up the CPU anyway, so no cycles. DMA is not any 00:43:03.099 --> 00:43:06.960 performance gain. You may as well get the CPU to do it and then get the CPU to 00:43:06.960 --> 00:43:13.029 do it in the lowest overhead way as possible. I think this can be summarized as bringing 00:43:13.029 --> 00:43:17.410 the "RISC principles" to the system. So the RISC principle, say for your CPU, you 00:43:17.410 --> 00:43:21.420 don't put anything in the CPU that you can do in software and this is saying, okay, 00:43:21.420 --> 00:43:26.789 we'll actually software can do the IO just as well without a cache as the DMA 00:43:26.789 --> 00:43:29.799 system. So let's get software to do that. And I think this is a kind of a nice way 00:43:29.799 --> 00:43:34.339 of seeing it. This is part of the cost optimization for really very little 00:43:34.339 --> 00:43:39.910 degradation in performance compared to doing in hardware. So this is an IO card. 00:43:39.910 --> 00:43:43.380 The euro cards then nice and easy. The only thing I wanted to say here was this 00:43:43.380 --> 00:43:48.839 is my SCSI card and it has a ROM on the left hand side. And so. This is the 00:43:48.839 --> 00:43:53.731 expansion ROM basically many, many years before PCI made this popular. Your drivers 00:43:53.731 --> 00:43:58.950 are on this ROM. This is a SCSI disc plugging into this and you can plug this 00:43:58.950 --> 00:44:02.990 card in and then boot off the disk. You don't need any other software to make it 00:44:02.990 --> 00:44:07.670 work. So this is just a very nice user experience. There is no messing around 00:44:07.670 --> 00:44:11.690 with configuring IO windows or interrupts or any of the iSCSI sort of stuff that was 00:44:11.690 --> 00:44:17.869 going on at the time. So to summarize some of the the hardware stuff that we've seen, 00:44:17.869 --> 00:44:21.950 the ARM is pipelined and it has the load- store-multiple -instructions which make 00:44:21.950 --> 00:44:27.950 for a very high bandwidth utilization. That's what gives it its high performance. 00:44:27.950 --> 00:44:32.670 The machine was really simple. So attention to detail about separating, 00:44:32.670 --> 00:44:37.239 partitioning the work between the chips and reducing the chip cost as much as 00:44:37.239 --> 00:44:44.569 possible. Keeping that balanced was really a good idea. The machine was designed when 00:44:44.569 --> 00:44:49.400 memory and CPUs were about the same speed. So this is before that kind of flipped 00:44:49.400 --> 00:44:52.910 over. An 8 MHz ARM 2 was designed to use 8 MHz memory. 00:44:52.910 --> 00:44:56.509 There's no need to have a cache at all on there these days it sounds really crazy 00:44:56.509 --> 00:45:01.410 not to have a cache on the CPU, but if your memory is not that much slower than this 00:45:01.410 --> 00:45:07.809 is a huge cost saving, but it is also risk saving. This was the first real proper CPU. 00:45:07.809 --> 00:45:11.670 If we don't count ARM 1 to say ARM 1 was a test, but ARM 2 is that, you know, the 00:45:11.670 --> 00:45:16.490 first product CPU. And having a cache on that would have been a huge risk for a 00:45:16.490 --> 00:45:20.640 design team that hadn't dealt with the structures that complicated at that 00:45:20.640 --> 00:45:22.599 point. So that was the right thing to do, I think 00:45:22.599 --> 00:45:25.569 and I talked about DMA. I'm actually 00:45:25.569 --> 00:45:28.636 converse on this. I thought this was crap. And actually, I think this was a really 00:45:28.636 --> 00:45:33.319 good example of balanced design. What's the right tool for the job? Software is 00:45:33.319 --> 00:45:37.757 going to do the IO, so let's make sure that FIQ mode, it makes sure that 00:45:37.757 --> 00:45:44.640 there's low overhead as possible. We talked about system partitioning. The MMU. 00:45:44.640 --> 00:45:49.299 I still think it's weird and backward. I think there is a 00:45:49.299 --> 00:45:56.029 strong argument though that a more familiar TLB is a massively complicated 00:45:56.029 --> 00:45:59.339 compared to what they did here. And I think the main drive here was not just 00:45:59.339 --> 00:46:06.120 area on the chip, but also to make it much simpler to implement. So it worked. And I 00:46:06.120 --> 00:46:09.450 think this was they really didn't have that many shots of doing this. This wasn't 00:46:09.450 --> 00:46:14.779 a company or a team that could afford to have many goes at this product. And I 00:46:14.779 --> 00:46:20.660 think that says it all. I think they did a great job. Okay. So the OS story is a 00:46:20.660 --> 00:46:24.599 little bit more complicated. Remember, it's gonna be this office automation 00:46:24.599 --> 00:46:28.920 machine a bit like a Xerox star. Was going to have this wonderful high res mono mode 00:46:28.920 --> 00:46:33.729 and people gonna be laser printing from it. So just like Xerox PARC, Acorn started 00:46:33.729 --> 00:46:37.911 Palo Alto based research center. Californians and beanbags writing an 00:46:37.911 --> 00:46:43.319 operating system using a micro kernel in Modula-2 all of the trendy boxes ticked 00:46:43.319 --> 00:46:49.400 here for the mid 80s. It was by the sounds a very advanced operating system and it 00:46:49.400 --> 00:46:54.029 did virtual memory and so on, is very resource hungry, though. And it was never 00:46:54.029 --> 00:47:00.130 really very performant. Ultimately, the hardware got done quicker than the 00:47:00.130 --> 00:47:05.403 software. And after a year or two. Management got the jitters. Hardware was 00:47:05.403 --> 00:47:09.320 looming and said, well, next year we're going to have the computer ready. Where's 00:47:09.320 --> 00:47:13.170 the operating system? And the project got canned. And this is a real shame. I'd love 00:47:13.170 --> 00:47:16.599 to know more about this operating system. Virtually nothing is documented outside of 00:47:16.599 --> 00:47:21.569 Acorn. Even the people, I spoke to, didn't work on this. A bunch of people in 00:47:21.569 --> 00:47:25.250 California that kind of disappeared with it. So if anyone has this software 00:47:25.250 --> 00:47:29.259 archived anywhere, then get in touch. Computer Museum around the corner from me 00:47:29.259 --> 00:47:35.369 is raring to go on that. That'll be really cool thing to archive. So anyway, they 00:47:35.369 --> 00:47:39.979 had now a desperate situation. They had to go to Plan B, which was in under a year write 00:47:39.979 --> 00:47:43.239 an operating system for the machine that was on its way to being delivered. 00:47:43.239 --> 00:47:48.260 And it kind of shows Arthur was I mean, I think the team did a really good job in 00:47:48.260 --> 00:47:53.160 getting something out of the door in half a year, but it was a little bit flaky. 00:47:53.160 --> 00:47:57.160 RISC OS then a year later, developed from Arthur. I don't know if anyone's 00:47:57.160 --> 00:48:01.609 heard of RISC OS, but Arthur is very, very niche and basically got 00:48:01.609 --> 00:48:07.170 completely replaced by RISC OS because it was a bit less usable than RISC OS. 00:48:07.170 --> 00:48:12.059 Another really strong point that this had it's quite a big ROM. So 2 MB going 00:48:12.059 --> 00:48:17.400 up...sorry, 0,5 MB in the 80s going up to 2 MB in the early 90s. 00:48:17.400 --> 00:48:21.739 There's a lot of stuff in ROM. One of those things is BBC Basic 5. I know 00:48:21.739 --> 00:48:29.289 it's 2019, and I know Basic is basic, but BBC Basic is actually quite good. It has 00:48:29.289 --> 00:48:32.859 procedures and it's got support for all the graphics and sound. You could write GUI 00:48:32.859 --> 00:48:36.660 applications in Basic and a lot of people did. It's also very fast. So Sophie Wilson 00:48:36.660 --> 00:48:42.920 wrote this very, very optimized Basic interpreter. I talked about the modules 00:48:42.920 --> 00:48:45.589 and podules. This is the expansion ROM things. And a really great user 00:48:45.589 --> 00:48:50.589 experience there. But speaking of user experience, this was ARTHUR . I never used 00:48:50.589 --> 00:48:57.969 ARTHUR. I just dug out a ROM and had a play with it. It's bloody horrible. So that 00:48:57.969 --> 00:49:03.819 went away quickly. At the time also. So part of this emergency plan B was to take 00:49:03.819 --> 00:49:08.210 the Acorn soft team who were supposed to be writing applications for this and get 00:49:08.210 --> 00:49:12.079 them to quickly knock out an operating system. So at launch, basically, this is 00:49:12.079 --> 00:49:15.750 one of the only things that you could do with the machine. Had a great demo called 00:49:15.750 --> 00:49:20.569 Lander, of a great game called Zarch, which is 3D space. You could fly around, 00:49:20.569 --> 00:49:27.029 it didn't have serious business applications. And, you know, it was very 00:49:27.029 --> 00:49:31.079 there was not much you could do with this really expensive machine at launch and 00:49:31.079 --> 00:49:35.450 that really hurt it, I think. So let me get RISC OS 2 in 1988 and this is now 00:49:35.450 --> 00:49:42.219 looking less like a vomit sort of thing, much nicer machine. And then eventually 00:49:42.219 --> 00:49:46.749 RISC OS 3. It was drag and drop between applications. It's all multitasking, 00:49:46.749 --> 00:49:52.849 does outline font anti aliasing and so on. So just lastly, I want to 00:49:52.849 --> 00:49:55.769 quickly touch on the really interesting operating systems that ACORN had a Unix 00:49:55.769 --> 00:49:59.079 operating system. So as well as being a geek, I'm also UNIX geek and I've always 00:49:59.079 --> 00:50:04.609 been fascinated by RISCiX. These machines are astonishingly expensive. They were 00:50:04.609 --> 00:50:08.191 the existing Archimedes machines with a different sticker on. So that's A540 with 00:50:08.191 --> 00:50:14.850 a sticker on the front. And this OS was developed after the Archimedes was 00:50:14.850 --> 00:50:18.359 already designed at that point when this OS was being developed. So 00:50:18.359 --> 00:50:20.950 there's a lot of stuff about the hardware that wasn't quite right for a Unix 00:50:20.950 --> 00:50:26.230 operating system. 32K page size on a 4 megabyte machine really, really killed you 00:50:26.230 --> 00:50:29.900 in terms of your page cache and and that kind of thing. They turned this into a bit 00:50:29.900 --> 00:50:35.089 of an opportunity. At least they made good on some of this. There was a quite a novel 00:50:35.089 --> 00:50:42.380 online decompression scheme for you to demand a page- text from a binary 00:50:42.380 --> 00:50:46.170 and it would decompress into your 32K page, but it was stored in a 00:50:46.170 --> 00:50:53.659 sparse way on disk. So actually on disk use was a lot less than you'd expect. The 00:50:53.659 --> 00:50:56.638 only way it fit on some of the smaller machines. 00:50:56.638 --> 00:51:02.160 Also Acorn TechL the department that designed the cyber truck it turns out. 00:51:02.160 --> 00:51:06.228 This was their view of the A680, which is an unreleased workstation. 00:51:06.228 --> 00:51:08.940 I love this picture. I like that piece of cheese or 00:51:08.940 --> 00:51:13.379 cake as the mouse. That's my favorite part. But this is the real machine. So 00:51:13.379 --> 00:51:18.730 this is an unreleased prototype I found at the computer museum. It's notable. And 00:51:18.730 --> 00:51:22.130 it's got 2 MEMCs. It's got a 8MB of RAM. It's only designed to run RISC iX, 00:51:22.130 --> 00:51:26.099 the Unix operating system and has highres monitor only doesn't have color, who's 00:51:26.099 --> 00:51:30.279 designed to run frame maker and driver laser printers and be a kind of desktop 00:51:30.279 --> 00:51:35.249 publishing workstation. I've always been fascinated by RISC iX, as I said a while 00:51:35.249 --> 00:51:41.450 ago I hacked around on ArcEm for a while. I got it booting in ArcEm. I'd never seen 00:51:41.450 --> 00:51:46.640 this before. I never used a RISC iX machine. So there we go, it boots, it is 00:51:46.640 --> 00:51:51.130 multi-user. But wait, there's more. It has a really cool X-Server, a very fast one. I 00:51:51.130 --> 00:51:54.730 think Sophie Wilson again worked on the X server here. So it's very well 00:51:54.730 --> 00:51:58.019 optimized and very fast for a machine of its era. And it makes quite a nice little 00:51:58.019 --> 00:52:02.900 Unix workstation. It's quite a cool little system, by the way Tudor, the guy that 00:52:02.900 --> 00:52:07.099 designed the VIDC and the IO system called me a sado forgetting this working in 00:52:07.099 --> 00:52:14.150 there. That's my claim to fame. Finally, and I want to leave some time for 00:52:14.150 --> 00:52:19.510 questions. There's a lot of useful stuff in ROM. One of them is BBC Basic. Basic 00:52:19.510 --> 00:52:23.009 has an assembler so you can walk up to this machine with a floppy disk and write 00:52:23.009 --> 00:52:29.529 assembler has a special bit of syntax there and then you can just call it. And 00:52:29.529 --> 00:52:32.460 so this is really powerful. So at school or something with the floppy disk, you can 00:52:32.460 --> 00:52:37.199 do something that's a bit more than basic programing. Bizarrely, I mostly write that 00:52:37.199 --> 00:52:41.420 with only two or three tiny syntax errors after about 20 years away from this. It's 00:52:41.420 --> 00:52:46.059 in there somewhere. Legacy wise, the machine didn't sell very many under a 00:52:46.059 --> 00:52:50.930 hundred thousand easily. I don't think it really made a massive impact. PCs had 00:52:50.930 --> 00:52:54.640 already taken off by then. The ARM processor, not going to go on about the 00:52:54.640 --> 00:52:58.920 company. That's clear that that obviously has changed the world in many 00:52:58.920 --> 00:53:04.140 ways. The thing I really took away from this exercise was that a handful of smart 00:53:04.140 --> 00:53:10.089 people. Not that many. No, order of a dozen designed multiple chips, designed a custom 00:53:10.089 --> 00:53:14.599 computer from scratch, got it working. And it was quite good. And I think that this 00:53:14.599 --> 00:53:17.380 really turned people's heads. It made people think differently that the people 00:53:17.380 --> 00:53:21.160 that were not Motorola and IBM really, really big companies with enormous 00:53:21.160 --> 00:53:27.479 resources could do this and could make it work. I think actually that led to the 00:53:27.479 --> 00:53:30.809 thinking that people could design their systems on the chip in the 90s and that 00:53:30.809 --> 00:53:35.309 market taking off. So I think this is really key in getting people thinking that 00:53:35.309 --> 00:53:40.420 way. It was possible to design your own silicon. And finally, I just want to thank 00:53:40.420 --> 00:53:45.279 the people I spoke to and Adrian and Jason. Their center of computing history in 00:53:45.279 --> 00:53:49.049 Cambridge. If you're in Cambridge, then please visit there. It's a really cool 00:53:49.049 --> 00:53:56.270 museum. And with that, I'll wrap up. If there's any time for questions, then I'm 00:53:56.270 --> 00:53:58.356 getting a blank look. No time for questions? 00:53:58.356 --> 00:54:01.890 Herald: There's about 5 minutes left for questions. 00:54:01.890 --> 00:54:07.880 Matt: Fantastic! Or come up to me afterwards. I'm happy to chat more about this. 00:54:07.880 --> 00:54:18.940 applause Herald:The first question is for the 00:54:18.940 --> 00:54:29.799 Internet. Signal angel, will you? Well, grab your microphones and get the 00:54:29.799 --> 00:54:36.700 first of the audio in the room here. There that microphone, please ask a question. 00:54:36.700 --> 00:54:44.130 Mic1: You mentioned that the system is making good use of the memory, but how is 00:54:44.130 --> 00:54:50.459 that actually not completely being stalled on memory? Having no cache and 00:54:50.459 --> 00:54:55.450 same cycle time for the cache- for the memory as for the CPU. 00:54:55.450 --> 00:55:01.000 M: Good question. So how is it not always stalled on memory ? I mean. Well, it's 00:55:01.000 --> 00:55:04.390 sometimes stalled on memory when you do something that's non sequential. You have 00:55:04.390 --> 00:55:08.869 to take one of the slow cycles. This was the N cycle. The key is you try and 00:55:08.869 --> 00:55:11.469 maximize the amount of time that you're doing sequential stuff. 00:55:11.469 --> 00:55:16.220 So on the ARM 2 you wanted to unroll loops as much as possible. So you're fetching 00:55:16.220 --> 00:55:19.799 your instructions sequentially, right? You wanted to make as much use of load-store 00:55:19.799 --> 00:55:24.290 multiples. You could load single registers with an individual register load, but it 00:55:24.290 --> 00:55:28.710 was much more efficient to pay that cost. Just once the start of the instruction and 00:55:28.710 --> 00:55:33.619 then stream stuff sequentially. So you're right that it is still stalled sometimes, 00:55:33.619 --> 00:55:37.141 but that was still a good tradeoff, I think, for a system that 00:55:37.141 --> 00:55:40.549 didn't have a cache for other reasons. M1: Thanks. 00:55:40.549 --> 00:55:45.140 Herald: Next question is for the Internet. Signal Angel: Are there any Acorns on 00:55:45.140 --> 00:55:49.839 sale right now or if you want to get into this kind of hardware where do you get it? 00:55:49.839 --> 00:55:52.810 Herald: Can you repeat the first sentence, please? Sorry, the first part. 00:55:52.810 --> 00:55:56.259 S: If you want to get into this kind of hardware right now, if you want to buy it 00:55:56.259 --> 00:55:58.839 right now. M: Yeah, good question. How do you 00:55:58.839 --> 00:56:06.359 get hold of one drive prices up on eBay? I guess I hate to say it. Might be fun to 00:56:06.359 --> 00:56:09.170 play around in emulators. Always perfer that to hack around on the 00:56:09.170 --> 00:56:12.309 real thing. Emulators always feel a bit strange. There are a bunch of really good 00:56:12.309 --> 00:56:19.180 emulators out there. Quite complete. Yeah, I think it just I would just go on 00:56:19.180 --> 00:56:23.260 auction sites and try and find one. Unfortunately, they're not completely 00:56:23.260 --> 00:56:27.829 rare. I mean that's the thing, they did sell. Not quite sure. Exact figure, 00:56:27.829 --> 00:56:31.500 but you know, there were tens and tens of thousands of these things made. So I would 00:56:31.500 --> 00:56:35.130 look also in Britain more than elsewhere. Although I do understand that Germany had 00:56:35.130 --> 00:56:40.170 quite a few. If you can get a hold of one, though, I do suggest doing so. I think 00:56:40.170 --> 00:56:46.259 they're really fun to play with. Herald: OK, next question. 00:56:46.259 --> 00:56:51.860 M2: So I found myself looking at the documentation for the LVM/STM instructions 00:56:51.860 --> 00:56:58.049 while devaluing something on ARM just last week. And just maybe wonder what's your 00:56:58.049 --> 00:57:04.029 thought? Are there any quirks of the Archimedes that have crept into the modern 00:57:04.029 --> 00:57:06.900 ARM design and instruction set that you are aware of? 00:57:06.900 --> 00:57:13.449 M: Most of them got purged. So there are the 26 bits adressing. There was a 00:57:13.449 --> 00:57:19.409 couple of strange uses of, there is an XOR instruction into PC for changing flags. So 00:57:19.409 --> 00:57:25.160 there was a great purge when the ARM 6 was designed and the ARM 6. I should know 00:57:25.160 --> 00:57:31.559 this ARM v3. That's got 32 bit addressing and lost this. These weirdnesses 00:57:31.559 --> 00:57:35.690 got moved out. I can't think of aside from just the 00:57:35.690 --> 00:57:40.619 resulting ARM 32 instructions that being quite quirky and having a lot of good 00:57:40.619 --> 00:57:46.789 quirks. This shifted register as sort of a free thing you can do. For example, you 00:57:46.789 --> 00:57:52.059 can add one register to a shifted register in one cycle. I think that's a good quirk. 00:57:52.059 --> 00:57:55.119 So in terms of the inheriting that instruction set and not changing those 00:57:55.119 --> 00:58:05.959 things. Maybe that counts? Herald: Any further questions? Internet, 00:58:05.959 --> 00:58:11.439 any new questions? No? Okay, so in that case one round of applause for Matt Evans. 00:58:11.439 --> 00:58:13.579 M: Thank you. 00:58:13.579 --> 00:58:21.142 applause 00:58:21.142 --> 00:58:27.679 postroll music 00:58:27.679 --> 00:58:43.658 Subtitles created by c3subtitles.de in the year 2021. Join, and help us!