36c3 preroll music Herald: Our next talk will be "The ultimate Acorn Archimedes Talk", in which there will be spoken about everything about the Archimedes computer. There's a promise in advance that there will be no heureka jokes in there. Give a warm welcome to Matt Evans. applause Matt Evans: Thank you. Okay. Little bit of retro computing first thing in the morning, sort of. Welcome. My name is Matt Evans. The Acorn Archimedes was my favorite computer when I was a small hacker and I'm privileged to be able to talk a bit little bit about it with you today. Let's start with: What is an Acorn Archimedes? So I'd like an interactive session, I'm afraid. Please indulge me, like a show of hands. Who's heard of the Acorn Archimedes before? Ah, OK, maybe 50, 60%. Who has used one? Maybe 10%, maybe. Okay. Who has programs - who has coded on an Archimedes? Maybe half? Two, three people. Great. Okay. Three. laughs Okay, so a small percentage. I don't see these machines as being as famous as say the Apple Macintosh or IBM PC. And certainly outside of Europe they were not that common. So this is kind of interesting just how many people here have seen this. So it was the first ARM- based computer. This is an astonishingly 1980s - I think one of them is drawing, actually. But they're not just the first ARM-based machine, but the machine that the ARM was originally designed to drive. It's a... Is that a comment for me? Mic? I'm being heckled already. It's only slide two. Let's see how this goes. So it's a two box computer. It looks a bit like a Mega S.T. ... to me. Its main unit with the processor and disks and expansion cards and so on. Now this is an A3000. This is mine, in fact, and I didn't bother to clean it before taking the photo. And now it's on this huge screen. That was a really bad idea. You can see all the disgusting muck in the keyboard. It has a bit of ink on it, I don't know why. But this machine is 30 years old. And this was luckily my machine, as I said, as a small hacker. And this is why I'm doing the talk today. This had a big influence on me. I'd like to say as a person, but more as an engineer. In terms of what my programing experience when I was learning to program and so on. So I live and work in Cambridge in the U.K., where this machine was designed. And through the funny sort of turn of events, I ended up there and actually work in the building next to the building where this was designed. And a bunch of the people that were on that original team that designed this system are still around and relatively contactable. And I thought this is a good opportunity to get on the phone and call them up or go for a beer with a couple of them and ask them: Why are things the way they are? There's all sorts of weird quirks to this machine. I was always wondering this, for 20 years. Can you please tell me - why did you do it this way? And they were a really good bunch of people. So I talked to Steve Ferber, who led the hardware design, Sophie Wilson, who was the same with software. Tudor Brown, who did the video system. Mike Miller, the IO system. John Biggs and Jamie Urquhart , who did the silicon design, I spoiled one of the surprises here. There's been some silicon design that's gone on in building this Acorn. And they were all wonderful people that gave me their time and told me a bunch of anecdotes that I will pass on to you. So I'm going to talk about the classic Arc. There's a bunch of different machines that Acorn built into the 1990s. But the ones I'm talking about started in 1987. There were 2 models, effectively a low end and a high end. One had an option for a hard disk, 20 megabytes, 2300 pounds, up to 4MB of RAM. They all share the same basic architecture, they're all basically the same. So the A3000 that I just showed you came out in 1989. That was the machine I had. Those again, the same. It had the memory controller slightly updated, was slightly faster. They all had an ARM 2. This was the released version of the ARM processor designed for this machine, at 8 MHz. And then finally in 1990, what I call the last of the classic Arc, Archimedes, is the A540. This was the top end machine - could have up to 16 MB of memory, which is a fair bit even in 1990. It had a 30 MHz ARM 3. The ARM 3 was the evolution of the ARM 2, but with a cache and a lot faster. So this talk will be centered around how these machines work, not the more modern machines. So around 1987, what else was available? This is a random selection of machines. Apologies if your favorite machine is not on this list. It wouldn't fit on the slide otherwise. So at the start of the 80s, we had the exotic things like the Apple Lisa and the Apple Mac. Very expensive machines. The Amiga - I had to put in here. Started off relatively expensive because the Amiga 500 was, you know, very good value for money, very capable machine. But I'm comparing this more to PCs and Macs, because that was the sort of, you know, market it was going for. And although it was an expensive machine compared to Macintosh, it was pretty cheap. Even put NeXT Cube on there, I figured that... I'd heard that they were incredibly expensive. And actually compared to the Macintosh, they're not that expensive at all. Well I don't know which one I would have preferred. So the first question I asked them - the first thing they told me: Why was it built? I've used them in school and as I said, had one at home. But I was never really quite sure what it was for. And I think a lot of the Acorn marketing wasn't quite sure what it was for either. They told me it was the successor to the BBC Micro, this 8 bit machine. Lovely 6502 machine, incredibly popular, especially in the UK. And the goal was to make a machine that was 10 times the performance of this. The successor would be 10 times faster at the same price. And the thing I didn't know is they had been inspired. The team Acorn had seen the Apple Lisa and the Xerox Star, which comes from the famous Xerox Alto, Xerox PARC, first GUI workstation in the 70s, monumental machine. They'd been inspired by these machines and they wanted to make something very similar. So this is the same story as the Macintosh. They wanted to make something that was desktop machine for business, for office automation, desktop publishing and that kind of thing. But I never really understood this before. So this was this inspiration came from the Xerox machines. It was supposed to be obviously a lot more affordable and a lot faster. So this is what happens when Acorn marketing gets hold of this vision. So Xerox Star on the left is this nice, sensible business machine. Someone's wearing nice, crisp suit bumps microphon banging their microphone - and it gets turned into the very Cambridge Tweed version on the right. It's apparently illegal to program one of these if you're not wearing a top hat. But no one told me that when I was a kid. And my court case comes up next week. So Cambridge is a bit of a funny place. And for those that been there, this picture on the right sums it all up. So they began Project A, which was build this new machine. And they looked at the alternatives. They looked at the processors that were available at that time, the 286, the 68 K, then that semi 32016, which was an early 32 bit machine, a bit of a weird processor. And they all had something in common that they're ridiculously expensive and in Tudors words a bit crap. They weren't a lot faster than the BBC Micro. They're a lot more expensive. They're much more complicated in terms of the processor itself. But also the system around them was very complicated. They need lots of weird support chips. This just drove the price up of the system and it wasn't going to hit that 10 times performance, let alone at the same price point. They'd visited a couple of other companies designing their own custom silicon. They got this idea in about 1983. They were looking at some of the RISC papers coming out of Berkeley and they were quite impressed by what a bunch of grad students were doing. They managed to get a working RISC processor and they went to Western Design Center and looked at 6502 successors being design there. They had a positive experience. They saw a bunch of high school kids with Apple 2s doing silicon layout. And they though "OK, well". They'd never designed a CPU before at ACORN. ACORN hadn't done any custom silicon to this degree, but they were buoyed by this and they thought, okay, well, maybe RISC is the secret and we can do this. And this was not really the done thing in this timeframe and not for a company the size of ACORN, but they designed their computer from scratch. They designed all of the major pieces of silicon in this machine. And it wasn't about designing the ARM chip. Hey, we've got a processor core. What should we do with it? But it was about designing the machine that ARM and the history of that company has kind of benefited from. But this is all about designing the machine as a whole. They're a tiny team. They're a handful of people - about a dozen...ish that did the hardware design, a similar sort of order for software and operating systems on top, which is orders of magnitude different from IBM and Motorola and so forth that were designing computers at this time. RISC was the key. They needed to be incredibly simple. One of the other experiences they had was they went to a CISC processor design center. They had a team in a couple of hundred people and they were on revision H and it still had bugs and it was just this unwieldy, complex machine. So RISC was the secret. Steve Ferber has an interview somewhere. He jokes about ACORN management giving him two things. Special sauce was two things that no one else had: He'd no people and no money. So it had to be incredibly simple. It had to be built on a shoestring, as Jamie said to me. So there are lots of corners cut, but in the right way. I would say "corners cut", that sounds ungenerous. There's some very shrewd design decisions, always weighing up cost versus benefit. And I think they erred on the correct side for all of them. So Steve sent me this picture. That's he's got a cameo here. That's the outline of him in the reflection on the glass there. He's got this up in his office. So he led the hardware design of all of these chips at ACORN. Across the top, we've got the original ARM, the ARM 1, ARM 2 and the ARM 3 - guess the naming scheme - and the video controller, memory controller and IO controller. Think, sort of see their relative sizes and it's kind of pretty. This was also on a processor where you could really point at that and say, "oh, that's the register file and you can see the cache over there". You can't really do that nowadays with modern processors. So the bit about the specification, what it could do, the end product. So I mentioned they all had this ARM 2 8MHz, up to four MB of RAM, 26-bit addresses, remember that. That's weird. So a lot of 32-bit machines, had 32-bit addresses or the ones that we know today do. That wasn't the case here. And I'll explain why in a minute. The A540 had a updated CPU. The memory controller had an MMU, which was unusual for machines of the mid 80s. So it could support, the hardware would support virtual memory, page faults and so on. It had decent sound, it had 8-channel sound, hardware mixed and stereo. It was 8 bit, but it was logarithmic - so it was a bit like u-law, if anyone knows that - instead of PCM, so you got more precision at the low end and it sounded to me a little bit like 12 bit PCM sound. So this is quite good. Storage wise, it's the same floppy controller as the Atari S.T.. It's fairly boring. Hard disk controller was a horrible standard called ST506, MFM drives, which were very, very crude compared to disks we have today. Keyboard and mouse, nothing to write home about. I mean, it was a normal keyboard. It was nothing special going on there. And printer port, serial port and some expansion slots which, I'll outline later on. The thing I really liked about the ARC was the graphics capabilities. It's fairly capable, especially for a machine of that era and of the price. It just had a flat frame buffer so it didn't have sprites, which is unfortunate. It didn't have a blitter and a bitplanes and so forth. But the upshot of that is dead simple to program. It had a 256 color mode, 8 bits per pixel, so it's a byte, and it's all just laid out as a linear string of bytes. So it was dead easy to just write some really nice optimized code to just blit stuff to the screen. Part of the reason why there isn't a blitter is actually the CPU was so good at doing this. Colorwise, it's got paletted modes out of a 4096 color palette, same as the Amiga. It has this 256 color mode, which is different. The big high end machines, the top end machines, the A540 and the A400 series could also do this very high res 1152 by 900, which was more of a workstation resolution. If you bought a Sun workstation a Sun 3 in those days, could do this and some higher resolutions. But this is really not seen on computers that might have in the office or school or education at the end of the market. And it's quite clever the way they did that. I'll come back to that in a sec. But for me, the thing about the ARC: For the money, it was the fastest machine around. It was definitely faster than 386s and all the stuff that Motorola was doing at the time by quite a long way. It is almost eight times faster than a 68k at about the same clock speed. And it's to do with it's pipelineing and to do with it having a 32 bit word and a couple of other tricks again. I'll show you later on what the secret to that performance was. About minicomputer speed and compared to some of the other RISC machines at the time, it wasn't the first RISC in the world, it was the first cheap RISC and the first RISC machine that people could feasibly buy and have on their desks at work or in education. And if you compare it to something like the MIPS or the SPARC, it was not as fast as a MIPS or SPARC chip. It was also a lot smaller, a lot cheaper. Both of those other processers had very big Die. They needed other support chips. They had huge packages, lots of pins, lots of cooling requirements. So all this really added up. So I priced up a Sun 4 workstation at the time and it was well over four times the price of one of these machines. And that was before you add on extras such as disks and network interfaces and things like that. So it's very good, very competitive for the money. And if you think about building a cluster, then you could get a lot more throughput, you could network them together. So this is about as far as I got when I was a youngster, I was wasn't brave enough to really take the machine apart and poke around. Fortunately, now it's 30 years old and I'm fine. I'm qualified and doing this. I'm going to take it apart. Here's the motherboard. Quite a nice clean design. This was built in Wales for anyone that's been to the UK. Very unusual these days. Anything to be built in the UK. It's got several main sections around these four chips. Remember the Steve photo earlier on? This is the chip set: the ARM BMC, PDC, IOC. So the IOC side of things happens over on the left video and sound in the top right. And the memory and the processor in the middle. It's got a megabyte onboard and you can plug in an expansion for 4 MB. So memory map from the software view. I mentioned this 26-bit addressing and I think this is one of the key characteristics of one of these machines. So you have a 64MB address space, it's quite packed. That's quite a lot of stuff shoehorned into here. So there's the memory. The bottom half of the address space, 32MB of that is the processor. It's got user space and privilege mode. It's got a concept of privilege within the processor execution. So when you're in user mode, you only get to see the bottom half and that's the virtual maps. There's the MMU, that will map pages into that space and then when you're in supervisor mode, you get to see the whole of the rest of the memory, including the physical memory and various registers up the top. The thing to notice here is: there's stuff hidden behind the ROM, this address space is very packed together. So there's a requirement for control registers, for the memory controller, for the video controller and so on, and they write only registers in ROM basically. So you write to the ROM and you get to hit these registers. Kind of weird when you first see it, but it was quite a clever way to fit this stuff into the address space. So it will start with the ARM1. So Sophie Wilson designed the instruction set late 1983, Steve took the instruction set and designed the top level, the block, the micro architecture of this processor. So this is the data path and how the control logic works. And then the VLSI team, then implemented this, did their own custom cells. There's a custom data path and custom logic throughout this. It took them about a year, all in. Well, 1984, that sort of... This project A really kicked off early 1984. And this staked out first thing early 1985. The design process the guys gave me a little bit of... So Jamie Urquhart and John Biggs gave me a bit of an insight into how they worked on the VLSI side of things. So they had an Apollo workstation, just one Apollo workstation, the DN600. This is a 68K based washing machine, as Jamie described it. It's this huge thing. It cost about 50˙000 £. It's incredibly expensive. And they designed all of this with just one of these workstations. Jamie got in at 5:00 a.m., worked until the afternoon and then let someone else on the machine. So they shared the workstation, they worked shifts so that they could design this whole thing on one workstation. So this comes back to that. It was designed on a bit of a shoestring budget. When they got a couple of other workstations later on in the projects, there was an allegation that the software might not have been licensed initially on the other workstations and the CAD software might have been. I can neither confirm nor deny whether that's true. So Steve wrote a BBC Basic simulator for this. When he's designing this block level micro architecture run on his BBC Micro. So this could then run real software. There could be a certain amount of software development, but then they could also validate that the design was correct. There's no cache on this. This is a quite a large chip. 50 square millimeters was the economic limit of those days for this part of the market. There's no cache. That also would have been far too complicated. So this was also, I think, quite a big risk, no pun intended. The aim of doing this with such a small team that they're all very clever people. But they hadn't all got experience in building chips before. And I think they knew what they were up against. And so not having a cache of complicated things like that was the right choice to make. I'll show you later that that didn't actually affect things. So this was a RISC machine. If anyone has not programmed ARM in this room then get out at once. But if you have programed ARM this is quite familiar with some differences. It's a classical three operand RISC, its got three shift on one of the operands for most of the instructions. So you can do things like static multiplies quite easily. It's not purist RISC though. It does have load or store multiple instructions. So these will, as the name implies, load or store multiple number of registers in one go. So one register per cycle, but it's all done through one instruction. This is not RISC. Again, there's a good reason for doing that. So when one comes back and it gets plugged into a board that looks a bit like this. This is called the A2P, the ARM second processor. It plugs into a BBC Micro. It's basically there's a thing called the Tube, which is sort of a FIFO like arrangement. The BBC Micro can send messages one way and this can send messages back. And the BBC Micro has the discs, it has the I/O, keyboard and so on. And that's used as the hosts to then download code into one megabytes of RAM up here and then you combine the code on the ARM. So this was the initial system, 6 MHz. The thing I found quite interesting about this, I mentioned that Steve had built this BBC Basic simulation, one of the early bits of software that could run on this. So he'd ported BBC Basic to ARM and written an ARM version of it. The Basic interpreter was very fast, very lean, and it was running on this board early on. They then built a simulator called ASIM, which was an event based simulator for doing logic design and all of the other chips in the chips on the chipset that were simulated using ASIM on ARM1 which is quite nice. So this was the fastest machine that they had around. They didn't have, you know, the thousands of machines in the cluster like you'd have in a modern company doing EDA. They had a very small number of machines and these were the fastest ones they had about. So ARM2 was simulated on ARM1 and all the other chipset. So then ARM2 comes along. So it's a year later, this is a shrink of the design. It's based on the same basic micro architecture but has a multiplier now. It's a booth multiplier , so it is at worst case, 16 cycle, multiply just two bits per clock. Again, no cache. But one thing they did add in on to is banked registers. Some of the processor modes I mentioned there's an interrupt mode. Next slide, some of the processor modes will basically give you different view on registers, which is very useful. These were all validated at 8 MHz. So the product was designed for 8 MHz. The company that built them said, okay, put the stamp on the outside saying 8 MHz. There's two versions of this chip and I think they're actually the same silicon. I've got a suspicion that they're the same. They just tested this batch saying that works at 10 or 12. So on my project list is overclocking my A3000 to see how fast it'll go and see if I can get it to 12 MHz. Okay. So the banking of the registers. ARM has got this even modern 32 bit type of interrupts and an IRQ pronounced "erk" in English and FIQ pronounced "fic" in English. I appreciate it doesn't mean quite the same thing in German. So I call if FIQ from here on in and FIQ mode has this property where the top half of the registers are effectively different registers when you get into this mode. So this lets you first of all you don't have to back up those registers. I mean your FIQ handler. And secondly if you can write an FIQ handler using just those registers and there's enough for doing most basic tasks, you don't have to save and restore anything when you get an interrupt. So this is designed specifically to be very, very low overhead interrupt mode. So I'm coming to why there's a 26 bit address space. And so I found this link very unintuitive. So unlike 32 bit ARM, the more modern 1990s onwards ARMs, the program counter register 15 doesn't just contain the program counter, but also contains the status flags and processor mode and effectively all of the machine state is packed in there as well. So I asked the question, well why, why 64 megabytes of address space? What's special about 64. And Mike told me, well, you're asking the wrong question. It's the other way round. What we wanted was this property that all of the machine state is in one register. So this means you just have to save one register. Well, you know, what's the harm in saving two registers? And he reminded me of this FIQ mode. Well, if you're already in a state where you've really optimized your interrupt handler so that you don't need any other registers to deal with, you're not saving restoring anything apart from your PC, then saving another register is 50 percent overhead on that operation. So that was the prime motivator was to keep all of the state in one word. And then once you take all of the flags away, you're left with 24 bits for a word aligned program counter, which leads to 26 bit addressing. And that was then seen as well, 64 MB is enough. There were machines in 1985 that, you know, could conceivably have more memory than that. But for a desktop that was still seen as a very large, very expensive amount of memory. The other thing, you don't need to reinvent another instruction to do return from exception so you can return using one of your existing instructions. In this case, it's the subtract into PC which looks a bit strange, but trust me, that does the right thing. So the memory controller. This is - I mentioned the address translation, so this has an MMU in it. In fact, the thing directly on the left hand side. I was worried that these slides actually might not be the right resolution and they might be sort of too small for people to see this. And in fact, it's the size of a house is really useful here. So the left hand side of this chip is the MMU. This chip is the same size as ARM2. Yeah, pretty much. So that's part of the reason why the MMU is on another chip ARM2 was as big as they could make it to fit the price as you don't have anyone here done silicon design. But as the area goes up effectively your yield goes down and the price it's a non-linear effect on price. So the MMU had to be on a separate chip and it's half the size of that as well. MEMC does most mundane things like it drives DRAM, it does refresh for DRAM and it converts from linear addresses into row and column addresses which DRAM takes. So the key thing about this ARM and MEMC binding is the key factor of performance is making use of memory bandwidth. When the team had looked at all the other processors in Project A before designing their own, one of the things they looked at was how well they utilized DRAM and 68K and the semi chips made very, very poor use of DRAM bandwidth. Steve said, well, okay. The DRAM is the most expensive component of any of these machines and they're making poor use of it. And I think a key insight here is if you maximize that use of the DRAM, then you're going to be able to get much higher performance in those machines. And so it's 32 bits wide. The ARM is pipelined, so it can do a 32 bit word every cycle. And it also indicates whether it's sequential or non sequential addressing. This then lets your MEMC decide whether to do an N cycle or an S cycle. So there's a fast one and a slow one basically. So when you access a new random address and DRAM, you have to open that row and that takes twice the time. It's a 4 MHz cycle. But then once you've access that address and then once you're accessing linearly ahead of that address, you can do fast page mode accesses, which are 8 MHz cycles. So ultimately, that's the reason why these load store multiples exist. The non-RISC instructions, they're there so that you can stream out registers and back in and make use of this DRAM bandwidth. So store multiple. This is just a simple calculation for 14 registers, you're hitting about 25 megabytes a second out of 30. So this is it's not 100%, but it's way more than a 10th or an 8th. Which a lot of the other processors were using. So this was really good. This is the prime factor of why this machine was so fast. It's effectively the load store multiple instructions and being able to access the stuff linearly. So the MMU is weird. It's not TLB in the traditional sense, so TLB's today, if you take your MIPS chip or something where the TLB is visible to software, it will map a virtual address into a chosen physical address and you'll have some number of entries and you more or less arbitrarily, you know, poke an entry and with the set mapping in it. The MEMC does it upside down. So it says it's got a fixed number of entries for every page in DRAM. And then for each of those entries, it checks an incoming address to see whether it matches. So it has all of those entries that we've showed on the chip diagram a couple of slides ago. That big left hand side had that big array. All of those effectively just storing a virtual address and then matching it and have a comparator. And then one of them lights up and says yes, it's mine. So effectively, the aphysical page says that virtual address is mine instead of the other way round. So this also limits your memory. If you're saying I have to have one of these entries on chip per page of physical memory and you don't want pages to be enormous. The 32 K if you do the maths is 4 MB over 128 pages, it's a 32K page. If you don't want the page to get much bigger than that and trust me you don't, then you need to add more of these entries and it's already half the size of the chip. So effectively, this is one of the limits of why you can only have 4 MB on one of these memory controller chips. OK. So VIDC is the core of the video and sound system. It's a set of FIFOs and a set of shift digital analog converters for doing video and sound. You stream stuff into the FIFOs and it does the display timing and pallet lookup and so forth. It has an 8 bit mode I mentioned. It's slightly strange. It also has an output for transparency bit. So in your palette you can set 12 bits of color, but you can set a bit of transparency as well so you can do video gen- looking quite easily with this. So there was a revision later on Tudor explains that the very first one had a bit of crosstalk between the video and the sound, so you'd get sound with noise on it. That was basically video noise and it's quite hard to get rid of. And so they did this revision and the way he fixed it was quite cool. They shuffled the power supply around and did all the sensible engineering things. But he also filtered out a bit of the noise that is being output on the sound. He inverted it and then fed that back in as the reference current for the DACs. So that sort of self compensating and took the noise a bit like the noise canceling headphones. It was kind of a nice hack. And that was that was VIDC1. OK, the final one, I'm going to stop showing you chip plots after this, unfortunately, but just get your fill while we're here. And again, I'm really glad this is enormous for the people in the room and maybe those zooming in online. There's a cool little Illuminati eye logo in the bottom left corner. So I feared that you weren't gonna be able to see and I didn't have time to do zoomed in version, but. Okay. So IOC is the center of the IO system as much of the IO system as possible, all the random bits of glue logic to do things like timing. Some peripherals are slower than others lives in IOC. It contains a UART for the keyboard, so the keyboard is looked after by an 8051 microcontroller. Just nice and easy, you don't have to do scanning in software. This microcontroller just sends stuff up of serial port to this chip. So UART keyboard, asynchronous receiver and transmitter. It was at one point called the fast asynchronous receiver and transmitter. Mike got forced to change the name. Not everyone has a 12 year old sense of humor, but I admire his spirit. So the other thing it does is interrupts all the interrupts go into IOC and it's got masks and consolidates them effectively for sending an interrupt up to the on the ARM. The ARM can then check the status and do fast response to it. So the eye of providence there, the little logo I pointed out, Mike said he put that in for future archaeologists to wonder about. Okay. That was it. I was hoping there'd be this big back story about, you know, he was in the Illuminati or something. Maybe he is, but not allowed to say anyway. So just like the other dev board I showed you so this one's A 500 2P, it's still a second processor that plugs into a BBC Micro. It's still got this host having disk drives and so forth attached to it and pushing stuff down the tube into the memory here. But now, finally all of this, the chip set now assembled in one place. So this is starting to look like an Archimedes. It got video out. It's got keyboard interface. It's got some expansion stuff. So this is bring up an early software headstart. But very shortly afterwards, we got the a five A500 internal to Acorn. And this is really the first Archimedes. This is the prototype Archimedes. Actually got a gorgeous gray brick sort of look to it, kind of concrete. It weighs like concrete, too, but it has all the hallmarks. It's got the IO interfaces, it's got the expansion slots. You can see at the back. It's got all, it runs the same operating system. Now, this was used for the OS development. There's only a couple of hundred of these made. Well, this is a serial 222. So this is one of the last, I think. But yeah. Only an internal to ACORN. There are lots of nice tweaks to this machine. So the hardware team had designed this, Tudor designed this as well as the video system. And he said, well, his A500 was the special one that he had a video controller. He'd hand-picked one of the VCs so that instead of running at 24 MHz to run at 56, so some silicon variations in manufacturer. So he found a 56 MHz part so he could do. I think it was 1024 x 768, which is way out of respect for the rest of the Archimedes. So he had the really, really cool machine. They also ran some of them at 12 MHz as well instead of 8. This is a massive performance improvement. I think it used expensive memory, which is kind of out of reach for the product. Right. So believe me, this is the simplified circuit diagram. The technical reference manuals are available online if anyone wants the complicated one. The main parts of the display are ARM, MEMC, VIDC and some RAM and we have a little walk through them. So the clocks are generated actually by the memory controller. Memory controller gives the clocks to the ARM. The main reason for this is that the memory controller has to do some slow things now and then. It has to open pages of DRAMs, refresh cycles and things. So it stops the CPU and generates the clock and it pauses the CPU by stopping that clock from time to time. When you do a DRAM access, your adress on bus along the top, the ARM outputs an address that goes into the MEMC. The MEMC then converts that, it does an address translation and then it converts that into a row and column addresses suitable for DRAM. And then if you're doing a read DRAM and outputs the address, outputs the data onto the date bus, which on MEMC, this kind of MEMC, the critical path on this, but the address flows through MEMC effectively. Notice that MEMC is not on the data bus. It just gets addresses flowing through it become important later on. ROM is another slow things. Another reason why MEMC might slow down the access from CPU, it works in a similar sort of way. There is also a permission, check done when you're doing the address translation per IO user permission versus IOS? a supervisor and so this information's output as part of the cycle when IO does access. If you miss and that translation, you get a page fault or permission fault, then an abort signal comes back and you take an exception ION? deals with that in software. The data bus is a critical path, and so the IO stuff is buffered, it is kept away from that. So the IO bus is 16 bits and not a lot 32 bit peripherals around in those days that will the peripherals 8 or 16 bits. So that's the right thing to do. The IOC decodes that and there's a handshake with MEMC and if it needs more time, if it's accessing one of the expansion cards in the expansion card is that something slow on X? then that's dealt with in the IOC. So I mentioned the interrupt status that gets funneled into IOC and then back out again. There's a VSync interrupt, but not an HSync interrupt. You have to use timers for that, really annoyingly. There's one timer and there's a 2 MHz timer available. I think I had in a previous slide not previously mentioned it. So if you want to do funny palette switching stuff or copper bars or something - that's possible with the timers, it's also simple hardware mod to make a real HSync interrupt as well. There's some spare interrupt inputs on the IOC as an exercise for you . So the bit I really like about this system, I mentioned that MEMC is not on the data bus. The VIDC is only on the data bus and it doesn't have an address by C then. The VIDC is the thing responsible for turning the frame buffer into video, reading that frame buffer out of RAM, so on. So how does it actually do that RAM read without the address? Well, the MEMC contains all of the registers for doing this DMA?: the start of the frame buffer, the current position and size, and so on. They will live in the MEMC. So there's a handshake where VIDC sends a request up to the MEMC. When it's FIFO gets low, the MEMC then actually generates the address into the DRAM diagram, DRAM outputs that data and then gives the MEMC, gives an acknowledged to the ARM Excuse me - too many chips. The MEMC gives an acknowledged to VIDC, which then matches that data into the FIFO. So this partitioning is quite neat. A lot of the video, DMA. The video DMA stuff all lives in MEMC and there's this kind of split across the two chips. The sound one I've just highlighted one interrupt that comes from MEMC. Sound works exactly the same way, except there's a double buffering scheme that goes on. And when one half of it becomes empty, you get an interrupt so you can be sure that so you don't glitch your sound. So this all works really very smoothly. So finally the high res- mono thing that I mentioned before is quite novel way they did that Tudor had realized that with one external component to the shift register and running very fast, he could implement this very high resolution mode without really affecting the rest of the chip. So VIDC still runs at 24 MHz to sort of VGA resolution. The outputs on a digital bus that was a test board, originally. It outputs 4 bits. So 4 pixels in one chunk at 24 MHz and this external component then shifts through that 4 times the speed. There's one component. I mean, this is a very cheap way of doing this. And as I said, this high res- mode is very unusual for machines of this era. I've got a feeling an A500 the top end machine, if anyone's got one of these and wants to try this trick and please get in touch, I've got a feeling an A500 will do 1280 x 1024 by overclocking this. I think all of the parts survive it. But for some reason, ACORN didn't support that on the board. And finally, clock selection VIDC on some of the machines, quite flexible set of clocks for different resolutions, basically. So MEMC is not on the data bus. How do we program it? It's got registers for DMA and it's got all this address translation. So the memory map I showed before has an 8 MB space reserve for the address translation registers. It doesn't have 8 MB of it. I mean, doesn't have two million... 32 bit registers behind them, which is a hint of what's going on here. So what you do is you write any value to this space and you encode the information that you want to put into one of these registers in the address. So this address, the top three bits of 1 - it's in the top 8 MB of the 64 MB address space and you format your logical physical page information in this address and then you write any byte effectively. This is a sort of feels really dirty, but also really a very nice way of doing it because there's no other space in the address map. And this reads to the the price balance. So it's not worth having an address bus going into MEMC costing 32 more pins just to write these registers as opposed to playing this sort of trick. If you have that address bus just for that data bus, just for that, then you have to get to a more expensive package. And this was really in their minds: a 68 pin chip versus an 84 pin chip. It was a big deal. So everything they really strived to make sure it was in the very smallest package possible. And this system partitioning effort led to these sorts of tricks to then program it. So on the A540, we get multiple MEMCs. Each one is assigned a colored stripe here of the physical address space. So you have a 16 MB space, each one looks after 4 MB of it. But then when you do a virtual access in the bottom half of the user space, regular program access, all of them light up and all of them will translate that address in parallel. And one of them hopefully will translate and then energize the RAM to do the read, for example, when you put an ARM 3 in this system, the ARM 3 has its cache and then the address leads into the MEMC. So then that means that the address is being translated outside of the cache or after the cache. So your caching virtual addresses and as we all know, this is kind of bad for performance because whenever you change that virtual address space, you have to invalidate your cache. Target, but they didn't do that. There's other ways of solving this problem. Basically on this machine, what you need to do is invalidate the whole cache. It's quite a quick operation, but it's still not good for performance to have an empty cache. The only DMA present in the system is for the video, for the video and sound. I/O doesn't have any DMA at all. And this is another area where as younger engineers see crap, why didn't they have DMA? That would be way better. DMA is the solution to everyone's problems, as we all know. And I think the quote on the right hand ties in with the ACORN team's discovery that all of these other processes needed quite complex chipsets, quite expensive support chips. So the quote on the right says that if you've got some chips, that vendors will be charging more for their DMA devices even than the CPU. So not having dedicated DMA engine on board is a massive cost saving. The comment I made on the previous 2 slides about the system partitioning, putting a lot of attention into how many pins were on one chip versus another, how many buses were going around the place. Not having IOC having to access memory was a massive saving and cost for the number of pins and the system as a whole. The other thing is the the FIQ mode was effectively the means for doing IO. Therefore, FIQ Mode was designed to be an incredibly low overhead way of doing programed IO by having the CPU do the IO. So this was saying that the CPU is going to be doing all of the IO stuff, but lets just optimize it, let's make it make it as good as it could be and that's what led to the programmed IO. I also remember ARM 2 didn't have a cache. If you don't have a cache on your CPU you can. DMA is going to hold up the CPU anyway, so no cycles. DMA is not any performance, again. You may as well get the CPU to do it and then get the CPU to do it in the lowest overhead way - is possible. I think this can be summarized as bringing the "RISC principles" to the system. So the RISC principle, say for your CPU, you don't put anything in the CPU that you can do in software and this is saying, okay, we'll actually software can do the IO just as well without the cache as the DMA system. So let's get software to do that. And I think this is a kind of a nice way of seeing it. This is part of the cost optimization for really very little degradation in performance compared to doing in hardware. So this is an IO card. The euro cards then nice and easy. The only thing I wanted to say here was this is my SCSI card and it has a ROM on the left hand side. And so. This is the expansion ROM basically many, many years before PCI made this popular. Your drivers are on this ROM. This is a SCSI disc plugging into this and you can plug this card in and then boot off the desk. You don't need any other software to make it work. So this is just a very nice user experience. There is no messing around with configuring IO windows or interrupts or any of the iSCSI sort of stuff that was going on at the time. So to summarize some of the the hardware stuff that we've seen, the AMAs pipeline and it has the load- store-multiple -instructions which make for a very high bandwidth utilization. That's what gives it its high performance. The machine was really simple. So attention to detail about separating, partitioning the work between the chips and reducing the chip cost as much as possible. Keeping that balanced was really a good idea. The machine was designed when memory and CPUs were about the same speed. So this is before that kind of flipped over. An 8 MHz ARM 2 was designed to use 8 MHz memory. There's no need to have a cache at all on there these days it sounds really crazy not to have a cache on the CPU, but if your memory is not that much slower than this is a huge cost saving, but it is also risk saving. This was the first real proper CPU. If we don't count ARM 1 to say ARM 1 was a test, but ARM 2 is that, you know, the first product CPU. And having a cache on that would have been a huge risk for a design team that hadn't dealt with the structures that complicated at that point. So that was the right thing to do, I think, and took that DMA. I'm actually converse on this. I thought this was crap. And actually, I think this was a really good example of balanced design. What's the right tool for the job? Software is going to do the IO, so let's make sure that the FIQ mode, it makes sure that there's low overhead as possible. Wou talked about system partitioning. The MMU. I still think it's weird and backward. I think there is a strong argument though that a more familiar TLB is a massively complicated compared to what they did here. And I think the main drive here was not just area on the chip, but also to make it much simpler to implement. So it worked. And I think this was they really didn't have that many shots of doing this. This wasn't a company or a team that could afford to have many goes at this product. And I think that says it all. I think they did a great job. Okay. So the ARX story is a little bit more complicated. Remember, it's gonna be this office automation machine a bit like a Xerox star. Was going to have this wonderful [high ARMs mono]? mode and people gonna be laser printing from it. So just like Xerox PARC Aiken started Palo Alto based research center. Californians and beanbags writing an operating system using a micro kernel in Modula-2 all of the trendy boxes ticked here for the mid 80s. It was the sounds that very advanced operating system and it did virtual memory and so on, is very resource hungry, though. And it was never really very performance. Ultimately, the hardware got done quicker than the software. And after a year or two. Management got the jitters. Hardware was looming and said, well, next year we're going to have the computer ready. Where's the operating system? And the project got canned. And this is a real shame. I'd love to know more about this operating system. Virtually nothing is documented outside of ACORN. Even the people, I spoke to, didn't work on this. A bunch of people in California that kind of disappeared with it. So if anyone has this software archived anywhere, then get in touch. Computer Museum around the corner from me is raring to go on that. That'll be really cool things to archive. So anyway, they had now a desperate situation. They had to go to Plan B, which was in under a year write an operating system for the machine that was on its way to being delivered. And it kind of shows Arthur was I mean, I think the team did a really good job in getting something out of the door in half a year, but it was a little bit flaky. RISC OS then a year later, developed from Arthur. I don't know if anyone's heard of RISC OS, but Arthur is very, very niche and basically got completely replaced by RISC OS because it was a bit less usable than RISC OS. Another really strong point that this is quite a big wrong. So 2 MB going up...sorry, 0,5 MB in the 80s going up to 2 MB in the early 90s. There's a lot of stuff in ROM. One of those things is BBC Basic Five. I know it's 2019, but I know Basic is basic, but BBC Basic is actually quite good. It has procedures and it's got support for all the graphics and sound. You could write GUI applications in Basic and a lot of people did. It's also very fast. So Sophie Wilson wrote this very, very optimized Basic interpreter. I talked about the modules and produles (?). This is the expansion room. Things are really great user experience there. But speaking of user experience, this was ARTHUR . I never used ARTHUR. I just dug out from it how to play with it. It is bloody horrible. So that went away quickly. At the time also. So part of this emergency plan B was to take the ACORN soft team who were supposed to be writing applications for this and get them to quickly knock out an operating system. So at launch, basically, this is one of the only things that you could do with the machine. Had a great demo called Lender of Great Game called Arch, which is 3D space. You could fly around it, didn't have business, operate serious business applications. And, you know, it was very there was not much you could do with this really expensive machine at launch and that really hurt it, I think. So let me get the risk as to 1988 and this is now looking less like a vomit sort of thing, much nicer machine. And then eventually you Risc OS 3. It was drag and drop between applications. It's all multitasking, does outline anti aliasing and so on. So just lastly, I want to quickly touch on the really interesting operating systems that ACORN had a Unix operating system. So as well as being a geek, I'm also UNIX geek and I've always been fascinated by RISCiX. These machines are astonishing and expensive. They were the existing Archimedes machines with a different sticker on. So that's a 540 with a sticker on the front. And this system was developed after the Archimedes was really designed at that point when this open system was being developed. So there's a lot of stuff about the hardware that wasn't quite right for a Unix operating system. 32K. page size on a 4 megabyte machine really, really killed you in terms of your page cache and and that kind of thing. They turned this into a bit of an opportunity. At least they made good on some of this. There was a quite a novel online decompression scheme for you to demand Page in all text from the binary and it would decompressed into your search to get a page, but it was stored in a sparse way on disk. So actually on disk use was a lot less than you'd expect. The only way it fit on some of the smaller machines. Also tackles the department does on the cyber track. It turns out this is their view of the 680, which is an unreleased workstation. I love this picture. I like that piece of cheese or cake is the mouse. That's my favorite part. But this is the real machine. So this is an unreleased prototype I found at the computer museum. It's notable. And there's got to MEMC. It's got a 8MB of RAM. It's only designed to run. Respects the Unix operating system and has highres monitor only doesn't have color, who's designed to run frame maker and driver laser printers and be a kind of desktop publishing workstation. I've always been fascinated by Risk X, as I said a while ago. I hacked around on ACORN for a while. I got a beating and I can. I've never seen this before. I never used to risk X machine. So there we go it Boots, it is multi-user. But wait, there's more. It has a really cool X-Server, a very fast one. I think so. If you Wilson again worked on the server here. So it's very, very well optimized and very fast for a machine of its era. And it makes quite a nice little Unix workstation. It's quite a cool little system, by the way TUDOR the guy that designed the VIDC and the IO system called me a sado forgetting this working in there. That's my claim to fame. Finally, and I want to leave some time for questions. There's a lot of useful stuff in Rome. One of them is BBC Basic Basic has an assembler so you can walk up to this machine with a floppy disk and write assembler has a special bit of syntax there and then you can just call it. And so this is really powerful. So at school or something with the floppy disk, you can do something that's a bit more than basic programing. Bizarrely, I mostly write that with only two or three tiny syntax errors after about 20 years away from this. It's in there somewhere legacy wise. The machine didn't sell very many under a hundred thousand easily. I don't think it really made a massive impact. PCs had already taken off. By then. The ARM processor is going to go on about the company. That's that's clear that that obviously has changed the world in many ways. The thing I really took away from this exercise was that a handful of smart people. Not that many. No order of a dozen designed multiple chips, designed a custom computer from scratch, got it working. And it was quite good. And I think that this really turned people's heads. It made people think differently that the people that were not Motorola and IBM really, really big companies with enormous resources could do this and could make it work. I think actually that led to the thinking that people could design their systems on the chip in the 90s and that market taking off. So I think this is really key in getting people thinking that way. It was possible to design your own silicon. And finally, I just want to thank the people I spoke to and Adrian and Jason. Their sense of computing history in Cambridge. If you're in Cambridge, then please visit there. It's a really cool museum. And with that, I'll wrap up. If there's any time for questions, then I'm getting a blank look. No time for questions. There's about 5 minutes left. Say it or come up to me afterwards. I'm happy to. Happy to talk more about this. Applause Herald:The first question is for the Internet. Internet signal angel, will you? Well, get your microphones and get the first of the audio in the room here. Since the microphone, please ask a question. Mic1: You mentioned that the system is making good use of the memory, but how is that actually not completely being installed on memory? Having no cache and same cycle time for the cache as for the memory as for the CPU. M: Good question. So how is it not always build on memory ? I mean. Well, it's sometimes stored on memory when you do something that's non sequential. You have to take one of the slow cycles. This was the N cycle. The key is this you try and maximize the amount of time that you're doing sequential stuff. So on the ARM 2 you wanted to unroll loops as much as possible. So you're fetching your instructions sequentially, right? You wanted to make as much use as lodestone multiples. You could load single registers with an individual register load, but it was much more efficient to pay that cost. Just once the start of the instruction and then stream stuff sequentially. So you're right that it is still stored sometimes, but that was still there. Still a good tradeoff, I think, for a system that didn't have a cache for other reasons. M1: Thanks. Herald: Next question is for the Internet. Signal Angel(S): Are there any other ACORN here right now or if you want to get into this kind of party together? Herald: Can you repeat the first sentence, please? S: Sorry. The first part if you want to get into this kind of popular vibe right now. M: Yeah, good question, sir. How do you get hold of one drive prices up on eBay? I guess I hate to say it might be fun to play around and emulators. Always professors that are hack around on the real thing. Emulators always feel a bit strange. There are a bunch of really good emulators out there. Quite complete. Yeah, I think it just I would just go on on on auction sites and try and find one. Unfortunately, they're not completely rare. I mean that's that's the thing they did sell. Not quite sure. Exact figure, but you know, there were tens and tens of thousands of these things made. So I would look also in Britain more than elsewhere. Although I do understand that Germany had quite a few. If you can get a hold of one, though, I do suggest doing so. I think they're really fun to play with. Herald: OK, next question. M2: So I found myself looking at the documentation for the LV MSU instructions while devaluing something on. Just last week. And just maybe wonder what's your thought? Are there any quirks of the Archimedes that have crept into the modern arm design and instruction set that you were aware of? M: Most of them got purged. So there are the 26 bits of dressing. There was a couple of strange uses of theirs A XOR or instruction into PC for changing flags. So there was a great purge when the ARM 6 was designed and the arm 6. I should know there's ARM v3. That's what first step addressing and lost this. These witnesses got moved out. I can't think of aside from just the resulting on 32 instructions that being quite quirky and having a lot of good quirks. This shifted register as sort of a free thing you can do. For example, you can add one register to a shifted register in one cycle. I think that's a good quirk. So in terms of the inheriting that instruction set and not changing those things. Maybe that counts this. Herald: Any further questions Internet ? And if you have questions. No. Okay. No. In that case, one round of applause. M: Thank you. Applause postroll music Subtitles created by c3subtitles.de in the year 2020. Join, and help us!