< Return to Video

36C3 - The Ultimate Acorn Archimedes talk

  • 0:00 - 0:19
    36c3 preroll music
  • 0:19 - 0:25
    Herald: Our next talk will be "The
    ultimate Acorn Archimedes Talk", in which
  • 0:25 - 0:29
    there will be spoken about everything
    about the Archimedes computer. There's a
  • 0:29 - 0:33
    promise in advance that there will be no
    heureka jokes in there. Give a warm
  • 0:33 - 0:35
    welcome to Matt Evans.
  • 0:35 - 0:41
    applause
  • 0:41 - 0:48
    Matt Evans: Thank you. Okay. Little bit of
    retro computing first thing in the
  • 0:48 - 0:55
    morning, sort of. Welcome. My name is Matt
    Evans. The Acorn Archimedes was my
  • 0:55 - 0:59
    favorite computer when I was a small
    hacker and I'm privileged to be able to
  • 0:59 - 1:05
    talk a bit little bit about it with you
    today. Let's start with: What is an Acorn
  • 1:05 - 1:09
    Archimedes? So I'd like an interactive
    session, I'm afraid. Please indulge me,
  • 1:09 - 1:15
    like a show of hands. Who's heard of the
    Acorn Archimedes before? Ah, OK, maybe 50,
  • 1:15 - 1:23
    60%. Who has used one? Maybe 10%,
    maybe. Okay. Who has programs -
  • 1:23 - 1:30
    who has coded on an Archimedes? Maybe
    half? Two, three people. Great. Okay.
  • 1:30 - 1:34
    Three. laughs Okay, so a small
    percentage. I don't see these machines as
  • 1:34 - 1:40
    being as famous as say the Apple Macintosh
    or IBM PC. And certainly outside of Europe
  • 1:40 - 1:44
    they were not that common. So this is kind
    of interesting just how many people here
  • 1:44 - 1:50
    have seen this. So it was the first ARM-
    based computer. This is an astonishingly
  • 1:50 - 1:56
    1980s - I think one of them is drawing,
    actually. But they're not just the first
  • 1:56 - 2:01
    ARM-based machine, but the machine that
    the ARM was originally designed to drive.
  • 2:01 - 2:07
    It's a... Is that a comment for me?
    Mic?
  • 2:07 - 2:14
    I'm being heckled already. It's only slide
    two. Let's see how this goes. So it's a
  • 2:14 - 2:19
    two box computer. It looks a bit like a
    Mega S.T. ... to me. Its main unit with
  • 2:19 - 2:26
    the processor and disks and expansion
    cards and so on. Now this is an A3000.
  • 2:26 - 2:31
    This is mine, in fact, and I didn't bother
    to clean it before taking the photo. And
  • 2:31 - 2:33
    now it's on this huge screen. That was a
    really bad idea. You can see all the
  • 2:33 - 2:37
    disgusting muck in the keyboard. It has a
    bit of ink on it, I don't know why. But
  • 2:37 - 2:42
    this machine is 30 years old. And
    this was luckily my machine, as I said, as
  • 2:42 - 2:45
    a small hacker. And this is why I'm doing
    the talk today. This had a big influence
  • 2:45 - 2:53
    on me. I'd like to say as a person, but
    more as an engineer. In terms of what my
  • 2:53 - 2:57
    programing experience when I was learning
    to program and so on. So I live and work
  • 2:57 - 3:02
    in Cambridge in the U.K., where this
    machine was designed. And through the
  • 3:02 - 3:05
    funny sort of turn of events, I ended up
    there and actually work in the building
  • 3:05 - 3:09
    next to the building where this was
    designed. And a bunch of the people that
  • 3:09 - 3:14
    were on that original team that designed
    this system are still around and
  • 3:14 - 3:18
    relatively contactable. And I thought this
    is a good opportunity to get on the phone
  • 3:18 - 3:22
    and call them up or go for a beer with a
    couple of them and ask them: Why are
  • 3:22 - 3:25
    things the way they are? There's all sorts
    of weird quirks to this machine. I was
  • 3:25 - 3:29
    always wondering this, for 20 years. Can
    you please tell me - why did you do it
  • 3:29 - 3:33
    this way? And they were a really good bunch
    of people. So I talked to Steve Ferber,
  • 3:33 - 3:38
    who led the hardware design, Sophie
    Wilson, who was the same with software.
  • 3:38 - 3:43
    Tudor Brown, who did the video system.
    Mike Miller, the IO system. John Biggs and
  • 3:43 - 3:46
    Jamie Urquhart , who did the silicon
    design, I spoiled one of the
  • 3:46 - 3:50
    surprises here. There's been some silicon
    design that's gone on in building this
  • 3:50 - 3:55
    Acorn. And they were all wonderful people
    that gave me their time and told me a
  • 3:55 - 4:00
    bunch of anecdotes that I will pass on to
    you. So I'm going to talk about the
  • 4:00 - 4:05
    classic Arc. There's a bunch of different
    machines that Acorn built into the 1990s.
  • 4:05 - 4:09
    But the ones I'm talking about started in
    1987. There were 2 models, effectively a
  • 4:09 - 4:15
    low end and a high end. One had an option
    for a hard disk, 20 megabytes, 2300
  • 4:15 - 4:21
    pounds, up to 4MB of RAM. They all share
    the same basic architecture, they're all
  • 4:21 - 4:26
    basically the same. So the A3000 that I
    just showed you came out in 1989. That was
  • 4:26 - 4:30
    the machine I had. Those again, the same.
    It had the memory controller slightly
  • 4:30 - 4:36
    updated, was slightly faster. They all had
    an ARM 2. This was the released version of
  • 4:36 - 4:41
    the ARM processor designed for this
    machine, at 8 MHz. And then finally in
  • 4:41 - 4:46
    1990, what I call the last of the classic
    Arc, Archimedes, is the A540. This was the
  • 4:46 - 4:51
    top end machine - could have up to
    16 MB of memory, which is a fair bit
  • 4:51 - 4:58
    even in 1990. It had a 30 MHz ARM 3. The
    ARM 3 was the evolution of the ARM 2, but
  • 4:58 - 5:02
    with a cache and a lot faster. So this
    talk will be centered around how these
  • 5:02 - 5:09
    machines work, not the more modern
    machines. So around 1987, what else
  • 5:09 - 5:14
    was available? This is a random selection
    of machines. Apologies if your favorite
  • 5:14 - 5:18
    machine is not on this list. It wouldn't
    fit on the slide otherwise. So at the
  • 5:18 - 5:22
    start of the 80s, we had the exotic things
    like the Apple Lisa and the Apple Mac.
  • 5:22 - 5:29
    Very expensive machines. The Amiga - I had
    to put in here. Started off relatively
  • 5:29 - 5:33
    expensive because the Amiga 500 was, you
    know, very good value for money, very
  • 5:33 - 5:37
    capable machine. But I'm comparing this
    more to PCs and Macs, because that was the
  • 5:37 - 5:42
    sort of, you know, market it was going
    for. And although it was an expensive
  • 5:42 - 5:47
    machine compared to Macintosh, it was
    pretty cheap. Even put NeXT Cube on there,
  • 5:47 - 5:50
    I figured that... I'd heard that they were
    incredibly expensive. And actually
  • 5:50 - 5:54
    compared to the Macintosh, they're not
    that expensive at all. Well I don't know
  • 5:54 - 5:58
    which one I would have preferred. So the
    first question I asked them - the first
  • 5:58 - 6:03
    thing they told me: Why was it built? I've
    used them in school and as I said, had one
  • 6:03 - 6:09
    at home. But I was never really quite sure
    what it was for. And I think a lot of the
  • 6:09 - 6:12
    Acorn marketing wasn't quite sure what it
    was for either. They told me it was the
  • 6:12 - 6:16
    successor to the BBC Micro, this 8 bit
    machine. Lovely 6502 machine, incredibly
  • 6:16 - 6:20
    popular, especially in the UK. And the
    goal was to make a machine that was 10
  • 6:20 - 6:24
    times the performance of this. The
    successor would be 10 times faster at the
  • 6:24 - 6:30
    same price. And the thing I didn't know is
    they had been inspired. The team Acorn had
  • 6:30 - 6:36
    seen the Apple Lisa and the Xerox Star,
    which comes from the famous Xerox Alto,
  • 6:36 - 6:41
    Xerox PARC, first GUI workstation in the
    70s, monumental machine. They'd been
  • 6:41 - 6:45
    inspired by these machines and they wanted
    to make something very similar. So this is
  • 6:45 - 6:49
    the same story as the Macintosh. They
    wanted to make something that was desktop
  • 6:49 - 6:52
    machine for business, for office
    automation, desktop publishing and that
  • 6:52 - 6:56
    kind of thing. But I never really
    understood this before. So this was this
  • 6:56 - 7:02
    inspiration came from the Xerox machines.
    It was supposed to be obviously a lot more
  • 7:02 - 7:07
    affordable and a lot faster. So this is
    what happens when Acorn marketing gets
  • 7:07 - 7:12
    hold of this vision. So Xerox Star on the
    left is this nice, sensible business
  • 7:12 - 7:15
    machine. Someone's wearing nice, crisp
    suit bumps microphon banging their
  • 7:15 - 7:20
    microphone - and it gets turned into the
    very Cambridge Tweed version on the right.
  • 7:20 - 7:24
    It's apparently illegal to program one of
    these if you're not wearing a top hat. But
  • 7:24 - 7:29
    no one told me that when I was a kid. And
    my court case comes up next week. So
  • 7:29 - 7:32
    Cambridge is a bit of a funny place. And
    for those that been there, this picture on
  • 7:32 - 7:39
    the right sums it all up. So they began
    Project A, which was build this new
  • 7:39 - 7:43
    machine. And they looked at the
    alternatives. They looked at the
  • 7:43 - 7:50
    processors that were available at that
    time, the 286, the 68 K, then that semi
  • 7:50 - 7:55
    32016, which was an early 32 bit
    machine, a bit of a weird processor. And
  • 7:55 - 7:58
    they all had something in common that
    they're ridiculously expensive and in
  • 7:58 - 8:03
    Tudors words a bit crap. They weren't a
    lot faster than the BBC Micro. They're a
  • 8:03 - 8:07
    lot more expensive. They're much more
    complicated in terms of the processor
  • 8:07 - 8:10
    itself. But also the system around them
    was very complicated. They need lots of
  • 8:10 - 8:15
    weird support chips. This just drove the
    price up of the system and it wasn't going
  • 8:15 - 8:20
    to hit that 10 times performance, let
    alone at the same price point. They'd
  • 8:20 - 8:24
    visited a couple of other companies
    designing their own custom silicon. They
  • 8:24 - 8:28
    got this idea in about 1983. They were
    looking at some of the RISC papers coming
  • 8:28 - 8:31
    out of Berkeley and they were quite
    impressed by what a bunch of grad students
  • 8:31 - 8:38
    were doing. They managed to get a working
    RISC processor and they went to Western
  • 8:38 - 8:42
    Design Center and looked at 6502
    successors being design there. They had a
  • 8:42 - 8:45
    positive experience. They saw a bunch of
    high school kids with Apple 2s doing
  • 8:45 - 8:49
    silicon layout. And they though "OK,
    well". They'd never designed a CPU before
  • 8:49 - 8:53
    at ACORN. ACORN hadn't done any custom
    silicon to this degree, but they were
  • 8:53 - 8:57
    buoyed by this and they thought, okay,
    well, maybe RISC is the secret and we can
  • 8:57 - 9:02
    do this. And this was not really the done
    thing in this timeframe and not for a
  • 9:02 - 9:06
    company the size of ACORN, but they
    designed their computer from scratch. They
  • 9:06 - 9:09
    designed all of the major pieces of
    silicon in this machine. And it wasn't
  • 9:09 - 9:12
    about designing the ARM chip. Hey, we've
    got a processor core. What should we do
  • 9:12 - 9:16
    with it? But it was about designing the
    machine that ARM and the history of that
  • 9:16 - 9:20
    company has kind of benefited from. But
    this is all about designing the machine as
  • 9:20 - 9:27
    a whole. They're a tiny team. They're a
    handful of people - about a dozen...ish
  • 9:27 - 9:31
    that did the hardware design, a similar
    sort of order for software and operating
  • 9:31 - 9:36
    systems on top, which is orders of
    magnitude different from IBM and Motorola
  • 9:36 - 9:41
    and so forth that were designing computers
    at this time. RISC was the key. They
  • 9:41 - 9:44
    needed to be incredibly simple. One of the
    other experiences they had was they went
  • 9:44 - 9:49
    to a CISC processor design center. They
    had a team in a couple of hundred people
  • 9:49 - 9:53
    and they were on revision H and it still
    had bugs and it was just this unwieldy,
  • 9:53 - 9:58
    complex machine. So RISC was the secret.
    Steve Ferber has an interview somewhere.
  • 9:58 - 10:03
    He jokes about ACORN management giving him
    two things. Special sauce was two things
  • 10:03 - 10:08
    that no one else had: He'd no people and
    no money. So it had to be incredibly
  • 10:08 - 10:15
    simple. It had to be built on a
    shoestring, as Jamie said to me. So there
  • 10:15 - 10:18
    are lots of corners cut, but in the right
    way. I would say "corners cut", that
  • 10:18 - 10:23
    sounds ungenerous. There's some very
    shrewd design decisions, always weighing
  • 10:23 - 10:30
    up cost versus benefit. And I think they
    erred on the correct side for all of them.
  • 10:30 - 10:34
    So Steve sent me this picture. That's he's
    got a cameo here. That's the outline of
  • 10:34 - 10:39
    him in the reflection on the glass there.
    He's got this up in his office. So he
  • 10:39 - 10:44
    led the hardware design of all of these
    chips at ACORN. Across the top, we've got
  • 10:44 - 10:49
    the original ARM, the ARM 1, ARM 2 and the
    ARM 3 - guess the naming scheme - and the
  • 10:49 - 10:53
    video controller, memory controller and IO
    controller. Think, sort of see their
  • 10:53 - 10:57
    relative sizes and it's kind of pretty.
    This was also on a processor where you
  • 10:57 - 11:01
    could really point at that and say, "oh,
    that's the register file and you can see
  • 11:01 - 11:07
    the cache over there". You can't really do
    that nowadays with modern processors. So
  • 11:07 - 11:11
    the bit about the specification, what it
    could do, the end product. So I mentioned
  • 11:11 - 11:17
    they all had this ARM 2 8MHz, up to four
    MB of RAM, 26-bit addresses, remember
  • 11:17 - 11:22
    that. That's weird. So a lot of 32-bit
    machines, had 32-bit addresses or the ones
  • 11:22 - 11:26
    that we know today do. That wasn't the
    case here. And I'll explain why in a
  • 11:26 - 11:33
    minute. The A540 had a updated CPU. The
    memory controller had an MMU, which was
  • 11:33 - 11:39
    unusual for machines of the mid 80s. So it
    could support, the hardware would support
  • 11:39 - 11:46
    virtual memory, page faults and so on. It
    had decent sound, it had 8-channel sound,
  • 11:46 - 11:49
    hardware mixed and stereo. It was 8 bit,
    but it was logarithmic - so it was a bit
  • 11:49 - 11:53
    like u-law, if anyone knows that - instead
    of PCM, so you got more precision at the
  • 11:53 - 11:58
    low end and it sounded to me a little bit
    like 12 bit PCM sound. So this is quite
  • 11:58 - 12:05
    good. Storage wise, it's the same floppy
    controller as the Atari S.T.. It's fairly
  • 12:05 - 12:10
    boring. Hard disk controller was a
    horrible standard called ST506, MFM
  • 12:10 - 12:16
    drives, which were very, very crude
    compared to disks we have today. Keyboard
  • 12:16 - 12:20
    and mouse, nothing to write home about. I
    mean, it was a normal keyboard. It was
  • 12:20 - 12:23
    nothing special going on there. And
    printer port, serial port and some
  • 12:23 - 12:29
    expansion slots which, I'll
    outline later on. The thing I really liked
  • 12:29 - 12:33
    about the ARC was the graphics
    capabilities. It's fairly capable,
  • 12:33 - 12:38
    especially for a machine of that era and
    of the price. It just had a flat frame
  • 12:38 - 12:42
    buffer so it didn't have sprites, which is
    unfortunate. It didn't have a blitter and
  • 12:42 - 12:47
    a bitplanes and so forth. But the upshot
    of that is dead simple to program. It had
  • 12:47 - 12:52
    a 256 color mode, 8 bits per pixel, so
    it's a byte, and it's all just laid out as
  • 12:52 - 12:56
    a linear string of bytes. So it was dead
    easy to just write some really nice
  • 12:56 - 13:00
    optimized code to just blit stuff to the
    screen. Part of the reason why there isn't
  • 13:00 - 13:05
    a blitter is actually the CPU was so good
    at doing this. Colorwise, it's got
  • 13:05 - 13:11
    paletted modes out of a 4096 color
    palette, same as the Amiga. It has this
  • 13:11 - 13:16
    256 color mode, which is different. The
    big high end machines, the top end
  • 13:16 - 13:21
    machines, the A540 and the A400 series
    could also do this very high res 1152 by
  • 13:21 - 13:24
    900, which was more of a workstation
    resolution. If you bought a Sun
  • 13:24 - 13:28
    workstation a Sun 3 in those days, could
    do this and some higher resolutions. But
  • 13:28 - 13:33
    this is really not seen on computers that
    might have in the office or school or
  • 13:33 - 13:36
    education at the end of the market. And
    it's quite clever the way they did that.
  • 13:36 - 13:40
    I'll come back to that in a sec. But for
    me, the thing about the ARC: For the
  • 13:40 - 13:46
    money, it was the fastest machine around.
    It was definitely faster than 386s and all
  • 13:46 - 13:50
    the stuff that Motorola was doing at the
    time by quite a long way. It is almost
  • 13:50 - 13:54
    eight times faster than a 68k at about the
    same clock speed. And it's to do with it's
  • 13:54 - 13:57
    pipelineing and to do with it having a 32
    bit word and a couple of other tricks
  • 13:57 - 14:01
    again. I'll show you later on what the
    secret to that performance was. About
  • 14:01 - 14:05
    minicomputer speed and compared to some of
    the other RISC machines at the time, it
  • 14:05 - 14:09
    wasn't the first RISC in the world, it was
    the first cheap RISC and the first RISC
  • 14:09 - 14:14
    machine that people could feasibly buy and
    have on their desks at work or in
  • 14:14 - 14:19
    education. And if you compare it to
    something like the MIPS or the SPARC, it
  • 14:19 - 14:25
    was not as fast as a MIPS or SPARC chip.
    It was also a lot smaller, a lot cheaper.
  • 14:25 - 14:29
    Both of those other processers had very
    big Die. They needed other support chips.
  • 14:29 - 14:33
    They had huge packages, lots of pins, lots
    of cooling requirements. So all this
  • 14:33 - 14:36
    really added up. So I priced up
    a Sun 4 workstation at the time and
  • 14:36 - 14:40
    it was well over four times the price of
    one of these machines. And that was before
  • 14:40 - 14:44
    you add on extras such as disks and
    network interfaces and things like that.
  • 14:44 - 14:47
    So it's very good, very competitive for
    the money. And if you think about building
  • 14:47 - 14:50
    a cluster, then you could get a lot more
    throughput, you could network them
  • 14:50 - 14:57
    together. So this is about as far as I got
    when I was a youngster, I was wasn't brave
  • 14:57 - 15:03
    enough to really take the machine apart
    and poke around. Fortunately, now it's 30
  • 15:03 - 15:07
    years old and I'm fine. I'm qualified and
    doing this. I'm going to take it apart.
  • 15:07 - 15:12
    Here's the motherboard. Quite a nice clean
    design. This was built in Wales for anyone
  • 15:12 - 15:18
    that's been to the UK. Very unusual these
    days. Anything to be built in the UK. It's
  • 15:18 - 15:23
    got several main sections around these
    four chips. Remember the Steve photo
  • 15:23 - 15:29
    earlier on? This is the chip set: the ARM
    BMC, PDC, IOC. So the IOC side of things
  • 15:29 - 15:34
    happens over on the left video and sound
    in the top right. And the memory and the
  • 15:34 - 15:38
    processor in the middle. It's got a
    megabyte onboard and you can plug in an
  • 15:38 - 15:44
    expansion for 4 MB. So memory map
    from the software view. I mentioned this
  • 15:44 - 15:47
    26-bit addressing and I think this is one
    of the key characteristics of one of these
  • 15:47 - 15:52
    machines. So you have a 64MB address
    space, it's quite packed. That's quite a
  • 15:52 - 15:57
    lot of stuff shoehorned into here. So
    there's the memory. The bottom half of the
  • 15:57 - 16:02
    address space, 32MB of that is the
    processor. It's got user space and
  • 16:02 - 16:08
    privilege mode. It's got a concept of
    privilege within the processor execution.
  • 16:08 - 16:12
    So when you're in user mode, you only get
    to see the bottom half and that's the
  • 16:12 - 16:16
    virtual maps. There's the MMU, that will
    map pages into that space and then when
  • 16:16 - 16:19
    you're in supervisor mode, you get to see
    the whole of the rest of the memory,
  • 16:19 - 16:23
    including the physical memory and various
    registers up the top. The thing to notice
  • 16:23 - 16:27
    here is: there's stuff hidden behind the
    ROM, this address space is very packed
  • 16:27 - 16:31
    together. So there's a requirement
    for control registers, for the memory
  • 16:31 - 16:35
    controller, for the video controller and
    so on, and they write only registers in
  • 16:35 - 16:40
    ROM basically. So you write to the ROM and
    you get to hit these registers. Kind of
  • 16:40 - 16:44
    weird when you first see it, but it was
    quite a clever way to fit this stuff into
  • 16:44 - 16:51
    the address space. So it will start with
    the ARM1. So Sophie Wilson designed the
  • 16:51 - 16:59
    instruction set late 1983, Steve took the
    instruction set and designed the top
  • 16:59 - 17:03
    level, the block, the micro architecture
    of this processor. So this is the data
  • 17:03 - 17:08
    path and how the control logic works. And
    then the VLSI team, then implemented this,
  • 17:08 - 17:12
    did their own custom cells. There's a
    custom data path and custom logic
  • 17:12 - 17:18
    throughout this. It took them about a
    year, all in. Well, 1984, that sort of...
  • 17:18 - 17:24
    This project A really kicked off early
    1984. And this staked out first thing
  • 17:24 - 17:35
    early 1985. The design process the guys
    gave me a little bit of... So Jamie
  • 17:35 - 17:41
    Urquhart and John Biggs gave me a bit of
    an insight into how they worked on the
  • 17:41 - 17:47
    VLSI side of things. So they had an Apollo
    workstation, just one Apollo workstation,
  • 17:47 - 17:52
    the DN600. This is a 68K based washing
    machine, as Jamie described it. It's this
  • 17:52 - 17:56
    huge thing. It cost about 50˙000 £.
    It's incredibly expensive. And they
  • 17:56 - 18:00
    designed all of this with just one of
    these workstations. Jamie got in at 5:00
  • 18:00 - 18:04
    a.m., worked until the afternoon and then
    let someone else on the machine. So they
  • 18:04 - 18:07
    shared the workstation, they worked
    shifts so that they could design this
  • 18:07 - 18:10
    whole thing on one workstation. So this
    comes back to that. It was designed on a
  • 18:10 - 18:14
    bit of a shoestring budget. When they got
    a couple of other workstations later on in
  • 18:14 - 18:18
    the projects, there was an allegation that
    the software might not have been licensed
  • 18:18 - 18:22
    initially on the other workstations and
    the CAD software might have been. I can
  • 18:22 - 18:28
    neither confirm nor deny whether that's
    true. So Steve wrote a BBC Basic
  • 18:28 - 18:33
    simulator for this. When he's designing
    this block level micro architecture run on
  • 18:33 - 18:39
    his BBC Micro. So this could then run real
    software. There could be a certain amount
  • 18:39 - 18:42
    of software development, but then they
    could also validate that the design was
  • 18:42 - 18:47
    correct. There's no cache on this. This is
    a quite a large chip. 50 square
  • 18:47 - 18:52
    millimeters was the economic limit of
    those days for this part of the market.
  • 18:52 - 18:56
    There's no cache. That also would have
    been far too complicated. So this was
  • 18:56 - 19:03
    also, I think, quite a big risk, no pun
    intended. The aim of doing this
  • 19:03 - 19:08
    with such a small team that they're all
    very clever people. But they hadn't all
  • 19:08 - 19:11
    got experience in building chips before.
    And I think they knew what they were up
  • 19:11 - 19:15
    against. And so not having a cache of
    complicated things like that was the right
  • 19:15 - 19:21
    choice to make. I'll show you later that
    that didn't actually affect things. So
  • 19:21 - 19:25
    this was a RISC machine. If anyone has not
    programmed ARM in this room then get out
  • 19:25 - 19:29
    at once. But if you have programed ARM
    this is quite familiar with some
  • 19:29 - 19:36
    differences. It's a classical three
    operand RISC, its got three shift on one of
  • 19:36 - 19:39
    the operands for most of the instructions.
    So you can do things like static
  • 19:39 - 19:44
    multiplies quite easily. It's not purist
    RISC though. It does have load or store
  • 19:44 - 19:48
    multiple instructions. So these will, as
    the name implies, load or store multiple
  • 19:48 - 19:51
    number of registers in one go. So one
    register per cycle, but it's all done
  • 19:51 - 19:55
    through one instruction. This is not RISC.
    Again, there's a good reason for doing
  • 19:55 - 19:59
    that. So when one comes back and it gets
    plugged into a board that looks a bit like
  • 19:59 - 20:07
    this. This is called the A2P, the ARM second
    processor. It plugs into a BBC Micro. It's
  • 20:07 - 20:11
    basically there's a thing called the Tube,
    which is sort of a FIFO like arrangement.
  • 20:11 - 20:15
    The BBC Micro can send messages one way
    and this can send messages back. And the
  • 20:15 - 20:20
    BBC Micro has the discs, it has the I/O,
    keyboard and so on. And that's used as the
  • 20:20 - 20:24
    hosts to then download code into one
    megabytes of RAM up here and then you
  • 20:24 - 20:29
    combine the code on the ARM. So this was
    the initial system, 6 MHz. The
  • 20:29 - 20:32
    thing I found quite interesting about
    this, I mentioned that Steve had built
  • 20:32 - 20:37
    this BBC Basic simulation, one of the
    early bits of software that could run on
  • 20:37 - 20:42
    this. So he'd ported BBC Basic to ARM and
    written an ARM version of it. The Basic
  • 20:42 - 20:48
    interpreter was very fast, very lean, and
    it was running on this board early on.
  • 20:48 - 20:52
    They then built a simulator called ASIM,
    which was an event based simulator for
  • 20:52 - 20:55
    doing logic design and all of the other
    chips in the chips on the chipset that
  • 20:55 - 20:59
    were simulated using ASIM on ARM1 which is
    quite nice. So this was the fastest
  • 20:59 - 21:02
    machine that they had around. They didn't
    have, you know, the thousands of machines
  • 21:02 - 21:08
    in the cluster like you'd have in a
    modern company doing EDA. They had
  • 21:08 - 21:11
    a very small number of machines and these
    were the fastest ones they had about. So
  • 21:11 - 21:17
    ARM2 was simulated on ARM1 and all the
    other chipset. So then ARM2 comes along.
  • 21:17 - 21:22
    So it's a year later, this is a shrink of
    the design. It's based on the same basic
  • 21:22 - 21:26
    micro architecture but has a multiplier
    now. It's a booth multiplier , so it is at
  • 21:26 - 21:32
    worst case, 16 cycle, multiply just two
    bits per clock. Again, no cache. But one
  • 21:32 - 21:37
    thing they did add in on to is banked
    registers. Some of the processor modes I
  • 21:37 - 21:43
    mentioned there's an interrupt mode. Next
    slide, some of the processor modes will
  • 21:43 - 21:48
    basically give you different view on
    registers, which is very useful. These
  • 21:48 - 21:51
    were all validated at 8 MHz. So
    the product was designed for 8 MHz.
  • 21:51 - 21:54
    The company that built them
    said, okay, put the stamp on the outside
  • 21:54 - 21:58
    saying 8 MHz. There's two
    versions of this chip and I think they're
  • 21:58 - 22:01
    actually the same silicon. I've got a
    suspicion that they're the same. They just
  • 22:01 - 22:05
    tested this batch saying that works at 10
    or 12. So on my project list is
  • 22:05 - 22:12
    overclocking my A3000 to see how fast
    it'll go and see if I can get it to 12 MHz.
  • 22:12 - 22:19
    Okay. So the banking of the registers.
    ARM has got this even modern 32 bit
  • 22:19 - 22:25
    type of interrupts and an IRQ
    pronounced "erk" in English and FIQ
  • 22:25 - 22:29
    pronounced "fic" in English. I appreciate it
    doesn't mean quite the same thing in
  • 22:29 - 22:34
    German. So I call if FIQ from here on in
    and FIQ mode has this property where
  • 22:34 - 22:38
    the top half of the registers are effectively
    different registers when you get into
  • 22:38 - 22:43
    this mode. So this lets you first of all
    you don't have to back up those registers.
  • 22:43 - 22:48
    I mean your FIQ handler. And
    secondly if you can write an FIQ handler
  • 22:48 - 22:52
    using just those registers and there's
    enough for doing most basic tasks, you
  • 22:52 - 22:56
    don't have to save and restore anything
    when you get an interrupt. So this is
  • 22:56 - 23:03
    designed specifically to be very, very low
    overhead interrupt mode. So I'm coming to
  • 23:03 - 23:08
    why there's a 26 bit address space. And so
    I found this link very unintuitive. So
  • 23:08 - 23:14
    unlike 32 bit ARM, the more modern
    1990s onwards ARMs, the program counter
  • 23:14 - 23:17
    register 15 doesn't just contain the
    program counter, but also contains the
  • 23:17 - 23:20
    status flags and processor mode and
    effectively all of the machine state is
  • 23:20 - 23:24
    packed in there as well. So I asked the
    question, well why, why 64 megabytes of
  • 23:24 - 23:28
    address space? What's special about 64.
    And Mike told me, well, you're asking the
  • 23:28 - 23:32
    wrong question. It's the other way round.
    What we wanted was this property that all
  • 23:32 - 23:36
    of the machine state is in one register.
    So this means you just have to save one
  • 23:36 - 23:40
    register. Well, you know, what's the harm
    in saving two registers? And he reminded
  • 23:40 - 23:43
    me of this FIQ mode. Well, if you're
    already in a state where you've really
  • 23:43 - 23:48
    optimized your interrupt handler so that
    you don't need any other registers to deal
  • 23:48 - 23:51
    with, you're not saving restoring anything
    apart from your PC, then saving another
  • 23:51 - 23:56
    register is 50 percent overhead on that
    operation. So that was the prime motivator
  • 23:56 - 24:00
    was to keep all of the state in one word.
    And then once you take all of the flags
  • 24:00 - 24:05
    away, you're left with 24 bits for a word
    aligned program counter, which leads to
  • 24:05 - 24:10
    26 bit addressing. And that was then seen
    as well, 64 MB is enough. There were
  • 24:10 - 24:15
    machines in 1985 that, you know, could
    conceivably have more memory than that.
  • 24:15 - 24:18
    But for a desktop that was still seen as a
    very large, very expensive amount of
  • 24:18 - 24:24
    memory. The other thing, you don't need to
    reinvent another instruction to do
  • 24:24 - 24:28
    return from exception so you can return
    using one of your existing instructions.
  • 24:28 - 24:33
    In this case, it's the subtract into PC
    which looks a bit strange, but trust me,
  • 24:33 - 24:39
    that does the right thing. So the memory
    controller. This is - I mentioned the
  • 24:39 - 24:43
    address translation, so this has an MMU in
    it. In fact, the thing directly on the
  • 24:43 - 24:46
    left hand side. I was
    worried that these slides actually might
  • 24:46 - 24:50
    not be the right resolution and they might
    be sort of too small for people to see
  • 24:50 - 24:54
    this. And in fact, it's the size of a
    house is really useful here. So the left
  • 24:54 - 24:58
    hand side of this chip is the MMU. This
    chip is the same size as ARM2. Yeah,
  • 24:58 - 25:02
    pretty much. So that's part of the reason
    why the MMU is on another chip ARM2 was
  • 25:02 - 25:07
    as big as they could make it to fit the
    price as you don't have anyone here done
  • 25:07 - 25:11
    silicon design. But as the area goes
    up effectively your yield goes down and
  • 25:11 - 25:15
    the price it's a non-linear effect on
    price. So the MMU had to be on a separate
  • 25:15 - 25:20
    chip and it's half the size of that as
    well. MEMC does most mundane things
  • 25:20 - 25:24
    like it drives DRAM, it does refresh for
    DRAM and it converts from linear addresses
  • 25:24 - 25:34
    into row and column addresses which DRAM
    takes. So the key thing about this
  • 25:34 - 25:39
    ARM and MEMC binding is the key factor of
    performance is making use of memory
  • 25:39 - 25:44
    bandwidth. When the team had looked at all
    the other processors in Project A before
  • 25:44 - 25:49
    designing their own, one of the things
    they looked at was how well they utilized
  • 25:49 - 25:56
    DRAM and 68K and the semi chips made very,
    very poor use of DRAM bandwidth.
  • 25:56 - 26:00
    Steve said, well, okay. The DRAM is the
    most expensive component of any of these
  • 26:00 - 26:04
    machines and they're making poor use of
    it. And I think a key insight here is if
  • 26:04 - 26:08
    you maximize that use of the DRAM, then
    you're going to be able to get much higher
  • 26:08 - 26:13
    performance in those machines. And so it's
    32 bits wide. The ARM is pipelined, so it can
  • 26:13 - 26:19
    do a 32 bit word every cycle. And it also
    indicates whether it's sequential or non
  • 26:19 - 26:25
    sequential addressing. This
    then lets your MEMC
  • 26:25 - 26:31
    decide whether to do an N cycle or an S
    cycle. So there's a fast one and a slow
  • 26:31 - 26:35
    one basically. So when you access a new
    random address and DRAM, you have to open
  • 26:35 - 26:41
    that row and that takes twice the time.
    It's a 4 MHz cycle. But then once
  • 26:41 - 26:45
    you've access that address and then once
    you're accessing linearly ahead of that
  • 26:45 - 26:50
    address, you can do fast page mode
    accesses, which are 8 MHz cycles.
  • 26:50 - 26:54
    So ultimately, that's the reason
    why these load store multiples exist. The
  • 26:54 - 26:58
    non-RISC instructions, they're there so
    that you can stream out registers and back
  • 26:58 - 27:03
    in and make use of this DRAM bandwidth. So
    store multiple. This is just a simple
  • 27:03 - 27:08
    calculation for 14 registers, you're
    hitting about 25 megabytes a second out of
  • 27:08 - 27:13
    30. So this is it's not 100%, but it's way
    more than a 10th or an 8th.
  • 27:13 - 27:17
    Which a lot of the other processors
    were using. So this was really good. This
  • 27:17 - 27:21
    is the prime factor of why this machine
    was so fast. It's effectively the load store
  • 27:21 - 27:28
    multiple instructions and being able to
    access the stuff linearly. So the MMU is
  • 27:28 - 27:37
    weird. It's not TLB in the traditional
    sense, so TLB's today, if you take your
  • 27:37 - 27:43
    MIPS chip or something where the TLB is
    visible to software, it will map a virtual
  • 27:43 - 27:48
    address into a chosen physical address and
    you'll have some number of entries and you
  • 27:48 - 27:54
    more or less arbitrarily, you know, poke
    an entry and with the set mapping in it.
  • 27:54 - 27:58
    The MEMC does it upside down. So it says it's
    got a fixed number of entries for every
  • 27:58 - 28:02
    page in DRAM. And then for each of those
    entries, it checks an incoming address to
  • 28:02 - 28:09
    see whether it matches. So it has all of
    those entries that we've showed on the
  • 28:09 - 28:14
    chip diagram a couple of slides ago. That
    big left hand side had that big array. All
  • 28:14 - 28:17
    of those effectively just storing a
    virtual address and then matching it and
  • 28:17 - 28:20
    have a comparator. And then one of them
    lights up and says yes, it's mine. So
  • 28:20 - 28:25
    effectively, the aphysical page says that
    virtual address is mine instead of the
  • 28:25 - 28:30
    other way round. So this also limits your
    memory. If you're saying I have to have
  • 28:30 - 28:34
    one of these entries on chip per page of
    physical memory and you don't want pages
  • 28:34 - 28:41
    to be enormous. The 32 K if you do the
    maths is 4 MB over 128 pages, it's a
  • 28:41 - 28:44
    32K page. If you don't want the page to
    get much bigger than that and trust me you
  • 28:44 - 28:48
    don't, then you need to add more of these
    entries and it's already half the size of
  • 28:48 - 28:53
    the chip. So effectively, this is one of
    the limits of why you can only have 4 MB
  • 28:53 - 28:58
    on one of these memory
    controller chips. OK. So VIDC is the core
  • 28:58 - 29:05
    of the video and sound system. It's a set
    of FIFOs and a set of shift digital analog
  • 29:05 - 29:10
    converters for doing video and sound. You
    stream stuff into the FIFOs and it does
  • 29:10 - 29:15
    the display timing and pallet lookup and
    so forth. It has an 8 bit mode I
  • 29:15 - 29:21
    mentioned. It's slightly strange. It also
    has an output for transparency bit. So in
  • 29:21 - 29:24
    your palette you can set 12 bits of
    color, but you can set a bit of
  • 29:24 - 29:32
    transparency as well so you can do video
    gen- looking quite easily with this. So
  • 29:32 - 29:37
    there was a revision later on Tudor
    explains that the very first one had a bit
  • 29:37 - 29:41
    of crosstalk between the video and the
    sound, so you'd get sound with noise on
  • 29:41 - 29:45
    it. That was basically video noise and
    it's quite hard to get rid of. And so they
  • 29:45 - 29:50
    did this revision and the way he fixed it
    was quite cool. They shuffled the power
  • 29:50 - 29:54
    supply around and did all the sensible
    engineering things. But he also filtered
  • 29:54 - 29:58
    out a bit of the noise that is being
    output on the sound. He
  • 29:58 - 30:03
    inverted it and then fed that back in as
    the reference current for the DACs. So that
  • 30:03 - 30:06
    sort of self compensating and took the
    noise a bit like the noise canceling
  • 30:06 - 30:13
    headphones. It was kind of a nice hack.
    And that was that was VIDC1. OK, the final
  • 30:13 - 30:18
    one, I'm going to stop showing you chip
    plots after this, unfortunately, but just
  • 30:18 - 30:21
    get your fill while we're here. And again,
    I'm really glad this is enormous for the
  • 30:21 - 30:26
    people in the room and maybe those zooming
    in online. There's a cool little
  • 30:26 - 30:30
    Illuminati eye logo in the bottom left
    corner. So I feared that you weren't gonna
  • 30:30 - 30:34
    be able to see and I didn't have time to
    do zoomed in version, but. Okay. So IOC
  • 30:34 - 30:38
    is the center of the IO system as much of
    the IO system as possible, all the random
  • 30:38 - 30:41
    bits of glue logic to do things like
    timing. Some peripherals are slower than
  • 30:41 - 30:47
    others lives in IOC. It contains a UART
    for the keyboard, so the keyboard is
  • 30:47 - 30:52
    looked after by an 8051 microcontroller. Just
    nice and easy, you don't have to do scanning
  • 30:52 - 30:57
    in software. This microcontroller just sends
    stuff up of serial port to this chip. So
  • 30:57 - 31:02
    UART keyboard, asynchronous receiver and
    transmitter. It was at one point called
  • 31:02 - 31:06
    the fast asynchronous receiver and
    transmitter. Mike got forced to change the
  • 31:06 - 31:12
    name. Not everyone has a 12 year old sense
    of humor, but I admire his spirit. So the
  • 31:12 - 31:16
    other thing it does is interrupts all the
    interrupts go into IOC and it's got masks
  • 31:16 - 31:20
    and consolidates them effectively for
    sending an interrupt up to the on the ARM.
  • 31:20 - 31:25
    The ARM can then check the status and do
    fast response to it. So the eye of providence
  • 31:25 - 31:28
    there, the little logo I pointed out, Mike
    said he put that in for future
  • 31:28 - 31:36
    archaeologists to wonder about. Okay.
    That was it. I was hoping there'd be
  • 31:36 - 31:39
    this big back story about, you know, he
    was in the Illuminati or something. Maybe
  • 31:39 - 31:45
    he is, but not allowed to say anyway. So just
    like the other dev board I showed you so
  • 31:45 - 31:50
    this one's A 500 2P, it's still a second
    processor that plugs into a BBC Micro.
  • 31:50 - 31:54
    It's still got this host having disk
    drives and so forth attached to it and
  • 31:54 - 32:00
    pushing stuff down the tube into the
    memory here. But now, finally
  • 32:00 - 32:05
    all of this, the chip set now
    assembled in one place. So this is
  • 32:05 - 32:08
    starting to look like an Archimedes. It
    got video out. It's got keyboard
  • 32:08 - 32:12
    interface. It's got some expansion stuff.
    So this is bring up an early software
  • 32:12 - 32:18
    headstart. But very shortly afterwards, we
    got the a five A500 internal to Acorn. And
  • 32:18 - 32:21
    this is really the first Archimedes. This
    is the prototype Archimedes. Actually got
  • 32:21 - 32:27
    a gorgeous gray brick sort of look to it,
    kind of concrete. It weighs like concrete,
  • 32:27 - 32:31
    too, but it has all the hallmarks. It's
    got the IO interfaces, it's got the
  • 32:31 - 32:37
    expansion slots. You can see at the back.
    It's got all, it runs the same operating
  • 32:37 - 32:40
    system. Now, this was used for the OS
    development. There's only a couple of
  • 32:40 - 32:45
    hundred of these made. Well, this is a
    serial 222. So this is one of the last,
  • 32:45 - 32:51
    I think. But yeah. Only an internal to
    ACORN. There are lots of nice tweaks to this
  • 32:51 - 32:56
    machine. So the hardware team had designed
    this, Tudor designed this as well as the
  • 32:56 - 33:01
    video system. And he said, well, his A500
    was the special one that he had a video
  • 33:01 - 33:05
    controller. He'd hand-picked one
    of the VCs so that instead of running
  • 33:05 - 33:11
    at 24 MHz to run at 56, so some silicon
    variations in manufacturer. So he found a
  • 33:11 - 33:16
    56 MHz part so he could do. I
    think it was 1024 x 768, which is way out
  • 33:16 - 33:22
    of respect for the rest of the Archimedes.
    So he had the really, really cool machine.
  • 33:22 - 33:26
    They also ran some of them at 12 MHz
    as well instead of 8. This is a massive
  • 33:26 - 33:30
    performance improvement. I think it used
    expensive memory, which is kind of out of
  • 33:30 - 33:37
    reach for the product. Right. So
    believe me, this is the simplified
  • 33:37 - 33:41
    circuit diagram. The technical reference
    manuals are available online if anyone wants
  • 33:41 - 33:48
    the complicated one. The main parts of the
    display are ARM, MEMC, VIDC and some RAM
  • 33:48 - 33:52
    and we have a little walk through them. So
    the clocks are generated actually by the
  • 33:52 - 33:57
    memory controller. Memory controller gives
    the clocks to the ARM. The main reason for
  • 33:57 - 34:00
    this is that the memory controller has to
    do some slow things now and then. It has
  • 34:00 - 34:06
    to open pages of DRAMs, refresh cycles and
    things. So it stops the CPU and generates
  • 34:06 - 34:12
    the clock and it pauses the CPU by
    stopping that clock from time to time.
  • 34:12 - 34:16
    When you do a DRAM access, your adress on
    bus along the top, the ARM outputs an
  • 34:16 - 34:20
    address that goes into the MEMC. The
    MEMC then converts that, it does an address
  • 34:20 - 34:23
    translation and then it converts that into
    a row and column addresses suitable for
  • 34:23 - 34:27
    DRAM. And then if you're doing a read
    DRAM outputs the address, outputs the data
  • 34:27 - 34:33
    onto the data bus, which ARM then sees.
    MEMC is the the critical path on
  • 34:33 - 34:37
    this, but the address flows through MEMC
    effectively. Notice that MEMC is not on
  • 34:37 - 34:41
    the data bus. It just gets addresses
    flowing through it, this is important later
  • 34:41 - 34:45
    on. ROM is another slow thing.
  • 34:45 - 34:49
    Another reason why MEMC might slow down
    the access from the CPU, it works in a
  • 34:49 - 34:54
    similar sort of way. There is also a
    permission check done when you're doing
  • 34:54 - 35:00
    the address translation per... user
    permission versus OS, a supervisor.
  • 35:00 - 35:05
    And so this information is output as part
    of the cycle when the ARM does that access.
  • 35:05 - 35:10
    If you miss in that translation, you get
    a page fault or permission fault, then an
  • 35:10 - 35:13
    abort signal comes back and you
    take an exception.
  • 35:13 - 35:17
    And the ARM deals with that in software.
  • 35:17 - 35:22
    The data bus is a critical path, and so
    the IO stuff is buffered, it is kept away
  • 35:22 - 35:28
    from that. So the IO bus is 16 bits and
    not a lot 32 bit peripherals were around
  • 35:28 - 35:33
    in those days. All the peripherals 8 or
    16 bits. So that's the right thing to do.
  • 35:33 - 35:36
    The IOC decodes that and there's a
    handshake with MEMC. If it needs more
  • 35:36 - 35:40
    time, if it's accessing one of the
    expansion cards and the expansion card
  • 35:40 - 35:48
    has something slow on it then that's dealt
    with in the IOC. So I mentioned the
  • 35:48 - 35:54
    interrupt status that gets funneled into
    IOC and then back out again. There's a
  • 35:54 - 35:58
    VSync interrupt, but not an HSync
    interrupt. You have to use timers for that,
  • 35:58 - 36:02
    really annoyingly. There's one timer and
    there's a 2 MHz timer available. I
  • 36:02 - 36:05
    think I had that in a previous slide,
    forgot to mention it. So if you want to
  • 36:05 - 36:10
    do funny palette switching stuff or copper
    bars or something - that's possible with the
  • 36:10 - 36:13
    timers, it's also simple hardware mod to
    make a real HSync interrupt as well.
  • 36:13 - 36:19
    There's some spare interrupt inputs on the
    IOC as an exercise for you . So the bit I
  • 36:19 - 36:23
    really like about this system, I mentioned
    that MEMC is not on the data bus. The VIDC
  • 36:23 - 36:28
    is only on the data bus and it doesn't
    have an address bus either. The VIDC is the
  • 36:28 - 36:31
    thing responsible for turning the frame
    buffer into video, reading that frame
  • 36:31 - 36:36
    buffer out of RAM, so on. So how does it
    actually do that RAM read without the
  • 36:36 - 36:41
    address? Well, the MEMC contains all of
    the registers for doing this DMA: the
  • 36:41 - 36:45
    start of the frame buffer, the current
    position and size, and so on. They all
  • 36:45 - 36:51
    live in the MEMC. So there's a handshake
    where VIDC sends a request up to the MEMC.
  • 36:51 - 36:55
    When it's FIFO gets low, the MEMC then
    actually generates the address into the
  • 36:55 - 37:01
    DRAM, DRAM outputs that data and
    then the MEMC, gives an acknowledge
  • 37:01 - 37:06
    to the ARM Excuse me - too many
    chips. The MEMC gives an acknowledged to
  • 37:06 - 37:11
    VIDC, which then latches that data
    into the FIFO. So this partitioning is
  • 37:11 - 37:17
    quite neat. A lot of the video, DMA.
    The video DMA stuff all lives in MEMC and
  • 37:17 - 37:21
    there's this kind of split across the two
    chips. The sound one I've just
  • 37:21 - 37:25
    highlighted one interrupt that comes from
    MEMC. Sound works exactly the same way,
  • 37:25 - 37:28
    except there's a double buffering scheme
    that goes on. And when one half of it
  • 37:28 - 37:32
    becomes empty, you get an interrupt so you
    can refill that so you don't glitch your
  • 37:32 - 37:40
    sound. So this all works really very
    smoothly. So finally the high res- mono
  • 37:40 - 37:45
    thing that I mentioned before is quite
    novel way they did that. Tudor had realized
  • 37:45 - 37:50
    that with one external component to the
    shift register and running very fast, he
  • 37:50 - 37:53
    could implement this very high resolution
    mode without really affecting the rest of
  • 37:53 - 37:59
    the chip. So VIDC still runs at
    24 MHz to sort of VGA resolution. It
  • 37:59 - 38:05
    outputs on a digital bus that was a test
    board, originally. It outputs 4 bits. So 4
  • 38:05 - 38:09
    pixels in one chunk at 24 MHz and
    this external component then shifts
  • 38:09 - 38:14
    through that 4 times the speed. There's
    one component. I mean, this is a
  • 38:14 - 38:18
    very cheap way of doing this. And as I
    said, this high res- mode is very
  • 38:18 - 38:23
    unusual for machines of this era.
    I've got a feeling an A500 the top end
  • 38:23 - 38:27
    machine, if anyone's got one of these and
    wants to try this trick and please get in
  • 38:27 - 38:31
    touch, I've got a feeling an
    A500 will do 1280 x 1024 by
  • 38:31 - 38:36
    overclocking this. I think all of the
    parts survive it. But for some reason,
  • 38:36 - 38:40
    ACORN didn't support that on the board.
    And finally, clock selection VIDC on
  • 38:40 - 38:45
    some of the machines, quite flexible set
    of clocks for different resolutions,
  • 38:45 - 38:51
    basically. So MEMC is not on the data bus.
    How do we program it? It's got registers
  • 38:51 - 38:55
    for DMA and it's got all this address
    translation. So the memory map I showed
  • 38:55 - 39:01
    before has an 8 MB space reserved for
    the address translation registers. It
  • 39:01 - 39:05
    doesn't have 8 MB of it. I mean,
    doesn't have two million... 32 bit registers
  • 39:05 - 39:10
    behind there, which is a hint of what's
    going on here. So what you do is you write
  • 39:10 - 39:14
    any value to this space and you encode the
    information that you want to put into one
  • 39:14 - 39:20
    of these registers in the address. So this
    address, the top three bits are 1 - it's
  • 39:20 - 39:25
    in the top 8 MB of the 64 MB
    address space and you format your
  • 39:25 - 39:29
    logical physical page information in this
    address and then you write any byte
  • 39:29 - 39:35
    effectively. This sort of feels
    really dirty, but also really a very nice
  • 39:35 - 39:40
    way of doing it because there's no other
    space in the address map. And this reads
  • 39:40 - 39:45
    to the the price balance. So it's not
    worth having an address bus going into
  • 39:45 - 39:50
    MEMC costing 32 more pins just to write
    these registers as opposed to playing this
  • 39:50 - 39:56
    sort of trick. If you have that address
    bus just for that data bus, just for
  • 39:56 - 40:00
    that, then you have to get to a more
    expensive package. And this was
  • 40:00 - 40:05
    really in their minds: a 68 pin chip
    versus an 84 pin chip. It was a big deal.
  • 40:05 - 40:09
    So everything they really strived
    to make sure it was in the very smallest
  • 40:09 - 40:13
    package possible. And this system
    partitioning effort led to these sorts of
  • 40:13 - 40:23
    tricks to then program it. So on the
    A540, we get multiple MEMCs. Each one is
  • 40:23 - 40:27
    assigned a colored stripe here of the
    physical address space. So you have a
  • 40:27 - 40:31
    16 MB space, each one looks after
    4 MB of it. But then when you do a
  • 40:31 - 40:36
    virtual access in the bottom half of the
    user space, regular program access, all of
  • 40:36 - 40:39
    them light up and all of them will
    translate that address in parallel. And
  • 40:39 - 40:44
    one of them hopefully will translate and
    then energize the RAM to do the read, for
  • 40:44 - 40:50
    example. When you put an ARM 3 in this
    system, the ARM 3 has its cache and then
  • 40:50 - 40:54
    the address leads into the MEMC. So then
    that means that the address is being
  • 40:54 - 40:58
    translated outside of the cache or after
    the cache. So your caching virtual
  • 40:58 - 41:03
    addresses and as we all know, this is kind
    of bad for performance because whenever
  • 41:03 - 41:07
    you change that virtual address space, you
    have to invalidate your cache. Or tag it,
  • 41:07 - 41:11
    but they didn't do that. There's other ways
    of solving this problem. Basically on this
  • 41:11 - 41:15
    machine, what you need to do is invalidate
    the whole cache. It's quite a quick
  • 41:15 - 41:24
    operation, but it's still not good for
    performance to have an empty cache. The
  • 41:24 - 41:28
    only DMA present in the system is for the
    video, for the video and sound. I/O
  • 41:28 - 41:33
    doesn't have any DMA at all. And this is
    another area where as younger engineer
  • 41:33 - 41:36
    "crap, why didn't they have DMA? That
    would be way better." DMA is the solution
  • 41:36 - 41:41
    to everyone's problems, as we all know.
    And I think the quote on the right
  • 41:41 - 41:47
    ties in with the ACORN team's discovery
    that all of these other processes needed
  • 41:47 - 41:52
    quite complex chipsets, quite expensive
    support chips. So the quote on the right
  • 41:52 - 41:57
    says that if you've got some chips, that
    vendors will be charging more for their
  • 41:57 - 42:03
    DMA devices even than the CPU. So not
    having dedicated DMA engine on board is a
  • 42:03 - 42:09
    massive cost saving. The comment I made on
    the previous 2 slides about the system
  • 42:09 - 42:14
    partitioning, putting a lot of attention
    into how many pins were on one chip versus
  • 42:14 - 42:19
    another, how many buses were going around
    the place. Not having IOC having to access
  • 42:19 - 42:25
    memory was a massive saving in cost for
    the number of pins and the system as a
  • 42:25 - 42:34
    whole. The other thing is the FIQ mode
    was effectively the means for doing IO.
  • 42:34 - 42:38
    Therefore, FIQ Mode was designed to be an
    incredibly low overhead way of doing
  • 42:38 - 42:44
    programed IO, having the CPU do the
    IO. So this was saying that the CPU is
  • 42:44 - 42:49
    going to be doing all of the IO stuff, but
    lets just optimize it, let's make it make
  • 42:49 - 42:54
    it as good as it could be and that's
    what led to the programmed IO. I also
  • 42:54 - 42:58
    remember ARM 2 didn't have a cache. If you
    don't have a cache on your CPU then
  • 42:58 - 43:03
    DMA is going to hold up the CPU anyway,
    so no cycles. DMA is not any
  • 43:03 - 43:07
    performance gain. You may as well get
    the CPU to do it and then get the CPU to
  • 43:07 - 43:13
    do it in the lowest overhead way as possible.
    I think this can be summarized as bringing
  • 43:13 - 43:17
    the "RISC principles" to the system. So
    the RISC principle, say for your CPU, you
  • 43:17 - 43:21
    don't put anything in the CPU that you can
    do in software and this is saying, okay,
  • 43:21 - 43:27
    we'll actually software can do the IO just
    as well without a cache as the DMA
  • 43:27 - 43:30
    system. So let's get software to do that.
    And I think this is a kind of a nice way
  • 43:30 - 43:34
    of seeing it. This is part of the cost
    optimization for really very little
  • 43:34 - 43:40
    degradation in performance compared to
    doing in hardware. So this is an IO card.
  • 43:40 - 43:43
    The euro cards then nice and easy. The
    only thing I wanted to say here was this
  • 43:43 - 43:49
    is my SCSI card and it has a ROM on the
    left hand side. And so. This is the
  • 43:49 - 43:54
    expansion ROM basically many, many years
    before PCI made this popular. Your drivers
  • 43:54 - 43:59
    are on this ROM. This is a SCSI disc
    plugging into this and you can plug this
  • 43:59 - 44:03
    card in and then boot off the disk. You
    don't need any other software to make it
  • 44:03 - 44:08
    work. So this is just a very nice user
    experience. There is no messing around
  • 44:08 - 44:12
    with configuring IO windows or interrupts
    or any of the iSCSI sort of stuff that was
  • 44:12 - 44:18
    going on at the time. So to summarize some
    of the the hardware stuff that we've seen,
  • 44:18 - 44:22
    the ARM is pipelined and it has the load-
    store-multiple -instructions which make
  • 44:22 - 44:28
    for a very high bandwidth utilization.
    That's what gives it its high performance.
  • 44:28 - 44:33
    The machine was really simple. So
    attention to detail about separating,
  • 44:33 - 44:37
    partitioning the work between the chips
    and reducing the chip cost as much as
  • 44:37 - 44:45
    possible. Keeping that balanced was really
    a good idea. The machine was designed when
  • 44:45 - 44:49
    memory and CPUs were about the same speed.
    So this is before that kind of flipped
  • 44:49 - 44:53
    over. An 8 MHz ARM 2 was
    designed to use 8 MHz memory.
  • 44:53 - 44:57
    There's no need to have a cache at all on
    there these days it sounds really crazy
  • 44:57 - 45:01
    not to have a cache on the CPU, but if your
    memory is not that much slower than this
  • 45:01 - 45:08
    is a huge cost saving, but it is also risk
    saving. This was the first real proper CPU.
  • 45:08 - 45:12
    If we don't count ARM 1 to say ARM 1 was a
    test, but ARM 2 is that, you know, the
  • 45:12 - 45:16
    first product CPU. And having a cache on
    that would have been a huge risk for a
  • 45:16 - 45:21
    design team that hadn't dealt with the
    structures that complicated at that
  • 45:21 - 45:23
    point. So that was the right
    thing to do, I think
  • 45:23 - 45:26
    and I talked about DMA. I'm actually
  • 45:26 - 45:29
    converse on this. I thought this was crap.
    And actually, I think this was a really
  • 45:29 - 45:33
    good example of balanced design. What's
    the right tool for the job? Software is
  • 45:33 - 45:38
    going to do the IO, so let's make sure
    that FIQ mode, it makes sure that
  • 45:38 - 45:45
    there's low overhead as possible. We
    talked about system partitioning. The MMU.
  • 45:45 - 45:49
    I still think it's weird and
    backward. I think there is a
  • 45:49 - 45:56
    strong argument though that a more
    familiar TLB is a massively complicated
  • 45:56 - 45:59
    compared to what they did here. And I
    think the main drive here was not just
  • 45:59 - 46:06
    area on the chip, but also to make it much
    simpler to implement. So it worked. And I
  • 46:06 - 46:09
    think this was they really didn't have
    that many shots of doing this. This wasn't
  • 46:09 - 46:15
    a company or a team that could afford to
    have many goes at this product. And I
  • 46:15 - 46:21
    think that says it all. I think they did a
    great job. Okay. So the OS story is a
  • 46:21 - 46:25
    little bit more complicated. Remember,
    it's gonna be this office automation
  • 46:25 - 46:29
    machine a bit like a Xerox star. Was going
    to have this wonderful high res mono mode
  • 46:29 - 46:34
    and people gonna be laser printing from
    it. So just like Xerox PARC, Acorn started
  • 46:34 - 46:38
    Palo Alto based research center.
    Californians and beanbags writing an
  • 46:38 - 46:43
    operating system using a micro kernel in
    Modula-2 all of the trendy boxes ticked
  • 46:43 - 46:49
    here for the mid 80s. It was by the sounds
    a very advanced operating system and it
  • 46:49 - 46:54
    did virtual memory and so on, is very
    resource hungry, though. And it was never
  • 46:54 - 47:00
    really very performant. Ultimately, the
    hardware got done quicker than the
  • 47:00 - 47:05
    software. And after a year or two.
    Management got the jitters. Hardware was
  • 47:05 - 47:09
    looming and said, well, next year we're
    going to have the computer ready. Where's
  • 47:09 - 47:13
    the operating system? And the project got
    canned. And this is a real shame. I'd love
  • 47:13 - 47:17
    to know more about this operating system.
    Virtually nothing is documented outside of
  • 47:17 - 47:22
    Acorn. Even the people, I spoke to, didn't
    work on this. A bunch of people in
  • 47:22 - 47:25
    California that kind of disappeared with
    it. So if anyone has this software
  • 47:25 - 47:29
    archived anywhere, then get in touch.
    Computer Museum around the corner from me
  • 47:29 - 47:35
    is raring to go on that. That'll be really
    cool thing to archive. So anyway, they
  • 47:35 - 47:40
    had now a desperate situation. They had to
    go to Plan B, which was in under a year write
  • 47:40 - 47:43
    an operating system for the machine
    that was on its way to being delivered.
  • 47:43 - 47:48
    And it kind of shows Arthur was I mean, I
    think the team did a really good job in
  • 47:48 - 47:53
    getting something out of the door in half
    a year, but it was a little bit flaky.
  • 47:53 - 47:57
    RISC OS then a year later, developed
    from Arthur. I don't know if anyone's
  • 47:57 - 48:02
    heard of RISC OS, but Arthur is
    very, very niche and basically got
  • 48:02 - 48:07
    completely replaced by RISC OS because
    it was a bit less usable than RISC OS.
  • 48:07 - 48:12
    Another really strong point that this
    had it's quite a big ROM. So 2 MB going
  • 48:12 - 48:17
    up...sorry, 0,5 MB in the 80s going
    up to 2 MB in the early 90s.
  • 48:17 - 48:22
    There's a lot of stuff in ROM. One of
    those things is BBC Basic 5. I know
  • 48:22 - 48:29
    it's 2019, and I know Basic is basic, but
    BBC Basic is actually quite good. It has
  • 48:29 - 48:33
    procedures and it's got support for all
    the graphics and sound. You could write GUI
  • 48:33 - 48:37
    applications in Basic and a lot of people
    did. It's also very fast. So Sophie Wilson
  • 48:37 - 48:43
    wrote this very, very optimized Basic
    interpreter. I talked about the modules
  • 48:43 - 48:46
    and podules. This is the expansion
    ROM things. And a really great user
  • 48:46 - 48:51
    experience there. But speaking of user
    experience, this was ARTHUR . I never used
  • 48:51 - 48:58
    ARTHUR. I just dug out a ROM and had a
    play with it. It's bloody horrible. So that
  • 48:58 - 49:04
    went away quickly. At the time also. So
    part of this emergency plan B was to take
  • 49:04 - 49:08
    the Acorn soft team who were supposed to
    be writing applications for this and get
  • 49:08 - 49:12
    them to quickly knock out an operating
    system. So at launch, basically, this is
  • 49:12 - 49:16
    one of the only things that you could do
    with the machine. Had a great demo called
  • 49:16 - 49:21
    Lander, of a great game called Zarch,
    which is 3D space. You could fly around,
  • 49:21 - 49:27
    it didn't have serious business
    applications. And, you know, it was very
  • 49:27 - 49:31
    there was not much you could do with this
    really expensive machine at launch and
  • 49:31 - 49:35
    that really hurt it, I think. So let me
    get RISC OS 2 in 1988 and this is now
  • 49:35 - 49:42
    looking less like a vomit sort of thing,
    much nicer machine. And then eventually
  • 49:42 - 49:47
    RISC OS 3. It was drag and drop between
    applications. It's all multitasking,
  • 49:47 - 49:53
    does outline font anti aliasing
    and so on. So just lastly, I want to
  • 49:53 - 49:56
    quickly touch on the really interesting
    operating systems that ACORN had a Unix
  • 49:56 - 49:59
    operating system. So as well as being a
    geek, I'm also UNIX geek and I've always
  • 49:59 - 50:05
    been fascinated by RISCiX. These machines
    are astonishingly expensive. They were
  • 50:05 - 50:08
    the existing Archimedes machines with a
    different sticker on. So that's A540 with
  • 50:08 - 50:15
    a sticker on the front. And this OS
    was developed after the Archimedes was
  • 50:15 - 50:18
    already designed at that point when this
    OS was being developed. So
  • 50:18 - 50:21
    there's a lot of stuff about the hardware
    that wasn't quite right for a Unix
  • 50:21 - 50:26
    operating system. 32K page size on a 4
    megabyte machine really, really killed you
  • 50:26 - 50:30
    in terms of your page cache and and that
    kind of thing. They turned this into a bit
  • 50:30 - 50:35
    of an opportunity. At least they made good
    on some of this. There was a quite a novel
  • 50:35 - 50:42
    online decompression scheme for you to
    demand a page- text from a binary
  • 50:42 - 50:46
    and it would decompress into your 32K
    page, but it was stored in a
  • 50:46 - 50:54
    sparse way on disk. So actually on disk
    use was a lot less than you'd expect. The
  • 50:54 - 50:57
    only way it fit on some of the
    smaller machines.
  • 50:57 - 51:02
    Also Acorn TechL the department that
    designed the cyber truck it turns out.
  • 51:02 - 51:06
    This was their view of the A680,
    which is an unreleased workstation.
  • 51:06 - 51:09
    I love this picture.
    I like that piece of cheese or
  • 51:09 - 51:13
    cake as the mouse. That's my favorite
    part. But this is the real machine. So
  • 51:13 - 51:19
    this is an unreleased prototype I found at
    the computer museum. It's notable. And
  • 51:19 - 51:22
    it's got 2 MEMCs. It's got a 8MB of
    RAM. It's only designed to run RISC iX,
  • 51:22 - 51:26
    the Unix operating system and has highres
    monitor only doesn't have color, who's
  • 51:26 - 51:30
    designed to run frame maker and driver
    laser printers and be a kind of desktop
  • 51:30 - 51:35
    publishing workstation. I've always been
    fascinated by RISC iX, as I said a while
  • 51:35 - 51:41
    ago I hacked around on ArcEm for a while.
    I got it booting in ArcEm. I'd never seen
  • 51:41 - 51:47
    this before. I never used a RISC iX
    machine. So there we go, it boots, it is
  • 51:47 - 51:51
    multi-user. But wait, there's more. It has
    a really cool X-Server, a very fast one. I
  • 51:51 - 51:55
    think Sophie Wilson again worked on
    the X server here. So it's very well
  • 51:55 - 51:58
    optimized and very fast for a machine of
    its era. And it makes quite a nice little
  • 51:58 - 52:03
    Unix workstation. It's quite a cool little
    system, by the way Tudor, the guy that
  • 52:03 - 52:07
    designed the VIDC and the IO system called
    me a sado forgetting this working in
  • 52:07 - 52:14
    there. That's my claim to fame. Finally,
    and I want to leave some time for
  • 52:14 - 52:20
    questions. There's a lot of useful stuff
    in ROM. One of them is BBC Basic. Basic
  • 52:20 - 52:23
    has an assembler so you can walk up to
    this machine with a floppy disk and write
  • 52:23 - 52:30
    assembler has a special bit of syntax
    there and then you can just call it. And
  • 52:30 - 52:32
    so this is really powerful. So at school
    or something with the floppy disk, you can
  • 52:32 - 52:37
    do something that's a bit more than basic
    programing. Bizarrely, I mostly write that
  • 52:37 - 52:41
    with only two or three tiny syntax errors
    after about 20 years away from this. It's
  • 52:41 - 52:46
    in there somewhere. Legacy wise, the
    machine didn't sell very many under a
  • 52:46 - 52:51
    hundred thousand easily. I don't think it
    really made a massive impact. PCs had
  • 52:51 - 52:55
    already taken off by then. The ARM
    processor, not going to go on about the
  • 52:55 - 52:59
    company. That's clear that that
    obviously has changed the world in many
  • 52:59 - 53:04
    ways. The thing I really took away from
    this exercise was that a handful of smart
  • 53:04 - 53:10
    people. Not that many. No, order of a dozen
    designed multiple chips, designed a custom
  • 53:10 - 53:15
    computer from scratch, got it working. And
    it was quite good. And I think that this
  • 53:15 - 53:17
    really turned people's heads. It made
    people think differently that the people
  • 53:17 - 53:21
    that were not Motorola and IBM really,
    really big companies with enormous
  • 53:21 - 53:27
    resources could do this and could make it
    work. I think actually that led to the
  • 53:27 - 53:31
    thinking that people could design their
    systems on the chip in the 90s and that
  • 53:31 - 53:35
    market taking off. So I think this is
    really key in getting people thinking that
  • 53:35 - 53:40
    way. It was possible to design your own
    silicon. And finally, I just want to thank
  • 53:40 - 53:45
    the people I spoke to and Adrian and
    Jason. Their center of computing history in
  • 53:45 - 53:49
    Cambridge. If you're in Cambridge, then
    please visit there. It's a really cool
  • 53:49 - 53:56
    museum. And with that, I'll wrap up. If
    there's any time for questions, then I'm
  • 53:56 - 53:58
    getting a blank look. No time for
    questions?
  • 53:58 - 54:02
    Herald: There's about 5 minutes left for
    questions.
  • 54:02 - 54:08
    Matt: Fantastic! Or come up to me afterwards.
    I'm happy to chat more about this.
  • 54:08 - 54:19
    applause
    Herald:The first question is for the
  • 54:19 - 54:30
    Internet. Signal angel, will you?
    Well, grab your microphones and get the
  • 54:30 - 54:37
    first of the audio in the room here. There
    that microphone, please ask a question.
  • 54:37 - 54:44
    Mic1: You mentioned that the system is
    making good use of the memory, but how is
  • 54:44 - 54:50
    that actually not completely being
    stalled on memory? Having no cache and
  • 54:50 - 54:55
    same cycle time for the cache- for the
    memory as for the CPU.
  • 54:55 - 55:01
    M: Good question. So how is it not always
    stalled on memory ? I mean. Well, it's
  • 55:01 - 55:04
    sometimes stalled on memory when you do
    something that's non sequential. You have
  • 55:04 - 55:09
    to take one of the slow cycles. This was
    the N cycle. The key is you try and
  • 55:09 - 55:11
    maximize the amount of time that you're
    doing sequential stuff.
  • 55:11 - 55:16
    So on the ARM 2 you wanted to unroll loops
    as much as possible. So you're fetching
  • 55:16 - 55:20
    your instructions sequentially, right? You
    wanted to make as much use of load-store
  • 55:20 - 55:24
    multiples. You could load single registers
    with an individual register load, but it
  • 55:24 - 55:29
    was much more efficient to pay that cost.
    Just once the start of the instruction and
  • 55:29 - 55:34
    then stream stuff sequentially. So you're
    right that it is still stalled sometimes,
  • 55:34 - 55:37
    but that was still a good
    tradeoff, I think, for a system that
  • 55:37 - 55:41
    didn't have a cache for other reasons.
    M1: Thanks.
  • 55:41 - 55:45
    Herald: Next question is for the Internet.
    Signal Angel: Are there any Acorns on
  • 55:45 - 55:50
    sale right now or if you want to get into
    this kind of hardware where do you get it?
  • 55:50 - 55:53
    Herald: Can you repeat the first sentence,
    please? Sorry, the first part.
  • 55:53 - 55:56
    S: If you want to get into this kind of
    hardware right now, if you want to buy it
  • 55:56 - 55:59
    right now.
    M: Yeah, good question. How do you
  • 55:59 - 56:06
    get hold of one drive prices up on eBay? I
    guess I hate to say it. Might be fun to
  • 56:06 - 56:09
    play around in emulators. Always
    perfer that to hack around on the
  • 56:09 - 56:12
    real thing. Emulators always feel a bit
    strange. There are a bunch of really good
  • 56:12 - 56:19
    emulators out there. Quite complete. Yeah,
    I think it just I would just go on
  • 56:19 - 56:23
    auction sites and try and find one.
    Unfortunately, they're not completely
  • 56:23 - 56:28
    rare. I mean that's the thing, they
    did sell. Not quite sure. Exact figure,
  • 56:28 - 56:32
    but you know, there were tens and tens of
    thousands of these things made. So I would
  • 56:32 - 56:35
    look also in Britain more than elsewhere.
    Although I do understand that Germany had
  • 56:35 - 56:40
    quite a few. If you can get a hold of one,
    though, I do suggest doing so. I think
  • 56:40 - 56:46
    they're really fun to play with.
    Herald: OK, next question.
  • 56:46 - 56:52
    M2: So I found myself looking at the
    documentation for the LVM/STM instructions
  • 56:52 - 56:58
    while devaluing something on ARM just last
    week. And just maybe wonder what's your
  • 56:58 - 57:04
    thought? Are there any quirks of the
    Archimedes that have crept into the modern
  • 57:04 - 57:07
    ARM design and instruction set that you
    are aware of?
  • 57:07 - 57:13
    M: Most of them got purged. So there are
    the 26 bits adressing. There was a
  • 57:13 - 57:19
    couple of strange uses of, there is an XOR
    instruction into PC for changing flags. So
  • 57:19 - 57:25
    there was a great purge when the ARM 6 was
    designed and the ARM 6. I should know
  • 57:25 - 57:32
    this ARM v3. That's got 32 bit addressing
    and lost this. These weirdnesses
  • 57:32 - 57:36
    got moved out.
    I can't think of aside from just the
  • 57:36 - 57:41
    resulting ARM 32 instructions that being
    quite quirky and having a lot of good
  • 57:41 - 57:47
    quirks. This shifted register as sort of a
    free thing you can do. For example, you
  • 57:47 - 57:52
    can add one register to a shifted register
    in one cycle. I think that's a good quirk.
  • 57:52 - 57:55
    So in terms of the inheriting that
    instruction set and not changing those
  • 57:55 - 58:06
    things. Maybe that counts?
    Herald: Any further questions? Internet,
  • 58:06 - 58:11
    any new questions? No? Okay, so in that
    case one round of applause for Matt Evans.
  • 58:11 - 58:14
    M: Thank you.
  • 58:14 - 58:21
    applause
  • 58:21 - 58:28
    postroll music
  • 58:28 - 58:44
    Subtitles created by c3subtitles.de
    in the year 2021. Join, and help us!
Title:
36C3 - The Ultimate Acorn Archimedes talk
Description:

more » « less
Video Language:
English
Duration:
58:48

English subtitles

Revisions