< Return to Video

Garden City Ruby 2014 - Ruby memory Model by Hari Krishnan

  • 0:25 - 0:26
    HARY KRISHNAN: So, thank you very much
  • 0:26 - 0:28
    for being here on a Saturday evening, this
    late.
  • 0:28 - 0:30
    My talk got pushed to the last, but I
  • 0:30 - 0:35
    appreciate you being here, first. My name's
    Hari. I
  • 0:35 - 0:37
    work at MavenHive. So this is a talk about
  • 0:37 - 0:44
    Ruby memory model. So before I start, how
    many
  • 0:44 - 0:47
    of you have heard about memory model and know
  • 0:47 - 0:52
    what it is? Show of hands, please. OK. Let's
  • 0:52 - 0:55
    see where this talk goes. So why I did
  • 0:55 - 0:59
    I come up with this talk topic. So I
  • 0:59 - 1:02
    started my career with Java, and I spent a
  • 1:02 - 1:05
    lot many years with Java, and Java has a
  • 1:05 - 1:09
    very clearly documented memory model. And
    it kind of
  • 1:09 - 1:10
    gets to you because with all that, you don't
  • 1:10 - 1:14
    feel safe enough doing multi-threaded programming
    at all. So
  • 1:14 - 1:18
    with Ruby, we've always been talking about,
    you know,
  • 1:18 - 1:21
    doing multi-process for multi-process parallelism,
  • 1:21 - 1:24
    rather than multi-threaded parallelism,
  • 1:24 - 1:29
    even though the language actually supports,
    you know, multi-threading
  • 1:29 - 1:31
    semantics. Of course we know it's called single-threaded
    and
  • 1:31 - 1:34
    all that, but I just got curious, like, what
  • 1:34 - 1:36
    is the real memory model behind Ruby, and
    I
  • 1:36 - 1:39
    just wanted to figure that out. So this talk
  • 1:39 - 1:42
    is all about my learnings as I went through,
  • 1:42 - 1:46
    like, various literatures, and figured out,
    and I tried
  • 1:46 - 1:48
    to combine, like, get a gist of the whole
  • 1:48 - 1:51
    thing. And cram it into some twenty minutes
    so
  • 1:51 - 1:52
    that I could, like, probably give you a very
  • 1:52 - 1:56
    useful session, like, from which you can further
    do
  • 1:56 - 2:01
    more digging on this, right. So when I talked
  • 2:01 - 2:03
    to my friends about memory model, the first
    thing
  • 2:03 - 2:06
    that comes up to their mind is probably this
  • 2:06 - 2:10
    - heap, heap, non-heap, stack, whatever. I'm
    not gonna
  • 2:10 - 2:14
    talk about that. I'm not gonna talk about
    this
  • 2:14 - 2:17
    either. It's not about, you know, optimizing
    your memory,
  • 2:17 - 2:21
    or search memory leeks, or garbage collection.
    This talk
  • 2:21 - 2:23
    is not about that either. So what the hell
  • 2:23 - 2:27
    am I gonna talk about? First, a quick exercise.
  • 2:27 - 2:31
    So let's start with this and see where it
  • 2:31 - 2:36
    goes. Simple code. Not much to process late
    in
  • 2:36 - 2:39
    the day. There's a shared variable called
    'n', and
  • 2:39 - 2:42
    there are thousand threads over that, and
    each of
  • 2:42 - 2:45
    those threads want to increment that shared
    variable hundred
  • 2:45 - 2:49
    times, right. And what is the expected output?
    I'm
  • 2:49 - 2:51
    not gonna question you, I'm just gonna give
    it
  • 2:51 - 2:55
    away. It's 100,000. It's fairly straightforward
    code. I'm sure
  • 2:55 - 2:57
    all of you have done this, and it's no
  • 2:57 - 3:02
    big deal. So what's the real output? MRI is
  • 3:02 - 3:05
    very faithful, it gives you what you expected.
    100,000,
  • 3:05 - 3:09
    right. So what happens next? I'm running it
    on
  • 3:09 - 3:13
    Rubinius. This is what you see. And it's always
  • 3:13 - 3:16
    going to be a different number every time
    you
  • 3:16 - 3:19
    run it. And that's JRuby. It gives you a
  • 3:19 - 3:23
    lower number. Some of you may be guessing
    already,
  • 3:23 - 3:24
    and you probably know it, why it gives you
  • 3:24 - 3:28
    a lower number. So why all this basic stupid
  • 3:28 - 3:31
    code and some stupid counter over here, right?
    So
  • 3:31 - 3:34
    I just wanted to get a really basic example
  • 3:34 - 3:36
    to explain the concept of increment is not
    a
  • 3:36 - 3:40
    single instruction, right. The reason why
    I'm talking about
  • 3:40 - 3:43
    this is, I love Ruby because the syntax is
  • 3:43 - 3:47
    so terse, and it's so simple, it's so readable,
  • 3:47 - 3:49
    right. But it does not mean every single instruction
  • 3:49 - 3:52
    on the screen is going to be executed straight
  • 3:52 - 3:55
    away, right. So at least, to my junior self,
  • 3:55 - 3:57
    this is the first advice I would give, when
  • 3:57 - 4:01
    I started, you know, multi-threaded programming.
    So at least
  • 4:01 - 4:06
    three steps. Lowered increments store, right.
    That's, even further,
  • 4:06 - 4:10
    really simple piece of code like, you know,
    a
  • 4:10 - 4:13
    plus equals to, right. So this is what we
  • 4:13 - 4:16
    really want to happen. You have a count, you
  • 4:16 - 4:18
    lowered it, you increment it, you stored it.
    Then
  • 4:18 - 4:21
    the next thread comes along. It lowers it,
    increments
  • 4:21 - 4:23
    it, stores it. You have the next result which
  • 4:23 - 4:26
    is what you expect, right. But we live in
  • 4:26 - 4:28
    a world where threads don't want to be our
  • 4:28 - 4:31
    friend. They do this. One guy comes along,
    reads
  • 4:31 - 4:34
    it, increments it. The other guy also reads
    the
  • 4:34 - 4:37
    older value, increments it. And both of them
    go
  • 4:37 - 4:40
    and save the same value, right. So this is
  • 4:40 - 4:42
    a classic case of lost update. I'm sure most
  • 4:42 - 4:44
    of you have seen it in the database world.
  • 4:44 - 4:47
    But this pretty much happens a lot in the
  • 4:47 - 4:49
    multi-threading world, right. But why did
    it not happen
  • 4:49 - 4:52
    with MRI? And what did you see the right
  • 4:52 - 4:53
    result?? [00:04:52]? That, I'm sure a lot
    of you
  • 4:53 - 4:56
    know, but let's step, let's part that question
    and
  • 4:56 - 5:00
    just move a little ahead. So, as you observed
  • 5:00 - 5:04
    earlier, a lot of reordoring happening in
    instructions, right.
  • 5:04 - 5:07
    Like, the threads were context-switching,
    and they were reordering
  • 5:07 - 5:11
    statements. So where does this reordering
    happen? Reordering can
  • 5:11 - 5:15
    happen at multiple levels. So start from the
    top.
  • 5:15 - 5:18
    You have the compiler, which can do simple
    optimizations
  • 5:18 - 5:21
    like look closer?? [00:05:20]. Even that can
    change the
  • 5:21 - 5:24
    order of your statements in your code, right.
    Next,
  • 5:24 - 5:28
    when the code gets translated to, you know,
    machine-level
  • 5:28 - 5:31
    language, goes to core, and your CP cores
    are
  • 5:31 - 5:34
    at liberty, again, to reorder them for performance.
    And
  • 5:34 - 5:37
    next comes the memory system, right. The memory
    system
  • 5:37 - 5:40
    is like the combined global memory, which
    all the
  • 5:40 - 5:42
    CPUs can read, and also they're individual
    caches. But
  • 5:42 - 5:46
    why do CPUs have caches? They want to, memory
  • 5:46 - 5:48
    is slow, so they want to load, reload all
  • 5:48 - 5:50
    the values, refactor it, keep it in the cache,
  • 5:50 - 5:53
    again improve performance. So even the memory
    system can
  • 5:53 - 5:56
    conspire against you and reorder the loads
    and stores
  • 5:56 - 5:59
    after the memory registers. And that can cause
    reordering,
  • 5:59 - 6:03
    right. So this is really, really crazy. Like,
    I'm
  • 6:03 - 6:08
    a very stupid programmer, who works at the
    programming
  • 6:08 - 6:11
    language level. I don't really understand
    the structure of
  • 6:11 - 6:13
    the hardware and things like that. So how
    do
  • 6:13 - 6:16
    I keep myself abstracted from all this, you
    know,
  • 6:16 - 6:22
    really crazy stuff? So that's essentially
    a memory model.
  • 6:22 - 6:24
    So what, what is a memory model? A memory
  • 6:24 - 6:27
    model describes the interactions of threads
    through memory and
  • 6:27 - 6:29
    their shared use of data. So this is straight
  • 6:29 - 6:31
    out of Wikipedia, right. So if you just read
  • 6:31 - 6:35
    it first, either you're gonna think it's really
    simple,
  • 6:35 - 6:38
    and probably even looks stupid, but otherwise
    you might
  • 6:38 - 6:41
    not even understand. So I was the second category.
  • 6:41 - 6:44
    So what does this all mean? So when there
  • 6:44 - 6:49
    are so many complications with the reordering,
    the reads
  • 6:49 - 6:51
    and writes of memory and things like that,
    as
  • 6:51 - 6:55
    a programmer you need certain guarantees from
    the programming
  • 6:55 - 6:57
    language, and the virtual machine you're working
    on top
  • 6:57 - 7:01
    of, to say this is how multi-threaded shared,
    I
  • 7:01 - 7:04
    mean, multi-threaded access to shared memory
    is going to
  • 7:04 - 7:06
    work. These are the basic guarantees and these
    are
  • 7:06 - 7:09
    the simple rules of how the system works.
    So
  • 7:09 - 7:13
    you can reliably work code against that, right.
    So
  • 7:13 - 7:15
    in, in effect, a memory model is just a
  • 7:15 - 7:21
    specification. Any Java programmers here,
    in the house? Great.
  • 7:21 - 7:26
    So how many of you know about JSR 133?
  • 7:26 - 7:31
    The memory model, double check locking - OK.
    Some
  • 7:31 - 7:37
    people. Single term issue? OK - some more
    hands.
  • 7:37 - 7:40
    So Java was the first programming language
    which came
  • 7:40 - 7:43
    up with a concept called memory model, right.
    Because,
  • 7:43 - 7:46
    the first thing is, right ones?? [00:07:45]
    won't run
  • 7:46 - 7:48
    anywhere. It had to be predictable across
    platforms, across
  • 7:48 - 7:52
    reimplementations, and things like that. So
    the, there had
  • 7:52 - 7:55
    to be a JSR which specified what is the
  • 7:55 - 7:57
    memory model that it can code against so that
  • 7:57 - 8:02
    your multi-threaded code works predictably,
    and deterministically across platforms
  • 8:02 - 8:09
    and across virtual machines. Right? So essentially
    that's where
  • 8:09 - 8:11
    my, you know, whole thing started. I had gone
  • 8:11 - 8:15
    through the Java memory model, and was pretty
    much
  • 8:15 - 8:17
    really happy that someone had taken the pain
    to
  • 8:17 - 8:19
    write it down in clear terms so that you
  • 8:19 - 8:26
    don't have to worry about multi-threading.
    Hold on, sorry.
  • 8:28 - 8:35
    Sorry about that. Cool. So. Memory model gives
    you
  • 8:35 - 8:41
    rules at three broad levels. Atomicity, visibility
    and ordering.
  • 8:41 - 8:43
    So atomicity is as simple as, you know, variable
  • 8:43 - 8:47
    assignment. Is a variable assignment an indivisible
    unit of
  • 8:47 - 8:50
    work, or not? The rules around that, and it
  • 8:50 - 8:52
    also talks about rules around, can you assign
    hashes,
  • 8:52 - 8:55
    send arrays indivisibly and things like that.
    These rules
  • 8:55 - 8:58
    can change based on every alligned version,
    and things
  • 8:58 - 9:02
    like that. Next is visibility. So in that
    example
  • 9:02 - 9:05
    which you talked about, I mean, we saw two
  • 9:05 - 9:07
    threads trying to read the same value. Essentially
    they
  • 9:07 - 9:09
    are spying on each other. And it was not
  • 9:09 - 9:12
    clear at what point the data had to become
  • 9:12 - 9:15
    visible to each of those threads. So essentially
    visibility
  • 9:15 - 9:18
    is about that. And that is ensured through
    memory
  • 9:18 - 9:22
    barriers and ordering, which is the next thing.
    So
  • 9:22 - 9:25
    ordering is about how the loads and stores
    are
  • 9:25 - 9:29
    sequenced, or, you know, let's say you want
    to
  • 9:29 - 9:31
    write a piece of code, critical section as
    you
  • 9:31 - 9:33
    call it. And you don't want the compiler to
  • 9:33 - 9:36
    do any crazy things to improve performance.
    So you
  • 9:36 - 9:38
    say, I make it synchronized, and it has to
  • 9:38 - 9:40
    behave in a, behave in a nice serial?? [00:09:40]
  • 9:40 - 9:45
    manner. So that ?? manner is ensured by ordering.
  • 9:45 - 9:48
    Ordering is a really complex area. It talks
    about
  • 9:48 - 9:51
    causality, logical clocks and all that. I
    won't go
  • 9:51 - 9:54
    into those details. But I've been worrying
    you with
  • 9:54 - 9:58
    all this, you know, computer science basics
    and all
  • 9:58 - 10:00
    this. Why the hell am I talking about it
  • 10:00 - 10:02
    in a Ruby conference? Ruby is single-threaded,
    anyway. Why
  • 10:02 - 10:06
    the hell should I care about it, right? OK.
  • 10:06 - 10:09
    Do you really think languages like Ruby are
    thread
  • 10:09 - 10:15
    safe? Show of hands, anyone? So thread safety,
    I'm
  • 10:15 - 10:19
    talking only about Ruby - maybe Python. GIL
    based
  • 10:19 - 10:26
    languages. Are they thread safe? No? OK. In
    fact
  • 10:26 - 10:31
    they're not. Having single-threaded does not
    mean it's thread-safe,
  • 10:31 - 10:34
    right. Threads can switch context, and based
    on how
  • 10:34 - 10:36
    the language has been implemented and how
    often the
  • 10:36 - 10:39
    threads can switch context, and at what point
    they
  • 10:39 - 10:44
    can switch, things can go wrong, right. And
    another
  • 10:44 - 10:46
    pretty popular myth - I don't think many people
  • 10:46 - 10:49
    believe it here, in this audience at least.
    I
  • 10:49 - 10:52
    don't have concurrency problems because I'm
    running on single
  • 10:52 - 10:56
    core. Not true. Again, threads can switch
    context and
  • 10:56 - 10:59
    run on the same core and still have dirty
  • 10:59 - 11:03
    reads and things like that. So concurrency
    is all
  • 11:03 - 11:06
    about interleavings, right. Again, goes back
    to reordering. I
  • 11:06 - 11:08
    think I've been talking about this too often.
    And
  • 11:08 - 11:12
    let's not, again, worry with that. It's about
    interleavings.
  • 11:12 - 11:16
    We'll leave it at that. So let's, before we
  • 11:16 - 11:19
    understand more about, you know, the memory
    model and
  • 11:19 - 11:21
    what it has to do with Ruby, let's just
  • 11:21 - 11:25
    understand a little bit about threading in
    Ruby. So
  • 11:25 - 11:28
    all of you know, green threads, as of 1.8,
  • 11:28 - 11:31
    there was only one worse thread, which was
    being
  • 11:31 - 11:35
    multiplexed with multiple Ruby threads, which
    were being scheduled
  • 11:35 - 11:39
    on it through global interpreter lock. 1.9
    comes along,
  • 11:39 - 11:41
    there is a one to one mapping between the
  • 11:41 - 11:44
    Ruby thread and OS thread, but still the Ruby
  • 11:44 - 11:47
    thread cannot use the OS thread unless it
    has
  • 11:47 - 11:51
    the global VM lock as its call now. The
  • 11:51 - 11:56
    JVL acquire. So does having a Global Interpreter
    Lock
  • 11:56 - 12:01
    make you thread safe? It depends. It does
    make
  • 12:01 - 12:03
    you thread safe in a way, but let's see.
  • 12:03 - 12:05
    So how does GIL work? This is a very
  • 12:05 - 12:09
    simplistic representation of how GIL works.
    So you have
  • 12:09 - 12:12
    two threads here. One is already holding the
    GIL.
  • 12:12 - 12:16
    So it's, it's working with the OS thread.
    And
  • 12:16 - 12:19
    now when there is another thread waiting on
    it,
  • 12:19 - 12:21
    waiting on the GIL to do its work, it
  • 12:21 - 12:23
    sends a, it wakes up the timer thread. Time
  • 12:23 - 12:27
    thread is, again, another Ruby thread. The
    timer thread
  • 12:27 - 12:30
    now goes and interrupts the thread holding
    the GIL,
  • 12:30 - 12:32
    and if the GIL, if the thread holding the
  • 12:32 - 12:35
    GIL is done with whatever it's doing - I'll
  • 12:35 - 12:37
    get to it in a bit - it just
  • 12:37 - 12:40
    releases the lock, and now thread two can
    take
  • 12:40 - 12:43
    over and do its thing. Well this is the
  • 12:43 - 12:48
    basic working that at least I understood about
    GIL.
  • 12:48 - 12:50
    But there are details to this, right. It's
    not
  • 12:50 - 12:57
    as simple as what we saw. So, when you
  • 12:58 - 13:01
    initialize a thread, or create a thread in
    Ruby,
  • 13:01 - 13:03
    you pass it a block of code. So how
  • 13:03 - 13:06
    does that work? You take a block of code,
  • 13:06 - 13:08
    you put it inside the thread. What the thread
  • 13:08 - 13:10
    does is usually it acquires a JVL and a
  • 13:10 - 13:14
    block?? [00:13:11]. It executes the block
    of code. It
  • 13:14 - 13:17
    releases the, returns and releases the lock,
    right. So
  • 13:17 - 13:19
    essentially this is how it works. So during
    that
  • 13:19 - 13:22
    period of executation of the block, no other
    thread
  • 13:22 - 13:24
    is allowed to work. So that makes you almost
  • 13:24 - 13:28
    thread safe, right? But not really. If that's
    how
  • 13:28 - 13:31
    it's going to work, what if that thread is
  • 13:31 - 13:34
    going to hog the GIL, and not allow any
  • 13:34 - 13:36
    other thread to work? So there has to be
  • 13:36 - 13:38
    some kind of lock fairness, right. So that's
    where
  • 13:38 - 13:41
    the timer thread comes in and interrupts it.
    OK.
  • 13:41 - 13:43
    Does that mean the thread holding the GIL
    immediately
  • 13:43 - 13:45
    gives it up, and says here you go, you
  • 13:45 - 13:49
    can start and work with it? Not really. Again
  • 13:49 - 13:51
    the thread holding the GIL will only release
    the
  • 13:51 - 13:54
    GIL if it is at a context to its
  • 13:54 - 13:57
    boundary. What that is, is fairly complicated.
    I don't
  • 13:57 - 14:00
    want to go into the details. I think people
  • 14:00 - 14:03
    who here know a lot better C than me,
  • 14:03 - 14:05
    and are deep C divers really, they can probably
  • 14:05 - 14:09
    tell you, you know, how, at what the GIL
  • 14:09 - 14:11
    can get released. If a C thread, a C
  • 14:11 - 14:13
    code makes a call to Ruby code, can it
  • 14:13 - 14:15
    or can it not release the GIL? All those
  • 14:15 - 14:18
    things are there, right. So all these complexities
    are
  • 14:18 - 14:21
    really, really hard to deal with. I came across
  • 14:21 - 14:25
    this blog by Jesse Storimer. It's excellent
    and I
  • 14:25 - 14:27
    strongly encourage you to go through the two-part
    blog
  • 14:27 - 14:31
    about, you know, nobody understands GIL. It's
    really, really
  • 14:31 - 14:34
    important, if you're trying to do any sort
    of
  • 14:34 - 14:40
    multi-threaded programming in Ruby. So do
    you still think
  • 14:40 - 14:43
    Ruby is thread safe because it's got GIL?
    I'm
  • 14:43 - 14:49
    talking about MRI, essentially. So the thing
    is, we
  • 14:49 - 14:52
    can't depend on GIL, right. GIL is not documented
  • 14:52 - 14:54
    anywhere that this is exactly how it works.
    This
  • 14:54 - 14:56
    is when the timer thread wakes up. These are
  • 14:56 - 14:59
    the time slices alotted to the thread acquiring
    the
  • 14:59 - 15:03
    JVL. There is no documentation around at what
    point
  • 15:03 - 15:05
    the GIL can be released, can it not be
  • 15:05 - 15:07
    released, and things like that. There's no,
    it's not
  • 15:07 - 15:10
    predictable, and if you depend on it, what
    could
  • 15:10 - 15:13
    also happen is even within MRI, when you're
    moving
  • 15:13 - 15:16
    from version to version, if something changes
    in GIL,
  • 15:16 - 15:22
    your code with behave nondeterministically.
    And what about language
  • 15:22 - 15:25
    in Ruby implementations that don't even have
    a GIL?
  • 15:25 - 15:27
    So obviously that's the big problem, right.
    If you
  • 15:27 - 15:30
    write a gem or something which has to be
  • 15:30 - 15:32
    multi-threaded, and if you're depending on
    the GIL to
  • 15:32 - 15:35
    do its thing to keep you safe, then obviously
  • 15:35 - 15:39
    it cannot work on Rubinius and JRuby. Let
    that
  • 15:39 - 15:41
    alone, even, even if you give that up, even
  • 15:41 - 15:44
    with MRI, it's not entirely correct to say
    that
  • 15:44 - 15:47
    you're thread safe, because there is a GIL
    that
  • 15:47 - 15:53
    will ensure that only one thread is running.
    So
  • 15:53 - 15:55
    what did I find out? Ruby really does not
  • 15:55 - 15:57
    have a documented memory model. It's pretty
    much similar
  • 15:57 - 16:00
    to Python. It doesn't have a clearly documented
    memory
  • 16:00 - 16:05
    model. What is the implication of that? So
    as
  • 16:05 - 16:08
    I mentioned previously, a memory model is
    like a
  • 16:08 - 16:11
    specification. This is exactly how the system
    has to
  • 16:11 - 16:15
    provide a certain minimum guarantee to the
    users of
  • 16:15 - 16:18
    the language, right, regarding multi threaded
    access to shared
  • 16:18 - 16:22
    memory. Now, basically if I don't have a written
  • 16:22 - 16:24
    down memory model, and I am going to write
  • 16:24 - 16:27
    a Ruby implementation to model, I have the
    liberty
  • 16:27 - 16:30
    to choose whatever memory model I want. So
    the
  • 16:30 - 16:33
    code, if you're writing against MRI, may not
    essentially
  • 16:33 - 16:37
    work right on my, you know, my implementation
    of
  • 16:37 - 16:41
    Ruby. That's the big implication, right. So
    Ruby right
  • 16:41 - 16:46
    now depends on underlying virtual machines.
    Even after ER,
  • 16:46 - 16:48
    you have bad code compilations, so even MRI
    is
  • 16:48 - 16:51
    almost like a VM. So that has no specification
  • 16:51 - 16:53
    for a memory model, but it does have something,
  • 16:53 - 16:55
    right, internally. If you have to go through
    the
  • 16:55 - 16:58
    C code and understand. It's not guaranteed
    to remain
  • 16:58 - 17:01
    the same from version to version, as I understand,
  • 17:01 - 17:05
    right. And obviously JRuby and Rubinius, they
    depend on
  • 17:05 - 17:08
    JVM and LLVM respectively. And they all have
    a
  • 17:08 - 17:12
    clearly documented memory model. You could
    have a read
  • 17:12 - 17:15
    at it. And the only thing is, if Ruby
  • 17:15 - 17:18
    had an implementation - sorry, a specification
    for a
  • 17:18 - 17:22
    memory model, it could be, you know, implemented
    using
  • 17:22 - 17:28
    the constructs available on JVM and LLVM.
    But this
  • 17:28 - 17:29
    is what we have. We don't have much to
  • 17:29 - 17:33
    do. What do we do under the circumstances?
    We
  • 17:33 - 17:37
    have to engineer our code for thread safety.
    We
  • 17:37 - 17:40
    can't bask under the safety that, there is
    a
  • 17:40 - 17:42
    GIL and so it's going to help me keep
  • 17:42 - 17:45
    my code thread safe. So even I can write
  • 17:45 - 17:48
    multiple, you know, multi threaded code without
    actually worrying
  • 17:48 - 17:51
    about serious synchronization issues and things
    like that. It's
  • 17:51 - 17:54
    totally not the right thing to do. I think
  • 17:54 - 17:57
    any which way, Ruby is a language I love,
  • 17:57 - 18:00
    and I'm sure all of you love, so. And
  • 18:00 - 18:03
    it's progressing my leaps and bounds, and
    eventually we're
  • 18:03 - 18:05
    going to write more and more complex systems
    with
  • 18:05 - 18:09
    Ruby. And who knows, we might have true parallelism
  • 18:09 - 18:14
    very soon, right. So why, still, stay in the
  • 18:14 - 18:17
    same mental block that we don't want to write,
  • 18:17 - 18:20
    you know, thread safe code that's anyway single
    threaded.
  • 18:20 - 18:22
    We might as well get into the mindset of
  • 18:22 - 18:26
    writing proper thread safe code, and try and
    probably
  • 18:26 - 18:30
    come up with a memory model, right. But I
  • 18:30 - 18:32
    think for now we just start engineering code
    for
  • 18:32 - 18:37
    thread safety. Simple Mutex, I'm sure all
    of you
  • 18:37 - 18:40
    know, but it's really, really important for
    even a
  • 18:40 - 18:44
    stupid operation like a plus equals two. So
    simple
  • 18:44 - 18:47
    things which are noticed in Ruby code bases
    and
  • 18:47 - 18:51
    Rails code bases as well, like generally,
    is, there
  • 18:51 - 18:53
    is like a synchronized, you know, a section
    of
  • 18:53 - 18:56
    the code has lots of synchronization and everything.
    It's
  • 18:56 - 18:59
    really safe. But we leave an innocent accessor
    lying
  • 18:59 - 19:01
    around, and that causes a lot of, you know,
  • 19:01 - 19:04
    pain, like debugging those issues. And general
    issues like,
  • 19:04 - 19:08
    you know, state mutations, inside methods
    is really a
  • 19:08 - 19:10
    bad idea. So if you're looking for issues
    around
  • 19:10 - 19:12
    multi threading, this might be a good place
    to
  • 19:12 - 19:14
    start. So I just listed a few of them
  • 19:14 - 19:16
    here. I didn't want to make a really dense
  • 19:16 - 19:19
    talk with all the details. You can always
    catch
  • 19:19 - 19:21
    me offline and I can tell you some of
  • 19:21 - 19:24
    my experiences and probably even listen to
    you and
  • 19:24 - 19:26
    learn from you about some of the issues that
  • 19:26 - 19:29
    we can solve by actually writing proper thread
    safe
  • 19:29 - 19:33
    code in Ruby. I came across a few gems
  • 19:33 - 19:35
    which were really, really nice. Both of them
    happen
  • 19:35 - 19:39
    to be written by headius. The first one is
  • 19:39 - 19:41
    atomic. Atomic is almost trying to give you
    the
  • 19:41 - 19:45
    similar constructs like the Java utility concurrent
    package. It
  • 19:45 - 19:51
    tries to, it's kind of compatible across MRI,
    JRuby,
  • 19:51 - 19:54
    and Rubinius, which is also a really nice
    thing.
  • 19:54 - 19:57
    So you have atomic integers and atomic floats,
    which
  • 19:57 - 20:00
    do increments actually in an atomic way, which
    is
  • 20:00 - 20:02
    excellent. And then there is thread_safe library,
    which also
  • 20:02 - 20:05
    has a few thread safe data structures. I'm
    trying
  • 20:05 - 20:07
    to play around with these libraries right
    now, but
  • 20:07 - 20:09
    they may be a good, you know, starting point
  • 20:09 - 20:11
    if you are trying to do higher level constructs
  • 20:11 - 20:16
    for concurrency. And that's pretty much it.
    I'm open
  • 20:16 - 20:22
    to take questions. Thank you. And before anything
    I
  • 20:22 - 20:23
    really would like to thank you all, again
    for
  • 20:23 - 20:27
    being here for the talk, and thank the GCRC
  • 20:27 - 20:31
    organizers, you know, they've done a great
    job with
  • 20:31 - 20:38
    this conference. A big shout out to them.
  • 20:46 - 20:47
    V.O.: Any questions?
  • 20:47 - 20:47
    H.K.: Yeah?
  • 20:47 - 20:47
    QUESTION: Hey.
  • 20:47 - 20:47
    H.K.: Hi.
  • 20:47 - 20:48
    QUESTION: If, for example, if a Ruby code
    is running
  • 20:48 - 20:52
    in the JVM, in JRuby, how does, because none
  • 20:52 - 20:54
    of the Ruby code is written in a thread
  • 20:54 - 20:57
    safe way. How do, how does it internally manage
  • 20:57 - 20:59
    - does it actually, yeah, yesterday Yogi talked
    about
  • 20:59 - 21:01
    the point that ActiveRecord is not actually
    thread safe.
  • 21:01 - 21:04
    Can you explain it in detail like in a
  • 21:04 - 21:04
    theoretical way?
  • 21:04 - 21:07
    H.K.: OK. What is thread safety in
  • 21:07 - 21:09
    general, right? Thread safety is about how
    the data
  • 21:09 - 21:13
    is consistently maintained after multi-threaded
    access to that shared
  • 21:13 - 21:17
    data, right. So Ruby essentially has a GIL
    because
  • 21:17 - 21:20
    internal implementations are not thread safe,
    right. That's why
  • 21:20 - 21:22
    you want to have a GIL to protect you
  • 21:22 - 21:26
    from those problems. But as far as JRuby is
  • 21:26 - 21:29
    concerned, or Rubinius is concerned, the implementation
    itself is
  • 21:29 - 21:32
    not written in C. JRuby is written in Ruby
  • 21:32 - 21:34
    again, I mean JRuby itself, and Rubinius is
    written
  • 21:34 - 21:38
    in Ruby. And some of these actual internal
    constructs
  • 21:38 - 21:41
    are thread safe when compared to MRI. I haven't
  • 21:41 - 21:43
    actually taken a look in detail into the code
  • 21:43 - 21:48
    of these code bases, but if they are implemented
  • 21:48 - 21:50
    properly, you can be thread safe - internally,
    at
  • 21:50 - 21:53
    least - so, which means, the base code of
  • 21:53 - 21:56
    JRuby itself might be thread safe. It's only
    not
  • 21:56 - 21:58
    thread safe because the gems on top of it,
  • 21:58 - 22:01
    which are trying to run. They may have, like,
  • 22:01 - 22:05
    thread safety issues, right. Does that answer
    your question,
  • 22:05 - 22:06
    like, or- ?
  • 22:06 - 22:08
    QUESTION: About thread safety?? [00:22:09].
  • 22:08 - 22:12
    H.K.: Sure, sure. So those gems will not work.
    That's
  • 22:12 - 22:14
    the point. Like what I want to convey here,
  • 22:14 - 22:17
    is whatever gems we are offering, and whatever
    code
  • 22:17 - 22:19
    we are writing, we might get it - it's
  • 22:19 - 22:20
    a good idea to get into the habit of
  • 22:20 - 22:23
    writing thread safe code, so that we can actually
  • 22:23 - 22:25
    encourage a truly parallel Ruby, right. We
    don't, we
  • 22:25 - 22:28
    don't have to stay in the same paradigm of
  • 22:28 - 22:32
    OK we have to be single threaded.
  • 22:32 - 22:37
    QUESTION: So Mutex based thread management
    is one way.
  • 22:37 - 22:40
    There's also like actors and futures and things
    like that.
  • 22:40 - 22:42
    And there's a gem called cellulite-
  • 22:42 - 22:43
    H.K.: Yup.
  • 22:43 - 22:45
    QUESTION: That, combined with something called
    Hamster,
  • 22:45 - 22:46
    which makes everything immutable-
  • 22:46 - 22:47
    H.K.: Yup.
  • 22:47 - 22:48
    QUESTION: Is another way to do it.
  • 22:48 - 22:48
    H.K.: Yup.
  • 22:48 - 22:49
    QUESTION: Have you done it or like,
  • 22:49 - 22:50
    what's your experience with that?
  • 22:50 - 22:53
    H.K.: Yeah, I have tried out actors, with
    revactor,
  • 22:53 - 22:54
    and lockless concurrency is
  • 22:54 - 22:57
    something I definitely agree is a good idea.
    But
  • 22:57 - 23:01
    I'm specifically talking about, you know,
    lock-based concurrency, like,
  • 23:01 - 23:05
    Mutex-based concurrency. This area is also
    important because it's
  • 23:05 - 23:08
    not like thread mutable state is bad. It is,
  • 23:08 - 23:11
    it is actually applicable in certain scenarios.
    When we
  • 23:11 - 23:13
    are working in this particular paradigm, we
    still need
  • 23:13 - 23:19
    the safety of a memory model. Any other questions?
  • 23:19 - 23:26
    QUESTION: Thanks for the talk Hari. It was
    really
  • 23:28 - 23:29
    good.
  • 23:29 - 23:30
    H.K.: Thanks.
  • 23:30 - 23:31
    QUESTION: Is there a way that
  • 23:31 - 23:35
    you would recommend to test if you have done
  • 23:35 - 23:38
    threading properly or not? I mean, I know,
    bugs
  • 23:38 - 23:38
    that come out-
  • 23:38 - 23:39
    H.K.: Right.
  • 23:39 - 23:39
    QUESTION: Like I have
  • 23:39 - 23:42
    written bugs that come out of badly written,
    you
  • 23:42 - 23:44
    know, not thread safe code, as.
  • 23:44 - 23:45
    H.K.: So-
  • 23:45 - 23:47
    QUESTION: Like, ?? [00:23:46] so, you catch
    them.
  • 23:47 - 23:52
    H.K.: At least, my opinion, and a lot of people
    have
  • 23:52 - 23:54
    done research in this area, their opinion
    also is
  • 23:54 - 23:58
    that it's not possible to write tests against
    multi
  • 23:58 - 24:00
    threaded code where there is shared data.
    Because it's
  • 24:00 - 24:04
    nondeterministic and nonrepeatable. The kind
    of results you get,
  • 24:04 - 24:07
    you can only test it against a heuristic.
    For
  • 24:07 - 24:09
    example, if you have a deterministic use case
    at
  • 24:09 - 24:12
    the top level, you can probably test it against
  • 24:12 - 24:14
    that. But exact test cases can never be written
  • 24:14 - 24:16
    for this.
  • 24:16 - 24:19
    V.O.: Any more questions?
  • 24:19 - 24:26
    H.K.: Cool. All right. Thank you so much.
Title:
Garden City Ruby 2014 - Ruby memory Model by Hari Krishnan
Description:

more » « less
Duration:
24:56

English subtitles

Revisions