Garden City Ruby 2014 - Ruby memory Model by Hari Krishnan

Edit subtitles

0:25 - 0:26

HARY KRISHNAN: So, thank you very much
0:26 - 0:28

for being here on a Saturday evening, this
late.
0:28 - 0:30

My talk got pushed to the last, but I
0:30 - 0:35

appreciate you being here, first. My name's
Hari. I
0:35 - 0:37

work at MavenHive. So this is a talk about
0:37 - 0:44

Ruby memory model. So before I start, how
many
0:44 - 0:47

of you have heard about memory model and know
0:47 - 0:52

what it is? Show of hands, please. OK. Let's
0:52 - 0:55

see where this talk goes. So why I did
0:55 - 0:59

I come up with this talk topic. So I
0:59 - 1:02

started my career with Java, and I spent a
1:02 - 1:05

lot many years with Java, and Java has a
1:05 - 1:09

very clearly documented memory model. And
it kind of
1:09 - 1:10

gets to you because with all that, you don't
1:10 - 1:14

feel safe enough doing multi-threaded programming
at all. So
1:14 - 1:18

with Ruby, we've always been talking about,
you know,
1:18 - 1:21

doing multi-process for multi-process parallelism,
1:21 - 1:24

rather than multi-threaded parallelism,
1:24 - 1:29

even though the language actually supports,
you know, multi-threading
1:29 - 1:31

semantics. Of course we know it's called single-threaded
and
1:31 - 1:34

all that, but I just got curious, like, what
1:34 - 1:36

is the real memory model behind Ruby, and
I
1:36 - 1:39

just wanted to figure that out. So this talk
1:39 - 1:42

is all about my learnings as I went through,
1:42 - 1:46

like, various literatures, and figured out,
and I tried
1:46 - 1:48

to combine, like, get a gist of the whole
1:48 - 1:51

thing. And cram it into some twenty minutes
so
1:51 - 1:52

that I could, like, probably give you a very
1:52 - 1:56

useful session, like, from which you can further
do
1:56 - 2:01

more digging on this, right. So when I talked
2:01 - 2:03

to my friends about memory model, the first
thing
2:03 - 2:06

that comes up to their mind is probably this
2:06 - 2:10

- heap, heap, non-heap, stack, whatever. I'm
not gonna
2:10 - 2:14

talk about that. I'm not gonna talk about
this
2:14 - 2:17

either. It's not about, you know, optimizing
your memory,
2:17 - 2:21

or search memory leeks, or garbage collection.
This talk
2:21 - 2:23

is not about that either. So what the hell
2:23 - 2:27

am I gonna talk about? First, a quick exercise.
2:27 - 2:31

So let's start with this and see where it
2:31 - 2:36

goes. Simple code. Not much to process late
in
2:36 - 2:39

the day. There's a shared variable called
'n', and
2:39 - 2:42

there are thousand threads over that, and
each of
2:42 - 2:45

those threads want to increment that shared
variable hundred
2:45 - 2:49

times, right. And what is the expected output?
I'm
2:49 - 2:51

not gonna question you, I'm just gonna give
it
2:51 - 2:55

away. It's 100,000. It's fairly straightforward
code. I'm sure
2:55 - 2:57

all of you have done this, and it's no
2:57 - 3:02

big deal. So what's the real output? MRI is
3:02 - 3:05

very faithful, it gives you what you expected.
100,000,
3:05 - 3:09

right. So what happens next? I'm running it
on
3:09 - 3:13

Rubinius. This is what you see. And it's always
3:13 - 3:16

going to be a different number every time
you
3:16 - 3:19

run it. And that's JRuby. It gives you a
3:19 - 3:23

lower number. Some of you may be guessing
already,
3:23 - 3:24

and you probably know it, why it gives you
3:24 - 3:28

a lower number. So why all this basic stupid
3:28 - 3:31

code and some stupid counter over here, right?
So
3:31 - 3:34

I just wanted to get a really basic example
3:34 - 3:36

to explain the concept of increment is not
a
3:36 - 3:40

single instruction, right. The reason why
I'm talking about
3:40 - 3:43

this is, I love Ruby because the syntax is
3:43 - 3:47

so terse, and it's so simple, it's so readable,
3:47 - 3:49

right. But it does not mean every single instruction
3:49 - 3:52

on the screen is going to be executed straight
3:52 - 3:55

away, right. So at least, to my junior self,
3:55 - 3:57

this is the first advice I would give, when
3:57 - 4:01

I started, you know, multi-threaded programming.
So at least
4:01 - 4:06

three steps. Lowered increments store, right.
That's, even further,
4:06 - 4:10

really simple piece of code like, you know,
a
4:10 - 4:13

plus equals to, right. So this is what we
4:13 - 4:16

really want to happen. You have a count, you
4:16 - 4:18

lowered it, you increment it, you stored it.
Then
4:18 - 4:21

the next thread comes along. It lowers it,
increments
4:21 - 4:23

it, stores it. You have the next result which
4:23 - 4:26

is what you expect, right. But we live in
4:26 - 4:28

a world where threads don't want to be our
4:28 - 4:31

friend. They do this. One guy comes along,
reads
4:31 - 4:34

it, increments it. The other guy also reads
the
4:34 - 4:37

older value, increments it. And both of them
go
4:37 - 4:40

and save the same value, right. So this is
4:40 - 4:42

a classic case of lost update. I'm sure most
4:42 - 4:44

of you have seen it in the database world.
4:44 - 4:47

But this pretty much happens a lot in the
4:47 - 4:49

multi-threading world, right. But why did
it not happen
4:49 - 4:52

with MRI? And what did you see the right
4:52 - 4:53

result?? [00:04:52]? That, I'm sure a lot
of you
4:53 - 4:56

know, but let's step, let's part that question
and
4:56 - 5:00

just move a little ahead. So, as you observed
5:00 - 5:04

earlier, a lot of reordoring happening in
instructions, right.
5:04 - 5:07

Like, the threads were context-switching,
and they were reordering
5:07 - 5:11

statements. So where does this reordering
happen? Reordering can
5:11 - 5:15

happen at multiple levels. So start from the
top.
5:15 - 5:18

You have the compiler, which can do simple
optimizations
5:18 - 5:21

like look closer?? [00:05:20]. Even that can
change the
5:21 - 5:24

order of your statements in your code, right.
Next,
5:24 - 5:28

when the code gets translated to, you know,
machine-level
5:28 - 5:31

language, goes to core, and your CP cores
are
5:31 - 5:34

at liberty, again, to reorder them for performance.
And
5:34 - 5:37

next comes the memory system, right. The memory
system
5:37 - 5:40

is like the combined global memory, which
all the
5:40 - 5:42

CPUs can read, and also they're individual
caches. But
5:42 - 5:46

why do CPUs have caches? They want to, memory
5:46 - 5:48

is slow, so they want to load, reload all
5:48 - 5:50

the values, refactor it, keep it in the cache,
5:50 - 5:53

again improve performance. So even the memory
system can
5:53 - 5:56

conspire against you and reorder the loads
and stores
5:56 - 5:59

after the memory registers. And that can cause
reordering,
5:59 - 6:03

right. So this is really, really crazy. Like,
I'm
6:03 - 6:08

a very stupid programmer, who works at the
programming
6:08 - 6:11

language level. I don't really understand
the structure of
6:11 - 6:13

the hardware and things like that. So how
do
6:13 - 6:16

I keep myself abstracted from all this, you
know,
6:16 - 6:22

really crazy stuff? So that's essentially
a memory model.
6:22 - 6:24

So what, what is a memory model? A memory
6:24 - 6:27

model describes the interactions of threads
through memory and
6:27 - 6:29

their shared use of data. So this is straight
6:29 - 6:31

out of Wikipedia, right. So if you just read
6:31 - 6:35

it first, either you're gonna think it's really
simple,
6:35 - 6:38

and probably even looks stupid, but otherwise
you might
6:38 - 6:41

not even understand. So I was the second category.
6:41 - 6:44

So what does this all mean? So when there
6:44 - 6:49

are so many complications with the reordering,
the reads
6:49 - 6:51

and writes of memory and things like that,
as
6:51 - 6:55

a programmer you need certain guarantees from
the programming
6:55 - 6:57

language, and the virtual machine you're working
on top
6:57 - 7:01

of, to say this is how multi-threaded shared,
I
7:01 - 7:04

mean, multi-threaded access to shared memory
is going to
7:04 - 7:06

work. These are the basic guarantees and these
are
7:06 - 7:09

the simple rules of how the system works.
So
7:09 - 7:13

you can reliably work code against that, right.
So
7:13 - 7:15

in, in effect, a memory model is just a
7:15 - 7:21

specification. Any Java programmers here,
in the house? Great.
7:21 - 7:26

So how many of you know about JSR 133?
7:26 - 7:31

The memory model, double check locking - OK.
Some
7:31 - 7:37

people. Single term issue? OK - some more
hands.
7:37 - 7:40

So Java was the first programming language
which came
7:40 - 7:43

up with a concept called memory model, right.
Because,
7:43 - 7:46

the first thing is, right ones?? [00:07:45]
won't run
7:46 - 7:48

anywhere. It had to be predictable across
platforms, across
7:48 - 7:52

reimplementations, and things like that. So
the, there had
7:52 - 7:55

to be a JSR which specified what is the
7:55 - 7:57

memory model that it can code against so that
7:57 - 8:02

your multi-threaded code works predictably,
and deterministically across platforms
8:02 - 8:09

and across virtual machines. Right? So essentially
that's where
8:09 - 8:11

my, you know, whole thing started. I had gone
8:11 - 8:15

through the Java memory model, and was pretty
much
8:15 - 8:17

really happy that someone had taken the pain
to
8:17 - 8:19

write it down in clear terms so that you
8:19 - 8:26

don't have to worry about multi-threading.
Hold on, sorry.
8:28 - 8:35

Sorry about that. Cool. So. Memory model gives
you
8:35 - 8:41

rules at three broad levels. Atomicity, visibility
and ordering.
8:41 - 8:43

So atomicity is as simple as, you know, variable
8:43 - 8:47

assignment. Is a variable assignment an indivisible
unit of
8:47 - 8:50

work, or not? The rules around that, and it
8:50 - 8:52

also talks about rules around, can you assign
hashes,
8:52 - 8:55

send arrays indivisibly and things like that.
These rules
8:55 - 8:58

can change based on every alligned version,
and things
8:58 - 9:02

like that. Next is visibility. So in that
example
9:02 - 9:05

which you talked about, I mean, we saw two
9:05 - 9:07

threads trying to read the same value. Essentially
they
9:07 - 9:09

are spying on each other. And it was not
9:09 - 9:12

clear at what point the data had to become
9:12 - 9:15

visible to each of those threads. So essentially
visibility
9:15 - 9:18

is about that. And that is ensured through
memory
9:18 - 9:22

barriers and ordering, which is the next thing.
So
9:22 - 9:25

ordering is about how the loads and stores
are
9:25 - 9:29

sequenced, or, you know, let's say you want
to
9:29 - 9:31

write a piece of code, critical section as
you
9:31 - 9:33

call it. And you don't want the compiler to
9:33 - 9:36

do any crazy things to improve performance.
So you
9:36 - 9:38

say, I make it synchronized, and it has to
9:38 - 9:40

behave in a, behave in a nice serial?? [00:09:40]
9:40 - 9:45

manner. So that ?? manner is ensured by ordering.
9:45 - 9:48

Ordering is a really complex area. It talks
about
9:48 - 9:51

causality, logical clocks and all that. I
won't go
9:51 - 9:54

into those details. But I've been worrying
you with
9:54 - 9:58

all this, you know, computer science basics
and all
9:58 - 10:00

this. Why the hell am I talking about it
10:00 - 10:02

in a Ruby conference? Ruby is single-threaded,
anyway. Why
10:02 - 10:06

the hell should I care about it, right? OK.
10:06 - 10:09

Do you really think languages like Ruby are
thread
10:09 - 10:15

safe? Show of hands, anyone? So thread safety,
I'm
10:15 - 10:19

talking only about Ruby - maybe Python. GIL
based
10:19 - 10:26

languages. Are they thread safe? No? OK. In
fact
10:26 - 10:31

they're not. Having single-threaded does not
mean it's thread-safe,
10:31 - 10:34

right. Threads can switch context, and based
on how
10:34 - 10:36

the language has been implemented and how
often the
10:36 - 10:39

threads can switch context, and at what point
they
10:39 - 10:44

can switch, things can go wrong, right. And
another
10:44 - 10:46

pretty popular myth - I don't think many people
10:46 - 10:49

believe it here, in this audience at least.
I
10:49 - 10:52

don't have concurrency problems because I'm
running on single
10:52 - 10:56

core. Not true. Again, threads can switch
context and
10:56 - 10:59

run on the same core and still have dirty
10:59 - 11:03

reads and things like that. So concurrency
is all
11:03 - 11:06

about interleavings, right. Again, goes back
to reordering. I
11:06 - 11:08

think I've been talking about this too often.
And
11:08 - 11:12

let's not, again, worry with that. It's about
interleavings.
11:12 - 11:16

We'll leave it at that. So let's, before we
11:16 - 11:19

understand more about, you know, the memory
model and
11:19 - 11:21

what it has to do with Ruby, let's just
11:21 - 11:25

understand a little bit about threading in
Ruby. So
11:25 - 11:28

all of you know, green threads, as of 1.8,
11:28 - 11:31

there was only one worse thread, which was
being
11:31 - 11:35

multiplexed with multiple Ruby threads, which
were being scheduled
11:35 - 11:39

on it through global interpreter lock. 1.9
comes along,
11:39 - 11:41

there is a one to one mapping between the
11:41 - 11:44

Ruby thread and OS thread, but still the Ruby
11:44 - 11:47

thread cannot use the OS thread unless it
has
11:47 - 11:51

the global VM lock as its call now. The
11:51 - 11:56

JVL acquire. So does having a Global Interpreter
Lock
11:56 - 12:01

make you thread safe? It depends. It does
make
12:01 - 12:03

you thread safe in a way, but let's see.
12:03 - 12:05

So how does GIL work? This is a very
12:05 - 12:09

simplistic representation of how GIL works.
So you have
12:09 - 12:12

two threads here. One is already holding the
GIL.
12:12 - 12:16

So it's, it's working with the OS thread.
And
12:16 - 12:19

now when there is another thread waiting on
it,
12:19 - 12:21

waiting on the GIL to do its work, it
12:21 - 12:23

sends a, it wakes up the timer thread. Time
12:23 - 12:27

thread is, again, another Ruby thread. The
timer thread
12:27 - 12:30

now goes and interrupts the thread holding
the GIL,
12:30 - 12:32

and if the GIL, if the thread holding the
12:32 - 12:35

GIL is done with whatever it's doing - I'll
12:35 - 12:37

get to it in a bit - it just
12:37 - 12:40

releases the lock, and now thread two can
take
12:40 - 12:43

over and do its thing. Well this is the
12:43 - 12:48

basic working that at least I understood about
GIL.
12:48 - 12:50

But there are details to this, right. It's
not
12:50 - 12:57

as simple as what we saw. So, when you
12:58 - 13:01

initialize a thread, or create a thread in
Ruby,
13:01 - 13:03

you pass it a block of code. So how
13:03 - 13:06

does that work? You take a block of code,
13:06 - 13:08

you put it inside the thread. What the thread
13:08 - 13:10

does is usually it acquires a JVL and a
13:10 - 13:14

block?? [00:13:11]. It executes the block
of code. It
13:14 - 13:17

releases the, returns and releases the lock,
right. So
13:17 - 13:19

essentially this is how it works. So during
that
13:19 - 13:22

period of executation of the block, no other
thread
13:22 - 13:24

is allowed to work. So that makes you almost
13:24 - 13:28

thread safe, right? But not really. If that's
how
13:28 - 13:31

it's going to work, what if that thread is
13:31 - 13:34

going to hog the GIL, and not allow any
13:34 - 13:36

other thread to work? So there has to be
13:36 - 13:38

some kind of lock fairness, right. So that's
where
13:38 - 13:41

the timer thread comes in and interrupts it.
OK.
13:41 - 13:43

Does that mean the thread holding the GIL
immediately
13:43 - 13:45

gives it up, and says here you go, you
13:45 - 13:49

can start and work with it? Not really. Again
13:49 - 13:51

the thread holding the GIL will only release
the
13:51 - 13:54

GIL if it is at a context to its
13:54 - 13:57

boundary. What that is, is fairly complicated.
I don't
13:57 - 14:00

want to go into the details. I think people
14:00 - 14:03

who here know a lot better C than me,
14:03 - 14:05

and are deep C divers really, they can probably
14:05 - 14:09

tell you, you know, how, at what the GIL
14:09 - 14:11

can get released. If a C thread, a C
14:11 - 14:13

code makes a call to Ruby code, can it
14:13 - 14:15

or can it not release the GIL? All those
14:15 - 14:18

things are there, right. So all these complexities
are
14:18 - 14:21

really, really hard to deal with. I came across
14:21 - 14:25

this blog by Jesse Storimer. It's excellent
and I
14:25 - 14:27

strongly encourage you to go through the two-part
blog
14:27 - 14:31

about, you know, nobody understands GIL. It's
really, really
14:31 - 14:34

important, if you're trying to do any sort
of
14:34 - 14:40

multi-threaded programming in Ruby. So do
you still think
14:40 - 14:43

Ruby is thread safe because it's got GIL?
I'm
14:43 - 14:49

talking about MRI, essentially. So the thing
is, we
14:49 - 14:52

can't depend on GIL, right. GIL is not documented
14:52 - 14:54

anywhere that this is exactly how it works.
This
14:54 - 14:56

is when the timer thread wakes up. These are
14:56 - 14:59

the time slices alotted to the thread acquiring
the
14:59 - 15:03

JVL. There is no documentation around at what
point
15:03 - 15:05

the GIL can be released, can it not be
15:05 - 15:07

released, and things like that. There's no,
it's not
15:07 - 15:10

predictable, and if you depend on it, what
could
15:10 - 15:13

also happen is even within MRI, when you're
moving
15:13 - 15:16

from version to version, if something changes
in GIL,
15:16 - 15:22

your code with behave nondeterministically.
And what about language
15:22 - 15:25

in Ruby implementations that don't even have
a GIL?
15:25 - 15:27

So obviously that's the big problem, right.
If you
15:27 - 15:30

write a gem or something which has to be
15:30 - 15:32

multi-threaded, and if you're depending on
the GIL to
15:32 - 15:35

do its thing to keep you safe, then obviously
15:35 - 15:39

it cannot work on Rubinius and JRuby. Let
that
15:39 - 15:41

alone, even, even if you give that up, even
15:41 - 15:44

with MRI, it's not entirely correct to say
that
15:44 - 15:47

you're thread safe, because there is a GIL
that
15:47 - 15:53

will ensure that only one thread is running.
So
15:53 - 15:55

what did I find out? Ruby really does not
15:55 - 15:57

have a documented memory model. It's pretty
much similar
15:57 - 16:00

to Python. It doesn't have a clearly documented
memory
16:00 - 16:05

model. What is the implication of that? So
as
16:05 - 16:08

I mentioned previously, a memory model is
like a
16:08 - 16:11

specification. This is exactly how the system
has to
16:11 - 16:15

provide a certain minimum guarantee to the
users of
16:15 - 16:18

the language, right, regarding multi threaded
access to shared
16:18 - 16:22

memory. Now, basically if I don't have a written
16:22 - 16:24

down memory model, and I am going to write
16:24 - 16:27

a Ruby implementation to model, I have the
liberty
16:27 - 16:30

to choose whatever memory model I want. So
the
16:30 - 16:33

code, if you're writing against MRI, may not
essentially
16:33 - 16:37

work right on my, you know, my implementation
of
16:37 - 16:41

Ruby. That's the big implication, right. So
Ruby right
16:41 - 16:46

now depends on underlying virtual machines.
Even after ER,
16:46 - 16:48

you have bad code compilations, so even MRI
is
16:48 - 16:51

almost like a VM. So that has no specification
16:51 - 16:53

for a memory model, but it does have something,
16:53 - 16:55

right, internally. If you have to go through
the
16:55 - 16:58

C code and understand. It's not guaranteed
to remain
16:58 - 17:01

the same from version to version, as I understand,
17:01 - 17:05

right. And obviously JRuby and Rubinius, they
depend on
17:05 - 17:08

JVM and LLVM respectively. And they all have
a
17:08 - 17:12

clearly documented memory model. You could
have a read
17:12 - 17:15

at it. And the only thing is, if Ruby
17:15 - 17:18

had an implementation - sorry, a specification
for a
17:18 - 17:22

memory model, it could be, you know, implemented
using
17:22 - 17:28

the constructs available on JVM and LLVM.
But this
17:28 - 17:29

is what we have. We don't have much to
17:29 - 17:33

do. What do we do under the circumstances?
We
17:33 - 17:37

have to engineer our code for thread safety.
We
17:37 - 17:40

can't bask under the safety that, there is
a
17:40 - 17:42

GIL and so it's going to help me keep
17:42 - 17:45

my code thread safe. So even I can write
17:45 - 17:48

multiple, you know, multi threaded code without
actually worrying
17:48 - 17:51

about serious synchronization issues and things
like that. It's
17:51 - 17:54

totally not the right thing to do. I think
17:54 - 17:57

any which way, Ruby is a language I love,
17:57 - 18:00

and I'm sure all of you love, so. And
18:00 - 18:03

it's progressing my leaps and bounds, and
eventually we're
18:03 - 18:05

going to write more and more complex systems
with
18:05 - 18:09

Ruby. And who knows, we might have true parallelism
18:09 - 18:14

very soon, right. So why, still, stay in the
18:14 - 18:17

same mental block that we don't want to write,
18:17 - 18:20

you know, thread safe code that's anyway single
threaded.
18:20 - 18:22

We might as well get into the mindset of
18:22 - 18:26

writing proper thread safe code, and try and
probably
18:26 - 18:30

come up with a memory model, right. But I
18:30 - 18:32

think for now we just start engineering code
for
18:32 - 18:37

thread safety. Simple Mutex, I'm sure all
of you
18:37 - 18:40

know, but it's really, really important for
even a
18:40 - 18:44

stupid operation like a plus equals two. So
simple
18:44 - 18:47

things which are noticed in Ruby code bases
and
18:47 - 18:51

Rails code bases as well, like generally,
is, there
18:51 - 18:53

is like a synchronized, you know, a section
of
18:53 - 18:56

the code has lots of synchronization and everything.
It's
18:56 - 18:59

really safe. But we leave an innocent accessor
lying
18:59 - 19:01

around, and that causes a lot of, you know,
19:01 - 19:04

pain, like debugging those issues. And general
issues like,
19:04 - 19:08

you know, state mutations, inside methods
is really a
19:08 - 19:10

bad idea. So if you're looking for issues
around
19:10 - 19:12

multi threading, this might be a good place
to
19:12 - 19:14

start. So I just listed a few of them
19:14 - 19:16

here. I didn't want to make a really dense
19:16 - 19:19

talk with all the details. You can always
catch
19:19 - 19:21

me offline and I can tell you some of
19:21 - 19:24

my experiences and probably even listen to
you and
19:24 - 19:26

learn from you about some of the issues that
19:26 - 19:29

we can solve by actually writing proper thread
safe
19:29 - 19:33

code in Ruby. I came across a few gems
19:33 - 19:35

which were really, really nice. Both of them
happen
19:35 - 19:39

to be written by headius. The first one is
19:39 - 19:41

atomic. Atomic is almost trying to give you
the
19:41 - 19:45

similar constructs like the Java utility concurrent
package. It
19:45 - 19:51

tries to, it's kind of compatible across MRI,
JRuby,
19:51 - 19:54

and Rubinius, which is also a really nice
thing.
19:54 - 19:57

So you have atomic integers and atomic floats,
which
19:57 - 20:00

do increments actually in an atomic way, which
is
20:00 - 20:02

excellent. And then there is thread_safe library,
which also
20:02 - 20:05

has a few thread safe data structures. I'm
trying
20:05 - 20:07

to play around with these libraries right
now, but
20:07 - 20:09

they may be a good, you know, starting point
20:09 - 20:11

if you are trying to do higher level constructs
20:11 - 20:16

for concurrency. And that's pretty much it.
I'm open
20:16 - 20:22

to take questions. Thank you. And before anything
I
20:22 - 20:23

really would like to thank you all, again
for
20:23 - 20:27

being here for the talk, and thank the GCRC
20:27 - 20:31

organizers, you know, they've done a great
job with
20:31 - 20:38

this conference. A big shout out to them.
20:46 - 20:47

V.O.: Any questions?
20:47 - 20:47

H.K.: Yeah?
20:47 - 20:47

QUESTION: Hey.
20:47 - 20:47

H.K.: Hi.
20:47 - 20:48

QUESTION: If, for example, if a Ruby code
is running
20:48 - 20:52

in the JVM, in JRuby, how does, because none
20:52 - 20:54

of the Ruby code is written in a thread
20:54 - 20:57

safe way. How do, how does it internally manage
20:57 - 20:59

- does it actually, yeah, yesterday Yogi talked
about
20:59 - 21:01

the point that ActiveRecord is not actually
thread safe.
21:01 - 21:04

Can you explain it in detail like in a
21:04 - 21:04

theoretical way?
21:04 - 21:07

H.K.: OK. What is thread safety in
21:07 - 21:09

general, right? Thread safety is about how
the data
21:09 - 21:13

is consistently maintained after multi-threaded
access to that shared
21:13 - 21:17

data, right. So Ruby essentially has a GIL
because
21:17 - 21:20

internal implementations are not thread safe,
right. That's why
21:20 - 21:22

you want to have a GIL to protect you
21:22 - 21:26

from those problems. But as far as JRuby is
21:26 - 21:29

concerned, or Rubinius is concerned, the implementation
itself is
21:29 - 21:32

not written in C. JRuby is written in Ruby
21:32 - 21:34

again, I mean JRuby itself, and Rubinius is
written
21:34 - 21:38

in Ruby. And some of these actual internal
constructs
21:38 - 21:41

are thread safe when compared to MRI. I haven't
21:41 - 21:43

actually taken a look in detail into the code
21:43 - 21:48

of these code bases, but if they are implemented
21:48 - 21:50

properly, you can be thread safe - internally,
at
21:50 - 21:53

least - so, which means, the base code of
21:53 - 21:56

JRuby itself might be thread safe. It's only
not
21:56 - 21:58

thread safe because the gems on top of it,
21:58 - 22:01

which are trying to run. They may have, like,
22:01 - 22:05

thread safety issues, right. Does that answer
your question,
22:05 - 22:06

like, or- ?
22:06 - 22:08

QUESTION: About thread safety?? [00:22:09].
22:08 - 22:12

H.K.: Sure, sure. So those gems will not work.
That's
22:12 - 22:14

the point. Like what I want to convey here,
22:14 - 22:17

is whatever gems we are offering, and whatever
code
22:17 - 22:19

we are writing, we might get it - it's
22:19 - 22:20

a good idea to get into the habit of
22:20 - 22:23

writing thread safe code, so that we can actually
22:23 - 22:25

encourage a truly parallel Ruby, right. We
don't, we
22:25 - 22:28

don't have to stay in the same paradigm of
22:28 - 22:32

OK we have to be single threaded.
22:32 - 22:37

QUESTION: So Mutex based thread management
is one way.
22:37 - 22:40

There's also like actors and futures and things
like that.
22:40 - 22:42

And there's a gem called cellulite-
22:42 - 22:43

H.K.: Yup.
22:43 - 22:45

QUESTION: That, combined with something called
Hamster,
22:45 - 22:46

which makes everything immutable-
22:46 - 22:47

H.K.: Yup.
22:47 - 22:48

QUESTION: Is another way to do it.
22:48 - 22:48

H.K.: Yup.
22:48 - 22:49

QUESTION: Have you done it or like,
22:49 - 22:50

what's your experience with that?
22:50 - 22:53

H.K.: Yeah, I have tried out actors, with
revactor,
22:53 - 22:54

and lockless concurrency is
22:54 - 22:57

something I definitely agree is a good idea.
But
22:57 - 23:01

I'm specifically talking about, you know,
lock-based concurrency, like,
23:01 - 23:05

Mutex-based concurrency. This area is also
important because it's
23:05 - 23:08

not like thread mutable state is bad. It is,
23:08 - 23:11

it is actually applicable in certain scenarios.
When we
23:11 - 23:13

are working in this particular paradigm, we
still need
23:13 - 23:19

the safety of a memory model. Any other questions?
23:19 - 23:26

QUESTION: Thanks for the talk Hari. It was
really
23:28 - 23:29

good.
23:29 - 23:30

H.K.: Thanks.
23:30 - 23:31

QUESTION: Is there a way that
23:31 - 23:35

you would recommend to test if you have done
23:35 - 23:38

threading properly or not? I mean, I know,
bugs
23:38 - 23:38

that come out-
23:38 - 23:39

H.K.: Right.
23:39 - 23:39

QUESTION: Like I have
23:39 - 23:42

written bugs that come out of badly written,
you
23:42 - 23:44

know, not thread safe code, as.
23:44 - 23:45

H.K.: So-
23:45 - 23:47

QUESTION: Like, ?? [00:23:46] so, you catch
them.
23:47 - 23:52

H.K.: At least, my opinion, and a lot of people
have
23:52 - 23:54

done research in this area, their opinion
also is
23:54 - 23:58

that it's not possible to write tests against
multi
23:58 - 24:00

threaded code where there is shared data.
Because it's
24:00 - 24:04

nondeterministic and nonrepeatable. The kind
of results you get,
24:04 - 24:07

you can only test it against a heuristic.
For
24:07 - 24:09

example, if you have a deterministic use case
at
24:09 - 24:12

the top level, you can probably test it against
24:12 - 24:14

that. But exact test cases can never be written
24:14 - 24:16

for this.
24:16 - 24:19

V.O.: Any more questions?
24:19 - 24:26

H.K.: Cool. All right. Thank you so much.

Title:: Garden City Ruby 2014 - Ruby memory Model by Hari Krishnan
Description:: more » « less
Duration:: 24:56

Amara Bot edited English subtitles for Garden City Ruby 2014 - Ruby memory Model by Hari Krishnan

English subtitles

Revisions

Revision 1 Imported

Amara Bot

Garden City Ruby 2014 - Ruby memory Model by Hari Krishnan

Revisions

Our website uses cookies

Operating cookies (Required)