-
Yeah, so, hi everyone.
-
Like Swanand said, I'm also part of the team
that's organized this,
-
so I want to just say it's really awesome
to see, like,
-
this huge a crowd on a Friday morning in Banglore.
-
So, what I'm speaking about today is native
extensions in Ruby.
-
The title of my talk is Native Extensions
Served 3 Ways.
-
About myself: I'm Tejas, Tejas Dinkar.
-
I am a partner at Nilenso software, which
I am an employee on
-
collective and it's a really fun place to
work.
-
If you want to know more, catch me tomorrow.
-
I'm on Twitter. I am tdinkar, and on GitHub
I am GJA.
-
You can find most of my OpenSource contributions
over there.
-
So, about my talk, this is actually a pretty
technical talk,
-
so expect to see lots of code. I hope to have
five minutes for questions.
-
I'll try to beat ?? and beg for it (?? @ 00:01:18).
-
But if I don't have time, just catch me in
the hallway
-
and I can try to answer whatever I can.
-
So I'll mostly be covering C extensions, FFI
and Swig.
-
Let's talk about first why you would ever
want to build
-
a native extension for Ruby. There's a bunch
of different reasons.
-
Number one is to maybe integrate with new
libraries.
-
Say like a new database has dome out, like,
for example LibDrizzle,
-
which is a new library that came out to work
with
-
MySQL in the Drizzle database.
-
You might want to port that over to Ruby.
-
You might want to improve performance of critical
code.
-
There's different ways of doing this, of course.
-
You could try doing JRuby, look at different
caches,
-
but somethings you have an algorithm that
you
-
just want to implement in native code,
-
or there's already a great library that implements
it.
-
Someone has given me an example of CSAT,
-
which I think Bundler could have actually
used,
-
or could use, to kind of resolve gem deficiencies,
-
and that's written in C++.
-
Sometimes you wanna just move that there to
-
improve performance, and of course you want
to
-
write code that works, of course different
languages,
-
and in general it's a lot of fun.
-
You know like, real hackers program in C
-
and all that stuff.
-
So you could just feel super elite by doing
this.
-
So before we talk about native extensions,
-
I'm just gonna take a small segue and ask-
-
So let's talk about Python a little bit.
-
Like how many people in the house are like,
are Pythonistas?
-
How many big Python fans?
-
Well, that's actually surprisingly small.
-
I thought there would be more.
-
So I've figured out the best way to write
Python code,
-
and I'm gonna tell you this right now and-
-
Yes, of course, I am trolling you.
-
So over here I have a Python interpreter open
-
and I'm gonna say import Ruby.
-
Yeah, that seems innocent enough.
-
I'm gonna say Ruby dot eval foobar dot size.
-
Hmm, that seems to return six.
-
So that seems to work. Let's try something
more complicated.
-
I'm gonna define a method, factorial, and,
def factorial,
-
partial, N equals zero, partial, factorial,
N minus one,
-
et cetera.
-
Yes it's still recursive, I know.
-
And finally I'm gonna call it,
-
and I'm gonna say Ruby dot eval factorial
five.
-
Yet again you'll see that I actually get the
result, right.
-
So wow, what's this? Have I implemented Python
in Ruby?
-
Let's look at the code which actually makes
this happen.
-
How can I actually make this work?
-
Well, surprisingly, it's about a dozen lines
of code.
-
I know this might be a bit hard to read from
the back,
-
so I'm just gonna read out the interesting
bits line by line.
-
So it's about a dozen lines of code,
-
but let's remove all the Python stuff and
let's just
-
look at the Ruby parts of this. It's even
a lot less,
-
yeah, so let's go through this one by one.
-
I start off my including Ruby dot H.
-
Ruby dot H, most of you who are familiar
-
with C and C++ dot H files, are hetero files.
-
They basically have the definitions of various
-
constructs that Ruby or, like, whatever,
-
that your library exposes, so that you can,
-
so that your compiler knows what definitions
exist in the first place.
-
So any Ruby extension that you write,
-
you'll need to include Ruby dot H,
-
and most of the time this is completely sufficient.
-
You don't need to include anything else.
-
So let's start looking at the actual code.
-
I have a method called Python Ruby eval.
-
Yeah, and it accepts one parameter,
-
which is a string. OK, nothing complicated
here.
-
What I do next is I take that string and
-
I pass it to a function called R-B eval string,
-
right, which stands for Ruby eval string.
-
Nothing really magical over here.
-
All I'm doing is I'm calling Ruby's eval,
-
and you'll notice that the R-B eval returns
a object of the type value.
-
This value is actually very important.
-
The same way in Ruby every single object inherits
from object,
-
the corresponding construct in the CRuby extensions
is the value object.
-
Value object is used to represent every single
Ruby object
-
from nil to true to false to every single
custom object.
-
Fine, so I have this value object. What am
I gonna do next?
-
Well, what I'm gonna do is I'm gonna
-
switch on the type of the object.
-
Yeah, there's a macro called type, and if
it's a fix num,
-
fix nums are numbers in Ruby,
-
I'm gonna call fix num to N,
-
to just convert that to an integer.
-
If it's a string I'm gonna call string value
pointer,
-
which converts from a Ruby string into a C
string.
-
And if it's some other type, like if it's
an array, yeah,
-
I could have implemented this,
-
but yeah I'm too lazy right now,
-
so I'm just gonna say return none,
-
which is the Python version of nil.
-
So yeah that's pretty much it.
-
That's all it really took to kind of build
-
a small kind of Ruby module which would extend
-
some of the functionality of Ruby into Python.
-
Let's look at something that's like another
construct.
-
I don't want to build all my quota
-
and I don't want to eval all of it.
-
Let's say I want to require a file.
-
It's pretty much the same.
-
All I need to do is, I accept the file
-
and I do R-B require on that file, yeah,
-
so in general, yay!
-
Actually in that twelve lines of code
-
you really have built your first Ruby extension
-
and your first Python extension.
-
So what I'm really trying to call out,
-
is it really is very simple, like,
-
as Ruby developers we always have a lot of
fears,
-
like, oh this very simple thing in Ruby.
-
How could I even do it in a C extension?
-
It turns out that the Ruby C extensions are
great,
-
because they expose almost everything
-
you would ever want to do in Ruby,
-
it exposes the same thing in C.
-
So let's look at, what are,
-
why aren't there more,
-
why aren't we doing it more?
-
Well, the biggest common fear every time
-
somebody mentions C through Ruby or in general
-
hello program, what about memory allocation?
-
Like how do I handle this?
-
Well, as it turns out, it's really not as
difficult
-
as you might think, and since you are still
programming
-
in the Ruby world, you actually have
-
a lot of things that can actually help you.
-
In particular there are two, there are these
two macros, right.
-
The first one basically takes a C pointer
and stuffs it inside a Ruby object.
-
You just tell it which class you want that
Ruby object
-
to be in and it will magically be created
with your pointer.
-
And the last, the second one gets the pointer.
-
What actually happens internally is that your
memory
-
that's allocated has been tied to this Ruby
object,
-
and when this Ruby object gets garbage collected,
-
so does your pointer.
-
So in many ways you're basically just re-using
-
the Ruby's GC to build a, you know,
-
to manage your native code as well.
-
Right, no batteries included is the next big
fear.
-
But just keep in mind that since you are
-
programming in the Ruby extension,
-
in Ruby extensions with C,
-
you actually have access to every single
-
basic functionality that Ruby can provide
you.
-
There are methods to manipulate arrays,
-
strings, hashes, you name it.
-
It's all very easy to manipulate even in the
C extension.
-
And of course portability.
-
I have no idea what this comic is about,
-
it was on Geek, but it was the first thing
-
that I found for portability.
-
So most C extensions work only in MRI,
-
except they sort of work in Ruby NS.
-
Like Ruby NS has tried to make the sort of
API compatible,
-
but it sometimes works, it sometimes doesn't.
-
So basically all you can trust if you're writing
-
C API is your API, your gem is gonna work
in MRI.
-
So what about, what if you do want to-
-
OK, so, the last concern I always see is,
-
how do I even get started?
-
What is the best practices,
-
how do I build a C extension for this?
-
I've always found that the Ruby source code
-
itself is probably the best documentation
-
for how to build a C extension.
-
It's actually very simple,
-
very easy to understand, very nice to read.
-
So over here I've actually,
-
I'm actually showing you string dot C,
-
and I'm gonna walk through a few lines of
this code now, OK.
-
So the first line is a method called init
string, right.
-
This is the equivalent of main for your Ruby
extension.
-
Whenever your gem is required it is gonna
call this function.
-
So if there was a gem called string,
-
and I said require string,
-
it would call this method init underscore
string, yeah.
-
The first thing I'm gonna do over there
-
is I'm gonna say R-B define class.
-
I'm gonna define a class called string,
-
which inherits from object, right.
-
That's exactly equivalent to class string,
-
less than sign object. Nothing complicated
there.
-
I'm storing this in a variable called
-
R-B underscore C string, yeah,
-
and then I'm gonna define a method
-
on R-B underscore C string called E-Q-L question
mark, right.
-
What I'm gonna tell it is that this method,
-
when somebody calls E-Q-L question mark on
any string,
-
call this C function, which is
-
R-B underscore S-T-R underscore equal, right.
-
Still nothing complicated over there.
-
And the last thing says that I expect one
-
extra parameter to be there.
-
Self is always fast, but I want one extra
parameter.
-
So those are the four simple parameters to
this.
-
There is your class name, your function name,
-
the C function to call, and the number of
parameters.
-
Still nothing really complicated.
-
Let's look at the actual implementation of
the function.
-
Really simple. If self is equal to S-T-R of
2, return true.
-
Yes, they're the same object, because the
two of them
-
are the same object ID. They have the same
object ID.
-
They're actually the same object.
-
Similarly, the second one is not a string.
-
Return false, simple enough.
-
And the last line kind of delegates
-
to the old Ruby equal,
-
which will do the algorithm most of us learned
-
in high school, where you compile,
-
compare a string by a string, character by
character
-
to figure out, are these two strings equal?
-
So as you can see there's really nothing
-
very complicated in building a C extension.
-
And most of the time your architecture sort
of looks like this.
-
You have the Ruby, you have the native code
-
on the left. This is the code you kind of
want to run,
-
and you have Ruby code on the right,
-
and this is the code that you want to consume,
-
that code that you've somehow built.
-
In between, in purple, is a Ruby-aware native
code.
-
And why do I say Ruby-aware native code?
-
Because you've still written this as native
code.
-
It's still written in C.
-
It's still compiled down to a dot S-O file
or a dot Dylib file on Mac.
-
But it's Ruby-aware.
-
It knows how things work in Ruby.
-
Compared to this FFI kind of does the opposite.
-
Instead of Ruby-aware native code,
-
what you have is native-aware Ruby code, right.
-
So what this means is with FFI basically
-
working purely in Ruby, which somehow understands
-
how the native architecture of the system-
-
OK, so FFI is a Ruby D-S-L.
-
It's really easy to implement.
-
It's even easier than MRI, like than the C
extension.
-
It actually works across all Ruby implementations.
-
I would actually say that all the Ruby implementors
-
got together one day and said, how can we
make something
-
that'll make it easy for us to integrate with
libraries,
-
so it works on JRuby, it works on MRI,
-
it even works on Mac Ruby, Mac L? Ruby B-N-S,
-
you name it, it works.
-
And it basically converts to and from C primitives
for you directly.
-
So let's look at an example.
-
I'm just taking the example straight out of
GitHub.
-
This one's not complicated at all. All I'm
doing is I'm saying require FFI.
-
I'm saying this is an FFI library,
-
my module is FFI library,
-
and I'm saying it attaches to lib C.
-
And by lib C that doesn't imply
-
that the library is written in C.
-
What that means is this C standard
-
library that you want to connect to.
-
And I'm creating a method called puts and
-
I'm saying puts takes one argument - it's
a string,
-
and it returns one argument, which is a,
-
one value which is an integer.
-
Realy nothing complicated over there.
-
This creates a static class method,
-
or a module method on this module,
-
so I can say my lib dot puts, hello world,
using lib C.
-
It's very, very easy to attach to a function
using FFI.
-
And let's quickly look at another example.
-
This one I'm attaching pow, which takes two
doubles
-
and returns a double, and you can see it in
action over there.
-
It works.
-
And here I'm attaching to lib dot M, which
is the math library.
-
So actually FFI supports a lot of built-in
types.
-
It supports integers, characters,
-
and for everything else,
-
every single pointer type that you would use
in C,
-
it supports, well, a pointer.
-
So in general FFI is probably your best solution
for everything.
-
If you're trying to build a new gem and
-
you want people to use your gem, and
-
you're not just doing it for fun,
-
you probably want to build it using FFI.
-
It also lets you do your modeling in Ruby,
-
which means the deployments also a little
bit easier.
-
You don't need to struggle with Make files
-
and other stuff kind of build that extension.
-
Unfortunately one piece of misinformation
that
-
seems to be out there is that FFI,
-
if you build it with FFI,
-
you do not need to worry about garbage collection.
-
I'll show you an example in the next slide,
-
and unfortunately with FFI there is no
-
C++ support without wrapping.
-
So you could see over here that these
-
functions that we attached to are all static
functions in C.
-
They kind of are not attached to any object.
-
They take fixed number of parameters so
-
that's not possible to wrap C++ functions
directly.
-
You could write a thin shim, which kind of
-
takes static functions which you can use to
call your C++ functions.
-
But it still starts getting to be more effort
and you need to write that in C or C++.
-
So you do still have to worry about the garbage
collection,
-
however, with FFI, and I'll show you a quick
example where it matters.
-
Most people will write code that looks
-
something like this and not worry about it.
-
So I have a def run query which will crash,
-
and say D-B connection is equal to my
-
FFI module database connection local host,
-
and my FFI module database query I'm passing
is a D-B connection
-
and I'm saying select star from users.
-
This will probably work most of the time.
-
But in reality that D-B connection will eventually
get GC'd.
-
And internally in C your cursor will probably
hold
-
a pointer to your D-B connection,
-
even though this has not been exposed to you
via the API.
-
So when your D-B connection gets GC'd,
-
or like when the Ruby object gets GC'd,
-
the pointer is gonna get GC'd in memory,
-
and then when your cursor tries to access
the D-B connection,
-
it will crash.
-
Yeah, so the standard pattern for solving
something
-
like this is to make sure that these
-
two objects are aware of each other in the
Ruby world.
-
The most, in general what I've seen happen
a lot
-
is you save the database cursor and you kind
of just say,
-
cursor dot D-B connection is equal to the
other connection,
-
so that this has a pointer in Ruby as well
-
which corresponds to the C pointer.
-
So it's not as if you can just blindly
-
take the library and, just looking at the
APIs, do this.
-
Although, granted, with the very primitive
types,
-
when you're looking at things on the left
side,
-
the characters, the strings, you're less likely
to fall,
-
face big memory problems.
-
So that's mostly all I'm gonna speak about
FFI.
-
If you look at the progression we've made,
-
we've kind of, we've started with gems that
work in MRI.
-
We've moved onto gems that work in all Ruby
and (00:18:15 - ??),
-
now let's talk about gems that work on all
programming languages.
-
And soon we'll talk about taking over the
world.
-
So SWIG is sort of the answer to that.
-
SWIG stands for the simplified wraper and
interface generator -
-
which is a big tongue-twister.
-
Basically what SWIG does is it lets you
-
annotate your C and C++ header files.
-
The architecture is sort of like this -
-
there's native code over there, there's some
magic in between,
-
and then magically you get Ruby code and Python
code out of it.
-
Let's look a little bit about how this magic
works.
-
So FFI, sorry, SWIG works off an interface
file.
-
What it basically is is it's an annotated
header file
-
and it auto-generates code to make it work
in your various languages.
-
And how it auto-generates that code depends
on every single language.
-
So for C for Ruby builds a C extension.
-
So maybe it won't work in JRuby.
-
But for Ruby it actually generates C code
which will call your library.
-
For Python it's actually a C and a dot py
file.
-
For Java it builds a JNI interface.
-
And of course you still will have the same
GC problem
-
that you had while we were discussing FFI.
-
But in general SWIG actually works pretty
well.
-
There are a couple of Ruby gems out there
-
that are built using SWIG.
-
I've seen it actually used in practice for
-
like a large company which had an algorithm
-
it wanted to share across different programming
language.
-
They had a Python, a Java, and a Ruby sort
of front-end,
-
so what they did was build their code in C++,
-
exposed it via SWIG and were able to use it
in
-
all these three different languages. It's
really simple.
-
So over here we have a class called rectangle
-
which has a length and a breadth
-
and a constructor and an int called area -
-
and I'm sure most of you would know the implementation
of this.
-
All I need to do is add some junk at the top
and the bottom and yeah,
-
no, that's it, and magically it will kind
of work.
-
So unless I said this is a module called shape
-
that translates to Ruby directly to a name
space,
-
I just need to say require shapes.
-
Rectangle equals shapes dot rectangle dot
new, and so on and so forth.
-
So with SWIG it's very easy to quickly kind
-
of generate interfaces across multiple different
-
langues really fast, and you can do this if
you are,
-
especially if you're the maintainer of the
actual native library.
-
So if you are the maintainer of the
-
actual native library I would recommend going
with SWIG.
-
There are other options as well.
-
Ruby has no shortage of ways to include native
things.
-
I think an old one which has been around
-
for a long time is dynamic load that is basically
-
the port of C's DL open into Ruby. It's been
around since forever,
-
but I've heard a lot of reports of it
-
being really buggy and in general I think
both that and Fiddle,
-
which is now - Fiddle is actually coming in
Ruby 2 or Ruby 2.1 -
-
that is again another way of introducing native
libraries.
-
Both of these kind of work in concept
-
very similar to how FFI works,
-
so I'm not gonna spend a lot of time covering
them.
-
I think Fiddle may start becoming
-
popular soon as more and more people start
using Ruby 2.
-
I don't know if FFI and Fiddle are someday
-
going to start merging together,
-
but in general these are the other two options.
-
So, TL;DR.
-
Native extensions are fun, really easy to
build.
-
The three big tools which are C extensions,
FFI and SWIG.
-
You probably want to choose FFI if you don't
maintain the library,
-
even if it's too easy to write the code for
it,
-
but SWIG may be better if you actually maintain
-
the library and you want to expose it to a
number of people.
-
OK, thank you. I think I actually have time
for questions.
-
How much time do I have for questions?
-
V.O.: Ten minutes.
-
T.D.: OK, so does anyone have any questions?
-
Yes?
-
QUERANT: So when you actuall write the native
code,
-
right, do you have to take it off during GIL
acquir-
-
acquiring GIL and using it yourself?
-
T.D.: So actually that's kind of an interesting
question.
-
I think that like when it actually calls the
-
native extensions that's all the code you
write,
-
would be considered a single Ruby call.
-
So I'm not actually sure if the GIL is held
for the entire time.
-
I think by default the GIL would be held
-
for the entire time your code is being executed.
-
QUERANT: OK
-
T.D.: Unless you say do something to create
a thread and move out.
-
QUERANT: Right, but when you're writing,
-
especially things like the database connection-
-
T.D.: Right.
-
QUERANT: Right, when you have that kind of-
-
T.D.: OK, the database connection.
-
QUERANT: Yeah, if you have- anyway,
-
when you have that kind of a code, right,
-
it's not very- you would assume that somebody
might
-
have done a thread dot new and, you know,
-
gone ahead and called the still lines of code.
-
T.D.: Right.
-
QUERANT: Which means, like, if you haven't
taken
-
the global interpreter lock yourself,
-
then the chances are the same problem that
you
-
said with GC might occur.
-
You might get pre-empted and
-
then horrible things might have happened.
-
So, does that mean every line,
-
every native extension that you write,
-
needs to take GIL because it's
-
an obscure case of some re-doing it in a new
thread?
-
T.D.: So, actually, so, in theory, I think
yes.
-
When the code, so, OK, so most native extensions...
-
let me just go all the way back.
-
QUERANT: So go to the C code.
-
T.D.: Yeah, this one, yeah?
-
QUERANT: Right.
-
T.D.: Right, so wht Ruby recalls automatically.
-
When I say string will equal some other string,
-
and as long as the code is in this method,
-
you will be holding the GIL.
-
I don't think anyone else can execute code
during this time, unless...
-
Actually, I'll need to get back to you.
-
QUERANT: All right.
-
T.D.: Let me check.
-
QUERANT: Yeah, I mean, sure. Yeah, this was
something that I-
-
T.D.: OK, sure.
-
QUERANT: Yeah, OK. Yeah, so, like on the same
note, actually,
-
I just want to add, your C extension,
-
you only acquire the GIL if, in your extension,
-
you're going to run something along the running
thing.
-
But you don't want to,
-
you don't want the control to return to Ruby.
-
For example, let's say you,
-
your C extension takes a measure,
-
and it does some image processing, actually,
-
and you don't want, you just want the C extension
-
to write the file to the disk and call it
a day.
-
You don't want to return that something to-
-
Then you acquire a GIL in your extension
-
and then that thread will run completely separate
-
from your- which Ruby BM is it running at?
-
Actually, so that's the only time when
-
you will acquire a lock, if any,
-
if you're passing any data back to Ruby.
-
Or anyway, you pass it back, control,
-
the control back to this thing,
-
then you don't want to acquire the GIL yourself
actually.
-
There are constructs for that, but generally
not recommended.
-
T.D.: Yeah, so I believe what he's saying
is correct.
-
I was slightly mistaken, I think the GIL is
only acquired
-
when you enter the function.
-
As soon as you enter the function the GIL
is released,
-
unless I'm mistaken, that's correct, right?
-
QUERANT: That's correct.
-
T.D.: Yeah. So, yeah. So I guess if you actually
call
-
anything that's a Ruby construct from here,
-
you can actually call a method from within
your C function body.
-
I think at that point you'll need to re-acquire
the global interpreter lock.
-
But you're correct that GIL is only caught
when you enter the method.
-
V.O.: We have time for more questions.
-
QUERANT: Hey Tejas, how do you test native
extensions?
-
T.D.: OK, so, it's like I said.
-
The architecture of most of your things is
sort of like this,
-
where you have native code, Ruby-aware native
code,
-
and your actual Ruby code. Presumably you're
doing something
-
a Google test or something, to test your native
code,
-
and to test your actual Ruby code depending
on what your library is.
-
It will vary drastically.
-
So for example, if you're writing something
-
that connects to a database,
-
you may want to actually step out the things
that actually call.
-
Say if you're implementing with FFI or even
with a native extension,
-
if you're making something like a database
call
-
you may actually want to mock out or stamp
out the
-
actual implementation that connects the two.
-
But if you're doing something that's maybe
not so intensive,
-
maybe something like a JSON parsing library,
-
what I would recommend at this level is actually
writing an integration test,
-
actually parse it in JSON and make sure it
actually
-
returns to the actual, you know, representation
that you expected.
-
So the answer to how do you test is actually
it varies,
-
very, very drastically, and I've seen like
different
-
maturities of tests across like, all of them.
Does that answer your question?
-
QUERANT: Maybe, yeah.
-
T.D.: Anything else?
-
OK then, I guess I-
-
That's it, like, thank you very much. And
yeah.