Onto the second talk
Steve Capper is going to tell us
about the good bits of Java
They do exist
[Audience] Could this have been a
lightening talk? [Audience laughter]
Believe it or not we've got some
good stuff here.
I was as skeptical as you guys
when I first looked.
First apologies for not attending this
mini-conf last year
I was unfortunately ill on the day
I was due to give this talk.
Let me figure out how to use a computer.
Sorry about this.
There we go; it's because
I've not woken up.
Last year I worked at Linaro in the
Enterprise group and we performed analysis
on 'Big Data' applications sets.
As many of you know quite a lot of these
big data applications are written in Java.
I'm from ARM and we were very interested
in 64bit ARM support.
So this is mainly AArch64 examples
for things like assembler
but most of the messages are
pertinent for any architecture.
These good bits are shared between
most if not all the architectures.
Whilst trying to optimise a lot of
these big data applications
I stumbled a across quite a few things in
the JVM and I thought
'actually that's really clever;
that's really cool'
So I thought that would make a good
basis for a talk.
This talk is essentially some of the
clever things I found in the
Java Virtual Machine; these
optimisations are in Open JDK.
Source is available it's all there,
readily available and in play now.
I'm going to finish with some of the
optimisation work we did with Java.
People who know me will know
I'm not a Java zealot.
I don't particularly believe in
programming in a language over another one
So to make it clear from the outset
I'm not attempting to convert
anyone to Java programmers.
I'm just going to highlight a few salient
things in the Java Virtual Machine
which I found to be quite clever and
interesting
and I'll try and talk through them
with my understanding of them.
Let's jump straight in and let's
start with an example.
This is a minimal example for
computing a SHA1 sum of a file.
I've alluded some of the checking in the
beginning of the function see when
command line parsing and that sort of
thing.
I've highlighted the salient points in red.
Essentially we instantiate a SHA1
crypto message service digest.
And we do the equivalent in
Java of an mmap.
Get it all in memory.
And then we just put this status straight
into the crypto engine.
And eventually at the end of the
program we'll spit out the SHA1 hash.
It's a very simple programme
It's basically mmap, SHA1 output
the hash afterwards.
In order to concentrate on the CPU
aspect rather than worry about IO
I decided to cheat a little by
setting this up.
I decided to use a sparse file. As many of
you know a sparse file is a file that not
all the contents are necessarily stored
on disc. The assumption is that the bits
that aren't stored are zero. For instance
on Linux you can create a 20TB sparse file
on a 10MB file system and use it as
normal.
Just don't write too much to it otherwise
you're going to run out of space.
The idea behind using a sparse file is I'm
just focusing on the computational aspects
of the SHA1 sum. I'm not worried about
the file system or anything like that.
I don't want to worry about the IO. I
just want to focus on the actual compute.
In order to set up a sparse file I used
the following runes.
The important point is that you seek
and the other important point
is you set a count otherwise you'll fill your disc up.
I decided to run this against firstly
let's get the native SHA1 sum command
that's built into Linux and let's normalise these results and say that's 1.0.
I used an older version of the Open
JDK and ran the Java programme
and that's 1.09 times slower than the
reference command. That's quite good.
Then I used the new Open JDK, this is now
the current JDK as this is a year on.
And 0.21 taken. It's significantly faster.
I've stressed that I've done nothing
surreptitious in the Java program.
It is mmap, compute, spit result out.
But the Open JDK has essentially got
some more context information.
I'll talk about that as we go through.
Before when I started Java I had a very
simplistic view of Java.
Traditionally Java is taught as a virtual
machine that runs byte code.
Now when you compile a Java program it
compiles into byte code.
The older versions of the Java Virtual
Machine would interpret this byte code
and then run through. Newer versions would
employ a just-in-time engine and try and
compile this byte code into native machine code.
That is not the only thing that goes on
when you run a Java program.
There is some extra optimisations as well.
So this alone would not account for
the newer version of the SHA1
sum beingsignificantly faster
than the distro supply one.
Java knows about context. It has a class
library and these class libraries
have reasonably well defined purposes.
We have classes that provide
crypto services.
We have some misc unsafe that every
single project seems to pull in their
project when they're not supposed to.
These have well defined meanings.
These do not necessarily have to be
written in Java.
They come as Java classes,
they come supplied.
But most JVMs now have a notion
of a virtual machine intrinsic
And the virtual machine intrinsic says ok
please do a SHA1 in the best possible way
that your implementation allows. This is
something done automatically by the JVM.
You don't ask for it. If the JVM knows
what it's running on and it's reasonably
recent this will just happen
for you for free.
And there's quite a few classes
that do this.
There's quite a few clever things with
atomics, there's crypto,
there's mathematical routines as well.
Most of these routines in the
class library have a well defined notion
of a virtual machine intrinsic
and they do run reasonably optimally.
They are a subject of continuous
optimisation as well.
We've got some runes that are
presented on the slides here.
These are quite useful if you
are interested in
how these intrinsics are made.
You can ask the JVM to print out a lot of
the just-in-time compiled code.
You can ask the JVM to print out the
native methods as well as these intrinsics
and in this particular case after sifting
through about 5MB of text
I've come across this particular SHA1 sum
implementation.
This is AArch64. This is employing the
cryptographic extensions
in the architecture. So it's essentially
using the CPU instructions which
would explain why it's faster. But again
it's done all this automatically.
This did not require any specific runes
or anything to activate.
We'll see a bit later on how you can
more easily find the hot spots
rather than sifting through a lot
of assembler.
I've mentioned that the cryptographic
engine is employed and again
this routine was generated at run
time as well.
This is one of the important things about
certain execution of amps like Java.
You don't have to know everything at
compile time.
You know a lot more information at
run time and you can use that
in theory to optimise.
You can switch off these clever routines.
For instance I've got a deactivate
here and we get back to the
slower performance we expected.
Again, this particular set of routines is
present in Open JDK,
I think for all the architectures that support it.
We get this optimisation for free on X86
and others as well.
It works quite well.
That was one surprise I came across
as the instrinsics.
One thing I thought it would be quite
good to do would be to go through
a slightly more complicated example.
And use this example to explain
a lot of other things that happen
in the JVM as well.
I will spend a bit of time going through
this example
and explain roughly the notion of what
it's supposed to be doing.
This is an imaginary method that I've
contrived to demonstrate lot of points
in the fewest possible lines of code.
I'll start with what it's meant to do.
This is meant to be a routine that gets a
reference to something and let's you know
whether or not it's an image and in a
hypothetical cache.
I'll start with the important thing
here the weak reference.
In Java and other garbage collected
languages we have the notion of references.
Most of the time when you are running a
Java program you have something like a
variable name and that is in the current
execution context that is referred to as a
strong reference to the object. In other
words I can see it. I am using it.
Please don't get rid of it.
Bad things will happen if you do.
So the garbage collector knows
not to get rid of it.
In Java and other languages you also
have the notion of a weak reference.
This is essentially the programmer saying
to the virtual machine
"Look I kinda care about this but
just a little bit."
"If you want to get rid of it feel free
to but please let me know."
This is why this is for a cache class.
For instance the JVM in this particular
case could decide that it's running quite
low on memory this particular xMB image
has not been used for a while it can
garbage collect it.
The important thing is how we go about
expressing this in the language.
We can't just have a reference to the
object because that's a strong reference
and the JVM will know it can't get
rid of this because the program
can see it actively.
So we have a level of direction which is known as