-
Not Synced
Onto the second talk
-
Not Synced
Steve Capper is going to tell us
about the good bits of Java
-
Not Synced
They do exist
-
Not Synced
[Audience] Could this have been a
lightening talk? [Audience laughter]
-
Not Synced
Believe it or not we've got some
good stuff here.
-
Not Synced
I was as skeptical as you guys
when I first looked.
-
Not Synced
First apologies for not attending this
mini-conf last year
-
Not Synced
I was unfortunately ill on the day
I was due to give this talk.
-
Not Synced
Let me figure out how to use a computer.
-
Not Synced
Sorry about this.
-
Not Synced
There we go; it's because
I've not woken up.
-
Not Synced
Last year I worked at Linaro in the
Enterprise group and we performed analysis
-
Not Synced
on 'Big Data' applications sets.
-
Not Synced
As many of you know quite a lot of these
big data applications are written in Java.
-
Not Synced
I'm from ARM and we were very interested
in 64bit ARM support.
-
Not Synced
So this is mainly AArch64 examples
for things like assembler
-
Not Synced
but most of the messages are
pertinent for any architecture.
-
Not Synced
These good bits are shared between
most if not all the architectures.
-
Not Synced
Whilst trying to optimise a lot of
these big data applications
-
Not Synced
I stumbled a across quite a few things in
the JVM and I thought
-
Not Synced
'actually that's really clever;
that's really cool'
-
Not Synced
So I thought that would make a good
basis for a talk.
-
Not Synced
This talk is essentially some of the
clever things I found in the
-
Not Synced
Java Virtual Machine; these
optimisations are in Open JDK.
-
Not Synced
Source is available it's all there,
readily available and in play now.
-
Not Synced
I'm going to finish with some of the
optimisation work we did with Java.
-
Not Synced
People who know me will know
I'm not a Java zealot.
-
Not Synced
I don't particularly believe in
programming in a language over another one
-
Not Synced
So to make it clear from the outset
I'm not attempting to convert
-
Not Synced
anyone to Java programmers.
-
Not Synced
I'm just going to highlight a few salient
things in the Java Virtual Machine
-
Not Synced
which I found to be quite clever and
interesting
-
Not Synced
and I'll try and talk through them
with my understanding of them.
-
Not Synced
Let's jump straight in and let's
start with an example.
-
Not Synced
This is a minimal example for
computing a SHA1 sum of a file.
-
Not Synced
I've alluded some of the checking in the
beginning of the function see when
-
Not Synced
command line parsing and that sort of
thing.
-
Not Synced
I've highlighted the salient points in red.
-
Not Synced
Essentially we instantiate a SHA1
crypto message service digest.
-
Not Synced
And we do the equivalent in
Java of an mmap.
-
Not Synced
Get it all in memory.
-
Not Synced
And then we just put this status straight
into the crypto engine.
-
Not Synced
And eventually at the end of the
program we'll spit out the SHA1 hash.
-
Not Synced
It's a very simple programme
-
Not Synced
It's basically mmap, SHA1 output
the hash afterwards.
-
Not Synced
In order to concentrate on the CPU
aspect rather than worry about IO
-
Not Synced
I decided to cheat a little by
setting this up.
-
Not Synced
I decided to use a sparse file. As many of
you know a sparse file is a file that not
-
Not Synced
all the contents are necessarily stored
on disc. The assumption is that the bits
-
Not Synced
that aren't stored are zero. For instance
on Linux you can create a 20TB sparse file
-
Not Synced
on a 10MB file system and use it as
normal.
-
Not Synced
Just don't write too much to it otherwise
you're going to run out of space.
-
Not Synced
The idea behind using a sparse file is I'm
just focusing on the computational aspects
-
Not Synced
of the SHA1 sum. I'm not worried about
the file system or anything like that.
-
Not Synced
I don't want to worry about the IO. I
just want to focus on the actual compute.
-
Not Synced
In order to set up a sparse file I used
the following runes.
-
Not Synced
The important point is that you seek
and the other important point
-
Not Synced
is you set a count otherwise you'll fill your disc up.
-
Not Synced
I decided to run this against firstly
let's get the native SHA1 sum command
-
Not Synced
that's built into Linux and let's normalise these results and say that's 1.0.
-
Not Synced
I used an older version of the Open
JDK and ran the Java programme
-
Not Synced
and that's 1.09 times slower than the
reference command. That's quite good.
-
Not Synced
Then I used the new Open JDK, this is now
the current JDK as this is a year on.
-
Not Synced
And 0.21 taken. It's significantly faster.
-
Not Synced
I've stressed that I've done nothing
surreptitious in the Java program.
-
Not Synced
It is mmap, compute, spit result out.
-
Not Synced
But the Open JDK has essentially got
some more context information.
-
Not Synced
I'll talk about that as we go through.
-
Not Synced
Before when I started Java I had a very
simplistic view of Java.
-
Not Synced
Traditionally Java is taught as a virtual
machine that runs byte code.
-
Not Synced
Now when you compile a Java program it
compiles into byte code.
-
Not Synced
The older versions of the Java Virtual
Machine would interpret this byte code
-
Not Synced
and then run through. Newer versions would
employ a just-in-time engine and try and
-
Not Synced
compile this byte code into native machine code.
-
Not Synced
That is not the only thing that goes on
when you run a Java program.
-
Not Synced
There is some extra optimisations as well.
So this alone would not account for
-
Not Synced
the newer version of the SHA1
sum beingsignificantly faster
-
Not Synced
than the distro supply one.
-
Not Synced
Java knows about context. It has a class
library and these class libraries
-
Not Synced
have reasonably well defined purposes.
-
Not Synced
We have classes that provide
crypto services.
-
Not Synced
We have some misc unsafe that every
single project seems to pull in their
-
Not Synced
project when they're not supposed to.
-
Not Synced
These have well defined meanings.
-
Not Synced
These do not necessarily have to be
written in Java.
-
Not Synced
They come as Java classes,
they come supplied.
-
Not Synced
But most JVMs now have a notion
of a virtual machine intrinsic
-
Not Synced
And the virtual machine intrinsic says ok
please do a SHA1 in the best possible way
-
Not Synced
that your implementation allows. This is
something done automatically by the JVM.
-
Not Synced
You don't ask for it. If the JVM knows
what it's running on and it's reasonably
-
Not Synced
recent this will just happen
for you for free.
-
Not Synced
And there's quite a few classes
that do this.
-
Not Synced
There's quite a few clever things with
atomics, there's crypto,
-
Not Synced
there's mathematical routines as well.
Most of these routines in the
-
Not Synced
class library have a well defined notion
of a virtual machine intrinsic
-
Not Synced
and they do run reasonably optimally.
-
Not Synced
They are a subject of continuous
optimisation as well.
-
Not Synced
We've got some runes that are
presented on the slides here.
-
Not Synced
These are quite useful if you
are interested in
-
Not Synced
how these intrinsics are made.
-
Not Synced
You can ask the JVM to print out a lot of
the just-in-time compiled code.
-
Not Synced
You can ask the JVM to print out the
native methods as well as these intrinsics
-
Not Synced
and in this particular case after sifting
through about 5MB of text
-
Not Synced
I've come across this particular SHA1 sum
implementation.
-
Not Synced
This is AArch64. This is employing the
cryptographic extensions
-
Not Synced
in the architecture. So it's essentially
using the CPU instructions which
-
Not Synced
would explain why it's faster. But again
it's done all this automatically.
-
Not Synced
This did not require any specific runes
or anything to activate.
-
Not Synced
We'll see a bit later on how you can
more easily find the hot spots
-
Not Synced
rather than sifting through a lot
of assembler.
-
Not Synced
I've mentioned that the cryptographic
engine is employed and again
-
Not Synced
this routine was generated at run
time as well.
-
Not Synced
This is one of the important things about
certain execution of amps like Java.
-
Not Synced
You don't have to know everything at
compile time.
-
Not Synced
You know a lot more information at
run time and you can use that
-
Not Synced
in theory to optimise.
-
Not Synced
You can switch off these clever routines.
-
Not Synced
For instance I've got a deactivate
here and we get back to the
-
Not Synced
slower performance we expected.
-
Not Synced
Again, this particular set of routines is
present in Open JDK,
-
Not Synced
I think for all the architectures that support it.
-
Not Synced
We get this optimisation for free on X86
and others as well.
-
Not Synced
It works quite well.
-
Not Synced
That was one surprise I came across
as the instrinsics.
-
Not Synced
One thing I thought it would be quite
good to do would be to go through
-
Not Synced
a slightly more complicated example.
And use this example to explain
-
Not Synced
a lot of other things that happen
in the JVM as well.
-
Not Synced
I will spend a bit of time going through
this example
-
Not Synced
and explain roughly the notion of what
it's supposed to be doing.
-
Not Synced
This is an imaginary method that I've
contrived to demonstrate lot of points
-
Not Synced
in the fewest possible lines of code.
-
Not Synced
I'll start with what it's meant to do.
-
Not Synced
This is meant to be a routine that gets a
reference to something and let's you know
-
Not Synced
whether or not it's an image and in a
hypothetical cache.
-
Not Synced
I'll start with the important thing
here the weak reference.
-
Not Synced
In Java and other garbage collected
languages we have the notion of references.
-
Not Synced
Most of the time when you are running a
Java program you have something like a
-
Not Synced
variable name and that is in the current
execution context that is referred to as a
-
Not Synced
strong reference to the object. In other
words I can see it. I am using it.
-
Not Synced
Please don't get rid of it.
Bad things will happen if you do.
-
Not Synced
So the garbage collector knows
not to get rid of it.
-
Not Synced
In Java and other languages you also
have the notion of a weak reference.
-
Not Synced
This is essentially the programmer saying
to the virtual machine
-
Not Synced
"Look I kinda care about this but
just a little bit."
-
Not Synced
"If you want to get rid of it feel free
to but please let me know."
-
Not Synced
This is why this is for a cache class.
For instance the JVM in this particular
-
Not Synced
case could decide that it's running quite
low on memory this particular xMB image
-
Not Synced
has not been used for a while it can
garbage collect it.
-
Not Synced
The important thing is how we go about
expressing this in the language.
-
Not Synced
We can't just have a reference to the
object because that's a strong reference
-
Not Synced
and the JVM will know it can't get
rid of this because the program
-
Not Synced
can see it actively.
-
Not Synced
So we have a level of direction which is known as