-
Not Synced
Onto the second talk
-
Not Synced
Steve Capper is going to tell us
about the good bits of Java
-
Not Synced
They do exist
-
Not Synced
[Audience] Could this have been a
lightening talk? [Audience laughter]
-
Not Synced
Believe it or not we've got some
good stuff here.
-
Not Synced
I was as skeptical as you guys
when I first looked.
-
Not Synced
First apologies for not attending this
mini-conf last year
-
Not Synced
I was unfortunately ill on the day
I was due to give this talk.
-
Not Synced
Let me figure out how to use a computer.
-
Not Synced
Sorry about this.
-
Not Synced
There we go; it's because
I've not woken up.
-
Not Synced
Last year I worked at Linaro in the
Enterprise group and we performed analysis
-
Not Synced
on 'Big Data' applications sets.
-
Not Synced
As many of you know quite a lot of these
big data applications are written in Java.
-
Not Synced
I'm from ARM and we were very interested
in 64bit ARM support.
-
Not Synced
So this is mainly AArch64 examples
for things like assembler
-
Not Synced
but most of the messages are
pertinent for any architecture.
-
Not Synced
These good bits are shared between
most if not all the architectures.
-
Not Synced
Whilst trying to optimise a lot of
these big data applications
-
Not Synced
I stumbled a across quite a few things in
the JVM and I thought
-
Not Synced
'actually that's really clever;
that's really cool'
-
Not Synced
So I thought that would make a good
basis for a talk.
-
Not Synced
This talk is essentially some of the
clever things I found in the
-
Not Synced
Java Virtual Machine; these
optimisations are in Open JDK.
-
Not Synced
Source is available it's all there,
readily available and in play now.
-
Not Synced
I'm going to finish with some of the
optimisation work we did with Java.
-
Not Synced
People who know me will know
I'm not a Java zealot.
-
Not Synced
I don't particularly believe in
programming in a language over another one
-
Not Synced
So to make it clear from the outset
I'm not attempting to convert
-
Not Synced
anyone to Java programmers.
-
Not Synced
I'm just going to highlight a few salient
things in the Java Virtual Machine
-
Not Synced
which I found to be quite clever and
interesting
-
Not Synced
and I'll try and talk through them
with my understanding of them.
-
Not Synced
Let's jump straight in and let's
start with an example.
-
Not Synced
This is a minimal example for
computing a SHA1 sum of a file.
-
Not Synced
I've alluded some of the checking in the
beginning of the function see when
-
Not Synced
command line parsing and that sort of
thing.
-
Not Synced
I've highlighted the salient points in red.
-
Not Synced
Essentially we instantiate a SHA1
crypto message service digest.
-
Not Synced
And we do the equivalent in
Java of an mmap.
-
Not Synced
Get it all in memory.
-
Not Synced
And then we just put this status straight
into the crypto engine.
-
Not Synced
And eventually at the end of the
program we'll spit out the SHA1 hash.
-
Not Synced
It's a very simple programme
-
Not Synced
It's basically mmap, SHA1 output
the hash afterwards.
-
Not Synced
In order to concentrate on the CPU
aspect rather than worry about IO
-
Not Synced
I decided to cheat a little by
setting this up.
-
Not Synced
I decided to use a sparse file. As many of
you know a sparse file is a file that not
-
Not Synced
all the contents are necessarily stored
on disc. The assumption is that the bits
-
Not Synced
that aren't stored are zero. For instance
on Linux you can create a 20TB sparse file
-
Not Synced
on a 10MB file system and use it as
normal.
-
Not Synced
Just don't write too much to it otherwise
you're going to run out of space.
-
Not Synced
The idea behind using a sparse file is I'm
just focusing on the computational aspects
-
Not Synced
of the SHA1 sum. I'm not worried about
the file system or anything like that.
-
Not Synced
I don't want to worry about the IO. I
just want to focus on the actual compute.
-
Not Synced
In order to set up a sparse file I used
the following runes.
-
Not Synced
The important point is that you seek
and the other important point
-
Not Synced
is you set a count otherwise you'll fill your disc up.
-
Not Synced
I decided to run this against firstly
let's get the native SHA1 sum command
-
Not Synced
that's built into Linux and let's normalise these results and say that's 1.0.
-
Not Synced
I used an older version of the Open
JDK and ran the Java programme
-
Not Synced
and that's 1.09 times slower than the
reference command. That's quite good.
-
Not Synced
Then I used the new Open JDK, this is now
the current JDK as this is a year on.
-
Not Synced
And 0.21 taken. It's significantly faster.
-
Not Synced
I've stressed that I've done nothing
surreptitious in the Java program.
-
Not Synced
It is mmap, compute, spit result out.
-
Not Synced
But the Open JDK has essentially got
some more context information.
-
Not Synced
I'll talk about that as we go through.
-
Not Synced
Before when I started Java I had a very
simplistic view of Java.
-
Not Synced
Traditionally Java is taught as a virtual
machine that runs byte code.
-
Not Synced
Now when you compile a Java program it
compiles into byte code.
-
Not Synced
The older versions of the Java Virtual
Machine would interpret this byte code
-
Not Synced
and then run through. Newer versions would
employ a just-in-time engine and try and
-
Not Synced
compile this byte code into native machine code.
-
Not Synced
That is not the only thing that goes on
when you run a Java program.
-
Not Synced
There is some extra optimisations as well.
So this alone would not account for
-
Not Synced
the newer version of the SHA1
sum beingsignificantly faster
-
Not Synced
than the distro supply one.
-
Not Synced
Java knows about context. It has a class
library and these class libraries
-
Not Synced
have reasonably well defined purposes.
-
Not Synced
We have classes that provide
crypto services.
-
Not Synced
We have some misc unsafe that every
single project seems to pull in their
-
Not Synced
project when they're not supposed to.
-
Not Synced
These have well defined meanings.
-
Not Synced
These do not necessarily have to be
written in Java.
-
Not Synced
They come as Java classes,
they come supplied.
-
Not Synced
But most JVMs now have a notion
of a virtual machine intrinsic
-
Not Synced
And the virtual machine intrinsic says ok
please do a SHA1 in the best possible way
-
Not Synced
that your implementation allows. This is
something done automatically by the JVM.
-
Not Synced
You don't ask for it. If the JVM knows
what it's running on and it's reasonably
-
Not Synced
recent this will just happen
for you for free.
-
Not Synced
And there's quite a few classes
that do this.
-
Not Synced
There's quite a few clever things with
atomics, there's crypto,
-
Not Synced
there's mathematical routines as well.
Most of these routines in the
-
Not Synced
class library have a well defined notion
of a virtual machine intrinsic
-
Not Synced
and they do run reasonably optimally.
-
Not Synced
They are a subject of continuous
optimisation as well.
-
Not Synced
We've got some runes that are
presented on the slides here.
-
Not Synced
These are quite useful if you
are interested in
-
Not Synced
how these intrinsics are made.
-
Not Synced
You can ask the JVM to print out a lot of
the just-in-time compiled code.
-
Not Synced
You can ask the JVM to print out the
native methods as well as these intrinsics
-
Not Synced
and in this particular case after sifting
through about 5MB of text
-
Not Synced
I've come across this particular SHA1 sum
implementation.
-
Not Synced
This is AArch64. This is employing the
cryptographic extensions
-
Not Synced
in the architecture. So it's essentially
using the CPU instructions which
-
Not Synced
would explain why it's faster. But again
it's done all this automatically.
-
Not Synced
This did not require any specific runes
or anything to activate.
-
Not Synced
We'll see a bit later on how you can
more easily find the hot spots
-
Not Synced
rather than sifting through a lot
of assembler.
-
Not Synced
I've mentioned that the cryptographic
engine is employed and again
-
Not Synced
this routine was generated at run
time as well.
-
Not Synced
This is one of the important things about
certain execution of amps like Java.
-
Not Synced
You don't have to know everything at
compile time.
-
Not Synced
You know a lot more information at
run time and you can use that
-
Not Synced
in theory to optimise.
-
Not Synced
You can switch off these clever routines.
-
Not Synced
For instance I've got a deactivate
here and we get back to the
-
Not Synced
slower performance we expected.
-
Not Synced
Again, this particular set of routines is
present in Open JDK,
-
Not Synced
I think for all the architectures that support it.
-
Not Synced
We get this optimisation for free on X86
and others as well.
-
Not Synced
It works quite well.
-
Not Synced
That was one surprise I came across
as the instrinsics.
-
Not Synced
One thing I thought it would be quite
good to do would be to go through
-
Not Synced
a slightly more complicated example.
And use this example to explain
-
Not Synced
a lot of other things that happen
in the JVM as well.
-
Not Synced
I will spend a bit of time going through
this example
-
Not Synced
and explain roughly the notion of what
it's supposed to be doing.
-
Not Synced
This is an imaginary method that I've
contrived to demonstrate lot of points
-
Not Synced
in the fewest possible lines of code.
-
Not Synced
I'll start with what it's meant to do.
-
Not Synced
This is meant to be a routine that gets a
reference to something and let's you know
-
Not Synced
whether or not it's an image and in a
hypothetical cache.
-
Not Synced
I'll start with the important thing
here the weak reference.
-
Not Synced
In Java and other garbage collected
languages we have the notion of references.
-
Not Synced
Most of the time when you are running a
Java program you have something like a
-
Not Synced
variable name and that is in the current
execution context that is referred to as a
-
Not Synced
strong reference to the object. In other
words I can see it. I am using it.
-
Not Synced
Please don't get rid of it.
Bad things will happen if you do.
-
Not Synced
So the garbage collector knows
not to get rid of it.
-
Not Synced
In Java and other languages you also
have the notion of a weak reference.
-
Not Synced
This is essentially the programmer saying
to the virtual machine
-
Not Synced
"Look I kinda care about this but
just a little bit."
-
Not Synced
"If you want to get rid of it feel free
to but please let me know."
-
Not Synced
This is why this is for a cache class.
For instance the JVM in this particular
-
Not Synced
case could decide that it's running quite
low on memory this particular xMB image
-
Not Synced
has not been used for a while it can
garbage collect it.
-
Not Synced
The important thing is how we go about
expressing this in the language.
-
Not Synced
We can't just have a reference to the
object because that's a strong reference
-
Not Synced
and the JVM will know it can't get
rid of this because the program
-
Not Synced
can see it actively.
-
Not Synced
So we have a level of indirection which is
known as a weak reference.
-
Not Synced
We have this hypothetical CacheClass
that I've devised.
-
Not Synced
At this point it is a weak reference.
-
Not Synced
Then we get it. This is calling the weak
reference routine.
-
Not Synced
Now it becomes a strong reference so
it's not going to be garbage collected.
-
Not Synced
When we get to the return path it becomes
a weak reference again
-
Not Synced
because our strong reference
has disappeared.
-
Not Synced
The salient points in this example are:
-
Not Synced
We're employing a method to get
a reference.
-
Not Synced
We're checking an item to see if
it's null.
-
Not Synced
So let's say that the JVM decided to
garbage collect this
-
Not Synced
before we executed the method.
-
Not Synced
The weak reference class is still valid
because we've got a strong reference to it
-
Not Synced
but the actual object behind this is gone.
-
Not Synced
If we're too late and the garbage
collector has killed it
-
Not Synced
it will be null and we return.
-
Not Synced
So it's a level of indirection to see
does this still exist
-
Not Synced
if so can I please have it and then
operate on it as normal
-
Not Synced
and then return becomes weak
reference again.
-
Not Synced
This example program is quite useful when
we look at how it's implemented in the JVM
-
Not Synced
and we'll go through a few things now.
-
Not Synced
First off we'll go through the byte code.
-
Not Synced
The only point of this slide is to
show it's roughly
-
Not Synced
the same as this.
-
Not Synced
We get our variable.
-
Not Synced
We use our getter.
-
Not Synced
This bit is extra this checkcast.
The reason that bit is extra is
-
Not Synced
because we're using the equivalent of
a template in Java.
-
Not Synced
And the way that's implemented in Java is
it just basically casts everything to an
-
Not Synced
object so that requires extra
compiler information.
-
Not Synced
And this is the extra check.
-
Not Synced
The rest of this we load the reference,
we check to see if it is null,
-
Not Synced
If it's not null we invoke a virtual
function - is it the image?
-
Not Synced
and we return as normal.
-
Not Synced
Essentially the point I'm trying to make
is when we compile this to byte code
-
Not Synced
this execution happens.
-
Not Synced
This null check happens.
-
Not Synced
This execution happens.
-
Not Synced
And we return.
-
Not Synced
In the actual Java class files we've not
lost anything.
-
Not Synced
This is what it looks like when it's
been JIT'd.
-
Not Synced
Now we've lost lots of things.
-
Not Synced
The JIT has done quite a few clever things
which I'll talk about.
-
Not Synced
First off if we look down here there's
a single branch here.
-
Not Synced
And this is only if our check cast failed
-
Not Synced
If we've got comments on the
right hand side.
-
Not Synced
Our get method has been in-lined so
we're no longer calling.
-
Not Synced
We seem to have lost our null check,
that's just gone.
-
Not Synced
And again we've got a get field as well.
-
Not Synced
That's no longer a method,
that's been in-lined as well
-
Not Synced
We've also got some other cute things.
-
Not Synced
Those more familiar with AArch64 will
understand that the pointers we're using
-
Not Synced
are 32bit not 64bit.
-
Not Synced
What we're doing is getting a pointer
and shifting it left 3
-
Not Synced
and widening it to a 64bit pointer.
-
Not Synced
We've also got 32bit pointers on a
64bit system as well.
-
Not Synced
So that's saving a reasonable amount
of memory and cache.
-
Not Synced
To summarise. We don't have any
branches or function calls
-
Not Synced
and we've got a lot of in-lining.
-
Not Synced
We did have function calls in the
class file so it's the JVM
-
Not Synced
it's the JIT that has done this.
-
Not Synced
We've got no null checks either and I'm
going to talk through this now.
-
Not Synced
The null check elimination is quite a
clever feature in Java and other programs.
-
Not Synced
The idea behind null check elimination is
-
Not Synced
most of the time this object is not
going to be null.
-
Not Synced
If this object is null the operating
system knows this quite quickly.
-
Not Synced
So if you try to de-reference a null
pointer you'll get either a SIGSEGV or
-
Not Synced
a SIGBUS depending on a
few circumstances.
-
Not Synced
That goes straight back to the JVM
-
Not Synced
and the JVM knows where the null
exception took place.
-
Not Synced
Because it knows where the exception took
place it can look this up
-
Not Synced
and unwind it as part of an exception.
-
Not Synced
Those null checks just go.
Completely gone.
-
Not Synced
Most of the time this works and you are
saving a reasonable amount of execution.
-
Not Synced
I'll talk about when it doesn't work
in a second.
-
Not Synced
That's reasonably clever. We have similar
programming techniques in other places
-
Not Synced
even the Linux kernel for instance when
you copy data to and from user space
-
Not Synced
it does pretty much identical the same
thing. It has an exception unwind table
-
Not Synced
and it knows if it catches a page fault on
this particular program counter
-
Not Synced
it can deal with it because it knows
the program counter and it knows
-
Not Synced
conceptually what it was doing.
-
Not Synced
In a similar way the JIT know what its
doing to a reasonable degree.
-
Not Synced
It can handle the null check elimination.
-
Not Synced
I mentioned the sneaky one. We've got
essentially 32bit pointers
-
Not Synced
on a 64bit system.
-
Not Synced
Most of the time in Java people typically
specify heap size smaller than 32GB.
-
Not Synced
Which is perfect if you want to use 32bit
pointers and left shift 3.
-
Not Synced
Because that gives you 32GB of
addressable memory.
-
Not Synced
That's a significant memory saving because
otherwise a lot of things would double up.
-
Not Synced
There's a significant number of pointers
in Java.
-
Not Synced
The one that should make people
jump out of their seat is
-
Not Synced
the fact that most methods in Java are
actually virtual.
-
Not Synced
So what the JVM has actually done is
in-lined a virtual function.
-
Not Synced
A virtual function is essentially a
function were you don't know where
-
Not Synced
you're going until run time.
-
Not Synced
You can have several different classes
and they share the same virtual function
-
Not Synced
in the base class and dependent upon
which specific class you're running
-
Not Synced
different virtual functions will
get executed.
-
Not Synced
In C++ that will be a read from a V table
and then you know where to go.
-
Not Synced
The JVM's in-lined it.
-
Not Synced
We've saved a memory load.
-
Not Synced
We've saved a branch as well
-
Not Synced
The reason the JVM can in-line it is
because the JVM knows
-
Not Synced
every single class that has been loaded.
-
Not Synced
So it knows that although this looks
polymorphic to the casual programmer
-
Not Synced
It is actually monomorphic.
The JVM knows this.