-
Not Synced
Onto the second talk of the day.
-
Not Synced
Steve Capper is going to tell us
about the good bits of Java
-
Not Synced
They do exist
-
Not Synced
[Audience] Could this have been a
lightening talk? [Audience laughter]
-
Not Synced
Believe it or not we've got some
good stuff here.
-
Not Synced
I was as skeptical as you guys
when I first looked.
-
Not Synced
First many apologies for not attending
the mini-conf last year
-
Not Synced
I was unfortunately ill on the day
I was due to give this talk.
-
Not Synced
Let me figure out how to use a computer.
-
Not Synced
Sorry about this.
-
Not Synced
There we go; it's because
I've not woken up.
-
Not Synced
Last year I worked at Linaro in the
Enterprise group and we performed analysis
-
Not Synced
on so called 'Big Data' application sets.
-
Not Synced
As many of you know quite a lot of these
big data applications are written in Java.
-
Not Synced
I'm from ARM and we were very interested
in 64bit ARM support.
-
Not Synced
So this is mainly AArch64 examples
for things like assembler
-
Not Synced
but most of the messages are
pertinent for any architecture.
-
Not Synced
These good bits are shared between
most if not all the architectures.
-
Not Synced
Whilst trying to optimise a lot of
these big data applications
-
Not Synced
I stumbled a across quite a few things in
the JVM and I thought
-
Not Synced
'actually that's really clever;
that's really cool'
-
Not Synced
So I thought that would make a good
basis for an interesting talk.
-
Not Synced
This talk is essentially some of the
clever things I found in the
-
Not Synced
Java Virtual Machine; these
optimisations are in OpenJDK.
-
Not Synced
Source is available it's all there,
readily available and in play now.
-
Not Synced
I'm going to finish with some of the
optimisation work we did with Java.
-
Not Synced
People who know me will know
I'm not a Java zealot.
-
Not Synced
I don't particularly believe in
programming in a language over another one
-
Not Synced
So to make it clear from the outset
I'm not attempting to convert
-
Not Synced
anyone to Java programmers.
-
Not Synced
I'm just going to highlight a few salient
things in the Java Virtual Machine
-
Not Synced
which I found to be quite clever and
interesting
-
Not Synced
and I'll try and talk through them
with my understanding of them.
-
Not Synced
Let's jump straight in and let's
start with an example.
-
Not Synced
This is a minimal example for
computing a SHA1 sum of a file.
-
Not Synced
I've omitted some of the checking in the
beginning of the function see when
-
Not Synced
command line parsing and that sort of
thing.
-
Not Synced
I've highlighted the salient
points in red.
-
Not Synced
Essentially we instantiate a SHA1
crypto message service digest.
-
Not Synced
And we do the equivalent in
Java of an mmap.
-
Not Synced
Get it all in memory.
-
Not Synced
And then we just put this status straight
into the crypto engine.
-
Not Synced
And eventually at the end of the
program we'll spit out the SHA1 hash.
-
Not Synced
It's a very simple program.
-
Not Synced
It's basically mmap, SHA1, output
the hash afterwards.
-
Not Synced
In order to concentrate on the CPU
aspect rather than worry about IO
-
Not Synced
I decided to cheat a little bit by
setting this up.
-
Not Synced
I decided to use a sparse file. As many of
you know a sparse file is a file that not
-
Not Synced
all the contents are stored necessarily
on disc. The assumption is that the bits
-
Not Synced
that aren't stored are zero. For instance
on Linux you can create a 20TB sparse file
-
Not Synced
on a 10MB file system and use it as
normal.
-
Not Synced
Just don't write too much to it otherwise
you're going to run out of space.
-
Not Synced
The idea behind using a sparse file is I'm
just focusing on the computational aspects
-
Not Synced
of the SHA1 sum. I'm not worried about
the file system or anything like that.
-
Not Synced
I don't want to worry about the IO. I
just want to focus on the actual compute.
-
Not Synced
In order to set up a sparse file I used
the following runes.
-
Not Synced
The important point is that you seek
and the other important point
-
Not Synced
is you set a count otherwise you'll
fill your disc up.
-
Not Synced
I decided to run this against firstly
let's get the native SHA1 sum command
-
Not Synced
that's built into Linux and let's
normalise these results and say that's 1.0
-
Not Synced
I used an older version of the OpenJDK
and ran the Java program
-
Not Synced
and that's 1.09 times slower than the
reference command. That's quite good.
-
Not Synced
Then I used the new OpenJDK, this is now
the current JDK as this is a year on.
-
Not Synced
And 0.21 taken. It's significantly faster.
-
Not Synced
I've stressed that I've done nothing
surreptitious in the Java program.
-
Not Synced
It is mmap, compute, spit result out.
-
Not Synced
But the OpenJDK has essentially got
some more context information.
-
Not Synced
I'll talk about that as we go through.
-
Not Synced
Before when I started Java I had a very
simplistic view of Java.
-
Not Synced
Traditionally Java is taught as a virtual
machine that runs bytecode.
-
Not Synced
Now when you compile a Java program it
compiles into bytecode.
-
Not Synced
The older versions of the Java Virtual
Machine would interpret this bytecode
-
Not Synced
and then run through. Newer versions
would employ a just-in-time engine
-
Not Synced
and try and compile this bytecode
into native machine code.
-
Not Synced
That is not the only thing that goes on
when you run a Java program.
-
Not Synced
There is some extra optimisations as well.
So this alone would not account for
-
Not Synced
the newer version of the SHA1
sum being significantly faster
-
Not Synced
than the distro supplied one.
-
Not Synced
Java knows about context. It has a class
library and these class libraries
-
Not Synced
have reasonably well defined purposes.
-
Not Synced
We have classes that provide
crypto services.
-
Not Synced
We have some misc unsafe that every
single project seems to pull in their
-
Not Synced
project when they're not supposed to.
-
Not Synced
These have well defined meanings.
-
Not Synced
These do not necessarily have to be
written in Java.
-
Not Synced
They come as Java classes,
they come supplied.
-
Not Synced
But most JVMs now have a notion
of a virtual machine intrinsic.
-
Not Synced
And the virtual machine intrinsic says ok
please do a SHA1 in the best possible way
-
Not Synced
that your implementation allows. This is
something done automatically by the JVM.
-
Not Synced
You don't ask for it. If the JVM knows
what it's running on and it's reasonably
-
Not Synced
recent this will just happen
for you for free.
-
Not Synced
And there's quite a few classes
that do this.
-
Not Synced
There's quite a few clever things with
atomics, there's crypto,
-
Not Synced
there's mathematical routines as well.
Most of these routines in the
-
Not Synced
class library have a well defined notion
of a virtual machine intrinsic
-
Not Synced
and they do run reasonably optimally.
-
Not Synced
They are a subject of continuous
optimisation as well.
-
Not Synced
We've got some runes that are
presented on the slides here.
-
Not Synced
These are quite useful if you
are interested in
-
Not Synced
how these intrinsics are made.
-
Not Synced
You can ask the JVM to print out a lot of
the just-in-time compiled code.
-
Not Synced
You can ask the JVM to print out the
native methods as well as these intrinsics
-
Not Synced
and in this particular case after sifting
through about 5MB of text
-
Not Synced
I've come across this particular SHA1 sum
implementation.
-
Not Synced
This is AArch64. This is employing the
cryptographic extensions
-
Not Synced
in the architecture.
-
Not Synced
So it's essentially using the CPU
instructions which would explain why
-
Not Synced
it's faster. But again it's done
all this automatically.
-
Not Synced
This did not require any specific runes
or anything to activate.
-
Not Synced
We'll see a bit later on how you can
more easily find the hot spots
-
Not Synced
rather than sifting through a lot
of assembler.
-
Not Synced
I've mentioned that the cryptographic
engine is employed and again
-
Not Synced
this routine was generated at run
time as well.
-
Not Synced
This is one of the important things about
certain execution of amps like Java.
-
Not Synced
You don't have to know everything at
compile time.
-
Not Synced
You know a lot more information at
run time and you can use that
-
Not Synced
in theory to optimise.
-
Not Synced
You can switch off these clever routines.
-
Not Synced
For instance I've got a deactivate
here and we get back to the
-
Not Synced
slower performance we expected.
-
Not Synced
Again, this particular set of routines is
present in OpenJDK,
-
Not Synced
I think for all the architectures that
support it.
-
Not Synced
We get this optimisation for free on X86
and others as well.
-
Not Synced
It works quite well.
-
Not Synced
That was one surprise I came across
as the instrinsics.
-
Not Synced
One thing I thought it would be quite
good to do would be to go through
-
Not Synced
a slightly more complicated example.
And use this example to explain
-
Not Synced
a lot of other things that happen
in the JVM as well.
-
Not Synced
I will spend a bit of time going through
this example
-
Not Synced
and explain roughly the notion of what
it's supposed to be doing.
-
Not Synced
This is an imaginary method that I've
contrived to demonstrate a lot of points
-
Not Synced
in the fewest possible lines of code.
-
Not Synced
I'll start with what it's meant to do.
-
Not Synced
This is meant to be a routine that gets a
reference to something and let's you know
-
Not Synced
whether or not it's an image and in a
hypothetical cache.
-
Not Synced
I'll start with the important thing
here the weak reference.
-
Not Synced
In Java and other garbage collected
languages we have the notion of references
-
Not Synced
Most of the time when you are running a
Java program you have something like a
-
Not Synced
variable name and that is in the current
execution context that is referred to as a
-
Not Synced
strong reference to the object. In other
words I can see it. I am using it.
-
Not Synced
Please don't get rid of it.
Bad things will happen if you do.
-
Not Synced
So the garbage collector knows
not to get rid of it.
-
Not Synced
In Java and other languages you also
have the notion of a weak reference.
-
Not Synced
This is essentially the programmer saying
to the virtual machine
-
Not Synced
"Look I kinda care about this but
just a little bit."
-
Not Synced
"If you want to get rid of it feel free
to but please let me know."
-
Not Synced
This is why this is for a CacheClass.
For instance the JVM in this particular
-
Not Synced
case could decide that it's running quite
low on memory this particular xMB image
-
Not Synced
has not been used for a while it can
garbage collect it.
-
Not Synced
The important thing is how we go about
expressing this in the language.
-
Not Synced
We can't just have a reference to the
object because that's a strong reference
-
Not Synced
and the JVM will know it can't get
rid of this because the program
-
Not Synced
can see it actively.
-
Not Synced
So we have a level of indirection which
is known as a weak reference.
-
Not Synced
We have this hypothetical CacheClass
that I've devised.
-
Not Synced
At this point it is a weak reference.
-
Not Synced
Then we get it. This is calling the weak
reference routine.
-
Not Synced
Now it becomes a strong reference so
it's not going to be garbage collected.
-
Not Synced
When we get to the return path it becomes
a weak reference again
-
Not Synced
because our strong reference
has disappeared.
-
Not Synced
The salient points in this example are:
-
Not Synced
We're employing a method to get
a reference.
-
Not Synced
We're checking an item to see if
it's null.
-
Not Synced
So let's say that the JVM decided to
garbage collect this
-
Not Synced
before we executed the method.
-
Not Synced
The weak reference class is still valid
because we've got a strong reference to it
-
Not Synced
but the actual object behind this is gone.
-
Not Synced
If we're too late and the garbage
collector has killed it
-
Not Synced
it will be null and we return.
-
Not Synced
So it's a level of indirection to see
does this still exist
-
Not Synced
if so can I please have it and then
operate on it as normal
-
Not Synced
and then return becomes weak
reference again.
-
Not Synced
This example program is quite useful when
we look at how it's implemented in the JVM
-
Not Synced
and we'll go through a few things now.
-
Not Synced
First off we'll go through the bytecode.
-
Not Synced
The only point of this slide is to
show it's roughly
-
Not Synced
the same as this.
-
Not Synced
We get our variable.
-
Not Synced
We use our getter.
-
Not Synced
This bit is extra this checkcast.
The reason that bit is extra is
-
Not Synced
because we're using the equivalent of
a template in Java.
-
Not Synced
And the way that's implemented in Java is
it just basically casts everything to an
-
Not Synced
object so that requires extra
compiler information.
-
Not Synced
And this is the extra check.
-
Not Synced
The rest of this we load the reference,
we check to see if it is null,
-
Not Synced
If it's not null we invoke a virtual
function - is it the image?
-
Not Synced
and we return as normal.
-
Not Synced
Essentially the point I'm trying to make
is when we compile this to bytecode
-
Not Synced
this execution happens.
-
Not Synced
This null check happens.
-
Not Synced
This execution happens.
-
Not Synced
And we return.
-
Not Synced
In the actual Java class files we've not
lost anything.
-
Not Synced
This is what it looks like when it's
been JIT'd.
-
Not Synced
Now we've lost lots of things.
-
Not Synced
The JIT has done quite a few clever things
which I'll talk about.
-
Not Synced
First off if we look down here there's
a single branch here.
-
Not Synced
And this is only if our check cast failed
-
Not Synced
We've got comments on the
right hand side.
-
Not Synced
Our get method has been inlined so
we're no longer calling.
-
Not Synced
We seem to have lost our null check,
that's just gone.
-
Not Synced
And again we've got a get field as well.
-
Not Synced
That's no longer a method,
that's been inlined as well.
-
Not Synced
We've also got some other cute things.
-
Not Synced
Those more familiar with AArch64
will understand
-
Not Synced
that the pointers we're using
are 32bit not 64bit.
-
Not Synced
What we're doing is getting a pointer
and shifting it left 3
-
Not Synced
and widening it to a 64bit pointer.
-
Not Synced
We've also got 32bit pointers on a
64bit system as well.
-
Not Synced
So that's saving a reasonable amount
of memory and cache.
-
Not Synced
To summarise. We don't have any
branches or function calls
-
Not Synced
and we've got a lot of inlining.
-
Not Synced
We did have function calls in the
class file so it's the JVM;
-
Not Synced
it's the JIT that has done this.
-
Not Synced
We've got no null checks either and I'm
going to talk through this now.
-
Not Synced
The null check elimination is quite a
clever feature in Java and other programs.
-
Not Synced
The idea behind null check elimination is
-
Not Synced
most of the time this object is not
going to be null.
-
Not Synced
If this object is null the operating
system knows this quite quickly.
-
Not Synced
So if you try to dereference a null
pointer you'll get either a SIGSEGV or
-
Not Synced
a SIGBUS depending on a
few circumstances.
-
Not Synced
That goes straight back to the JVM
-
Not Synced
and the JVM knows where the null
exception took place.
-
Not Synced
Because it knows where it took
place it can look this up
-
Not Synced
and unwind it as part of an exception.
-
Not Synced
Those null checks just go.
Completely gone.
-
Not Synced
Most of the time this works and you are
saving a reasonable amount of execution.
-
Not Synced
I'll talk about when it doesn't work
in a second.
-
Not Synced
That's reasonably clever. We have similar
programming techniques in other places
-
Not Synced
even the Linux kernel for instance when
you copy data to and from user space
-
Not Synced
it does pretty much identical
the same thing.
-
Not Synced
It has an exception unwind table and it
knows if it catches a page fault on
-
Not Synced
this particular program counter
it can deal with it because it knows
-
Not Synced
the program counter and it knows
conceptually what it was doing.
-
Not Synced
In a similar way the JIT know what its
doing to a reasonable degree.
-
Not Synced
It can handle the null check elimination.
-
Not Synced
I mentioned the sneaky one. We've got
essentially 32bit pointers
-
Not Synced
on a 64bit system.
-
Not Synced
Most of the time in Java people typically
specify heap size smaller than 32GB.
-
Not Synced
Which is perfect if you want to use 32bit
pointers and left shift 3.
-
Not Synced
Because that gives you 32GB of
addressable memory.
-
Not Synced
That's a significant memory saving because
otherwise a lot of things would double up.
-
Not Synced
There's a significant number of pointers
in Java.
-
Not Synced
The one that should make people
jump out of their seat is
-
Not Synced
the fact that most methods in Java are
actually virtual.
-
Not Synced
So what the JVM has actually done is
inlined a virtual function.
-
Not Synced
A virtual function is essentially a
function were you don't know where
-
Not Synced
you're going until run time.
-
Not Synced
You can have several different classes
and they share the same virtual function
-
Not Synced
in the base class and dependent upon
which specific class you're running
-
Not Synced
different virtual functions will
get executed.
-
Not Synced
In C++ that will be a read from a V table
and then you know where to go.
-
Not Synced
The JVM's inlined it.
-
Not Synced
We've saved a memory load.
-
Not Synced
We've saved a branch as well
-
Not Synced
The reason the JVM can inline it is
because the JVM knows
-
Not Synced
every single class that has been loaded.
-
Not Synced
So it knows that although this looks
polymorphic to the casual programmer
-
Not Synced
It actually is monomorphic.
The JVM knows this.
-
Not Synced
Because it knows this it can be clever.
And this is really clever.
-
Not Synced
That's a significant cost saving.
-
Not Synced
This is all great. I've already mentioned
the null check elimination.
-
Not Synced
We're taking a signal as most of you know
if we do that a lot it's going to be slow.
-
Not Synced
Jumping into kernel, into user,
bouncing around.
-
Not Synced
The JVM also has a notion of
'OK I've been a bit too clever now;
-
Not Synced
I need to back off a bit'
-
Not Synced
Also there's nothing stopping the user
loading more classes
-
Not Synced
and rendering the monomorphic
assumption invalid.
-
Not Synced
So the JVM needs to have a notion of
backpeddling and go
-
Not Synced
'Ok I've gone to far and need to
deoptimise'
-
Not Synced
The JVM has the ability to deoptimise.
-
Not Synced
In other words it essentially knows that
for certain code paths everything's OK.
-
Not Synced
But for certain new objects it can't get
away with these tricks.
-
Not Synced
By the time the new objects are executed
they are going to be safe.
-
Not Synced
There are ramifications for this.
This is the important thing to consider
-
Not Synced
with something like Java and other
languages and other virtual machines.
-
Not Synced
If you're trying to profile this it means
there is a very significant ramification.
-
Not Synced
You can have the same class and
method JIT'd multiple ways
-
Not Synced
and executed at the same time.
-
Not Synced
So if you're trying to find a hot spot
the program counter's nodding off.
-
Not Synced
Because you can refer to the same thing
in several different ways.
-
Not Synced
This is quite common as well as
deoptimisation does take place.
-
Not Synced
That's something to bear in mind with JVM
and similar runtime environments.
-
Not Synced
You can get a notion of what the JVM's
trying to do.
-
Not Synced
You can ask it nicely and add a print
compilation option
-
Not Synced
and it will tell you what it's doing.
-
Not Synced
This is reasonably verbose.
-
Not Synced
Typically what happens is the JVM gets
excited JIT'ing everything
-
Not Synced
and optimising everything then
it settles down.
-
Not Synced
Until you load something new
and it gets excited again.
-
Not Synced
There's a lot of logs. This is mainly
useful for debugging but
-
Not Synced
it gives you an appreciation that it's
doing a lot of work.
-
Not Synced
You can go even further with a log
compilation option.
-
Not Synced
That produces a lot of XML and that is
useful for people debugging the JVM as well.
-
Not Synced
It's quite handy to get an idea of
what's going on.
-
Not Synced
If that is not enough information you
also have the ability to go even further.
-
Not Synced
This is beyond the limit of my
understanding.
-
Not Synced
I've gone into this little bit just to
show you what can be done.
-
Not Synced
You have release builds of OpenJDK
and they have debug builds of OpenJDK.
-
Not Synced
The release builds will by default turn
off a lot of the diagnostic options.
-
Not Synced
You can switch them back on again.
-
Not Synced
When you do you can also gain insight
into the actual, it's colloquially
-
Not Synced
referred to as the C2 JIT,
the compiler there.
-
Not Synced
You can see, for instance, objects in
timelines and visualize them
-
Not Synced
as they're being optimised at various
stages and various things.
-
Not Synced
So this is based on a masters thesis
by Thomas Würthinger.
-
Not Synced
This is something you can play with as
well and see how far the optimiser goes.
-
Not Synced
And it's also good for people hacking
with the JVM.
-
Not Synced
I'll move onto some stuff we did.
-
Not Synced
Last year we were working on the
big data. Relatively new architecture
-
Not Synced
ARM64, it's called AArch64 in OpenJDK
land but ARM64 in Debian land.
-
Not Synced
We were a bit concerned because
everything's all shiny and new.
-
Not Synced
Has it been optimised correctly?
-
Not Synced
Are there any obvious things
we need to optimise?
-
Not Synced
And we're also interested because
everything was so shiny and new
-
Not Synced
in the whole system.
-
Not Synced
Not just the JVM but the glibc and
the kernel as well.
-
Not Synced
So how do we get a view of all of this?
-
Not Synced
I gave a quick talk before at the Debian
mini-conf before last [2014] about perf
-
Not Synced
so decided we could try and do some
clever things with Linux perf
-
Not Synced
and see if we could get some actual useful
debugging information out.
-
Not Synced
We have the flame graphs that are quite
well known.
-
Not Synced
We also have some previous work, Johannes
had a special perf map agent that
-
Not Synced
could basically hook into perf and it
would give you a nice way of running
-
Not Synced
perf-top for want of a better expression
and viewing the top Java function names.
-
Not Synced
This is really good work and it's really
good for a particular use case
-
Not Synced
if you just want to do a quick snap shot
once and see in that snap shot
-
Not Synced
where the hotspots where.
-
Not Synced
For a prolonged work load with all
the functions being JIT'd multiple ways
-
Not Synced
with the optimisation going on and
everything moving around
-
Not Synced
it require a little bit more information
to be captured.
-
Not Synced
I decided to do a little bit of work on a
very similar thing to perf-map-agent
-
Not Synced
but an agent that would capture it over
a prolonged period of time.
-
Not Synced
Here's an example Flame graph, these are
all over the internet.
-
Not Synced
This is the SHA1 computation example that
I gave at the beginning.
-
Not Synced
As expected the VM intrinsic SHA1 is the
top one.
-
Not Synced
Not expected by me was this quite
significant chunk of CPU execution time.
-
Not Synced
And there was a significant amount of
time being spent copying memory
-
Not Synced
from the mmapped memory
region into a heap
-
Not Synced
and then that was passed to
the crypto engine.
-
Not Synced
So we're doing a ton of memory copies for
no good reason.
-
Not Synced
That essentially highlighted an example.
-
Not Synced
That was an assumption I made about Java
to begin with which was if you do
-
Not Synced
the equivalent of mmap it should just
work like mmap right?
-
Not Synced
You should just be able to address the
memory. That is not the case.
-
Not Synced
If you've got a file mapping object and
you try to address it it has to be copied
-
Not Synced
into safe heap memory first. And that is
what was slowing down the programs.
-
Not Synced
If that was omitted you could make
the SHA1 computation even quicker.
-
Not Synced
So that would be the logical target you
would want to optimise.
-
Not Synced
I wanted to extend Johannes' work
with something called a
-
Not Synced
Java Virtual Machine Tools Interface
profiling agent.
-
Not Synced
This is part of the Java Virtual Machine
standard as you can make a special library
-
Not Synced
and then hook this into the JVM.
-
Not Synced
And the JVM can expose quite a few
things to the library.
-
Not Synced
It exposes a reasonable amount of
information as well.
-
Not Synced
Perf as well has the ability to look
at map files natively.
-
Not Synced
If you are profiling JavaScript, or
something similar, I think the
-
Not Synced
Google V8 JavaScript engine will write
out a special map file that says
-
Not Synced
these program counter addresses correspond
to these function names.
-
Not Synced
I decided to use that in a similar way to
what Johannes did for the extended
-
Not Synced
profiling agent but I also decided to
capture some more information as well.
-
Not Synced
I decided to capture the disassembly
so when we run perf annotate
-
Not Synced
we can see the actual JVM bytecode
in our annotation.
-
Not Synced
We can see how it was JIT'd at the
time when it was JIT'd.
-
Not Synced
We can see where the hotspots where.
-
Not Synced
And that's good. But we can go
even better.
-
Not Synced
We can run an annotated trace that
contains the Java class,
-
Not Synced
the Java method and the bytecode all in
one place at the same time.
-
Not Synced
You can see everything from the JVM
at the same place.
-
Not Synced
This works reasonably well because the
perf interface is extremely extensible.
-
Not Synced
And again we can do entire
system optimisation.
-
Not Synced
The bits in red here are the Linux kernel.
-
Not Synced
Then we got into libraries.
-
Not Synced
And then we got into Java and more
libraries as well.
-
Not Synced
So we can see everything from top to
bottom in one fell swoop.
-
Not Synced
This is just a quick slide showing the
mechanisms employed.
-
Not Synced
Essentially we have this agent which is
a shared object file.
-
Not Synced
And this will spit out useful files here
in a standard way.
-
Not Synced
And the Linux perf basically just records
the perf data dump file as normal.
-
Not Synced
We have 2 sets of recording going on.
-
Not Synced
To report it it's very easy to do
normal reporting with the PID map.
-
Not Synced
This is just out of the box, works with
the Google V8 engine as well.
-
Not Synced
If you want to do very clever annotations
perf has the ability to have
-
Not Synced
Python scripts passed to it.
-
Not Synced
So you can craft quite a dodgy Python
script and that can interface
-
Not Synced
with the perf annotation output.
-
Not Synced
That's how I was able to get the extra
Java information in the same annotation.
-
Not Synced
And this is really easy to do; it's quite
easy to knock the script up.
-
Not Synced
And again the only thing we do for this
profiling is we hook in the profiling
-
Not Synced
agent which dumps out various things.
-
Not Synced
We preserve the frame pointer because
that makes things considerably easier
-
Not Synced
on winding. This will effect
performance a little bit.
-
Not Synced
And again when we're reporting we just
hook in a Python script.
-
Not Synced
It's really easy to hook everything in
and get it working.
-
Not Synced
At the moment we have a JVMTI agent. It's
actually on http://git.linaro.org now.
-
Not Synced
Since I gave this talk Google have
extended perf anyway so it will do
-
Not Synced
quite a lot of similar things out of the
box anyway.
-
Not Synced
It's worth having a look at the
latest perf.
-
Not Synced
These techniques in this slide deck can be
used obviously in other JITs quite easily.
-
Not Synced
The fact that perf is so easy to extend
with scripts can be useful
-
Not Synced
for other things.
-
Not Synced
And OpenJDK has a significant amount of
cleverness associated with it that
-
Not Synced
I thought was very surprising and good.
So that's what I covered in the talk.
-
Not Synced
These are basically references to things
like command line arguments
-
Not Synced
and the Flame graphs and stuff like that.
-
Not Synced
If anyone is interested in playing with
OpenJDK on ARM64 I'd suggest going here:
-
Not Synced
http://openjdk.linaro.org
Where the most recent builds are.
-
Not Synced
Obviously fixes are going in upstream and
they're going into distributions as well.
-
Not Synced
They're included in OpenJDK so it should
be good as well.
-
Not Synced
I've run through quite a few fundamental
things reasonably quickly.
-
Not Synced
I'd be happy to accept any questions
or comments
-
Not Synced
And if you want to talk to me privately
about Java afterwards feel free to
-
Not Synced
when no-one's looking.
-
Not Synced
[Audience] Applause
-
Not Synced
[Audience] It's not really a question so
much as a comment.
-
Not Synced
Last mini-Deb conf we had a talk about
using the JVM with other languages.
-
Not Synced
And it seems to me that all this would
apply even if you hate Java programming
-
Not Synced
language and want to write in, I don't
know, lisp or something instead
-
Not Synced
if you've got a lisp system that can
generate JVM bytecode.
-
Not Synced
Yeah, totally. And the other
big data language we looked at was Scala.
-
Not Synced
It uses the JVM back end but a completely
different language on the front.
-
Not Synced
Cheers guys.