Java_the_good_bits.webm

Edit subtitles

0:02 - 0:03

Onto the second talk of the day.
0:11 - 0:14

Steve Capper is going to tell us
about the good bits of Java
0:15 - 0:17

They do exist
0:17 - 0:21

[Audience] Could this have been a
lightening talk? [Audience laughter]
0:23 - 0:26

Believe it or not we've got some
good stuff here.
0:27 - 0:30

I was as skeptical as you guys
when I first looked.
0:31 - 0:35

First many apologies for not attending
the mini-conf last year
0:35 - 0:40

I was unfortunately ill on the day
I was due to give this talk.
0:44 - 0:46

Let me figure out how to use a computer.
1:01 - 1:03

Sorry about this.
1:13 - 1:16

There we go; it's because
I've not woken up.
1:20 - 1:27

Last year I worked at Linaro in the
Enterprise group and we performed analysis
1:28 - 1:32

on so called 'Big Data' application sets.
1:32 - 1:37

As many of you know quite a lot of these
big data applications are written in Java.
1:38 - 1:43

I'm from ARM and we were very interested
in 64bit ARM support.
1:43 - 1:47

So this is mainly AArch64 examples
for things like assembler
1:48 - 1:53

but most of the messages are
pertinent for any architecture.
1:54 - 1:58

These good bits are shared between
most if not all the architectures.
1:59 - 2:03

Whilst trying to optimise a lot of
these big data applications
2:03 - 2:06

I stumbled a across quite a few things in
the JVM and I thought
2:06 - 2:11

'actually that's really clever;
that's really cool'
2:13 - 2:17

So I thought that would make a good
basis for an interesting talk.
2:17 - 2:20

This talk is essentially some of the
clever things I found in the
2:20 - 2:25

Java Virtual Machine; these
optimisations are in OpenJDK.
2:26 - 2:32

Source is available it's all there,
readily available and in play now.
2:33 - 2:38

I'm going to finish with some of the
optimisation work we did with Java.
2:38 - 2:43

People who know me will know
I'm not a Java zealot.
2:43 - 2:48

I don't particularly believe in
programming in a language over another one
2:48 - 2:51

So to make it clear from the outset
I'm not attempting to convert
2:51 - 2:54

anyone to Java programmers.
2:54 - 2:57

I'm just going to highlight a few salient
things in the Java Virtual Machine
2:57 - 3:00

which I found to be quite clever and
interesting
3:00 - 3:04

and I'll try and talk through them
with my understanding of them.
3:04 - 3:09

Let's jump straight in and let's
start with an example.
3:10 - 3:14

This is a minimal example for
computing a SHA1 sum of a file.
3:15 - 3:20

I've omitted some of the checking in the
beginning of the function see when
3:20 - 3:22

command line parsing and that sort of
thing.
3:22 - 3:25

I've highlighted the salient
points in red.
3:25 - 3:30

Essentially we instantiate a SHA1
crypto message service digest.
3:30 - 3:35

And we do the equivalent in
Java of an mmap.
3:36 - 3:38

Get it all in memory.
3:38 - 3:42

And then we just put this status straight
into the crypto engine.
3:42 - 3:47

And eventually at the end of the
program we'll spit out the SHA1 hash.
3:47 - 3:49

It's a very simple program.
3:49 - 3:53

It's basically mmap, SHA1, output
the hash afterwards.
3:56 - 4:03

In order to concentrate on the CPU
aspect rather than worry about IO
4:04 - 4:07

I decided to cheat a little bit by
setting this up.
4:08 - 4:15

I decided to use a sparse file. As many of
you know a sparse file is a file that not
4:15 - 4:20

all the contents are stored necessarily
on disc. The assumption is that the bits
4:20 - 4:26

that aren't stored are zero. For instance
on Linux you can create a 20TB sparse file
4:26 - 4:31

on a 10MB file system and use it as
normal.
4:31 - 4:34

Just don't write too much to it otherwise
you're going to run out of space.
4:34 - 4:41

The idea behind using a sparse file is I'm
just focusing on the computational aspects
4:41 - 4:45

of the SHA1 sum. I'm not worried about
the file system or anything like that.
4:45 - 4:49

I don't want to worry about the IO. I
just want to focus on the actual compute.
4:49 - 4:53

In order to set up a sparse file I used
the following runes.
4:53 - 4:57

The important point is that you seek
and the other important point
4:57 - 5:01

is you set a count otherwise you'll
fill your disc up.
5:03 - 5:09

I decided to run this against firstly
let's get the native SHA1 sum command
5:09 - 5:15

that's built into Linux and let's
normalise these results and say that's 1.0
5:17 - 5:21

I used an older version of the OpenJDK
and ran the Java program
5:21 - 5:28

and that's 1.09 times slower than the
reference command. That's quite good.
5:30 - 5:39

Then I used the new OpenJDK, this is now
the current JDK as this is a year on.
5:39 - 5:45

And 0.21 taken. It's significantly faster.
5:46 - 5:51

I've stressed that I've done nothing
surreptitious in the Java program.
5:51 - 5:54

It is mmap, compute, spit result out.
5:56 - 6:01

But the OpenJDK has essentially got
some more context information.
6:01 - 6:04

I'll talk about that as we go through.
6:06 - 6:11

Before when I started Java I had a very
simplistic view of Java.
6:11 - 6:17

Traditionally Java is taught as a virtual
machine that runs bytecode.
6:17 - 6:21

Now when you compile a Java program it
compiles into bytecode.
6:21 - 6:25

The older versions of the Java Virtual
Machine would interpret this bytecode
6:25 - 6:32

and then run through. Newer versions
would employ a just-in-time engine
6:32 - 6:38

and try and compile this bytecode
into native machine code.
6:39 - 6:43

That is not the only thing that goes on
when you run a Java program.
6:43 - 6:47

There is some extra optimisations as well.
So this alone would not account for
6:47 - 6:52

the newer version of the SHA1
sum being significantly faster
6:52 - 6:56

than the distro supplied one.
6:56 - 7:01

Java knows about context. It has a class
library and these class libraries
7:01 - 7:04

have reasonably well defined purposes.
7:04 - 7:08

We have classes that provide
crypto services.
7:08 - 7:11

We have some misc unsafe that every
single project seems to pull in their
7:11 - 7:13

project when they're not supposed to.
7:13 - 7:17

These have well defined meanings.
7:17 - 7:21

These do not necessarily have to be
written in Java.
7:21 - 7:24

They come as Java classes,
they come supplied.
7:24 - 7:29

But most JVMs now have a notion
of a virtual machine intrinsic.
7:29 - 7:35

And the virtual machine intrinsic says ok
please do a SHA1 in the best possible way
7:35 - 7:39

that your implementation allows. This is
something done automatically by the JVM.
7:39 - 7:43

You don't ask for it. If the JVM knows
what it's running on and it's reasonably
7:43 - 7:48

recent this will just happen
for you for free.
7:48 - 7:50

And there's quite a few classes
that do this.
7:50 - 7:54

There's quite a few clever things with
atomics, there's crypto,
7:54 - 7:58

there's mathematical routines as well.
Most of these routines in the
7:58 - 8:03

class library have a well defined notion
of a virtual machine intrinsic
8:03 - 8:07

and they do run reasonably optimally.
8:07 - 8:11

They are a subject of continuous
optimisation as well.
8:12 - 8:16

We've got some runes that are
presented on the slides here.
8:17 - 8:21

These are quite useful if you
are interested in
8:21 - 8:24

how these intrinsics are made.
8:24 - 8:29

You can ask the JVM to print out a lot of
the just-in-time compiled code.
8:29 - 8:35

You can ask the JVM to print out the
native methods as well as these intrinsics
8:35 - 8:40

and in this particular case after sifting
through about 5MB of text
8:40 - 8:45

I've come across this particular SHA1 sum
implementation.
8:45 - 8:52

This is AArch64. This is employing the
cryptographic extensions
8:52 - 8:54

in the architecture.
8:54 - 8:57

So it's essentially using the CPU
instructions which would explain why
8:57 - 9:00

it's faster. But again it's done
all this automatically.
9:00 - 9:06

This did not require any specific runes
or anything to activate.
9:08 - 9:12

We'll see a bit later on how you can
more easily find the hot spots
9:12 - 9:15

rather than sifting through a lot
of assembler.
9:15 - 9:19

I've mentioned that the cryptographic
engine is employed and again
9:19 - 9:23

this routine was generated at run
time as well.
9:23 - 9:28

This is one of the important things about
certain execution of amps like Java.
9:28 - 9:31

You don't have to know everything at
compile time.
9:31 - 9:35

You know a lot more information at
run time and you can use that
9:35 - 9:37

in theory to optimise.
9:37 - 9:40

You can switch off these clever routines.
9:40 - 9:43

For instance I've got a deactivate
here and we get back to the
9:43 - 9:47

slower performance we expected.
9:47 - 9:53

Again, this particular set of routines is
present in OpenJDK,
9:53 - 9:57

I think for all the architectures that
support it.
9:57 - 10:01

We get this optimisation for free on X86
and others as well.
10:01 - 10:03

It works quite well.
10:03 - 10:08

That was one surprise I came across
as the instrinsics.
10:08 - 10:13

One thing I thought it would be quite
good to do would be to go through
10:13 - 10:18

a slightly more complicated example.
And use this example to explain
10:18 - 10:21

a lot of other things that happen
in the JVM as well.
10:21 - 10:24

I will spend a bit of time going through
this example
10:24 - 10:30

and explain roughly the notion of what
it's supposed to be doing.
10:33 - 10:39

This is an imaginary method that I've
contrived to demonstrate a lot of points
10:39 - 10:43

in the fewest possible lines of code.
10:43 - 10:45

I'll start with what it's meant to do.
10:45 - 10:51

This is meant to be a routine that gets a
reference to something and let's you know
10:51 - 10:56

whether or not it's an image and in a
hypothetical cache.
10:58 - 11:02

I'll start with the important thing
here the weak reference.
11:02 - 11:09

In Java and other garbage collected
languages we have the notion of references
11:09 - 11:13

Most of the time when you are running a
Java program you have something like a
11:13 - 11:19

variable name and that is in the current
execution context that is referred to as a
11:19 - 11:24

strong reference to the object. In other
words I can see it. I am using it.
11:24 - 11:27

Please don't get rid of it.
Bad things will happen if you do.
11:27 - 11:31

So the garbage collector knows
not to get rid of it.
11:31 - 11:36

In Java and other languages you also
have the notion of a weak reference.
11:36 - 11:40

This is essentially the programmer saying
to the virtual machine
11:40 - 11:44

"Look I kinda care about this but
just a little bit."
11:44 - 11:49

"If you want to get rid of it feel free
to but please let me know."
11:49 - 11:54

This is why this is for a CacheClass.
For instance the JVM in this particular
11:54 - 12:01

case could decide that it's running quite
low on memory this particular xMB image
12:01 - 12:04

has not been used for a while it can
garbage collect it.
12:04 - 12:09

The important thing is how we go about
expressing this in the language.
12:09 - 12:13

We can't just have a reference to the
object because that's a strong reference
12:13 - 12:18

and the JVM will know it can't get
rid of this because the program
12:18 - 12:19

can see it actively.
12:19 - 12:24

So we have a level of indirection which
is known as a weak reference.
12:25 - 12:29

We have this hypothetical CacheClass
that I've devised.
12:29 - 12:32

At this point it is a weak reference.
12:32 - 12:36

Then we get it. This is calling the weak
reference routine.
12:36 - 12:41

Now it becomes a strong reference so
it's not going to be garbage collected.
12:41 - 12:45

When we get to the return path it becomes
a weak reference again
12:45 - 12:48

because our strong reference
has disappeared.
12:48 - 12:51

The salient points in this example are:
12:51 - 12:54

We're employing a method to get
a reference.
12:54 - 12:57

We're checking an item to see if
it's null.
12:57 - 13:01

So let's say that the JVM decided to
garbage collect this
13:02 - 13:04

before we executed the method.
13:04 - 13:09

The weak reference class is still valid
because we've got a strong reference to it
13:09 - 13:12

but the actual object behind this is gone.
13:12 - 13:15

If we're too late and the garbage
collector has killed it
13:15 - 13:18

it will be null and we return.
13:18 - 13:22

So it's a level of indirection to see
does this still exist
13:22 - 13:28

if so can I please have it and then
operate on it as normal
13:28 - 13:31

and then return becomes weak
reference again.
13:31 - 13:37

This example program is quite useful when
we look at how it's implemented in the JVM
13:37 - 13:40

and we'll go through a few things now.
13:40 - 13:44

First off we'll go through the bytecode.
13:44 - 13:49

The only point of this slide is to
show it's roughly
13:49 - 13:54

the same as this.
13:54 - 13:56

We get our variable.
13:56 - 13:59

We use our getter.
13:59 - 14:04

This bit is extra this checkcast.
The reason that bit is extra is
14:04 - 14:15

because we're using the equivalent of
a template in Java.
14:15 - 14:19

And the way that's implemented in Java is
it just basically casts everything to an
14:19 - 14:23

object so that requires extra
compiler information.
14:23 - 14:25

And this is the extra check.
14:25 - 14:31

The rest of this we load the reference,
we check to see if it is null,
14:31 - 14:35

If it's not null we invoke a virtual
function - is it the image?
14:35 - 14:38

and we return as normal.
14:38 - 14:43

Essentially the point I'm trying to make
is when we compile this to bytecode
14:43 - 14:45

this execution happens.
14:45 - 14:47

This null check happens.
14:47 - 14:48

This execution happens.
14:48 - 14:50

And we return.
14:50 - 14:55

In the actual Java class files we've not
lost anything.
14:55 - 14:58

This is what it looks like when it's
been JIT'd.
14:58 - 15:01

Now we've lost lots of things.
15:01 - 15:06

The JIT has done quite a few clever things
which I'll talk about.
15:06 - 15:11

First off if we look down here there's
a single branch here.
15:11 - 15:15

And this is only if our check cast failed
15:17 - 15:20

We've got comments on the
right hand side.
15:20 - 15:26

Our get method has been inlined so
we're no longer calling.
15:27 - 15:31

We seem to have lost our null check,
that's just gone.
15:32 - 15:36

And again we've got a get field as well.
15:36 - 15:40

That's no longer a method,
that's been inlined as well.
15:40 - 15:42

We've also got some other cute things.
15:42 - 15:46

Those more familiar with AArch64
will understand
15:46 - 15:50

that the pointers we're using
are 32bit not 64bit.
15:50 - 15:54

What we're doing is getting a pointer
and shifting it left 3
15:54 - 15:57

and widening it to a 64bit pointer.
15:57 - 16:02

We've also got 32bit pointers on a
64bit system as well.
16:02 - 16:06

So that's saving a reasonable amount
of memory and cache.
16:06 - 16:10

To summarise. We don't have any
branches or function calls
16:10 - 16:13

and we've got a lot of inlining.
16:13 - 16:16

We did have function calls in the
class file so it's the JVM;
16:16 - 16:18

it's the JIT that has done this.
16:18 - 16:22

We've got no null checks either and I'm
going to talk through this now.
16:24 - 16:29

The null check elimination is quite a
clever feature in Java and other programs.
16:30 - 16:33

The idea behind null check elimination is
16:33 - 16:37

most of the time this object is not
going to be null.
16:38 - 16:43

If this object is null the operating
system knows this quite quickly.
16:43 - 16:48

So if you try to dereference a null
pointer you'll get either a SIGSEGV or
16:48 - 16:51

a SIGBUS depending on a
few circumstances.
16:51 - 16:53

That goes straight back to the JVM
16:53 - 16:58

and the JVM knows where the null
exception took place.
16:58 - 17:02

Because it knows where it took
place it can look this up
17:02 - 17:05

and unwind it as part of an exception.
17:05 - 17:10

Those null checks just go.
Completely gone.
17:10 - 17:15

Most of the time this works and you are
saving a reasonable amount of execution.
17:16 - 17:20

I'll talk about when it doesn't work
in a second.
17:20 - 17:24

That's reasonably clever. We have similar
programming techniques in other places
17:24 - 17:28

even the Linux kernel for instance when
you copy data to and from user space
17:28 - 17:31

it does pretty much identical
the same thing.
17:31 - 17:36

It has an exception unwind table and it
knows if it catches a page fault on
17:36 - 17:40

this particular program counter
it can deal with it because it knows
17:40 - 17:44

the program counter and it knows
conceptually what it was doing.
17:44 - 17:48

In a similar way the JIT knows what its
doing to a reasonable degree.
17:48 - 17:52

It can handle the null check elimination.
17:53 - 17:57

I mentioned the sneaky one. We've got
essentially 32bit pointers
17:57 - 17:59

on a 64bit system.
17:59 - 18:05

Most of the time in Java people typically
specify heap size smaller than 32GB.
18:05 - 18:10

Which is perfect if you want to use 32bit
pointers and left shift 3.
18:10 - 18:13

Because that gives you 32GB of
addressable memory.
18:13 - 18:19

That's a significant memory saving because
otherwise a lot of things would double up.
18:19 - 18:23

There's a significant number of pointers
in Java.
18:23 - 18:29

The one that should make people
jump out of their seat is
18:29 - 18:32

the fact that most methods in Java are
actually virtual.
18:32 - 18:37

So what the JVM has actually done is
inlined a virtual function.
18:37 - 18:42

A virtual function is essentially a
function were you don't know where
18:42 - 18:43

you're going until run time.
18:43 - 18:47

You can have several different classes
and they share the same virtual function
18:47 - 18:51

in the base class and dependent upon
which specific class you're running
18:51 - 18:54

different virtual functions will
get executed.
18:54 - 19:00

In C++ that will be a read from a V table
and then you know where to go.
19:01 - 19:03

The JVM's inlined it.
19:03 - 19:05

We've saved a memory load.
19:05 - 19:08

We've saved a branch as well
19:08 - 19:12

The reason the JVM can inline it is
because the JVM knows
19:12 - 19:14

every single class that has been loaded.
19:14 - 19:20

So it knows that although this looks
polymorphic to the casual programmer
19:20 - 19:26

It actually is monomorphic.
The JVM knows this.
19:26 - 19:31

Because it knows this it can be clever.
And this is really clever.
19:31 - 19:35

That's a significant cost saving.
19:35 - 19:41

This is all great. I've already mentioned
the null check elimination.
19:41 - 19:47

We're taking a signal as most of you know
if we do that a lot it's going to be slow.
19:47 - 19:51

Jumping into kernel, into user,
bouncing around.
19:51 - 19:56

The JVM also has a notion of
'OK I've been a bit too clever now;
19:56 - 19:58

I need to back off a bit'
19:58 - 20:02

Also there's nothing stopping the user
loading more classes
20:02 - 20:07

and rendering the monomorphic
assumption invalid.
20:07 - 20:10

So the JVM needs to have a notion of
backpeddling and go
20:10 - 20:14

'Ok I've gone to far and need to
deoptimise'
20:14 - 20:17

The JVM has the ability to deoptimise.
20:17 - 20:23

In other words it essentially knows that
for certain code paths everything's OK.
20:23 - 20:27

But for certain new objects it can't get
away with these tricks.
20:27 - 20:32

By the time the new objects are executed
they are going to be safe.
20:32 - 20:35

There are ramifications for this.
This is the important thing to consider
20:35 - 20:40

with something like Java and other
languages and other virtual machines.
20:40 - 20:46

If you're trying to profile this it means
there is a very significant ramification.
20:46 - 20:51

You can have the same class and
method JIT'd multiple ways
20:52 - 20:55

and executed at the same time.
20:55 - 21:00

So if you're trying to find a hot spot
the program counter's nodding off.
21:01 - 21:04

Because you can refer to the same thing
in several different ways.
21:04 - 21:08

This is quite common as well as
deoptimisation does take place.
21:09 - 21:14

That's something to bear in mind with JVM
and similar runtime environments.
21:16 - 21:19

You can get a notion of what the JVM's
trying to do.
21:19 - 21:22

You can ask it nicely and add a print
compilation option
21:22 - 21:25

and it will tell you what it's doing.
21:25 - 21:27

This is reasonably verbose.
21:27 - 21:30

Typically what happens is the JVM gets
excited JIT'ing everything
21:30 - 21:32

and optimising everything then
it settles down.
21:32 - 21:35

Until you load something new
and it gets excited again.
21:35 - 21:38

There's a lot of logs. This is mainly
useful for debugging but
21:38 - 21:42

it gives you an appreciation that it's
doing a lot of work.
21:42 - 21:45

You can go even further with a log
compilation option.
21:45 - 21:50

That produces a lot of XML and that is
useful for people debugging the JVM as well.
21:51 - 21:54

It's quite handy to get an idea of
what's going on.
21:57 - 22:03

If that is not enough information you
also have the ability to go even further.
22:05 - 22:07

This is beyond the limit of my
understanding.
22:07 - 22:11

I've gone into this little bit just to
show you what can be done.
22:11 - 22:17

You have release builds of OpenJDK
and they have debug builds of OpenJDK.
22:17 - 22:24

The release builds will by default turn
off a lot of the diagnostic options.
22:25 - 22:28

You can switch them back on again.
22:28 - 22:33

When you do you can also gain insight
into the actual, it's colloquially
22:33 - 22:37

referred to as the C2 JIT,
the compiler there.
22:37 - 22:42

You can see, for instance, objects in
timelines and visualize them
22:42 - 22:45

as they're being optimised at various
stages and various things.
22:45 - 22:52

So this is based on a masters thesis
by Thomas Würthinger.
22:54 - 22:58

This is something you can play with as
well and see how far the optimiser goes.
23:00 - 23:03

And it's also good for people hacking
with the JVM.
23:05 - 23:08

I'll move onto some stuff we did.
23:10 - 23:16

Last year we were working on the
big data. Relatively new architecture
23:17 - 23:22

ARM64, it's called AArch64 in OpenJDK
land but ARM64 in Debian land.
23:24 - 23:27

We were a bit concerned because
everything's all shiny and new.
23:27 - 23:29

Has it been optimised correctly?
23:29 - 23:31

Are there any obvious things
we need to optimise?
23:31 - 23:34

And we're also interested because
everything was so shiny and new
23:34 - 23:35

in the whole system.
23:35 - 23:39

Not just the JVM but the glibc and
the kernel as well.
23:39 - 23:42

So how do we get a view of all of this?
23:42 - 23:49

I gave a quick talk before at the Debian
mini-conf before last [2014] about perf
23:50 - 23:53

so decided we could try and do some
clever things with Linux perf
23:53 - 23:58

and see if we could get some actual useful
debugging information out.
23:58 - 24:02

We have the flame graphs that are quite
well known.
24:02 - 24:08

We also have some previous work, Johannes
had a special perf map agent that
24:08 - 24:13

could basically hook into perf and it
would give you a nice way of running
24:13 - 24:20

perf-top for want of a better expression
and viewing the top Java function names.
24:22 - 24:25

This is really good work and it's really
good for a particular use case
24:25 - 24:29

if you just want to do a quick snap shot
once and see in that snap shot
24:30 - 24:32

where the hotspots where.
24:32 - 24:38

For a prolonged work load with all
the functions being JIT'd multiple ways
24:38 - 24:42

with the optimisation going on and
everything moving around
24:42 - 24:47

it require a little bit more information
to be captured.
24:47 - 24:51

I decided to do a little bit of work on a
very similar thing to perf-map-agent
24:51 - 24:56

but an agent that would capture it over
a prolonged period of time.
24:56 - 24:59

Here's an example Flame graph, these are
all over the internet.
24:59 - 25:05

This is the SHA1 computation example that
I gave at the beginning.
25:05 - 25:10

As expected the VM intrinsic SHA1 is the
top one.
25:10 - 25:17

Not expected by me was this quite
significant chunk of CPU execution time.
25:17 - 25:21

And there was a significant amount of
time being spent copying memory
25:21 - 25:28

from the mmapped memory
region into a heap
25:28 - 25:31

and then that was passed to
the crypto engine.
25:31 - 25:35

So we're doing a ton of memory copies for
no good reason.
25:35 - 25:39

That essentially highlighted an example.
25:39 - 25:42

That was an assumption I made about Java
to begin with which was if you do
25:42 - 25:45

the equivalent of mmap it should just
work like mmap right?
25:45 - 25:48

You should just be able to address the
memory. That is not the case.
25:48 - 25:54

If you've got a file mapping object and
you try to address it it has to be copied
25:54 - 25:59

into safe heap memory first. And that is
what was slowing down the programs.
25:59 - 26:05

If that was omitted you could make
the SHA1 computation even quicker.
26:05 - 26:09

So that would be the logical target you
would want to optimise.
26:09 - 26:12

I wanted to extend Johannes' work
with something called a
26:12 - 26:16

Java Virtual Machine Tools Interface
profiling agent.
26:17 - 26:23

This is part of the Java Virtual Machine
standard as you can make a special library
26:23 - 26:25

and then hook this into the JVM.
26:25 - 26:28

And the JVM can expose quite a few
things to the library.
26:28 - 26:32

It exposes a reasonable amount of
information as well.
26:32 - 26:39

Perf as well has the ability to look
at map files natively.
26:40 - 26:44

If you are profiling JavaScript, or
something similar, I think the
26:44 - 26:48

Google V8 JavaScript engine will write
out a special map file that says
26:48 - 26:53

these program counter addresses correspond
to these function names.
26:53 - 26:57

I decided to use that in a similar way to
what Johannes did for the extended
26:57 - 27:02

profiling agent but I also decided to
capture some more information as well.
27:05 - 27:10

I decided to capture the disassembly
so when we run perf annotate
27:10 - 27:13

we can see the actual JVM bytecode
in our annotation.
27:13 - 27:17

We can see how it was JIT'd at the
time when it was JIT'd.
27:17 - 27:20

We can see where the hotspots where.
27:20 - 27:23

And that's good. But we can go
even better.
27:23 - 27:29

We can run an annotated trace that
contains the Java class,
27:29 - 27:34

the Java method and the bytecode all in
one place at the same time.
27:34 - 27:39

You can see everything from the JVM
at the same place.
27:39 - 27:44

This works reasonably well because the
perf interface is extremely extensible.
27:44 - 27:48

And again we can do entire
system optimisation.
27:48 - 27:52

The bits in red here are the Linux kernel.
27:52 - 27:55

Then we got into libraries.
27:55 - 27:58

And then we got into Java and more
libraries as well.
27:58 - 28:02

So we can see everything from top to
bottom in one fell swoop.
28:04 - 28:08

This is just a quick slide showing the
mechanisms employed.
28:08 - 28:12

Essentially we have this agent which is
a shared object file.
28:12 - 28:16

And this will spit out useful files here
in a standard way.
28:16 - 28:26

And the Linux perf basically just records
the perf data dump file as normal.
28:27 - 28:30

We have 2 sets of recording going on.
28:30 - 28:35

To report it it's very easy to do
normal reporting with the PID map.
28:35 - 28:41

This is just out of the box, works with
the Google V8 engine as well.
28:41 - 28:45

If you want to do very clever annotations
perf has the ability to have
28:45 - 28:48

Python scripts passed to it.
28:48 - 28:54

So you can craft quite a dodgy Python
script and that can interface
28:54 - 28:55

with the perf annotation output.
28:55 - 29:00

That's how I was able to get the extra
Java information in the same annotation.
29:00 - 29:05

And this is really easy to do; it's quite
easy to knock the script up.
29:05 - 29:10

And again the only thing we do for this
profiling is we hook in the profiling
29:10 - 29:13

agent which dumps out various things.
29:13 - 29:18

We preserve the frame pointer because
that makes things considerably easier
29:18 - 29:21

on winding. This will effect
performance a little bit.
29:21 - 29:26

And again when we're reporting we just
hook in a Python script.
29:26 - 29:30

It's really easy to hook everything in
and get it working.
29:33 - 29:37

At the moment we have a JVMTI agent. It's
actually on http://git.linaro.org now.
29:38 - 29:42

Since I gave this talk Google have
extended perf anyway so it will do
29:42 - 29:45

quite a lot of similar things out of the
box anyway.
29:45 - 29:50

It's worth having a look at the
latest perf.
29:50 - 29:54

These techniques in this slide deck can be
used obviously in other JITs quite easily.
29:54 - 29:59

The fact that perf is so easy to extend
with scripts can be useful
29:59 - 30:01

for other things.
30:01 - 30:06

And OpenJDK has a significant amount of
cleverness associated with it that
30:06 - 30:10

I thought was very surprising and good.
So that's what I covered in the talk.
30:13 - 30:18

These are basically references to things
like command line arguments
30:18 - 30:20

and the Flame graphs and stuff like that.
30:20 - 30:26

If anyone is interested in playing with
OpenJDK on ARM64 I'd suggest going here:
30:26 - 30:31

http://openjdk.linaro.org
Where the most recent builds are.
30:31 - 30:36

Obviously fixes are going in upstream and
they're going into distributions as well.
30:36 - 30:40

They're included in OpenJDK so it should
be good as well.
30:41 - 30:45

I've run through quite a few fundamental
things reasonably quickly.
30:45 - 30:48

I'd be happy to accept any questions
or comments
30:54 - 30:57

And if you want to talk to me privately
about Java afterwards feel free to
30:57 - 30:59

when no-one's looking.
31:07 - 31:13

[Audience] Applause
31:13 - 31:19

[Audience] It's not really a question so
much as a comment.
31:19 - 31:26

Last mini-Deb conf we had a talk about
31:27 - 31:32

using the JVM with other languages.
31:32 - 31:36

And it seems to me that all this would
apply even if you hate Java programming
31:36 - 31:39

language and want to write in, I don't
know, lisp or something instead
31:39 - 31:42

if you've got a lisp system that can
generate JVM bytecode.
31:42 - 31:48

[Presenter] Yeah, totally. And the other
big data language we looked at was Scala.
31:49 - 31:53

It uses the JVM back end but a completely
different language on the front.
32:04 - 32:08

Cheers guys.

Title:: Java_the_good_bits.webm
Video Language:: English
Team:: Debconf
Project:: 2016_miniconf-cambridge16
Duration:: 32:13

	Jeffity edited English subtitles for Java_the_good_bits.webm
	Jeffity edited English subtitles for Java_the_good_bits.webm
	Jeffity edited English subtitles for Java_the_good_bits.webm
	Jeffity edited English subtitles for Java_the_good_bits.webm
	Jeffity edited English subtitles for Java_the_good_bits.webm
	Jeffity edited English subtitles for Java_the_good_bits.webm
	Jeffity edited English subtitles for Java_the_good_bits.webm
	Jeffity edited English subtitles for Java_the_good_bits.webm

Show all

English subtitles

Revisions Compare revisions

Revision 10 Edited

Jeffity
Revision 9 Edited

Jeffity
Revision 8 Edited

Jeffity
Revision 7 Edited

Jeffity
Revision 6 Edited

Jeffity
Revision 5 Edited

Jeffity
Revision 4 Edited

Jeffity
Revision 3 Edited

Jeffity
Revision 2 Edited

Jeffity
Revision 1 Edited

Jeffity

	Revision Number	Author	Created
	10	Jeffity
	9	Jeffity
	8	Jeffity
	7	Jeffity
	6	Jeffity
	5	Jeffity
	4	Jeffity
	3	Jeffity
	2	Jeffity
	1	Jeffity

Java_the_good_bits.webm

Revisions Compare revisions

Our website uses cookies

Operating cookies (Required)