< Return to Video

Java_the_good_bits.webm

  • Not Synced
    Onto the second talk of the day.
  • Not Synced
    Steve Capper is going to tell us
    about the good bits of Java
  • Not Synced
    They do exist
  • Not Synced
    [Audience] Could this have been a
    lightening talk? [Audience laughter]
  • Not Synced
    Believe it or not we've got some
    good stuff here.
  • Not Synced
    I was as skeptical as you guys
    when I first looked.
  • Not Synced
    First many apologies for not attending
    the mini-conf last year
  • Not Synced
    I was unfortunately ill on the day
    I was due to give this talk.
  • Not Synced
    Let me figure out how to use a computer.
  • Not Synced
    Sorry about this.
  • Not Synced
    There we go; it's because
    I've not woken up.
  • Not Synced
    Last year I worked at Linaro in the
    Enterprise group and we performed analysis
  • Not Synced
    on so called 'Big Data' application sets.
  • Not Synced
    As many of you know quite a lot of these
    big data applications are written in Java.
  • Not Synced
    I'm from ARM and we were very interested
    in 64bit ARM support.
  • Not Synced
    So this is mainly AArch64 examples
    for things like assembler
  • Not Synced
    but most of the messages are
    pertinent for any architecture.
  • Not Synced
    These good bits are shared between
    most if not all the architectures.
  • Not Synced
    Whilst trying to optimise a lot of
    these big data applications
  • Not Synced
    I stumbled a across quite a few things in
    the JVM and I thought
  • Not Synced
    'actually that's really clever;
    that's really cool'
  • Not Synced
    So I thought that would make a good
    basis for an interesting talk.
  • Not Synced
    This talk is essentially some of the
    clever things I found in the
  • Not Synced
    Java Virtual Machine; these
    optimisations are in OpenJDK.
  • Not Synced
    Source is available it's all there,
    readily available and in play now.
  • Not Synced
    I'm going to finish with some of the
    optimisation work we did with Java.
  • Not Synced
    People who know me will know
    I'm not a Java zealot.
  • Not Synced
    I don't particularly believe in
    programming in a language over another one
  • Not Synced
    So to make it clear from the outset
    I'm not attempting to convert
  • Not Synced
    anyone to Java programmers.
  • Not Synced
    I'm just going to highlight a few salient
    things in the Java Virtual Machine
  • Not Synced
    which I found to be quite clever and
    interesting
  • Not Synced
    and I'll try and talk through them
    with my understanding of them.
  • Not Synced
    Let's jump straight in and let's
    start with an example.
  • Not Synced
    This is a minimal example for
    computing a SHA1 sum of a file.
  • Not Synced
    I've omitted some of the checking in the
    beginning of the function see when
  • Not Synced
    command line parsing and that sort of
    thing.
  • Not Synced
    I've highlighted the salient
    points in red.
  • Not Synced
    Essentially we instantiate a SHA1
    crypto message service digest.
  • Not Synced
    And we do the equivalent in
    Java of an mmap.
  • Not Synced
    Get it all in memory.
  • Not Synced
    And then we just put this status straight
    into the crypto engine.
  • Not Synced
    And eventually at the end of the
    program we'll spit out the SHA1 hash.
  • Not Synced
    It's a very simple program.
  • Not Synced
    It's basically mmap, SHA1, output
    the hash afterwards.
  • Not Synced
    In order to concentrate on the CPU
    aspect rather than worry about IO
  • Not Synced
    I decided to cheat a little bit by
    setting this up.
  • Not Synced
    I decided to use a sparse file. As many of
    you know a sparse file is a file that not
  • Not Synced
    all the contents are stored necessarily
    on disc. The assumption is that the bits
  • Not Synced
    that aren't stored are zero. For instance
    on Linux you can create a 20TB sparse file
  • Not Synced
    on a 10MB file system and use it as
    normal.
  • Not Synced
    Just don't write too much to it otherwise
    you're going to run out of space.
  • Not Synced
    The idea behind using a sparse file is I'm
    just focusing on the computational aspects
  • Not Synced
    of the SHA1 sum. I'm not worried about
    the file system or anything like that.
  • Not Synced
    I don't want to worry about the IO. I
    just want to focus on the actual compute.
  • Not Synced
    In order to set up a sparse file I used
    the following runes.
  • Not Synced
    The important point is that you seek
    and the other important point
  • Not Synced
    is you set a count otherwise you'll
    fill your disc up.
  • Not Synced
    I decided to run this against firstly
    let's get the native SHA1 sum command
  • Not Synced
    that's built into Linux and let's
    normalise these results and say that's 1.0
  • Not Synced
    I used an older version of the OpenJDK
    and ran the Java program
  • Not Synced
    and that's 1.09 times slower than the
    reference command. That's quite good.
  • Not Synced
    Then I used the new OpenJDK, this is now
    the current JDK as this is a year on.
  • Not Synced
    And 0.21 taken. It's significantly faster.
  • Not Synced
    I've stressed that I've done nothing
    surreptitious in the Java program.
  • Not Synced
    It is mmap, compute, spit result out.
  • Not Synced
    But the OpenJDK has essentially got
    some more context information.
  • Not Synced
    I'll talk about that as we go through.
  • Not Synced
    Before when I started Java I had a very
    simplistic view of Java.
  • Not Synced
    Traditionally Java is taught as a virtual
    machine that runs bytecode.
  • Not Synced
    Now when you compile a Java program it
    compiles into bytecode.
  • Not Synced
    The older versions of the Java Virtual
    Machine would interpret this bytecode
  • Not Synced
    and then run through. Newer versions
    would employ a just-in-time engine
  • Not Synced
    and try and compile this bytecode
    into native machine code.
  • Not Synced
    That is not the only thing that goes on
    when you run a Java program.
  • Not Synced
    There is some extra optimisations as well.
    So this alone would not account for
  • Not Synced
    the newer version of the SHA1
    sum being significantly faster
  • Not Synced
    than the distro supplied one.
  • Not Synced
    Java knows about context. It has a class
    library and these class libraries
  • Not Synced
    have reasonably well defined purposes.
  • Not Synced
    We have classes that provide
    crypto services.
  • Not Synced
    We have some misc unsafe that every
    single project seems to pull in their
  • Not Synced
    project when they're not supposed to.
  • Not Synced
    These have well defined meanings.
  • Not Synced
    These do not necessarily have to be
    written in Java.
  • Not Synced
    They come as Java classes,
    they come supplied.
  • Not Synced
    But most JVMs now have a notion
    of a virtual machine intrinsic.
  • Not Synced
    And the virtual machine intrinsic says ok
    please do a SHA1 in the best possible way
  • Not Synced
    that your implementation allows. This is
    something done automatically by the JVM.
  • Not Synced
    You don't ask for it. If the JVM knows
    what it's running on and it's reasonably
  • Not Synced
    recent this will just happen
    for you for free.
  • Not Synced
    And there's quite a few classes
    that do this.
  • Not Synced
    There's quite a few clever things with
    atomics, there's crypto,
  • Not Synced
    there's mathematical routines as well.
    Most of these routines in the
  • Not Synced
    class library have a well defined notion
    of a virtual machine intrinsic
  • Not Synced
    and they do run reasonably optimally.
  • Not Synced
    They are a subject of continuous
    optimisation as well.
  • Not Synced
    We've got some runes that are
    presented on the slides here.
  • Not Synced
    These are quite useful if you
    are interested in
  • Not Synced
    how these intrinsics are made.
  • Not Synced
    You can ask the JVM to print out a lot of
    the just-in-time compiled code.
  • Not Synced
    You can ask the JVM to print out the
    native methods as well as these intrinsics
  • Not Synced
    and in this particular case after sifting
    through about 5MB of text
  • Not Synced
    I've come across this particular SHA1 sum
    implementation.
  • Not Synced
    This is AArch64. This is employing the
    cryptographic extensions
  • Not Synced
    in the architecture.
  • Not Synced
    So it's essentially using the CPU
    instructions which would explain why
  • Not Synced
    it's faster. But again it's done
    all this automatically.
  • Not Synced
    This did not require any specific runes
    or anything to activate.
  • Not Synced
    We'll see a bit later on how you can
    more easily find the hot spots
  • Not Synced
    rather than sifting through a lot
    of assembler.
  • Not Synced
    I've mentioned that the cryptographic
    engine is employed and again
  • Not Synced
    this routine was generated at run
    time as well.
  • Not Synced
    This is one of the important things about
    certain execution of amps like Java.
  • Not Synced
    You don't have to know everything at
    compile time.
  • Not Synced
    You know a lot more information at
    run time and you can use that
  • Not Synced
    in theory to optimise.
  • Not Synced
    You can switch off these clever routines.
  • Not Synced
    For instance I've got a deactivate
    here and we get back to the
  • Not Synced
    slower performance we expected.
  • Not Synced
    Again, this particular set of routines is
    present in OpenJDK,
  • Not Synced
    I think for all the architectures that
    support it.
  • Not Synced
    We get this optimisation for free on X86
    and others as well.
  • Not Synced
    It works quite well.
  • Not Synced
    That was one surprise I came across
    as the instrinsics.
  • Not Synced
    One thing I thought it would be quite
    good to do would be to go through
  • Not Synced
    a slightly more complicated example.
    And use this example to explain
  • Not Synced
    a lot of other things that happen
    in the JVM as well.
  • Not Synced
    I will spend a bit of time going through
    this example
  • Not Synced
    and explain roughly the notion of what
    it's supposed to be doing.
  • Not Synced
    This is an imaginary method that I've
    contrived to demonstrate a lot of points
  • Not Synced
    in the fewest possible lines of code.
  • Not Synced
    I'll start with what it's meant to do.
  • Not Synced
    This is meant to be a routine that gets a
    reference to something and let's you know
  • Not Synced
    whether or not it's an image and in a
    hypothetical cache.
  • Not Synced
    I'll start with the important thing
    here the weak reference.
  • Not Synced
    In Java and other garbage collected
    languages we have the notion of references
  • Not Synced
    Most of the time when you are running a
    Java program you have something like a
  • Not Synced
    variable name and that is in the current
    execution context that is referred to as a
  • Not Synced
    strong reference to the object. In other
    words I can see it. I am using it.
  • Not Synced
    Please don't get rid of it.
    Bad things will happen if you do.
  • Not Synced
    So the garbage collector knows
    not to get rid of it.
  • Not Synced
    In Java and other languages you also
    have the notion of a weak reference.
  • Not Synced
    This is essentially the programmer saying
    to the virtual machine
  • Not Synced
    "Look I kinda care about this but
    just a little bit."
  • Not Synced
    "If you want to get rid of it feel free
    to but please let me know."
  • Not Synced
    This is why this is for a CacheClass.
    For instance the JVM in this particular
  • Not Synced
    case could decide that it's running quite
    low on memory this particular xMB image
  • Not Synced
    has not been used for a while it can
    garbage collect it.
  • Not Synced
    The important thing is how we go about
    expressing this in the language.
  • Not Synced
    We can't just have a reference to the
    object because that's a strong reference
  • Not Synced
    and the JVM will know it can't get
    rid of this because the program
  • Not Synced
    can see it actively.
  • Not Synced
    So we have a level of indirection which
    is known as a weak reference.
  • Not Synced
    We have this hypothetical CacheClass
    that I've devised.
  • Not Synced
    At this point it is a weak reference.
  • Not Synced
    Then we get it. This is calling the weak
    reference routine.
  • Not Synced
    Now it becomes a strong reference so
    it's not going to be garbage collected.
  • Not Synced
    When we get to the return path it becomes
    a weak reference again
  • Not Synced
    because our strong reference
    has disappeared.
  • Not Synced
    The salient points in this example are:
  • Not Synced
    We're employing a method to get
    a reference.
  • Not Synced
    We're checking an item to see if
    it's null.
  • Not Synced
    So let's say that the JVM decided to
    garbage collect this
  • Not Synced
    before we executed the method.
  • Not Synced
    The weak reference class is still valid
    because we've got a strong reference to it
  • Not Synced
    but the actual object behind this is gone.
  • Not Synced
    If we're too late and the garbage
    collector has killed it
  • Not Synced
    it will be null and we return.
  • Not Synced
    So it's a level of indirection to see
    does this still exist
  • Not Synced
    if so can I please have it and then
    operate on it as normal
  • Not Synced
    and then return becomes weak
    reference again.
  • Not Synced
    This example program is quite useful when
    we look at how it's implemented in the JVM
  • Not Synced
    and we'll go through a few things now.
  • Not Synced
    First off we'll go through the bytecode.
  • Not Synced
    The only point of this slide is to
    show it's roughly
  • Not Synced
    the same as this.
  • Not Synced
    We get our variable.
  • Not Synced
    We use our getter.
  • Not Synced
    This bit is extra this checkcast.
    The reason that bit is extra is
  • Not Synced
    because we're using the equivalent of
    a template in Java.
  • Not Synced
    And the way that's implemented in Java is
    it just basically casts everything to an
  • Not Synced
    object so that requires extra
    compiler information.
  • Not Synced
    And this is the extra check.
  • Not Synced
    The rest of this we load the reference,
    we check to see if it is null,
  • Not Synced
    If it's not null we invoke a virtual
    function - is it the image?
  • Not Synced
    and we return as normal.
  • Not Synced
    Essentially the point I'm trying to make
    is when we compile this to bytecode
  • Not Synced
    this execution happens.
  • Not Synced
    This null check happens.
  • Not Synced
    This execution happens.
  • Not Synced
    And we return.
  • Not Synced
    In the actual Java class files we've not
    lost anything.
  • Not Synced
    This is what it looks like when it's
    been JIT'd.
  • Not Synced
    Now we've lost lots of things.
  • Not Synced
    The JIT has done quite a few clever things
    which I'll talk about.
  • Not Synced
    First off if we look down here there's
    a single branch here.
  • Not Synced
    And this is only if our check cast failed
  • Not Synced
    We've got comments on the
    right hand side.
  • Not Synced
    Our get method has been inlined so
    we're no longer calling.
  • Not Synced
    We seem to have lost our null check,
    that's just gone.
  • Not Synced
    And again we've got a get field as well.
  • Not Synced
    That's no longer a method,
    that's been inlined as well.
  • Not Synced
    We've also got some other cute things.
  • Not Synced
    Those more familiar with AArch64
    will understand
  • Not Synced
    that the pointers we're using
    are 32bit not 64bit.
  • Not Synced
    What we're doing is getting a pointer
    and shifting it left 3
  • Not Synced
    and widening it to a 64bit pointer.
  • Not Synced
    We've also got 32bit pointers on a
    64bit system as well.
  • Not Synced
    So that's saving a reasonable amount
    of memory and cache.
  • Not Synced
    To summarise. We don't have any
    branches or function calls
  • Not Synced
    and we've got a lot of inlining.
  • Not Synced
    We did have function calls in the
    class file so it's the JVM;
  • Not Synced
    it's the JIT that has done this.
  • Not Synced
    We've got no null checks either and I'm
    going to talk through this now.
  • Not Synced
    The null check elimination is quite a
    clever feature in Java and other programs.
  • Not Synced
    The idea behind null check elimination is
  • Not Synced
    most of the time this object is not
    going to be null.
  • Not Synced
    If this object is null the operating
    system knows this quite quickly.
  • Not Synced
    So if you try to dereference a null
    pointer you'll get either a SIGSEGV or
  • Not Synced
    a SIGBUS depending on a
    few circumstances.
  • Not Synced
    That goes straight back to the JVM
  • Not Synced
    and the JVM knows where the null
    exception took place.
  • Not Synced
    Because it knows where it took
    place it can look this up
  • Not Synced
    and unwind it as part of an exception.
  • Not Synced
    Those null checks just go.
    Completely gone.
  • Not Synced
    Most of the time this works and you are
    saving a reasonable amount of execution.
  • Not Synced
    I'll talk about when it doesn't work
    in a second.
  • Not Synced
    That's reasonably clever. We have similar
    programming techniques in other places
  • Not Synced
    even the Linux kernel for instance when
    you copy data to and from user space
  • Not Synced
    it does pretty much identical
    the same thing.
  • Not Synced
    It has an exception unwind table and it
    knows if it catches a page fault on
  • Not Synced
    this particular program counter
    it can deal with it because it knows
  • Not Synced
    the program counter and it knows
    conceptually what it was doing.
  • Not Synced
    In a similar way the JIT know what its
    doing to a reasonable degree.
  • Not Synced
    It can handle the null check elimination.
  • Not Synced
    I mentioned the sneaky one. We've got
    essentially 32bit pointers
  • Not Synced
    on a 64bit system.
  • Not Synced
    Most of the time in Java people typically
    specify heap size smaller than 32GB.
  • Not Synced
    Which is perfect if you want to use 32bit
    pointers and left shift 3.
  • Not Synced
    Because that gives you 32GB of
    addressable memory.
  • Not Synced
    That's a significant memory saving because
    otherwise a lot of things would double up.
  • Not Synced
    There's a significant number of pointers
    in Java.
  • Not Synced
    The one that should make people
    jump out of their seat is
  • Not Synced
    the fact that most methods in Java are
    actually virtual.
  • Not Synced
    So what the JVM has actually done is
    inlined a virtual function.
  • Not Synced
    A virtual function is essentially a
    function were you don't know where
  • Not Synced
    you're going until run time.
  • Not Synced
    You can have several different classes
    and they share the same virtual function
  • Not Synced
    in the base class and dependent upon
    which specific class you're running
  • Not Synced
    different virtual functions will
    get executed.
  • Not Synced
    In C++ that will be a read from a V table
    and then you know where to go.
  • Not Synced
    The JVM's inlined it.
  • Not Synced
    We've saved a memory load.
  • Not Synced
    We've saved a branch as well
  • Not Synced
    The reason the JVM can inline it is
    because the JVM knows
  • Not Synced
    every single class that has been loaded.
  • Not Synced
    So it knows that although this looks
    polymorphic to the casual programmer
  • Not Synced
    It actually is monomorphic.
    The JVM knows this.
  • Not Synced
    Because it knows this it can be clever.
    And this is really clever.
  • Not Synced
    That's a significant cost saving.
  • Not Synced
    This is all great. I've already mentioned
    the null check elimination.
  • Not Synced
    We're taking a signal as most of you know
    if we do that a lot it's going to be slow.
  • Not Synced
    Jumping into kernel, into user,
    bouncing around.
  • Not Synced
    The JVM also has a notion of
    'OK I've been a bit too clever now;
  • Not Synced
    I need to back off a bit'
  • Not Synced
    Also there's nothing stopping the user
    loading more classes
  • Not Synced
    and rendering the monomorphic
    assumption invalid.
  • Not Synced
    So the JVM needs to have a notion of
    backpeddling and go
  • Not Synced
    'Ok I've gone to far and need to
    deoptimise'
  • Not Synced
    The JVM has the ability to deoptimise.
  • Not Synced
    In other words it essentially knows that
    for certain code paths everything's OK.
  • Not Synced
    But for certain new objects it can't get
    away with these tricks.
  • Not Synced
    By the time the new objects are executed
    they are going to be safe.
  • Not Synced
    There are ramifications for this.
    This is the important thing to consider
  • Not Synced
    with something like Java and other
    languages and other virtual machines.
  • Not Synced
    If you're trying to profile this it means
    there is a very significant ramification.
  • Not Synced
    You can have the same class and
    method JIT'd multiple ways
  • Not Synced
    and executed at the same time.
  • Not Synced
    So if you're trying to find a hot spot
    the program counter's nodding off.
  • Not Synced
    Because you can refer to the same thing
    in several different ways.
  • Not Synced
    This is quite common as well as
    deoptimisation does take place.
  • Not Synced
    That's something to bear in mind with JVM
    and similar runtime environments.
  • Not Synced
    You can get a notion of what the JVM's
    trying to do.
  • Not Synced
    You can ask it nicely and add a print
    compilation option
  • Not Synced
    and it will tell you what it's doing.
  • Not Synced
    This is reasonably verbose.
  • Not Synced
    Typically what happens is the JVM gets
    excited JIT'ing everything
  • Not Synced
    and optimising everything then
    it settles down.
  • Not Synced
    Until you load something new
    and it gets excited again.
  • Not Synced
    There's a lot of logs. This is mainly
    useful for debugging but
  • Not Synced
    it gives you an appreciation that it's
    doing a lot of work.
  • Not Synced
    You can go even further with a log
    compilation option.
  • Not Synced
    That produces a lot of XML and that is
    useful for people debugging the JVM as well.
  • Not Synced
    It's quite handy to get an idea of
    what's going on.
  • Not Synced
    If that is not enough information you
    also have the ability to go even further.
  • Not Synced
    This is beyond the limit of my
    understanding.
  • Not Synced
    I've gone into this little bit just to
    show you what can be done.
  • Not Synced
    You have release builds of OpenJDK
    and they have debug builds of OpenJDK.
  • Not Synced
    The release builds will by default turn
    off a lot of the diagnostic options.
  • Not Synced
    You can switch them back on again.
  • Not Synced
    When you do you can also gain insight
    into the actual, it's colloquially
  • Not Synced
    referred to as the C2 JIT,
    the compiler there.
  • Not Synced
    You can see, for instance, objects in
    timelines and visualize them
  • Not Synced
    as they're being optimised at various
    stages and various things.
  • Not Synced
    So this is based on a masters thesis
    by Thomas Würthinger.
  • Not Synced
    This is something you can play with as
    well and see how far the optimiser goes.
  • Not Synced
    And it's also good for people hacking
    with the JVM.
  • Not Synced
    I'll move onto some stuff we did.
  • Not Synced
    Last year we were working on the
    big data. Relatively new architecture
  • Not Synced
    ARM64, it's called AArch64 in OpenJDK
    land but ARM64 in Debian land.
  • Not Synced
    We were a bit concerned because
    everything's all shiny and new.
  • Not Synced
    Has it been optimised correctly?
  • Not Synced
    Are there any obvious things
    we need to optimise?
  • Not Synced
    And we're also interested because
    everything was so shiny and new
  • Not Synced
    in the whole system.
  • Not Synced
    Not just the JVM but the glibc and
    the kernel as well.
  • Not Synced
    So how do we get a view of all of this?
  • Not Synced
    I gave a quick talk before at the Debian
    mini-conf before last [2014] about perf
  • Not Synced
    so decided we could try and do some
    clever things with Linux perf
  • Not Synced
    and see if we could get some actual useful
    debugging information out.
  • Not Synced
    We have the flame graphs that are quite
    well known.
  • Not Synced
    We also have some previous work, Johannes
    had a special perf map agent that
  • Not Synced
    could basically hook into perf and it
    would give you a nice way of running
  • Not Synced
    perf-top for want of a better expression
    and viewing the top Java function names.
  • Not Synced
    This is really good work and it's really
    good for a particular use case
  • Not Synced
    if you just want to do a quick snap shot
    once and see in that snap shot
  • Not Synced
    where the hotspots where.
  • Not Synced
    For a prolonged work load with all
    the functions being JIT'd multiple ways
  • Not Synced
    with the optimisation going on and
    everything moving around
  • Not Synced
    it require a little bit more information
    to be captured.
  • Not Synced
    I decided to do a little bit of work on a
    very similar thing to perf-map-agent
  • Not Synced
    but an agent that would capture it over
    a prolonged period of time.
  • Not Synced
    Here's an example Flame graph, these are
    all over the internet.
  • Not Synced
    This is the SHA1 computation example that
    I gave at the beginning.
  • Not Synced
    As expected the VM intrinsic SHA1 is the
    top one.
  • Not Synced
    Not expected by me was this quite
    significant chunk of CPU execution time.
  • Not Synced
    And there was a significant amount of
    time being spent copying memory
  • Not Synced
    from the mmapped memory
    region into a heap
  • Not Synced
    and then that was passed to
    the crypto engine.
  • Not Synced
    So we're doing a ton of memory copies for
    no good reason.
  • Not Synced
    That essentially highlighted an example.
  • Not Synced
    That was an assumption I made about Java
    to begin with which was if you do
  • Not Synced
    the equivalent of mmap it should just
    work like mmap right?
  • Not Synced
    You should just be able to address the
    memory. That is not the case.
  • Not Synced
    If you've got a file mapping object and
    you try to address it it has to be copied
  • Not Synced
    into safe heap memory first. And that is
    what was slowing down the programs.
  • Not Synced
    If that was omitted you could make
    the SHA1 computation even quicker.
  • Not Synced
    So that would be the logical target you
    would want to optimise.
  • Not Synced
    I wanted to extend Johannes' work
    with something called a
  • Not Synced
    Java Virtual Machine Tools Interface
    profiling agent.
  • Not Synced
    This is part of the Java Virtual Machine
    standard as you can make a special library
  • Not Synced
    and then hook this into the JVM.
  • Not Synced
    And the JVM can expose quite a few
    things to the library.
  • Not Synced
    It exposes a reasonable amount of
    information as well.
  • Not Synced
    Perf as well has the ability to look
    at map files natively.
  • Not Synced
    If you are profiling JavaScript, or
    something similar, I think the
  • Not Synced
    Google V8 JavaScript engine will write
    out a special map file that says
  • Not Synced
    these program counter addresses correspond
    to these function names.
  • Not Synced
    I decided to use that in a similar way to
    what Johannes did for the extended
  • Not Synced
    profiling agent but I also decided to
    capture some more information as well.
  • Not Synced
    I decided to capture the disassembly
    so when we run perf annotate
  • Not Synced
    we can see the actual JVM bytecode
    in our annotation.
  • Not Synced
    We can see how it was JIT'd at the
    time when it was JIT'd.
  • Not Synced
    We can see where the hotspots where.
  • Not Synced
    And that's good. But we can go
    even better.
  • Not Synced
    We can run an annotated trace that
    contains the Java class,
  • Not Synced
    the Java method and the bytecode all in
    one place at the same time.
  • Not Synced
    You can see everything from the JVM
    at the same place.
  • Not Synced
    This works reasonably well because the
    perf interface is extremely extensible.
  • Not Synced
    And again we can do entire
    system optimisation.
  • Not Synced
    The bits in red here are the Linux kernel.
  • Not Synced
    Then we got into libraries.
  • Not Synced
    And then we got into Java and more
    libraries as well.
  • Not Synced
    So we can see everything from top to
    bottom in one fell swoop.
  • Not Synced
    This is just a quick slide showing the
    mechanisms employed.
  • Not Synced
    Essentially we have this agent which is
    a shared object file.
  • Not Synced
    And this will spit out useful files here
    in a standard way.
  • Not Synced
    And the Linux perf basically just records
    the perf data dump file as normal.
  • Not Synced
    We have 2 sets of recording going on.
  • Not Synced
    To report it it's very easy to do
    normal reporting with the PID map.
  • Not Synced
    This is just out of the box, works with
    the Google V8 engine as well.
  • Not Synced
    If you want to do very clever annotations
    perf has the ability to have
  • Not Synced
    Python scripts passed to it.
  • Not Synced
    So you can craft quite a dodgy Python
    script and that can interface
  • Not Synced
    with the perf annotation output.
  • Not Synced
    That's how I was able to get the extra
    Java information in the same annotation.
  • Not Synced
    And this is really easy to do; it's quite
    easy to knock the script up.
  • Not Synced
    And again the only thing we do for this
    profiling is we hook in the profiling
  • Not Synced
    agent which dumps out various things.
  • Not Synced
    We preserve the frame pointer because
    that makes things considerably easier
  • Not Synced
    on winding. This will effect
    performance a little bit.
  • Not Synced
    And again when we're reporting we just
    hook in a Python script.
  • Not Synced
    It's really easy to hook everything in
    and get it working.
  • Not Synced
    At the moment we have a JVMTI agent. It's
    actually on http://git.linaro.org now.
  • Not Synced
    Since I gave this talk Google have
    extended perf anyway so it will do
  • Not Synced
    quite a lot of similar things out of the
    box anyway.
  • Not Synced
    It's worth having a look at the
    latest perf.
  • Not Synced
    These techniques in this slide deck can be
    used obviously in other JITs quite easily.
  • Not Synced
    The fact that perf is so easy to extend
    with scripts can be useful
  • Not Synced
    for other things.
  • Not Synced
    And OpenJDK has a significant amount of
    cleverness associated with it that
  • Not Synced
    I thought was very surprising and good.
    So that's what I covered in the talk.
  • Not Synced
    These are basically references to things
    like command line arguments
  • Not Synced
    and the Flame graphs and stuff like that.
  • Not Synced
    If anyone is interested in playing with
    OpenJDK on ARM64 I'd suggest going here:
  • Not Synced
    http://openjdk.linaro.org
    Where the most recent builds are.
  • Not Synced
    Obviously fixes are going in upstream and
    they're going into distributions as well.
  • Not Synced
    They're included in OpenJDK so it should
    be good as well.
  • Not Synced
    I've run through quite a few fundamental
    things reasonably quickly.
  • Not Synced
    I'd be happy to accept any questions
    or comments
  • Not Synced
    And if you want to talk to me privately
    about Java afterwards feel free to
  • Not Synced
    when no-one's looking.
  • Not Synced
    [Audience] Applause
  • Not Synced
    [Audience] It's not really a question so
    much as a comment.
  • Not Synced
    Last mini-Deb conf we had a talk about
    using the JVM with other languages.
  • Not Synced
    And it seems to me that all this would
    apply even if you hate Java programming
  • Not Synced
    language and want to write in, I don't
    know, lisp or something instead
  • Not Synced
    if you've got a lisp system that can
    generate JVM bytecode.
  • Not Synced
    Yeah, totally. And the other
    big data language we looked at was Scala.
  • Not Synced
    It uses the JVM back end but a completely
    different language on the front.
  • Not Synced
    Cheers guys.
Title:
Java_the_good_bits.webm
Video Language:
English
Team:
Debconf
Project:
2016_miniconf-cambridge16
Duration:
32:13

English subtitles

Revisions Compare revisions