< Return to Video

Java_the_good_bits.webm

  • Not Synced
    Onto the second talk
  • Not Synced
    Steve Capper is going to tell us
    about the good bits of Java
  • Not Synced
    They do exist
  • Not Synced
    [Audience] Could this have been a
    lightening talk? [Audience laughter]
  • Not Synced
    Believe it or not we've got some
    good stuff here.
  • Not Synced
    I was as skeptical as you guys
    when I first looked.
  • Not Synced
    First apologies for not attending this
    mini-conf last year
  • Not Synced
    I was unfortunately ill on the day
    I was due to give this talk.
  • Not Synced
    Let me figure out how to use a computer.
  • Not Synced
    Sorry about this.
  • Not Synced
    There we go; it's because
    I've not woken up.
  • Not Synced
    Last year I worked at Linaro in the
    Enterprise group and we performed analysis
  • Not Synced
    on 'Big Data' applications sets.
  • Not Synced
    As many of you know quite a lot of these
    big data applications are written in Java.
  • Not Synced
    I'm from ARM and we were very interested
    in 64bit ARM support.
  • Not Synced
    So this is mainly AArch64 examples
    for things like assembler
  • Not Synced
    but most of the messages are
    pertinent for any architecture.
  • Not Synced
    These good bits are shared between
    most if not all the architectures.
  • Not Synced
    Whilst trying to optimise a lot of
    these big data applications
  • Not Synced
    I stumbled a across quite a few things in
    the JVM and I thought
  • Not Synced
    'actually that's really clever;
    that's really cool'
  • Not Synced
    So I thought that would make a good
    basis for a talk.
  • Not Synced
    This talk is essentially some of the
    clever things I found in the
  • Not Synced
    Java Virtual Machine; these
    optimisations are in Open JDK.
  • Not Synced
    Source is available it's all there,
    readily available and in play now.
  • Not Synced
    I'm going to finish with some of the
    optimisation work we did with Java.
  • Not Synced
    People who know me will know
    I'm not a Java zealot.
  • Not Synced
    I don't particularly believe in
    programming in a language over another one
  • Not Synced
    So to make it clear from the outset
    I'm not attempting to convert
  • Not Synced
    anyone to Java programmers.
  • Not Synced
    I'm just going to highlight a few salient
    things in the Java Virtual Machine
  • Not Synced
    which I found to be quite clever and
    interesting
  • Not Synced
    and I'll try and talk through them
    with my understanding of them.
  • Not Synced
    Let's jump straight in and let's
    start with an example.
  • Not Synced
    This is a minimal example for
    computing a SHA1 sum of a file.
  • Not Synced
    I've alluded some of the checking in the
    beginning of the function see when
  • Not Synced
    command line parsing and that sort of
    thing.
  • Not Synced
    I've highlighted the salient points in red.
  • Not Synced
    Essentially we instantiate a SHA1
    crypto message service digest.
  • Not Synced
    And we do the equivalent in
    Java of an mmap.
  • Not Synced
    Get it all in memory.
  • Not Synced
    And then we just put this status straight
    into the crypto engine.
  • Not Synced
    And eventually at the end of the
    program we'll spit out the SHA1 hash.
  • Not Synced
    It's a very simple programme
  • Not Synced
    It's basically mmap, SHA1 output
    the hash afterwards.
  • Not Synced
    In order to concentrate on the CPU
    aspect rather than worry about IO
  • Not Synced
    I decided to cheat a little by
    setting this up.
  • Not Synced
    I decided to use a sparse file. As many of
    you know a sparse file is a file that not
  • Not Synced
    all the contents are necessarily stored
    on disc. The assumption is that the bits
  • Not Synced
    that aren't stored are zero. For instance
    on Linux you can create a 20TB sparse file
  • Not Synced
    on a 10MB file system and use it as
    normal.
  • Not Synced
    Just don't write too much to it otherwise
    you're going to run out of space.
  • Not Synced
    The idea behind using a sparse file is I'm
    just focusing on the computational aspects
  • Not Synced
    of the SHA1 sum. I'm not worried about
    the file system or anything like that.
  • Not Synced
    I don't want to worry about the IO. I
    just want to focus on the actual compute.
  • Not Synced
    In order to set up a sparse file I used
    the following runes.
  • Not Synced
    The important point is that you seek
    and the other important point
  • Not Synced
    is you set a count otherwise you'll fill your disc up.
  • Not Synced
    I decided to run this against firstly
    let's get the native SHA1 sum command
  • Not Synced
    that's built into Linux and let's normalise these results and say that's 1.0.
  • Not Synced
    I used an older version of the Open
    JDK and ran the Java programme
  • Not Synced
    and that's 1.09 times slower than the
    reference command. That's quite good.
  • Not Synced
    Then I used the new Open JDK, this is now
    the current JDK as this is a year on.
  • Not Synced
    And 0.21 taken. It's significantly faster.
  • Not Synced
    I've stressed that I've done nothing
    surreptitious in the Java program.
  • Not Synced
    It is mmap, compute, spit result out.
  • Not Synced
    But the Open JDK has essentially got
    some more context information.
  • Not Synced
    I'll talk about that as we go through.
  • Not Synced
    Before when I started Java I had a very
    simplistic view of Java.
  • Not Synced
    Traditionally Java is taught as a virtual
    machine that runs byte code.
  • Not Synced
    Now when you compile a Java program it
    compiles into byte code.
  • Not Synced
    The older versions of the Java Virtual
    Machine would interpret this byte code
  • Not Synced
    and then run through. Newer versions would
    employ a just-in-time engine and try and
  • Not Synced
    compile this byte code into native machine code.
  • Not Synced
    That is not the only thing that goes on
    when you run a Java program.
  • Not Synced
    There is some extra optimisations as well.
    So this alone would not account for
  • Not Synced
    the newer version of the SHA1
    sum beingsignificantly faster
  • Not Synced
    than the distro supply one.
  • Not Synced
    Java knows about context. It has a class
    library and these class libraries
  • Not Synced
    have reasonably well defined purposes.
  • Not Synced
    We have classes that provide
    crypto services.
  • Not Synced
    We have some misc unsafe that every
    single project seems to pull in their
  • Not Synced
    project when they're not supposed to.
  • Not Synced
    These have well defined meanings.
  • Not Synced
    These do not necessarily have to be
    written in Java.
  • Not Synced
    They come as Java classes,
    they come supplied.
  • Not Synced
    But most JVMs now have a notion
    of a virtual machine intrinsic
  • Not Synced
    And the virtual machine intrinsic says ok
    please do a SHA1 in the best possible way
  • Not Synced
    that your implementation allows. This is
    something done automatically by the JVM.
  • Not Synced
    You don't ask for it. If the JVM knows
    what it's running on and it's reasonably
  • Not Synced
    recent this will just happen
    for you for free.
  • Not Synced
    And there's quite a few classes
    that do this.
  • Not Synced
    There's quite a few clever things with
    atomics, there's crypto,
  • Not Synced
    there's mathematical routines as well.
    Most of these routines in the
  • Not Synced
    class library have a well defined notion
    of a virtual machine intrinsic
  • Not Synced
    and they do run reasonably optimally.
  • Not Synced
    They are a subject of continuous
    optimisation as well.
  • Not Synced
    We've got some runes that are
    presented on the slides here.
  • Not Synced
    These are quite useful if you
    are interested in
  • Not Synced
    how these intrinsics are made.
  • Not Synced
    You can ask the JVM to print out a lot of
    the just-in-time compiled code.
  • Not Synced
    You can ask the JVM to print out the
    native methods as well as these intrinsics
  • Not Synced
    and in this particular case after sifting
    through about 5MB of text
  • Not Synced
    I've come across this particular SHA1 sum
    implementation.
  • Not Synced
    This is AArch64. This is employing the
    cryptographic extensions
  • Not Synced
    in the architecture. So it's essentially
    using the CPU instructions which
  • Not Synced
    would explain why it's faster. But again
    it's done all this automatically.
  • Not Synced
    This did not require any specific runes
    or anything to activate.
  • Not Synced
    We'll see a bit later on how you can
    more easily find the hot spots
  • Not Synced
    rather than sifting through a lot
    of assembler.
  • Not Synced
    I've mentioned that the cryptographic
    engine is employed and again
  • Not Synced
    this routine was generated at run
    time as well.
  • Not Synced
    This is one of the important things about
    certain execution of amps like Java.
  • Not Synced
    You don't have to know everything at
    compile time.
  • Not Synced
    You know a lot more information at
    run time and you can use that
  • Not Synced
    in theory to optimise.
  • Not Synced
    You can switch off these clever routines.
  • Not Synced
    For instance I've got a deactivate
    here and we get back to the
  • Not Synced
    slower performance we expected.
  • Not Synced
    Again, this particular set of routines is
    present in Open JDK,
  • Not Synced
    I think for all the architectures that support it.
  • Not Synced
    We get this optimisation for free on X86
    and others as well.
  • Not Synced
    It works quite well.
  • Not Synced
    That was one surprise I came across
    as the instrinsics.
  • Not Synced
    One thing I thought it would be quite
    good to do would be to go through
  • Not Synced
    a slightly more complicated example.
    And use this example to explain
  • Not Synced
    a lot of other things that happen
    in the JVM as well.
  • Not Synced
    I will spend a bit of time going through
    this example
  • Not Synced
    and explain roughly the notion of what
    it's supposed to be doing.
  • Not Synced
    This is an imaginary method that I've
    contrived to demonstrate lot of points
  • Not Synced
    in the fewest possible lines of code.
  • Not Synced
    I'll start with what it's meant to do.
  • Not Synced
    This is meant to be a routine that gets a
    reference to something and let's you know
  • Not Synced
    whether or not it's an image and in a
    hypothetical cache.
  • Not Synced
    I'll start with the important thing
    here the weak reference.
  • Not Synced
    In Java and other garbage collected
    languages we have the notion of references.
  • Not Synced
    Most of the time when you are running a
    Java program you have something like a
  • Not Synced
    variable name and that is in the current
    execution context that is referred to as a
  • Not Synced
    strong reference to the object. In other
    words I can see it. I am using it.
  • Not Synced
    Please don't get rid of it.
    Bad things will happen if you do.
  • Not Synced
    So the garbage collector knows
    not to get rid of it.
  • Not Synced
    In Java and other languages you also
    have the notion of a weak reference.
  • Not Synced
    This is essentially the programmer saying
    to the virtual machine
  • Not Synced
    "Look I kinda care about this but
    just a little bit."
  • Not Synced
    "If you want to get rid of it feel free
    to but please let me know."
  • Not Synced
    This is why this is for a cache class.
    For instance the JVM in this particular
  • Not Synced
    case could decide that it's running quite
    low on memory this particular xMB image
  • Not Synced
    has not been used for a while it can
    garbage collect it.
  • Not Synced
    The important thing is how we go about
    expressing this in the language.
  • Not Synced
    We can't just have a reference to the
    object because that's a strong reference
  • Not Synced
    and the JVM will know it can't get
    rid of this because the program
  • Not Synced
    can see it actively.
  • Not Synced
    So we have a level of indirection which is
    known as a weak reference.
  • Not Synced
    We have this hypothetical CacheClass
    that I've devised.
  • Not Synced
    At this point it is a weak reference.
  • Not Synced
    Then we get it. This is calling the weak
    reference routine.
  • Not Synced
    Now it becomes a strong reference so
    it's not going to be garbage collected.
  • Not Synced
    When we get to the return path it becomes
    a weak reference again
  • Not Synced
    because our strong reference
    has disappeared.
  • Not Synced
    The salient points in this example are:
  • Not Synced
    We're employing a method to get
    a reference.
  • Not Synced
    We're checking an item to see if
    it's null.
  • Not Synced
    So let's say that the JVM decided to
    garbage collect this
  • Not Synced
    before we executed the method.
  • Not Synced
    The weak reference class is still valid
    because we've got a strong reference to it
  • Not Synced
    but the actual object behind this is gone.
  • Not Synced
    If we're too late and the garbage
    collector has killed it
  • Not Synced
    it will be null and we return.
  • Not Synced
    So it's a level of indirection to see
    does this still exist
  • Not Synced
    if so can I please have it and then
    operate on it as normal
  • Not Synced
    and then return becomes weak
    reference again.
  • Not Synced
    This example program is quite useful when
    we look at how it's implemented in the JVM
  • Not Synced
    and we'll go through a few things now.
  • Not Synced
    First off we'll go through the byte code.
  • Not Synced
    The only point of this slide is to
    show it's roughly
  • Not Synced
    the same as this.
  • Not Synced
    We get our variable.
  • Not Synced
    We use our getter.
  • Not Synced
    This bit is extra this checkcast.
    The reason that bit is extra is
  • Not Synced
    because we're using the equivalent of
    a template in Java.
  • Not Synced
    And the way that's implemented in Java is
    it just basically casts everything to an
  • Not Synced
    object so that requires extra
    compiler information.
  • Not Synced
    And this is the extra check.
  • Not Synced
    The rest of this we load the reference,
    we check to see if it is null,
  • Not Synced
    If it's not null we invoke a virtual
    function - is it the image?
  • Not Synced
    and we return as normal.
  • Not Synced
    Essentially the point I'm trying to make
    is when we compile this to byte code
  • Not Synced
    this execution happens.
  • Not Synced
    This null check happens.
  • Not Synced
    This execution happens.
  • Not Synced
    And we return.
  • Not Synced
    In the actual Java class files we've not
    lost anything.
  • Not Synced
    This is what it looks like when it's
    been JIT'd.
  • Not Synced
    Now we've lost lots of things.
  • Not Synced
    The JIT has done quite a few clever things
    which I'll talk about.
  • Not Synced
    First off if we look down here there's
    a single branch here.
  • Not Synced
    And this is only if our check cast failed
  • Not Synced
    If we've got comments on the
    right hand side.
  • Not Synced
    Our get method has been in-lined so
    we're no longer calling.
  • Not Synced
    We seem to have lost our null check,
    that's just gone.
  • Not Synced
    And again we've got a get field as well.
  • Not Synced
    That's no longer a method,
    that's been in-lined as well
  • Not Synced
    We've also got some other cute things.
  • Not Synced
    Those more familiar with AArch64 will
    understand that the pointers we're using
  • Not Synced
    are 32bit not 64bit.
  • Not Synced
    What we're doing is getting a pointer
    and shifting it left 3
  • Not Synced
    and widening it to a 64bit pointer.
  • Not Synced
    We've also got 32bit pointers on a
    64bit system as well.
  • Not Synced
    So that's saving a reasonable amount
    of memory and cache.
  • Not Synced
    To summarise. We don't have any
    branches or function calls
  • Not Synced
    and we've got a lot of in-lining.
  • Not Synced
    We did have function calls in the
    class file so it's the JVM
  • Not Synced
    it's the JIT that has done this.
  • Not Synced
    We've got no null checks either and I'm
    going to talk through this now.
  • Not Synced
    The null check elimination is quite a
    clever feature in Java and other programs.
  • Not Synced
    The idea behind null check elimination is
  • Not Synced
    most of the time this object is not
    going to be null.
  • Not Synced
    If this object is null the operating
    system knows this quite quickly.
  • Not Synced
    So if you try to de-reference a null
    pointer you'll get either a SIGSEGV or
  • Not Synced
    a SIGBUS depending on a
    few circumstances.
  • Not Synced
    That goes straight back to the JVM
  • Not Synced
    and the JVM knows where the null
    exception took place.
  • Not Synced
    Because it knows where the exception took
    place it can look this up
  • Not Synced
    and unwind it as part of an exception.
  • Not Synced
    Those null checks just go.
    Completely gone.
  • Not Synced
    Most of the time this works and you are
    saving a reasonable amount of execution.
  • Not Synced
    I'll talk about when it doesn't work
    in a second.
  • Not Synced
    That's reasonably clever. We have similar
    programming techniques in other places
  • Not Synced
    even the Linux kernel for instance when
    you copy data to and from user space
  • Not Synced
    it does pretty much identical the same
    thing. It has an exception unwind table
  • Not Synced
    and it knows if it catches a page fault on
    this particular program counter
  • Not Synced
    it can deal with it because it knows
    the program counter and it knows
  • Not Synced
    conceptually what it was doing.
  • Not Synced
    In a similar way the JIT know what its
    doing to a reasonable degree.
  • Not Synced
    It can handle the null check elimination.
  • Not Synced
    I mentioned the sneaky one. We've got
    essentially 32bit pointers
  • Not Synced
    on a 64bit system.
  • Not Synced
    Most of the time in Java people typically
    specify heap size smaller than 32GB.
  • Not Synced
    Which is perfect if you want to use 32bit
    pointers and left shift 3.
  • Not Synced
    Because that gives you 32GB of
    addressable memory.
  • Not Synced
    That's a significant memory saving because
    otherwise a lot of things would double up.
  • Not Synced
    There's a significant number of pointers
    in Java.
  • Not Synced
    The one that should make people
    jump out of their seat is
  • Not Synced
    the fact that most methods in Java are
    actually virtual.
  • Not Synced
    So what the JVM has actually done is
    in-lined a virtual function.
  • Not Synced
    A virtual function is essentially a
    function were you don't know where
  • Not Synced
    you're going until run time.
  • Not Synced
    You can have several different classes
    and they share the same virtual function
  • Not Synced
    in the base class and dependent upon
    which specific class you're running
  • Not Synced
    different virtual functions will
    get executed.
  • Not Synced
    In C++ that will be a read from a V table
    and then you know where to go.
  • Not Synced
    The JVM's in-lined it.
  • Not Synced
    We've saved a memory load.
  • Not Synced
    We've saved a branch as well
  • Not Synced
    The reason the JVM can in-line it is
    because the JVM knows
  • Not Synced
    every single class that has been loaded.
  • Not Synced
    So it knows that although this looks
    polymorphic to the casual programmer
  • Not Synced
    It is actually monomorphic.
    The JVM knows this.
  • Not Synced
    Because it knows this it can be clever.
    And this is really clever.
  • Not Synced
    That's a significant cost saving.
  • Not Synced
    This is all great. I've already mentioned
    the null check elimination.
  • Not Synced
    We're taking a signal as most of you know
    if we do that a lot it's going to be slow.
  • Not Synced
    Jumping into kernel, into user,
    bouncing around.
  • Not Synced
    The JVM also has a notion of
    'OK I've been a bit too clever now;
  • Not Synced
    I need to back off a bit'
  • Not Synced
    Also there's nothing stopping the user
    loading more classes
  • Not Synced
    and rendering the monomorphic
    assumption invalid.
  • Not Synced
    So the JVM needs to have a notion of
    backpeddling and go
  • Not Synced
    'Ok I've gone to far and need to
    de-optimise'
  • Not Synced
    The JVM has the ability to de-optimise.
  • Not Synced
    In other words essentially knows that for
    certain code paths everything's OK.
  • Not Synced
    But for certain new objects it can't get
    away with these tricks.
  • Not Synced
    By the time the new objects are executed
    they are going to be safe.
  • Not Synced
    There are ramifications for this.
    This is the important thing to consider
  • Not Synced
    with something like Java and other
    languages and other virtual machines.
  • Not Synced
    If you're trying to profile this it means
    there is a very significant ramification.
  • Not Synced
    You can have the same class and
    method JITd multiple ways
  • Not Synced
    and executed at the same time.
  • Not Synced
    So if you're trying to find a hot spot
    the program counter's nodding off.
  • Not Synced
    Because you can refer to the same thing
    in several different ways.
  • Not Synced
    This is quite common as well as
    de-optimisation does take place.
  • Not Synced
    That's something to bear in mind with JVM
    and similar runtime environments.
  • Not Synced
    You can get a notion of what the JVM's
    trying to do.
  • Not Synced
    You can ask it nicely and add a print
    compilation option and it will tell you
  • Not Synced
    what it's doing.
    This is reasonably verbose.
  • Not Synced
    Typically what happens is the JVM gets
    excited JITing everything and optimising
  • Not Synced
    everything then it settles down.
  • Not Synced
    Until you load something new
    and it gets excited again.
  • Not Synced
    There's a lot of logs. This is mainly
    useful for de-bugging but
  • Not Synced
    it gives you an appreciation that it's
    doing a lot of work.
  • Not Synced
    You can go even further with a log
    compilation option.
  • Not Synced
    That produces a lot of XML and that is
    useful for people debugging the JVM as well.
  • Not Synced
    It's quite handy to get an idea of
    what's going on.
  • Not Synced
    If that is not enough information you
    also have the ability to go even further.
  • Not Synced
    This is beyond the limit of my
    understanding. I've gone into this little
  • Not Synced
    bit just to show you what can be done.
  • Not Synced
    You have release builds of Open JDK
    and they have debug builds of Open JDK.
  • Not Synced
    The release builds will by default turn
    off a lot of the diagnostic options.
  • Not Synced
    You can switch them back on again.
  • Not Synced
    When you do you can also gain insight
    into the actual, it's colloquially
  • Not Synced
    referred to as the C2 JIT, the compiler there.
  • Not Synced
    You can see, for instance, objects in
    timelines and visualize them
  • Not Synced
    as they're being optimised at various
    stages and various things.
  • Not Synced
    So this is based on a masters thesis
    by Thomas Würthinger.
  • Not Synced
    This is something you can play with as
    well and see how far the optimiser goes.
  • Not Synced
    And it's also good for people hacking
    with the JVM.
  • Not Synced
    I'll move onto some stuff we did.
  • Not Synced
    Last year we were working on the big data
Title:
Java_the_good_bits.webm
Video Language:
English
Team:
Debconf
Project:
2016_miniconf-cambridge16
Duration:
32:13

English subtitles

Revisions Compare revisions