WEBVTT 99:59:59.999 --> 99:59:59.999 Onto the second talk 99:59:59.999 --> 99:59:59.999 Steve Capper is going to tell us about the good bits of Java 99:59:59.999 --> 99:59:59.999 They do exist 99:59:59.999 --> 99:59:59.999 [Audience] Could this have been a lightening talk? [Audience laughter] 99:59:59.999 --> 99:59:59.999 Believe it or not we've got some good stuff here. 99:59:59.999 --> 99:59:59.999 I was as skeptical as you guys when I first looked. 99:59:59.999 --> 99:59:59.999 First apologies for not attending this mini-conf last year 99:59:59.999 --> 99:59:59.999 I was unfortunately ill on the day I was due to give this talk. 99:59:59.999 --> 99:59:59.999 Let me figure out how to use a computer. 99:59:59.999 --> 99:59:59.999 Sorry about this. 99:59:59.999 --> 99:59:59.999 There we go; it's because I've not woken up. 99:59:59.999 --> 99:59:59.999 Last year I worked at Linaro in the Enterprise group and we performed analysis 99:59:59.999 --> 99:59:59.999 on 'Big Data' applications sets. 99:59:59.999 --> 99:59:59.999 As many of you know quite a lot of these big data applications are written in Java. 99:59:59.999 --> 99:59:59.999 I'm from ARM and we were very interested in 64bit ARM support. 99:59:59.999 --> 99:59:59.999 So this is mainly AArch64 examples for things like assembler 99:59:59.999 --> 99:59:59.999 but most of the messages are pertinent for any architecture. 99:59:59.999 --> 99:59:59.999 These good bits are shared between most if not all the architectures. 99:59:59.999 --> 99:59:59.999 Whilst trying to optimise a lot of these big data applications 99:59:59.999 --> 99:59:59.999 I stumbled a across quite a few things in the JVM and I thought 99:59:59.999 --> 99:59:59.999 'actually that's really clever; that's really cool' 99:59:59.999 --> 99:59:59.999 So I thought that would make a good basis for a talk. 99:59:59.999 --> 99:59:59.999 This talk is essentially some of the clever things I found in the 99:59:59.999 --> 99:59:59.999 Java Virtual Machine; these optimisations are in Open JDK. 99:59:59.999 --> 99:59:59.999 Source is available it's all there, readily available and in play now. 99:59:59.999 --> 99:59:59.999 I'm going to finish with some of the optimisation work we did with Java. 99:59:59.999 --> 99:59:59.999 People who know me will know I'm not a Java zealot. 99:59:59.999 --> 99:59:59.999 I don't particularly believe in programming in a language over another one 99:59:59.999 --> 99:59:59.999 So to make it clear from the outset I'm not attempting to convert 99:59:59.999 --> 99:59:59.999 anyone to Java programmers. 99:59:59.999 --> 99:59:59.999 I'm just going to highlight a few salient things in the Java Virtual Machine 99:59:59.999 --> 99:59:59.999 which I found to be quite clever and interesting 99:59:59.999 --> 99:59:59.999 and I'll try and talk through them with my understanding of them. 99:59:59.999 --> 99:59:59.999 Let's jump straight in and let's start with an example. 99:59:59.999 --> 99:59:59.999 This is a minimal example for computing a SHA1 sum of a file. 99:59:59.999 --> 99:59:59.999 I've alluded some of the checking in the beginning of the function see when 99:59:59.999 --> 99:59:59.999 command line parsing and that sort of thing. 99:59:59.999 --> 99:59:59.999 I've highlighted the salient points in red. 99:59:59.999 --> 99:59:59.999 Essentially we instantiate a SHA1 crypto message service digest. 99:59:59.999 --> 99:59:59.999 And we do the equivalent in Java of an mmap. 99:59:59.999 --> 99:59:59.999 Get it all in memory. 99:59:59.999 --> 99:59:59.999 And then we just put this status straight into the crypto engine. 99:59:59.999 --> 99:59:59.999 And eventually at the end of the program we'll spit out the SHA1 hash. 99:59:59.999 --> 99:59:59.999 It's a very simple programme 99:59:59.999 --> 99:59:59.999 It's basically mmap, SHA1 output the hash afterwards. 99:59:59.999 --> 99:59:59.999 In order to concentrate on the CPU aspect rather than worry about IO 99:59:59.999 --> 99:59:59.999 I decided to cheat a little by setting this up. 99:59:59.999 --> 99:59:59.999 I decided to use a sparse file. As many of you know a sparse file is a file that not 99:59:59.999 --> 99:59:59.999 all the contents are necessarily stored on disc. The assumption is that the bits 99:59:59.999 --> 99:59:59.999 that aren't stored are zero. For instance on Linux you can create a 20TB sparse file 99:59:59.999 --> 99:59:59.999 on a 10MB file system and use it as normal. 99:59:59.999 --> 99:59:59.999 Just don't write too much to it otherwise you're going to run out of space. 99:59:59.999 --> 99:59:59.999 The idea behind using a sparse file is I'm just focusing on the computational aspects 99:59:59.999 --> 99:59:59.999 of the SHA1 sum. I'm not worried about the file system or anything like that. 99:59:59.999 --> 99:59:59.999 I don't want to worry about the IO. I just want to focus on the actual compute. 99:59:59.999 --> 99:59:59.999 In order to set up a sparse file I used the following runes. 99:59:59.999 --> 99:59:59.999 The important point is that you seek and the other important point 99:59:59.999 --> 99:59:59.999 is you set a count otherwise you'll fill your disc up. 99:59:59.999 --> 99:59:59.999 I decided to run this against firstly let's get the native SHA1 sum command 99:59:59.999 --> 99:59:59.999 that's built into Linux and let's normalise these results and say that's 1.0. 99:59:59.999 --> 99:59:59.999 I used an older version of the Open JDK and ran the Java programme 99:59:59.999 --> 99:59:59.999 and that's 1.09 times slower than the reference command. That's quite good. 99:59:59.999 --> 99:59:59.999 Then I used the new Open JDK, this is now the current JDK as this is a year on. 99:59:59.999 --> 99:59:59.999 And 0.21 taken. It's significantly faster. 99:59:59.999 --> 99:59:59.999 I've stressed that I've done nothing surreptitious in the Java program. 99:59:59.999 --> 99:59:59.999 It is mmap, compute, spit result out. 99:59:59.999 --> 99:59:59.999 But the Open JDK has essentially got some more context information. 99:59:59.999 --> 99:59:59.999 I'll talk about that as we go through. 99:59:59.999 --> 99:59:59.999 Before when I started Java I had a very simplistic view of Java. 99:59:59.999 --> 99:59:59.999 Traditionally Java is taught as a virtual machine that runs byte code. 99:59:59.999 --> 99:59:59.999 Now when you compile a Java program it compiles into byte code. 99:59:59.999 --> 99:59:59.999 The older versions of the Java Virtual Machine would interpret this byte code 99:59:59.999 --> 99:59:59.999 and then run through. Newer versions would employ a just-in-time engine and try and 99:59:59.999 --> 99:59:59.999 compile this byte code into native machine code. 99:59:59.999 --> 99:59:59.999 That is not the only thing that goes on when you run a Java program. 99:59:59.999 --> 99:59:59.999 There is some extra optimisations as well. So this alone would not account for 99:59:59.999 --> 99:59:59.999 the newer version of the SHA1 sum beingsignificantly faster 99:59:59.999 --> 99:59:59.999 than the distro supply one. 99:59:59.999 --> 99:59:59.999 Java knows about context. It has a class library and these class libraries 99:59:59.999 --> 99:59:59.999 have reasonably well defined purposes. 99:59:59.999 --> 99:59:59.999 We have classes that provide crypto services. 99:59:59.999 --> 99:59:59.999 We have some misc unsafe that every single project seems to pull in their 99:59:59.999 --> 99:59:59.999 project when they're not supposed to. 99:59:59.999 --> 99:59:59.999 These have well defined meanings. 99:59:59.999 --> 99:59:59.999 These do not necessarily have to be written in Java. 99:59:59.999 --> 99:59:59.999 They come as Java classes, they come supplied. 99:59:59.999 --> 99:59:59.999 But most JVMs now have a notion of a virtual machine intrinsic 99:59:59.999 --> 99:59:59.999 And the virtual machine intrinsic says ok please do a SHA1 in the best possible way 99:59:59.999 --> 99:59:59.999 that your implementation allows. This is something done automatically by the JVM. 99:59:59.999 --> 99:59:59.999 You don't ask for it. If the JVM knows what it's running on and it's reasonably 99:59:59.999 --> 99:59:59.999 recent this will just happen for you for free. 99:59:59.999 --> 99:59:59.999 And there's quite a few classes that do this. 99:59:59.999 --> 99:59:59.999 There's quite a few clever things with atomics, there's crypto, 99:59:59.999 --> 99:59:59.999 there's mathematical routines as well. Most of these routines in the 99:59:59.999 --> 99:59:59.999 class library have a well defined notion of a virtual machine intrinsic 99:59:59.999 --> 99:59:59.999 and they do run reasonably optimally. 99:59:59.999 --> 99:59:59.999 They are a subject of continuous optimisation as well. 99:59:59.999 --> 99:59:59.999 We've got some runes that are presented on the slides here. 99:59:59.999 --> 99:59:59.999 These are quite useful if you are interested in 99:59:59.999 --> 99:59:59.999 how these intrinsics are made. 99:59:59.999 --> 99:59:59.999 You can ask the JVM to print out a lot of the just-in-time compiled code. 99:59:59.999 --> 99:59:59.999 You can ask the JVM to print out the native methods as well as these intrinsics 99:59:59.999 --> 99:59:59.999 and in this particular case after sifting through about 5MB of text 99:59:59.999 --> 99:59:59.999 I've come across this particular SHA1 sum implementation. 99:59:59.999 --> 99:59:59.999 This is AArch64. This is employing the cryptographic extensions 99:59:59.999 --> 99:59:59.999 in the architecture. So it's essentially using the CPU instructions which 99:59:59.999 --> 99:59:59.999 would explain why it's faster. But again it's done all this automatically. 99:59:59.999 --> 99:59:59.999 This did not require any specific runes or anything to activate. 99:59:59.999 --> 99:59:59.999 We'll see a bit later on how you can more easily find the hot spots 99:59:59.999 --> 99:59:59.999 rather than sifting through a lot of assembler. 99:59:59.999 --> 99:59:59.999 I've mentioned that the cryptographic engine is employed and again 99:59:59.999 --> 99:59:59.999 this routine was generated at run time as well. 99:59:59.999 --> 99:59:59.999 This is one of the important things about certain execution of amps like Java. 99:59:59.999 --> 99:59:59.999 You don't have to know everything at compile time. 99:59:59.999 --> 99:59:59.999 You know a lot more information at run time and you can use that 99:59:59.999 --> 99:59:59.999 in theory to optimise. 99:59:59.999 --> 99:59:59.999 You can switch off these clever routines. 99:59:59.999 --> 99:59:59.999 For instance I've got a deactivate here and we get back to the 99:59:59.999 --> 99:59:59.999 slower performance we expected. 99:59:59.999 --> 99:59:59.999 Again, this particular set of routines is present in Open JDK, 99:59:59.999 --> 99:59:59.999 I think for all the architectures that support it. 99:59:59.999 --> 99:59:59.999 We get this optimisation for free on X86 and others as well. 99:59:59.999 --> 99:59:59.999 It works quite well. 99:59:59.999 --> 99:59:59.999 That was one surprise I came across as the instrinsics. 99:59:59.999 --> 99:59:59.999 One thing I thought it would be quite good to do would be to go through 99:59:59.999 --> 99:59:59.999 a slightly more complicated example. And use this example to explain 99:59:59.999 --> 99:59:59.999 a lot of other things that happen in the JVM as well. 99:59:59.999 --> 99:59:59.999 I will spend a bit of time going through this example 99:59:59.999 --> 99:59:59.999 and explain roughly the notion of what it's supposed to be doing. 99:59:59.999 --> 99:59:59.999 This is an imaginary method that I've contrived to demonstrate lot of points 99:59:59.999 --> 99:59:59.999 in the fewest possible lines of code. 99:59:59.999 --> 99:59:59.999 I'll start with what it's meant to do. 99:59:59.999 --> 99:59:59.999 This is meant to be a routine that gets a reference to something and let's you know 99:59:59.999 --> 99:59:59.999 whether or not it's an image and in a hypothetical cache. 99:59:59.999 --> 99:59:59.999 I'll start with the important thing here the weak reference. 99:59:59.999 --> 99:59:59.999 In Java and other garbage collected languages we have the notion of references. 99:59:59.999 --> 99:59:59.999 Most of the time when you are running a Java program you have something like a 99:59:59.999 --> 99:59:59.999 variable name and that is in the current execution context that is referred to as a 99:59:59.999 --> 99:59:59.999 strong reference to the object. In other words I can see it. I am using it. 99:59:59.999 --> 99:59:59.999 Please don't get rid of it. Bad things will happen if you do. 99:59:59.999 --> 99:59:59.999 So the garbage collector knows not to get rid of it. 99:59:59.999 --> 99:59:59.999 In Java and other languages you also have the notion of a weak reference. 99:59:59.999 --> 99:59:59.999 This is essentially the programmer saying to the virtual machine 99:59:59.999 --> 99:59:59.999 "Look I kinda care about this but just a little bit." 99:59:59.999 --> 99:59:59.999 "If you want to get rid of it feel free to but please let me know." 99:59:59.999 --> 99:59:59.999 This is why this is for a cache class. For instance the JVM in this particular 99:59:59.999 --> 99:59:59.999 case could decide that it's running quite low on memory this particular xMB image 99:59:59.999 --> 99:59:59.999 has not been used for a while it can garbage collect it. 99:59:59.999 --> 99:59:59.999 The important thing is how we go about expressing this in the language. 99:59:59.999 --> 99:59:59.999 We can't just have a reference to the object because that's a strong reference 99:59:59.999 --> 99:59:59.999 and the JVM will know it can't get rid of this because the program 99:59:59.999 --> 99:59:59.999 can see it actively. 99:59:59.999 --> 99:59:59.999 So we have a level of indirection which is known as a weak reference. 99:59:59.999 --> 99:59:59.999 We have this hypothetical CacheClass that I've devised. 99:59:59.999 --> 99:59:59.999 At this point it is a weak reference. 99:59:59.999 --> 99:59:59.999 Then we get it. This is calling the weak reference routine. 99:59:59.999 --> 99:59:59.999 Now it becomes a strong reference so it's not going to be garbage collected. 99:59:59.999 --> 99:59:59.999 When we get to the return path it becomes a weak reference again 99:59:59.999 --> 99:59:59.999 because our strong reference has disappeared. 99:59:59.999 --> 99:59:59.999 The salient points in this example are: 99:59:59.999 --> 99:59:59.999 We're employing a method to get a reference. 99:59:59.999 --> 99:59:59.999 We're checking an item to see if it's null. 99:59:59.999 --> 99:59:59.999 So let's say that the JVM decided to garbage collect this 99:59:59.999 --> 99:59:59.999 before we executed the method. 99:59:59.999 --> 99:59:59.999 The weak reference class is still valid because we've got a strong reference to it 99:59:59.999 --> 99:59:59.999 but the actual object behind this is gone. 99:59:59.999 --> 99:59:59.999 If we're too late and the garbage collector has killed it 99:59:59.999 --> 99:59:59.999 it will be null and we return. 99:59:59.999 --> 99:59:59.999 So it's a level of indirection to see does this still exist 99:59:59.999 --> 99:59:59.999 if so can I please have it and then operate on it as normal 99:59:59.999 --> 99:59:59.999 and then return becomes weak reference again. 99:59:59.999 --> 99:59:59.999 This example program is quite useful when we look at how it's implemented in the JVM 99:59:59.999 --> 99:59:59.999 and we'll go through a few things now. 99:59:59.999 --> 99:59:59.999 First off we'll go through the byte code. 99:59:59.999 --> 99:59:59.999 The only point of this slide is to show it's roughly 99:59:59.999 --> 99:59:59.999 the same as this. 99:59:59.999 --> 99:59:59.999 We get our variable. 99:59:59.999 --> 99:59:59.999 We use our getter. 99:59:59.999 --> 99:59:59.999 This bit is extra this checkcast. The reason that bit is extra is 99:59:59.999 --> 99:59:59.999 because we're using the equivalent of a template in Java. 99:59:59.999 --> 99:59:59.999 And the way that's implemented in Java is it just basically casts everything to an 99:59:59.999 --> 99:59:59.999 object so that requires extra compiler information. 99:59:59.999 --> 99:59:59.999 And this is the extra check. 99:59:59.999 --> 99:59:59.999 The rest of this we load the reference, we check to see if it is null, 99:59:59.999 --> 99:59:59.999 If it's not null we invoke a virtual function - is it the image? 99:59:59.999 --> 99:59:59.999 and we return as normal. 99:59:59.999 --> 99:59:59.999 Essentially the point I'm trying to make is when we compile this to byte code 99:59:59.999 --> 99:59:59.999 this execution happens. 99:59:59.999 --> 99:59:59.999 This null check happens. 99:59:59.999 --> 99:59:59.999 This execution happens. 99:59:59.999 --> 99:59:59.999 And we return. 99:59:59.999 --> 99:59:59.999 In the actual Java class files we've not lost anything. 99:59:59.999 --> 99:59:59.999 This is what it looks like when it's been JIT'd. 99:59:59.999 --> 99:59:59.999 Now we've lost lots of things. 99:59:59.999 --> 99:59:59.999 The JIT has done quite a few clever things which I'll talk about. 99:59:59.999 --> 99:59:59.999 First off if we look down here there's a single branch here. 99:59:59.999 --> 99:59:59.999 And this is only if our check cast failed 99:59:59.999 --> 99:59:59.999 If we've got comments on the right hand side. 99:59:59.999 --> 99:59:59.999 Our get method has been in-lined so we're no longer calling. 99:59:59.999 --> 99:59:59.999 We seem to have lost our null check, that's just gone. 99:59:59.999 --> 99:59:59.999 And again we've got a get field as well. 99:59:59.999 --> 99:59:59.999 That's no longer a method, that's been in-lined as well 99:59:59.999 --> 99:59:59.999 We've also got some other cute things. 99:59:59.999 --> 99:59:59.999 Those more familiar with AArch64 will understand that the pointers we're using 99:59:59.999 --> 99:59:59.999 are 32bit not 64bit. 99:59:59.999 --> 99:59:59.999 What we're doing is getting a pointer and shifting it left 3 99:59:59.999 --> 99:59:59.999 and widening it to a 64bit pointer. 99:59:59.999 --> 99:59:59.999 We've also got 32bit pointers on a 64bit system as well. 99:59:59.999 --> 99:59:59.999 So that's saving a reasonable amount of memory and cache. 99:59:59.999 --> 99:59:59.999 To summarise. We don't have any branches or function calls 99:59:59.999 --> 99:59:59.999 and we've got a lot of in-lining. 99:59:59.999 --> 99:59:59.999 We did have function calls in the class file so it's the JVM 99:59:59.999 --> 99:59:59.999 it's the JIT that has done this. 99:59:59.999 --> 99:59:59.999 We've got no null checks either and I'm going to talk through this now. 99:59:59.999 --> 99:59:59.999 The null check elimination is quite a clever feature in Java and other programs. 99:59:59.999 --> 99:59:59.999 The idea behind null check elimination is 99:59:59.999 --> 99:59:59.999 most of the time this object is not going to be null. 99:59:59.999 --> 99:59:59.999 If this object is null the operating system knows this quite quickly. 99:59:59.999 --> 99:59:59.999 So if you try to de-reference a null pointer you'll get either a SIGSEGV or 99:59:59.999 --> 99:59:59.999 a SIGBUS depending on a few circumstances. 99:59:59.999 --> 99:59:59.999 That goes straight back to the JVM 99:59:59.999 --> 99:59:59.999 and the JVM knows where the null exception took place. 99:59:59.999 --> 99:59:59.999 Because it knows where the exception took place it can look this up 99:59:59.999 --> 99:59:59.999 and unwind it as part of an exception. 99:59:59.999 --> 99:59:59.999 Those null checks just go. Completely gone. 99:59:59.999 --> 99:59:59.999 Most of the time this works and you are saving a reasonable amount of execution. 99:59:59.999 --> 99:59:59.999 I'll talk about when it doesn't work in a second. 99:59:59.999 --> 99:59:59.999 That's reasonably clever. We have similar programming techniques in other places 99:59:59.999 --> 99:59:59.999 even the Linux kernel for instance when you copy data to and from user space 99:59:59.999 --> 99:59:59.999 it does pretty much identical the same thing. It has an exception unwind table 99:59:59.999 --> 99:59:59.999 and it knows if it catches a page fault on this particular program counter 99:59:59.999 --> 99:59:59.999 it can deal with it because it knows the program counter and it knows 99:59:59.999 --> 99:59:59.999 conceptually what it was doing. 99:59:59.999 --> 99:59:59.999 In a similar way the JIT know what its doing to a reasonable degree. 99:59:59.999 --> 99:59:59.999 It can handle the null check elimination. 99:59:59.999 --> 99:59:59.999 I mentioned the sneaky one. We've got essentially 32bit pointers 99:59:59.999 --> 99:59:59.999 on a 64bit system. 99:59:59.999 --> 99:59:59.999 Most of the time in Java people typically specify heap size smaller than 32GB. 99:59:59.999 --> 99:59:59.999 Which is perfect if you want to use 32bit pointers and left shift 3. 99:59:59.999 --> 99:59:59.999 Because that gives you 32GB of addressable memory. 99:59:59.999 --> 99:59:59.999 That's a significant memory saving because otherwise a lot of things would double up. 99:59:59.999 --> 99:59:59.999 There's a significant number of pointers in Java. 99:59:59.999 --> 99:59:59.999 The one that should make people jump out of their seat is 99:59:59.999 --> 99:59:59.999 the fact that most methods in Java are actually virtual. 99:59:59.999 --> 99:59:59.999 So what the JVM has actually done is in-lined a virtual function. 99:59:59.999 --> 99:59:59.999 A virtual function is essentially a function were you don't know where 99:59:59.999 --> 99:59:59.999 you're going until run time. 99:59:59.999 --> 99:59:59.999 You can have several different classes and they share the same virtual function 99:59:59.999 --> 99:59:59.999 in the base class and dependent upon which specific class you're running 99:59:59.999 --> 99:59:59.999 different virtual functions will get executed. 99:59:59.999 --> 99:59:59.999 In C++ that will be a read from a V table and then you know where to go. 99:59:59.999 --> 99:59:59.999 The JVM's in-lined it. 99:59:59.999 --> 99:59:59.999 We've saved a memory load. 99:59:59.999 --> 99:59:59.999 We've saved a branch as well 99:59:59.999 --> 99:59:59.999 The reason the JVM can in-line it is because the JVM knows 99:59:59.999 --> 99:59:59.999 every single class that has been loaded. 99:59:59.999 --> 99:59:59.999 So it knows that although this looks polymorphic to the casual programmer 99:59:59.999 --> 99:59:59.999 It is actually monomorphic. The JVM knows this. 99:59:59.999 --> 99:59:59.999 Because it knows this it can be clever. And this is really clever. 99:59:59.999 --> 99:59:59.999 That's a significant cost saving. 99:59:59.999 --> 99:59:59.999 This is all great. I've already mentioned the null check elimination. 99:59:59.999 --> 99:59:59.999 We're taking a signal as most of you know if we do that a lot it's going to be slow. 99:59:59.999 --> 99:59:59.999 Jumping into kernel, into user, bouncing around. 99:59:59.999 --> 99:59:59.999 The JVM also has a notion of 'OK I've been a bit too clever now; 99:59:59.999 --> 99:59:59.999 I need to back off a bit' 99:59:59.999 --> 99:59:59.999 Also there's nothing stopping the user loading more classes 99:59:59.999 --> 99:59:59.999 and rendering the monomorphic assumption invalid. 99:59:59.999 --> 99:59:59.999 So the JVM needs to have a notion of backpeddling and go 99:59:59.999 --> 99:59:59.999 'Ok I've gone to far and need to de-optimise' 99:59:59.999 --> 99:59:59.999 The JVM has the ability to de-optimise. 99:59:59.999 --> 99:59:59.999 In other words essentially knows that for certain code paths everything's OK. 99:59:59.999 --> 99:59:59.999 But for certain new objects it can't get away with these tricks. 99:59:59.999 --> 99:59:59.999 By the time the new objects are executed they are going to be safe. 99:59:59.999 --> 99:59:59.999 There are ramifications for this. This is the important thing to consider 99:59:59.999 --> 99:59:59.999 with something like Java and other languages and other virtual machines. 99:59:59.999 --> 99:59:59.999 If you're trying to profile this it means there is a very significant ramification. 99:59:59.999 --> 99:59:59.999 You can have the same class and method JITd multiple ways 99:59:59.999 --> 99:59:59.999 and executed at the same time. 99:59:59.999 --> 99:59:59.999 So if you're trying to find a hot spot the program counter's nodding off. 99:59:59.999 --> 99:59:59.999 Because you can refer to the same thing in several different ways. 99:59:59.999 --> 99:59:59.999 This is quite common as well as de-optimisation does take place. 99:59:59.999 --> 99:59:59.999 That's something to bear in mind with JVM and similar runtime environments. 99:59:59.999 --> 99:59:59.999 You can get a notion of what the JVM's trying to do. 99:59:59.999 --> 99:59:59.999 You can ask it nicely and add a print compilation option and it will tell you 99:59:59.999 --> 99:59:59.999 what it's doing. This is reasonably verbose. 99:59:59.999 --> 99:59:59.999 Typically what happens is the JVM gets excited JITing everything and optimising 99:59:59.999 --> 99:59:59.999 everything then it settles down. 99:59:59.999 --> 99:59:59.999 Until you load something new and it gets excited again. 99:59:59.999 --> 99:59:59.999 There's a lot of logs. This is mainly useful for de-bugging but 99:59:59.999 --> 99:59:59.999 it gives you an appreciation that it's doing a lot of work. 99:59:59.999 --> 99:59:59.999 You can go even further with a log compilation option. 99:59:59.999 --> 99:59:59.999 That produces a lot of XML and that is useful for people debugging the JVM as well. 99:59:59.999 --> 99:59:59.999 It's quite handy to get an idea of what's going on. 99:59:59.999 --> 99:59:59.999 If that is not enough information you also have the ability to go even further. 99:59:59.999 --> 99:59:59.999 This is beyond the limit of my understanding. I've gone into this little 99:59:59.999 --> 99:59:59.999 bit just to show you what can be done. 99:59:59.999 --> 99:59:59.999 You have release builds of Open JDK and they have debug builds of Open JDK. 99:59:59.999 --> 99:59:59.999 The release builds will by default turn off a lot of the diagnostic options. 99:59:59.999 --> 99:59:59.999 You can switch them back on again. 99:59:59.999 --> 99:59:59.999 When you do you can also gain insight into the actual, it's colloquially 99:59:59.999 --> 99:59:59.999 referred to as the C2 JIT, the compiler there. 99:59:59.999 --> 99:59:59.999 You can see, for instance, objects in timelines and visualize them 99:59:59.999 --> 99:59:59.999 as they're being optimised at various stages and various things. 99:59:59.999 --> 99:59:59.999 So this is based on a masters thesis by Thomas Würthinger. 99:59:59.999 --> 99:59:59.999 This is something you can play with as well and see how far the optimiser goes. 99:59:59.999 --> 99:59:59.999 And it's also good for people hacking with the JVM. 99:59:59.999 --> 99:59:59.999 I'll move onto some stuff we did. 99:59:59.999 --> 99:59:59.999 Last year we were working on the big data. Relatively new architecture 99:59:59.999 --> 99:59:59.999 ARM64, it's called AArch64 in Open JDK land but ARM64 in Debian land. 99:59:59.999 --> 99:59:59.999 We were a bit concerned because everything's all shiny and new. 99:59:59.999 --> 99:59:59.999 Has it been optimised correctly? Are there any obvious things 99:59:59.999 --> 99:59:59.999 we need to optimise? And we're also interested because 99:59:59.999 --> 99:59:59.999 everything was so shiny and new in the whole system. 99:59:59.999 --> 99:59:59.999 Not just the JVM but the glibc and the kernel as well. 99:59:59.999 --> 99:59:59.999 So how do we get a view of all of this? 99:59:59.999 --> 99:59:59.999 I gave a quick talk before at the Debian mini-conf before last [2014] about perf 99:59:59.999 --> 99:59:59.999 so decided we could try and do some clever things with Linux perf 99:59:59.999 --> 99:59:59.999 and see if we could get some actual useful debugging information out. 99:59:59.999 --> 99:59:59.999 We have the flame graphs that are quite well known. 99:59:59.999 --> 99:59:59.999 We also have some previous work, Johannes had a special perf map agent that 99:59:59.999 --> 99:59:59.999 could basically hook into perf and it would give you a nice way of running 99:59:59.999 --> 99:59:59.999 perf-top for want of a better expression and viewing the top Java function names. 99:59:59.999 --> 99:59:59.999 This is really good work and it's really good for a particular use case 99:59:59.999 --> 99:59:59.999 if you just want to do a snap shot once and see in that snap shot 99:59:59.999 --> 99:59:59.999 where the hotspots where. 99:59:59.999 --> 99:59:59.999 For a prolonged work load with all the functions being JITd multiple ways 99:59:59.999 --> 99:59:59.999 with the optimisation going on and everything moving around 99:59:59.999 --> 99:59:59.999 it require a little bit more information to be captured. 99:59:59.999 --> 99:59:59.999 I decided to do a little bit of work on a very similar thing to perf-map-agent 99:59:59.999 --> 99:59:59.999 but an agent that would capture it over a prolonged period of time. 99:59:59.999 --> 99:59:59.999 Here's an example Flame graph, these are all over the internet. 99:59:59.999 --> 99:59:59.999 This is the SHA1 computation example that I gave at the beginning. 99:59:59.999 --> 99:59:59.999 As expected the VM intrinsic SHA1 is the top one. 99:59:59.999 --> 99:59:59.999 Not expected by me was this quite significant chunk of CPU execution time. 99:59:59.999 --> 99:59:59.999 And there was a significant amount of time being spent copying memory 99:59:59.999 --> 99:59:59.999 from the mmapped memory region into a heap and then that was 99:59:59.999 --> 99:59:59.999 passed to the crypto engine. 99:59:59.999 --> 99:59:59.999 So we're doing a ton of memory copies for no good reason. 99:59:59.999 --> 99:59:59.999 That essentially highlighted an example. 99:59:59.999 --> 99:59:59.999 That was an assumption I made about Java to begin with which was if you do 99:59:59.999 --> 99:59:59.999 the equivalent of mmap it should just work like mmap right? 99:59:59.999 --> 99:59:59.999 You should just be able to address the memory. That is not the case. 99:59:59.999 --> 99:59:59.999 If you've got a file mapping object and you try to address it it has to be copied 99:59:59.999 --> 99:59:59.999 into safe heap memory first. And that is what was slowing down the programs. 99:59:59.999 --> 99:59:59.999 If that was elided you could make the SHA1 computation even quicker. 99:59:59.999 --> 99:59:59.999 So that would be the logical target you would want to optimise. 99:59:59.999 --> 99:59:59.999 I wanted to extend Johannes' work with something called a 99:59:59.999 --> 99:59:59.999 Java Virtual Machine Tools Interface profiling agent. 99:59:59.999 --> 99:59:59.999 This is part of the Java Virtual Machine standard as you can make a special library 99:59:59.999 --> 99:59:59.999 and then hook this into the JVM. 99:59:59.999 --> 99:59:59.999 And the JVM can expose quite a few things to the library. 99:59:59.999 --> 99:59:59.999 It exposes a reasonable amount of information as well. 99:59:59.999 --> 99:59:59.999 Perf as well has the ability to look at map files natively. 99:59:59.999 --> 99:59:59.999 If you are profiling JavaScript, or something similar, I think the 99:59:59.999 --> 99:59:59.999 Google V8 JavaScript engine will write out a special map file that says 99:59:59.999 --> 99:59:59.999 these program counter addresses correspond to these function names. 99:59:59.999 --> 99:59:59.999 I decided to use that in a similar way to what Johannes did for the extended 99:59:59.999 --> 99:59:59.999 profiling agent but I also decided to capture some more information as well. 99:59:59.999 --> 99:59:59.999 I decided to capture the dis-assembly so when we run perf annotate 99:59:59.999 --> 99:59:59.999 we can see the actual JVM byte code in our annotation. 99:59:59.999 --> 99:59:59.999 We can see how it was JITd when it was JITd. We can see where the hotspots where. 99:59:59.999 --> 99:59:59.999 And that's good. But we can go even better. 99:59:59.999 --> 99:59:59.999 We can run an annotated trace that contains the Java class, 99:59:59.999 --> 99:59:59.999 the Java method and the byte code all in one place at the same time. 99:59:59.999 --> 99:59:59.999 You can see everything from the JVM at the same place. 99:59:59.999 --> 99:59:59.999 This works reasonably well because the perf interface is extremely extensible. 99:59:59.999 --> 99:59:59.999 And again we can do entire system optimisation. 99:59:59.999 --> 99:59:59.999 The bits in red here are the Linux kernel. 99:59:59.999 --> 99:59:59.999 Then we got into libraries. 99:59:59.999 --> 99:59:59.999 And then we got into Java and more libraries as well. 99:59:59.999 --> 99:59:59.999 So we can see everything from top to bottom in one fell swoop. 99:59:59.999 --> 99:59:59.999 This is just a quick slide showing the mechanisms employed. 99:59:59.999 --> 99:59:59.999 Essentially we have this agent which is a shared object file. 99:59:59.999 --> 99:59:59.999 And this will spit out useful files here in a standard way. 99:59:59.999 --> 99:59:59.999 And the Linux perf basically just records the perf data dump file as normal. 99:59:59.999 --> 99:59:59.999 We have 2 sets of recording going on and to report it it's very easy to do 99:59:59.999 --> 99:59:59.999 normal reporting with the PID map. 99:59:59.999 --> 99:59:59.999 This is just out of the box, works with the Google V8 engine as well. 99:59:59.999 --> 99:59:59.999 If you want to do very clever annotations perf has the ability to have 99:59:59.999 --> 99:59:59.999 Python scripts passed to it. 99:59:59.999 --> 99:59:59.999 So you can craft quite a dodgy Python script and that can interface 99:59:59.999 --> 99:59:59.999 with the perf annotation output. 99:59:59.999 --> 99:59:59.999 That's how I was able to get the extra Java information in the same annotation. 99:59:59.999 --> 99:59:59.999 And this is really easy to do; it's quite easy to knock the script up. 99:59:59.999 --> 99:59:59.999 And again the only thing we do for this profiling is we hook in the profiling 99:59:59.999 --> 99:59:59.999 agent which dumps out various things. 99:59:59.999 --> 99:59:59.999 We preserve the frame pointer because that makes things considerably easier 99:59:59.999 --> 99:59:59.999 on winding. This will effect performance a little bit. 99:59:59.999 --> 99:59:59.999 And again when we're reporting we just hook in a Python script. 99:59:59.999 --> 99:59:59.999 It's really easy to hook everything in and get it working. 99:59:59.999 --> 99:59:59.999 At the moment we have a JVMTI agent. It's actually on http://git.linaro.org now. 99:59:59.999 --> 99:59:59.999 Since I gave this talk Google have extended perf anyway so it will do 99:59:59.999 --> 99:59:59.999 quite a lot of similar things out of the box anyway. 99:59:59.999 --> 99:59:59.999 It's worth having a look at the latest perf. 99:59:59.999 --> 99:59:59.999 These techniques in this slide deck can be used obviously in other JITs quite easily. 99:59:59.999 --> 99:59:59.999 The fact that perf is so easy to extend with scripts can be useful 99:59:59.999 --> 99:59:59.999 for other things. 99:59:59.999 --> 99:59:59.999 And Open JDK has a significant amount of cleverness associated with it that 99:59:59.999 --> 99:59:59.999 I thought was very surprising and good. So that's what I covered on the talk. 99:59:59.999 --> 99:59:59.999 These are basically references to things like command line arguments 99:59:59.999 --> 99:59:59.999 and the Flame graphs and stuff like that. 99:59:59.999 --> 99:59:59.999 If anyone is interested in playing with Open JDK on ARM64 I'd suggest going here: 99:59:59.999 --> 99:59:59.999 http://openjdk.linaro.org Where the most recent builds are. 99:59:59.999 --> 99:59:59.999 Obviously fixes are going in upstream and they're going into distributions as well. 99:59:59.999 --> 99:59:59.999 They're included in Open JDK so it should be good as well. 99:59:59.999 --> 99:59:59.999 I've run through quite a few fundamental things reasonably quickly. 99:59:59.999 --> 99:59:59.999 I'd be happy to accept any questions or comments 99:59:59.999 --> 99:59:59.999 And if you want to talk to me privately about Java afterwards feel free to 99:59:59.999 --> 99:59:59.999 when no-ones looking. 99:59:59.999 --> 99:59:59.999 [Audience] Applause 99:59:59.999 --> 99:59:59.999 [Audience] It's not really a question so much as a comment. 99:59:59.999 --> 99:59:59.999 Last mini-Deb conf we had a talk about using the JVM with other languages. 99:59:59.999 --> 99:59:59.999 And it seems to me that all this would apply even if you hate Java programming 99:59:59.999 --> 99:59:59.999 language and want to write in, I don't know, lisp or something instead if you've 99:59:59.999 --> 99:59:59.999 got a lisp system that can generate JVM byte code. 99:59:59.999 --> 99:59:59.999 Yeah, totally. And the other big data language we looked at was Scala. 99:59:59.999 --> 99:59:59.999 It uses the JVM back end but a completely different language on the front. 99:59:59.999 --> 99:59:59.999 Cheers guys.