9:59:59.000,9:59:59.000 Onto the second talk 9:59:59.000,9:59:59.000 Steve Capper is going to tell us[br]about the good bits of Java 9:59:59.000,9:59:59.000 They do exist 9:59:59.000,9:59:59.000 [Audience] Could this have been a [br]lightening talk? [Audience laughter] 9:59:59.000,9:59:59.000 Believe it or not we've got some [br]good stuff here. 9:59:59.000,9:59:59.000 I was as skeptical as you guys [br]when I first looked. 9:59:59.000,9:59:59.000 First apologies for not attending this[br]mini-conf last year 9:59:59.000,9:59:59.000 I was unfortunately ill on the day [br]I was due to give this talk. 9:59:59.000,9:59:59.000 Let me figure out how to use a computer. 9:59:59.000,9:59:59.000 Sorry about this. 9:59:59.000,9:59:59.000 There we go; it's because [br]I've not woken up. 9:59:59.000,9:59:59.000 Last year I worked at Linaro in the [br]Enterprise group and we performed analysis 9:59:59.000,9:59:59.000 on 'Big Data' applications sets. 9:59:59.000,9:59:59.000 As many of you know quite a lot of these [br]big data applications are written in Java. 9:59:59.000,9:59:59.000 I'm from ARM and we were very interested[br]in 64bit ARM support. 9:59:59.000,9:59:59.000 So this is mainly AArch64 examples [br]for things like assembler 9:59:59.000,9:59:59.000 but most of the messages are [br]pertinent for any architecture. 9:59:59.000,9:59:59.000 These good bits are shared between [br]most if not all the architectures. 9:59:59.000,9:59:59.000 Whilst trying to optimise a lot of [br]these big data applications 9:59:59.000,9:59:59.000 I stumbled a across quite a few things in [br]the JVM and I thought 9:59:59.000,9:59:59.000 'actually that's really clever; [br]that's really cool' 9:59:59.000,9:59:59.000 So I thought that would make a good [br]basis for a talk. 9:59:59.000,9:59:59.000 This talk is essentially some of the [br]clever things I found in the 9:59:59.000,9:59:59.000 Java Virtual Machine; these [br]optimisations are in Open JDK. 9:59:59.000,9:59:59.000 Source is available it's all there, [br]readily available and in play now. 9:59:59.000,9:59:59.000 I'm going to finish with some of the [br]optimisation work we did with Java. 9:59:59.000,9:59:59.000 People who know me will know [br]I'm not a Java zealot. 9:59:59.000,9:59:59.000 I don't particularly believe in [br]programming in a language over another one 9:59:59.000,9:59:59.000 So to make it clear from the outset [br]I'm not attempting to convert 9:59:59.000,9:59:59.000 anyone to Java programmers. 9:59:59.000,9:59:59.000 I'm just going to highlight a few salient [br]things in the Java Virtual Machine 9:59:59.000,9:59:59.000 which I found to be quite clever and [br]interesting 9:59:59.000,9:59:59.000 and I'll try and talk through them [br]with my understanding of them. 9:59:59.000,9:59:59.000 Let's jump straight in and let's [br]start with an example. 9:59:59.000,9:59:59.000 This is a minimal example for [br]computing a SHA1 sum of a file. 9:59:59.000,9:59:59.000 I've alluded some of the checking in the [br]beginning of the function see when 9:59:59.000,9:59:59.000 command line parsing and that sort of [br]thing. 9:59:59.000,9:59:59.000 I've highlighted the salient points in red. 9:59:59.000,9:59:59.000 Essentially we instantiate a SHA1 [br]crypto message service digest. 9:59:59.000,9:59:59.000 And we do the equivalent in [br]Java of an mmap. 9:59:59.000,9:59:59.000 Get it all in memory. 9:59:59.000,9:59:59.000 And then we just put this status straight [br]into the crypto engine. 9:59:59.000,9:59:59.000 And eventually at the end of the [br]program we'll spit out the SHA1 hash. 9:59:59.000,9:59:59.000 It's a very simple programme 9:59:59.000,9:59:59.000 It's basically mmap, SHA1 output [br]the hash afterwards. 9:59:59.000,9:59:59.000 In order to concentrate on the CPU [br]aspect rather than worry about IO 9:59:59.000,9:59:59.000 I decided to cheat a little by [br]setting this up. 9:59:59.000,9:59:59.000 I decided to use a sparse file. As many of[br]you know a sparse file is a file that not 9:59:59.000,9:59:59.000 all the contents are necessarily stored [br]on disc. The assumption is that the bits 9:59:59.000,9:59:59.000 that aren't stored are zero. For instance[br]on Linux you can create a 20TB sparse file 9:59:59.000,9:59:59.000 on a 10MB file system and use it as [br]normal. 9:59:59.000,9:59:59.000 Just don't write too much to it otherwise [br]you're going to run out of space. 9:59:59.000,9:59:59.000 The idea behind using a sparse file is I'm[br]just focusing on the computational aspects 9:59:59.000,9:59:59.000 of the SHA1 sum. I'm not worried about [br]the file system or anything like that. 9:59:59.000,9:59:59.000 I don't want to worry about the IO. I [br]just want to focus on the actual compute. 9:59:59.000,9:59:59.000 In order to set up a sparse file I used [br]the following runes. 9:59:59.000,9:59:59.000 The important point is that you seek[br]and the other important point 9:59:59.000,9:59:59.000 is you set a count otherwise you'll fill your disc up. 9:59:59.000,9:59:59.000 I decided to run this against firstly [br]let's get the native SHA1 sum command 9:59:59.000,9:59:59.000 that's built into Linux and let's normalise these results and say that's 1.0. 9:59:59.000,9:59:59.000 I used an older version of the Open [br]JDK and ran the Java programme 9:59:59.000,9:59:59.000 and that's 1.09 times slower than the [br]reference command. That's quite good. 9:59:59.000,9:59:59.000 Then I used the new Open JDK, this is now[br]the current JDK as this is a year on. 9:59:59.000,9:59:59.000 And 0.21 taken. It's significantly faster. 9:59:59.000,9:59:59.000 I've stressed that I've done nothing [br]surreptitious in the Java program. 9:59:59.000,9:59:59.000 It is mmap, compute, spit result out. 9:59:59.000,9:59:59.000 But the Open JDK has essentially got [br]some more context information. 9:59:59.000,9:59:59.000 I'll talk about that as we go through. 9:59:59.000,9:59:59.000 Before when I started Java I had a very [br]simplistic view of Java. 9:59:59.000,9:59:59.000 Traditionally Java is taught as a virtual [br]machine that runs byte code. 9:59:59.000,9:59:59.000 Now when you compile a Java program it [br]compiles into byte code. 9:59:59.000,9:59:59.000 The older versions of the Java Virtual [br]Machine would interpret this byte code 9:59:59.000,9:59:59.000 and then run through. Newer versions would[br]employ a just-in-time engine and try and 9:59:59.000,9:59:59.000 compile this byte code into native machine code. 9:59:59.000,9:59:59.000 That is not the only thing that goes on[br]when you run a Java program. 9:59:59.000,9:59:59.000 There is some extra optimisations as well.[br]So this alone would not account for 9:59:59.000,9:59:59.000 the newer version of the SHA1 [br]sum beingsignificantly faster 9:59:59.000,9:59:59.000 than the distro supply one. 9:59:59.000,9:59:59.000 Java knows about context. It has a class [br]library and these class libraries 9:59:59.000,9:59:59.000 have reasonably well defined purposes. 9:59:59.000,9:59:59.000 We have classes that provide [br]crypto services. 9:59:59.000,9:59:59.000 We have some misc unsafe that every [br]single project seems to pull in their 9:59:59.000,9:59:59.000 project when they're not supposed to. 9:59:59.000,9:59:59.000 These have well defined meanings. 9:59:59.000,9:59:59.000 These do not necessarily have to be [br]written in Java. 9:59:59.000,9:59:59.000 They come as Java classes, [br]they come supplied. 9:59:59.000,9:59:59.000 But most JVMs now have a notion [br]of a virtual machine intrinsic 9:59:59.000,9:59:59.000 And the virtual machine intrinsic says ok [br]please do a SHA1 in the best possible way 9:59:59.000,9:59:59.000 that your implementation allows. This is [br]something done automatically by the JVM. 9:59:59.000,9:59:59.000 You don't ask for it. If the JVM knows[br]what it's running on and it's reasonably 9:59:59.000,9:59:59.000 recent this will just happen [br]for you for free. 9:59:59.000,9:59:59.000 And there's quite a few classes [br]that do this. 9:59:59.000,9:59:59.000 There's quite a few clever things with [br]atomics, there's crypto, 9:59:59.000,9:59:59.000 there's mathematical routines as well. [br]Most of these routines in the 9:59:59.000,9:59:59.000 class library have a well defined notion [br]of a virtual machine intrinsic 9:59:59.000,9:59:59.000 and they do run reasonably optimally. 9:59:59.000,9:59:59.000 They are a subject of continuous [br]optimisation as well. 9:59:59.000,9:59:59.000 We've got some runes that are [br]presented on the slides here. 9:59:59.000,9:59:59.000 These are quite useful if you [br]are interested in 9:59:59.000,9:59:59.000 how these intrinsics are made. 9:59:59.000,9:59:59.000 You can ask the JVM to print out a lot of[br]the just-in-time compiled code. 9:59:59.000,9:59:59.000 You can ask the JVM to print out the [br]native methods as well as these intrinsics 9:59:59.000,9:59:59.000 and in this particular case after sifting [br]through about 5MB of text 9:59:59.000,9:59:59.000 I've come across this particular SHA1 sum[br]implementation. 9:59:59.000,9:59:59.000 This is AArch64. This is employing the [br]cryptographic extensions 9:59:59.000,9:59:59.000 in the architecture. So it's essentially [br]using the CPU instructions which 9:59:59.000,9:59:59.000 would explain why it's faster. But again [br]it's done all this automatically. 9:59:59.000,9:59:59.000 This did not require any specific runes [br]or anything to activate. 9:59:59.000,9:59:59.000 We'll see a bit later on how you can [br]more easily find the hot spots 9:59:59.000,9:59:59.000 rather than sifting through a lot [br]of assembler. 9:59:59.000,9:59:59.000 I've mentioned that the cryptographic [br]engine is employed and again 9:59:59.000,9:59:59.000 this routine was generated at run [br]time as well. 9:59:59.000,9:59:59.000 This is one of the important things about [br]certain execution of amps like Java. 9:59:59.000,9:59:59.000 You don't have to know everything at [br]compile time. 9:59:59.000,9:59:59.000 You know a lot more information at [br]run time and you can use that 9:59:59.000,9:59:59.000 in theory to optimise. 9:59:59.000,9:59:59.000 You can switch off these clever routines. 9:59:59.000,9:59:59.000 For instance I've got a deactivate [br]here and we get back to the 9:59:59.000,9:59:59.000 slower performance we expected. 9:59:59.000,9:59:59.000 Again, this particular set of routines is [br]present in Open JDK, 9:59:59.000,9:59:59.000 I think for all the architectures that support it. 9:59:59.000,9:59:59.000 We get this optimisation for free on X86 [br]and others as well. 9:59:59.000,9:59:59.000 It works quite well. 9:59:59.000,9:59:59.000 That was one surprise I came across [br]as the instrinsics. 9:59:59.000,9:59:59.000 One thing I thought it would be quite [br]good to do would be to go through 9:59:59.000,9:59:59.000 a slightly more complicated example. [br]And use this example to explain 9:59:59.000,9:59:59.000 a lot of other things that happen [br]in the JVM as well. 9:59:59.000,9:59:59.000 I will spend a bit of time going through [br]this example 9:59:59.000,9:59:59.000 and explain roughly the notion of what [br]it's supposed to be doing. 9:59:59.000,9:59:59.000 This is an imaginary method that I've [br]contrived to demonstrate lot of points 9:59:59.000,9:59:59.000 in the fewest possible lines of code. 9:59:59.000,9:59:59.000 I'll start with what it's meant to do. 9:59:59.000,9:59:59.000 This is meant to be a routine that gets a[br]reference to something and let's you know 9:59:59.000,9:59:59.000 whether or not it's an image and in a [br]hypothetical cache. 9:59:59.000,9:59:59.000 I'll start with the important thing [br]here the weak reference. 9:59:59.000,9:59:59.000 In Java and other garbage collected [br]languages we have the notion of references. 9:59:59.000,9:59:59.000 Most of the time when you are running a [br]Java program you have something like a 9:59:59.000,9:59:59.000 variable name and that is in the current [br]execution context that is referred to as a 9:59:59.000,9:59:59.000 strong reference to the object. In other [br]words I can see it. I am using it. 9:59:59.000,9:59:59.000 Please don't get rid of it. [br]Bad things will happen if you do. 9:59:59.000,9:59:59.000 So the garbage collector knows [br]not to get rid of it. 9:59:59.000,9:59:59.000 In Java and other languages you also [br]have the notion of a weak reference. 9:59:59.000,9:59:59.000 This is essentially the programmer saying[br]to the virtual machine 9:59:59.000,9:59:59.000 "Look I kinda care about this but [br]just a little bit." 9:59:59.000,9:59:59.000 "If you want to get rid of it feel free [br]to but please let me know." 9:59:59.000,9:59:59.000 This is why this is for a cache class. [br]For instance the JVM in this particular 9:59:59.000,9:59:59.000 case could decide that it's running quite [br]low on memory this particular xMB image 9:59:59.000,9:59:59.000 has not been used for a while it can [br]garbage collect it. 9:59:59.000,9:59:59.000 The important thing is how we go about [br]expressing this in the language. 9:59:59.000,9:59:59.000 We can't just have a reference to the [br]object because that's a strong reference 9:59:59.000,9:59:59.000 and the JVM will know it can't get [br]rid of this because the program 9:59:59.000,9:59:59.000 can see it actively. 9:59:59.000,9:59:59.000 So we have a level of indirection which is [br]known as a weak reference. 9:59:59.000,9:59:59.000 We have this hypothetical CacheClass [br]that I've devised. 9:59:59.000,9:59:59.000 At this point it is a weak reference. 9:59:59.000,9:59:59.000 Then we get it. This is calling the weak [br]reference routine. 9:59:59.000,9:59:59.000 Now it becomes a strong reference so [br]it's not going to be garbage collected. 9:59:59.000,9:59:59.000 When we get to the return path it becomes [br]a weak reference again 9:59:59.000,9:59:59.000 because our strong reference [br]has disappeared. 9:59:59.000,9:59:59.000 The salient points in this example are: 9:59:59.000,9:59:59.000 We're employing a method to get [br]a reference. 9:59:59.000,9:59:59.000 We're checking an item to see if [br]it's null. 9:59:59.000,9:59:59.000 So let's say that the JVM decided to [br]garbage collect this 9:59:59.000,9:59:59.000 before we executed the method. 9:59:59.000,9:59:59.000 The weak reference class is still valid [br]because we've got a strong reference to it 9:59:59.000,9:59:59.000 but the actual object behind this is gone. 9:59:59.000,9:59:59.000 If we're too late and the garbage [br]collector has killed it 9:59:59.000,9:59:59.000 it will be null and we return. 9:59:59.000,9:59:59.000 So it's a level of indirection to see [br]does this still exist 9:59:59.000,9:59:59.000 if so can I please have it and then [br]operate on it as normal 9:59:59.000,9:59:59.000 and then return becomes weak [br]reference again. 9:59:59.000,9:59:59.000 This example program is quite useful when[br]we look at how it's implemented in the JVM 9:59:59.000,9:59:59.000 and we'll go through a few things now. 9:59:59.000,9:59:59.000 First off we'll go through the byte code. 9:59:59.000,9:59:59.000 The only point of this slide is to [br]show it's roughly 9:59:59.000,9:59:59.000 the same as this. 9:59:59.000,9:59:59.000 We get our variable. 9:59:59.000,9:59:59.000 We use our getter. 9:59:59.000,9:59:59.000 This bit is extra this checkcast. [br]The reason that bit is extra is 9:59:59.000,9:59:59.000 because we're using the equivalent of [br]a template in Java. 9:59:59.000,9:59:59.000 And the way that's implemented in Java is [br]it just basically casts everything to an 9:59:59.000,9:59:59.000 object so that requires extra [br]compiler information. 9:59:59.000,9:59:59.000 And this is the extra check. 9:59:59.000,9:59:59.000 The rest of this we load the reference, [br]we check to see if it is null, 9:59:59.000,9:59:59.000 If it's not null we invoke a virtual [br]function - is it the image? 9:59:59.000,9:59:59.000 and we return as normal. 9:59:59.000,9:59:59.000 Essentially the point I'm trying to make [br]is when we compile this to byte code 9:59:59.000,9:59:59.000 this execution happens. 9:59:59.000,9:59:59.000 This null check happens. 9:59:59.000,9:59:59.000 This execution happens. 9:59:59.000,9:59:59.000 And we return. 9:59:59.000,9:59:59.000 In the actual Java class files we've not [br]lost anything. 9:59:59.000,9:59:59.000 This is what it looks like when it's [br]been JIT'd. 9:59:59.000,9:59:59.000 Now we've lost lots of things. 9:59:59.000,9:59:59.000 The JIT has done quite a few clever things[br]which I'll talk about. 9:59:59.000,9:59:59.000 First off if we look down here there's [br]a single branch here. 9:59:59.000,9:59:59.000 And this is only if our check cast failed 9:59:59.000,9:59:59.000 If we've got comments on the [br]right hand side. 9:59:59.000,9:59:59.000 Our get method has been in-lined so [br]we're no longer calling. 9:59:59.000,9:59:59.000 We seem to have lost our null check,[br]that's just gone. 9:59:59.000,9:59:59.000 And again we've got a get field as well. 9:59:59.000,9:59:59.000 That's no longer a method, [br]that's been in-lined as well 9:59:59.000,9:59:59.000 We've also got some other cute things. 9:59:59.000,9:59:59.000 Those more familiar with AArch64 will [br]understand that the pointers we're using 9:59:59.000,9:59:59.000 are 32bit not 64bit. 9:59:59.000,9:59:59.000 What we're doing is getting a pointer [br]and shifting it left 3 9:59:59.000,9:59:59.000 and widening it to a 64bit pointer. 9:59:59.000,9:59:59.000 We've also got 32bit pointers on a [br]64bit system as well. 9:59:59.000,9:59:59.000 So that's saving a reasonable amount [br]of memory and cache. 9:59:59.000,9:59:59.000 To summarise. We don't have any [br]branches or function calls 9:59:59.000,9:59:59.000 and we've got a lot of in-lining. 9:59:59.000,9:59:59.000 We did have function calls in the [br]class file so it's the JVM 9:59:59.000,9:59:59.000 it's the JIT that has done this. 9:59:59.000,9:59:59.000 We've got no null checks either and I'm [br]going to talk through this now. 9:59:59.000,9:59:59.000 The null check elimination is quite a [br]clever feature in Java and other programs. 9:59:59.000,9:59:59.000 The idea behind null check elimination is 9:59:59.000,9:59:59.000 most of the time this object is not [br]going to be null. 9:59:59.000,9:59:59.000 If this object is null the operating [br]system knows this quite quickly. 9:59:59.000,9:59:59.000 So if you try to de-reference a null [br]pointer you'll get either a SIGSEGV or 9:59:59.000,9:59:59.000 a SIGBUS depending on a [br]few circumstances. 9:59:59.000,9:59:59.000 That goes straight back to the JVM 9:59:59.000,9:59:59.000 and the JVM knows where the null [br]exception took place. 9:59:59.000,9:59:59.000 Because it knows where the exception took [br]place it can look this up 9:59:59.000,9:59:59.000 and unwind it as part of an exception. 9:59:59.000,9:59:59.000 Those null checks just go.[br]Completely gone. 9:59:59.000,9:59:59.000 Most of the time this works and you are [br]saving a reasonable amount of execution. 9:59:59.000,9:59:59.000 I'll talk about when it doesn't work [br]in a second. 9:59:59.000,9:59:59.000 That's reasonably clever. We have similar [br]programming techniques in other places 9:59:59.000,9:59:59.000 even the Linux kernel for instance when [br]you copy data to and from user space 9:59:59.000,9:59:59.000 it does pretty much identical the same [br]thing. It has an exception unwind table 9:59:59.000,9:59:59.000 and it knows if it catches a page fault on[br]this particular program counter 9:59:59.000,9:59:59.000 it can deal with it because it knows [br]the program counter and it knows 9:59:59.000,9:59:59.000 conceptually what it was doing. 9:59:59.000,9:59:59.000 In a similar way the JIT know what its [br]doing to a reasonable degree. 9:59:59.000,9:59:59.000 It can handle the null check elimination. 9:59:59.000,9:59:59.000 I mentioned the sneaky one. We've got[br]essentially 32bit pointers 9:59:59.000,9:59:59.000 on a 64bit system. 9:59:59.000,9:59:59.000 Most of the time in Java people typically [br]specify heap size smaller than 32GB. 9:59:59.000,9:59:59.000 Which is perfect if you want to use 32bit [br]pointers and left shift 3. 9:59:59.000,9:59:59.000 Because that gives you 32GB of [br]addressable memory. 9:59:59.000,9:59:59.000 That's a significant memory saving because[br]otherwise a lot of things would double up. 9:59:59.000,9:59:59.000 There's a significant number of pointers [br]in Java. 9:59:59.000,9:59:59.000 The one that should make people [br]jump out of their seat is 9:59:59.000,9:59:59.000 the fact that most methods in Java are [br]actually virtual. 9:59:59.000,9:59:59.000 So what the JVM has actually done is [br]in-lined a virtual function. 9:59:59.000,9:59:59.000 A virtual function is essentially a [br]function were you don't know where 9:59:59.000,9:59:59.000 you're going until run time. 9:59:59.000,9:59:59.000 You can have several different classes [br]and they share the same virtual function 9:59:59.000,9:59:59.000 in the base class and dependent upon [br]which specific class you're running 9:59:59.000,9:59:59.000 different virtual functions will [br]get executed. 9:59:59.000,9:59:59.000 In C++ that will be a read from a V table[br]and then you know where to go. 9:59:59.000,9:59:59.000 The JVM's in-lined it. 9:59:59.000,9:59:59.000 We've saved a memory load. 9:59:59.000,9:59:59.000 We've saved a branch as well 9:59:59.000,9:59:59.000 The reason the JVM can in-line it is [br]because the JVM knows 9:59:59.000,9:59:59.000 every single class that has been loaded. 9:59:59.000,9:59:59.000 So it knows that although this looks [br]polymorphic to the casual programmer 9:59:59.000,9:59:59.000 It is actually monomorphic.[br]The JVM knows this.