1 99:59:59,999 --> 99:59:59,999 Onto the second talk 2 99:59:59,999 --> 99:59:59,999 Steve Capper is going to tell us about the good bits of Java 3 99:59:59,999 --> 99:59:59,999 They do exist 4 99:59:59,999 --> 99:59:59,999 [Audience] Could this have been a lightening talk? [Audience laughter] 5 99:59:59,999 --> 99:59:59,999 Believe it or not we've got some good stuff here. 6 99:59:59,999 --> 99:59:59,999 I was as skeptical as you guys when I first looked. 7 99:59:59,999 --> 99:59:59,999 First apologies for not attending this mini-conf last year 8 99:59:59,999 --> 99:59:59,999 I was unfortunately ill on the day I was due to give this talk. 9 99:59:59,999 --> 99:59:59,999 Let me figure out how to use a computer. 10 99:59:59,999 --> 99:59:59,999 Sorry about this. 11 99:59:59,999 --> 99:59:59,999 There we go; it's because I've not woken up. 12 99:59:59,999 --> 99:59:59,999 Last year I worked at Linaro in the Enterprise group and we performed analysis 13 99:59:59,999 --> 99:59:59,999 on 'Big Data' applications sets. 14 99:59:59,999 --> 99:59:59,999 As many of you know quite a lot of these big data applications are written in Java. 15 99:59:59,999 --> 99:59:59,999 I'm from ARM and we were very interested in 64bit ARM support. 16 99:59:59,999 --> 99:59:59,999 So this is mainly AArch64 examples for things like assembler 17 99:59:59,999 --> 99:59:59,999 but most of the messages are pertinent for any architecture. 18 99:59:59,999 --> 99:59:59,999 These good bits are shared between most if not all the architectures. 19 99:59:59,999 --> 99:59:59,999 Whilst trying to optimise a lot of these big data applications 20 99:59:59,999 --> 99:59:59,999 I stumbled a across quite a few things in the JVM and I thought 21 99:59:59,999 --> 99:59:59,999 'actually that's really clever; that's really cool' 22 99:59:59,999 --> 99:59:59,999 So I thought that would make a good basis for a talk. 23 99:59:59,999 --> 99:59:59,999 This talk is essentially some of the clever things I found in the 24 99:59:59,999 --> 99:59:59,999 Java Virtual Machine; these optimisations are in Open JDK. 25 99:59:59,999 --> 99:59:59,999 Source is available it's all there, readily available and in play now. 26 99:59:59,999 --> 99:59:59,999 I'm going to finish with some of the optimisation work we did with Java. 27 99:59:59,999 --> 99:59:59,999 People who know me will know I'm not a Java zealot. 28 99:59:59,999 --> 99:59:59,999 I don't particularly believe in programming in a language over another one 29 99:59:59,999 --> 99:59:59,999 So to make it clear from the outset I'm not attempting to convert 30 99:59:59,999 --> 99:59:59,999 anyone to Java programmers. 31 99:59:59,999 --> 99:59:59,999 I'm just going to highlight a few salient things in the Java Virtual Machine 32 99:59:59,999 --> 99:59:59,999 which I found to be quite clever and interesting 33 99:59:59,999 --> 99:59:59,999 and I'll try and talk through them with my understanding of them. 34 99:59:59,999 --> 99:59:59,999 Let's jump straight in and let's start with an example. 35 99:59:59,999 --> 99:59:59,999 This is a minimal example for computing a SHA1 sum of a file. 36 99:59:59,999 --> 99:59:59,999 I've alluded some of the checking in the beginning of the function see when 37 99:59:59,999 --> 99:59:59,999 command line parsing and that sort of thing. 38 99:59:59,999 --> 99:59:59,999 I've highlighted the salient points in red. 39 99:59:59,999 --> 99:59:59,999 Essentially we instantiate a SHA1 crypto message service digest. 40 99:59:59,999 --> 99:59:59,999 And we do the equivalent in Java of an mmap. 41 99:59:59,999 --> 99:59:59,999 Get it all in memory. 42 99:59:59,999 --> 99:59:59,999 And then we just put this status straight into the crypto engine. 43 99:59:59,999 --> 99:59:59,999 And eventually at the end of the program we'll spit out the SHA1 hash. 44 99:59:59,999 --> 99:59:59,999 It's a very simple programme 45 99:59:59,999 --> 99:59:59,999 It's basically mmap, SHA1 output the hash afterwards. 46 99:59:59,999 --> 99:59:59,999 In order to concentrate on the CPU aspect rather than worry about IO 47 99:59:59,999 --> 99:59:59,999 I decided to cheat a little by setting this up. 48 99:59:59,999 --> 99:59:59,999 I decided to use a sparse file. As many of you know a sparse file is a file that not 49 99:59:59,999 --> 99:59:59,999 all the contents are necessarily stored on disc. The assumption is that the bits 50 99:59:59,999 --> 99:59:59,999 that aren't stored are zero. For instance on Linux you can create a 20TB sparse file 51 99:59:59,999 --> 99:59:59,999 on a 10MB file system and use it as normal. 52 99:59:59,999 --> 99:59:59,999 Just don't write too much to it otherwise you're going to run out of space. 53 99:59:59,999 --> 99:59:59,999 The idea behind using a sparse file is I'm just focusing on the computational aspects 54 99:59:59,999 --> 99:59:59,999 of the SHA1 sum. I'm not worried about the file system or anything like that. 55 99:59:59,999 --> 99:59:59,999 I don't want to worry about the IO. I just want to focus on the actual compute. 56 99:59:59,999 --> 99:59:59,999 In order to set up a sparse file I used the following runes. 57 99:59:59,999 --> 99:59:59,999 The important point is that you seek and the other important point 58 99:59:59,999 --> 99:59:59,999 is you set a count otherwise you'll fill your disc up. 59 99:59:59,999 --> 99:59:59,999 I decided to run this against firstly let's get the native SHA1 sum command 60 99:59:59,999 --> 99:59:59,999 that's built into Linux and let's normalise these results and say that's 1.0. 61 99:59:59,999 --> 99:59:59,999 I used an older version of the Open JDK and ran the Java programme 62 99:59:59,999 --> 99:59:59,999 and that's 1.09 times slower than the reference command. That's quite good. 63 99:59:59,999 --> 99:59:59,999 Then I used the new Open JDK, this is now the current JDK as this is a year on. 64 99:59:59,999 --> 99:59:59,999 And 0.21 taken. It's significantly faster. 65 99:59:59,999 --> 99:59:59,999 I've stressed that I've done nothing surreptitious in the Java program. 66 99:59:59,999 --> 99:59:59,999 It is mmap, compute, spit result out. 67 99:59:59,999 --> 99:59:59,999 But the Open JDK has essentially got some more context information. 68 99:59:59,999 --> 99:59:59,999 I'll talk about that as we go through. 69 99:59:59,999 --> 99:59:59,999 Before when I started Java I had a very simplistic view of Java. 70 99:59:59,999 --> 99:59:59,999 Traditionally Java is taught as a virtual machine that runs byte code. 71 99:59:59,999 --> 99:59:59,999 Now when you compile a Java program it compiles into byte code. 72 99:59:59,999 --> 99:59:59,999 The older versions of the Java Virtual Machine would interpret this byte code 73 99:59:59,999 --> 99:59:59,999 and then run through. Newer versions would employ a just-in-time engine and try and 74 99:59:59,999 --> 99:59:59,999 compile this byte code into native machine code. 75 99:59:59,999 --> 99:59:59,999 That is not the only thing that goes on when you run a Java program. 76 99:59:59,999 --> 99:59:59,999 There is some extra optimisations as well. So this alone would not account for 77 99:59:59,999 --> 99:59:59,999 the newer version of the SHA1 sum beingsignificantly faster 78 99:59:59,999 --> 99:59:59,999 than the distro supply one. 79 99:59:59,999 --> 99:59:59,999 Java knows about context. It has a class library and these class libraries 80 99:59:59,999 --> 99:59:59,999 have reasonably well defined purposes. 81 99:59:59,999 --> 99:59:59,999 We have classes that provide crypto services. 82 99:59:59,999 --> 99:59:59,999 We have some misc unsafe that every single project seems to pull in their 83 99:59:59,999 --> 99:59:59,999 project when they're not supposed to. 84 99:59:59,999 --> 99:59:59,999 These have well defined meanings. 85 99:59:59,999 --> 99:59:59,999 These do not necessarily have to be written in Java. 86 99:59:59,999 --> 99:59:59,999 They come as Java classes, they come supplied. 87 99:59:59,999 --> 99:59:59,999 But most JVMs now have a notion of a virtual machine intrinsic 88 99:59:59,999 --> 99:59:59,999 And the virtual machine intrinsic says ok please do a SHA1 in the best possible way 89 99:59:59,999 --> 99:59:59,999 that your implementation allows. This is something done automatically by the JVM. 90 99:59:59,999 --> 99:59:59,999 You don't ask for it. If the JVM knows what it's running on and it's reasonably 91 99:59:59,999 --> 99:59:59,999 recent this will just happen for you for free. 92 99:59:59,999 --> 99:59:59,999 And there's quite a few classes that do this. 93 99:59:59,999 --> 99:59:59,999 There's quite a few clever things with atomics, there's crypto, 94 99:59:59,999 --> 99:59:59,999 there's mathematical routines as well. Most of these routines in the 95 99:59:59,999 --> 99:59:59,999 class library have a well defined notion of a virtual machine intrinsic 96 99:59:59,999 --> 99:59:59,999 and they do run reasonably optimally. 97 99:59:59,999 --> 99:59:59,999 They are a subject of continuous optimisation as well. 98 99:59:59,999 --> 99:59:59,999 We've got some runes that are presented on the slides here. 99 99:59:59,999 --> 99:59:59,999 These are quite useful if you are interested in 100 99:59:59,999 --> 99:59:59,999 how these intrinsics are made. 101 99:59:59,999 --> 99:59:59,999 You can ask the JVM to print out a lot of the just-in-time compiled code. 102 99:59:59,999 --> 99:59:59,999 You can ask the JVM to print out the native methods as well as these intrinsics 103 99:59:59,999 --> 99:59:59,999 and in this particular case after sifting through about 5MB of text 104 99:59:59,999 --> 99:59:59,999 I've come across this particular SHA1 sum implementation. 105 99:59:59,999 --> 99:59:59,999 This is AArch64. This is employing the cryptographic extensions 106 99:59:59,999 --> 99:59:59,999 in the architecture. So it's essentially using the CPU instructions which 107 99:59:59,999 --> 99:59:59,999 would explain why it's faster. But again it's done all this automatically. 108 99:59:59,999 --> 99:59:59,999 This did not require any specific runes or anything to activate. 109 99:59:59,999 --> 99:59:59,999 We'll see a bit later on how you can more easily find the hot spots 110 99:59:59,999 --> 99:59:59,999 rather than sifting through a lot of assembler. 111 99:59:59,999 --> 99:59:59,999 I've mentioned that the cryptographic engine is employed and again 112 99:59:59,999 --> 99:59:59,999 this routine was generated at run time as well. 113 99:59:59,999 --> 99:59:59,999 This is one of the important things about certain execution of amps like Java. 114 99:59:59,999 --> 99:59:59,999 You don't have to know everything at compile time. 115 99:59:59,999 --> 99:59:59,999 You know a lot more information at run time and you can use that 116 99:59:59,999 --> 99:59:59,999 in theory to optimise. 117 99:59:59,999 --> 99:59:59,999 You can switch off these clever routines. 118 99:59:59,999 --> 99:59:59,999 For instance I've got a deactivate here and we get back to the 119 99:59:59,999 --> 99:59:59,999 slower performance we expected. 120 99:59:59,999 --> 99:59:59,999 Again, this particular set of routines is present in Open JDK, 121 99:59:59,999 --> 99:59:59,999 I think for all the architectures that support it. 122 99:59:59,999 --> 99:59:59,999 We get this optimisation for free on X86 and others as well. 123 99:59:59,999 --> 99:59:59,999 It works quite well. 124 99:59:59,999 --> 99:59:59,999 That was one surprise I came across as the instrinsics. 125 99:59:59,999 --> 99:59:59,999 One thing I thought it would be quite good to do would be to go through 126 99:59:59,999 --> 99:59:59,999 a slightly more complicated example. And use this example to explain 127 99:59:59,999 --> 99:59:59,999 a lot of other things that happen in the JVM as well. 128 99:59:59,999 --> 99:59:59,999 I will spend a bit of time going through this example 129 99:59:59,999 --> 99:59:59,999 and explain roughly the notion of what it's supposed to be doing. 130 99:59:59,999 --> 99:59:59,999 This is an imaginary method that I've contrived to demonstrate lot of points 131 99:59:59,999 --> 99:59:59,999 in the fewest possible lines of code. 132 99:59:59,999 --> 99:59:59,999 I'll start with what it's meant to do. 133 99:59:59,999 --> 99:59:59,999 This is meant to be a routine that gets a reference to something and let's you know 134 99:59:59,999 --> 99:59:59,999 whether or not it's an image and in a hypothetical cache. 135 99:59:59,999 --> 99:59:59,999 I'll start with the important thing here the weak reference. 136 99:59:59,999 --> 99:59:59,999 In Java and other garbage collected languages we have the notion of references. 137 99:59:59,999 --> 99:59:59,999 Most of the time when you are running a Java program you have something like a 138 99:59:59,999 --> 99:59:59,999 variable name and that is in the current execution context that is referred to as a 139 99:59:59,999 --> 99:59:59,999 strong reference to the object. In other words I can see it. I am using it. 140 99:59:59,999 --> 99:59:59,999 Please don't get rid of it. Bad things will happen if you do. 141 99:59:59,999 --> 99:59:59,999 So the garbage collector knows not to get rid of it. 142 99:59:59,999 --> 99:59:59,999 In Java and other languages you also have the notion of a weak reference. 143 99:59:59,999 --> 99:59:59,999 This is essentially the programmer saying to the virtual machine 144 99:59:59,999 --> 99:59:59,999 "Look I kinda care about this but just a little bit." 145 99:59:59,999 --> 99:59:59,999 "If you want to get rid of it feel free to but please let me know." 146 99:59:59,999 --> 99:59:59,999 This is why this is for a cache class. For instance the JVM in this particular 147 99:59:59,999 --> 99:59:59,999 case could decide that it's running quite low on memory this particular xMB image 148 99:59:59,999 --> 99:59:59,999 has not been used for a while it can garbage collect it. 149 99:59:59,999 --> 99:59:59,999 The important thing is how we go about expressing this in the language. 150 99:59:59,999 --> 99:59:59,999 We can't just have a reference to the object because that's a strong reference 151 99:59:59,999 --> 99:59:59,999 and the JVM will know it can't get rid of this because the program 152 99:59:59,999 --> 99:59:59,999 can see it actively. 153 99:59:59,999 --> 99:59:59,999 So we have a level of direction which is known as