1 99:59:59,999 --> 99:59:59,999 (Offscreen) So welcome to the talk from Niels 2 99:59:59,999 --> 99:59:59,999 about dependency resolution in polynomial time. 3 99:59:59,999 --> 99:59:59,999 (Neils) I'm Neils, I'm doing my second talk today. 4 99:59:59,999 --> 99:59:59,999 This time about something slightly more technical. 5 99:59:59,999 --> 99:59:59,999 I wanted to do dependency resolution in deterministic polynomial time, 6 99:59:59,999 --> 99:59:59,999 which is what this is all about. 7 99:59:59,999 --> 99:59:59,999 I didn't get as far with this as I wanted. 8 99:59:59,999 --> 99:59:59,999 I wanted to have a tool ready and have some results, which I didn't quite finish. 9 99:59:59,999 --> 99:59:59,999 So these will be my findings work in progress notes. 10 99:59:59,999 --> 99:59:59,999 We have six topics I have to cover. 11 99:59:59,999 --> 99:59:59,999 A quick introduction. 12 99:59:59,999 --> 99:59:59,999 Then something about the hard problem. 13 99:59:59,999 --> 99:59:59,999 Then the part the part where it actually turns out to be highly tractable 14 99:59:59,999 --> 99:59:59,999 in practice, if you don't think too much about it. 15 99:59:59,999 --> 99:59:59,999 And then there is some installing, upgrading, and installing again 16 99:59:59,999 --> 99:59:59,999 for how to improve your solvers. 17 99:59:59,999 --> 99:59:59,999 Starting with introduction. 18 99:59:59,999 --> 99:59:59,999 I'm hoping mostly to debunk the myth that dependency resolution is a very 19 99:59:59,999 --> 99:59:59,999 hard problem that we cannot solve, 20 99:59:59,999 --> 99:59:59,999 and we should remove packages in the hopes that they will somehow keep the archive 21 99:59:59,999 --> 99:59:59,999 from exploding, and the dependency resolvers to break, and apt to die, 22 99:59:59,999 --> 99:59:59,999 and god knows what. 23 99:59:59,999 --> 99:59:59,999 I've been working on Britney, which is a different problem to solve, but it 24 99:59:59,999 --> 99:59:59,999 involves the same techniques, or uses some of the same techniques. 25 99:59:59,999 --> 99:59:59,999 Some will be directly applicable to apt, some will not. 26 99:59:59,999 --> 99:59:59,999 There's not a one size fits all solution, sorry. 27 99:59:59,999 --> 99:59:59,999 And I much less haven't wrote it yet, even if there was. 28 99:59:59,999 --> 99:59:59,999 So defining the problem. 29 99:59:59,999 --> 99:59:59,999 When you try to install a package on your system, 30 99:59:59,999 --> 99:59:59,999 there's actually two problems being solved. 31 99:59:59,999 --> 99:59:59,999 One is the part where apt figures out if I have to install Eclipse. 32 99:59:59,999 --> 99:59:59,999 You'll be needing a lot of Java packages, you'll be needing a lot of other stuff. 33 99:59:59,999 --> 99:59:59,999 And so it figures out which packages are needed for this to make sense. 34 99:59:59,999 --> 99:59:59,999 The separate part of the problem, the second problem is, 35 99:59:59,999 --> 99:59:59,999 once you have this list, they have to be unpackaged and configured in a certain 36 99:59:59,999 --> 99:59:59,999 order for this all to make sense. 37 99:59:59,999 --> 99:59:59,999 They are in theory two distinct problems. 38 99:59:59,999 --> 99:59:59,999 I'll be talking about the first problem, 39 99:59:59,999 --> 99:59:59,999 because that's actually dependency resolution. 40 99:59:59,999 --> 99:59:59,999 The other thing is just ordering. 41 99:59:59,999 --> 99:59:59,999 The ordering thing is certainly also very important part. 42 99:59:59,999 --> 99:59:59,999 I don't want to dismiss that, it's just 43 99:59:59,999 --> 99:59:59,999 it is in fact theoretically a polynomial time problem. 44 99:59:59,999 --> 99:59:59,999 So to solve the ordering problem, you basically compute all the action- 45 99:59:59,999 --> 99:59:59,999 Given a set of actions that need to be done, 46 99:59:59,999 --> 99:59:59,999 then there are some constraints between some of them, 47 99:59:59,999 --> 99:59:59,999 like you unpack things before you configure it. 48 99:59:59,999 --> 99:59:59,999 Maybe you deconfigure something else before that. 49 99:59:59,999 --> 99:59:59,999 So you end up with a list of partial ?? constraints. 50 99:59:59,999 --> 99:59:59,999 From that you build a graph with ordering. 51 99:59:59,999 --> 99:59:59,999 It's fairly simple to do, in practice, if you have the right tools for it. 52 99:59:59,999 --> 99:59:59,999 Then without cycles, this is a trivial things where it just orders it 53 99:59:59,999 --> 99:59:59,999 and gives you this is the order, go fix. 54 99:59:59,999 --> 99:59:59,999 When there's cycles you will get a single action consisting of multiple things 55 99:59:59,999 --> 99:59:59,999 to do at the same time, which is really impossible to do. 56 99:59:59,999 --> 99:59:59,999 It turns out that the way we defined this whole thing, 57 99:59:59,999 --> 99:59:59,999 if you have a package that does not have a post install script, 58 99:59:59,999 --> 99:59:59,999 it does not need to be configured separately. 59 99:59:59,999 --> 99:59:59,999 That means that it's just unpacked or not unpacked. 60 99:59:59,999 --> 99:59:59,999 That tends to break most cylcles. 61 99:59:59,999 --> 99:59:59,999 If you want, there's problems with ordering constraints. 62 99:59:59,999 --> 99:59:59,999 Help us to find a way to get rid of most of the post install scripts, 63 99:59:59,999 --> 99:59:59,999 such as the ones just running lvconfig and nothing else. 64 99:59:59,999 --> 99:59:59,999 That could solve some of the cycles, 65 99:59:59,999 --> 99:59:59,999 and otherwise it's just feeding it to dpkg and it works. 66 99:59:59,999 --> 99:59:59,999 If you have a cycle, it's a separate problem fixing that. 67 99:59:59,999 --> 99:59:59,999 I won't be covering that. 68 99:59:59,999 --> 99:59:59,999 As I said, the runtime is polynomial. 69 99:59:59,999 --> 99:59:59,999 It's something like the size of the package, a regular graph. 70 99:59:59,999 --> 99:59:59,999 Which is fairly polynomial. 71 99:59:59,999 --> 99:59:59,999 This even covers finding cycles. 72 99:59:59,999 --> 99:59:59,999 Again breaking cycles is just removing post install scripts. 73 99:59:59,999 --> 99:59:59,999 And of course if you think this is too simple, 74 99:59:59,999 --> 99:59:59,999 you can always implement more features on top of it, 75 99:59:59,999 --> 99:59:59,999 such as trying to optimize for minimal downtime of services and all that. 76 99:59:59,999 --> 99:59:59,999 You can make any trivial problem very hard very soon with that. 77 99:59:59,999 --> 99:59:59,999 So, the hard problem. 78 99:59:59,999 --> 99:59:59,999 First of all, the players. 79 99:59:59,999 --> 99:59:59,999 We've got apt, aptitude, cupt, whatever. 80 99:59:59,999 --> 99:59:59,999 We've got Britney. 81 99:59:59,999 --> 99:59:59,999 We've got DOSE, edos, whatever its name is today. 82 99:59:59,999 --> 99:59:59,999 They solve the hard problem. 83 99:59:59,999 --> 99:59:59,999 Or they should be trying to do. 84 99:59:59,999 --> 99:59:59,999 Notable tools not affected by it. 85 99:59:59,999 --> 99:59:59,999 dpkg. 86 99:59:59,999 --> 99:59:59,999 So dpkg surely figures out if you are missing dependencies. It says, 87 99:59:59,999 --> 99:59:59,999 "I cannot resolve this, and I don't know where to fetch it, so I'm just giving up." 88 99:59:59,999 --> 99:59:59,999 And that is the only thing you can do. 89 99:59:59,999 --> 99:59:59,999 Which means it only verifies the solution which is known to be polynomial, 90 99:59:59,999 --> 99:59:59,999 and is therefore is not a game player. 91 99:59:59,999 --> 99:59:59,999 DAK rm is also doing a polynomial time check, it just happens to be so 92 99:59:59,999 --> 99:59:59,999 for other reasons. 93 99:59:59,999 --> 99:59:59,999 That's basically the known players. 94 99:59:59,999 --> 99:59:59,999 There are probably others. 95 99:59:59,999 --> 99:59:59,999 But we're moving onto the hard problem itself 96 99:59:59,999 --> 99:59:59,999 So the problem we actually have is, 97 99:59:59,999 --> 99:59:59,999 we've got version dependencies, we've got alternative choices, 98 99:59:59,999 --> 99:59:59,999 we've got virtual packages. 99 99:59:59,999 --> 99:59:59,999 All three makes this problem hard. 100 99:59:59,999 --> 99:59:59,999 You basically have to remove all possible alternative choices, guesses, whatever, 101 99:59:59,999 --> 99:59:59,999 to make this simple. 102 99:59:59,999 --> 99:59:59,999 It just so happens if you move version dependencies among other things, 103 99:59:59,999 --> 99:59:59,999 things become really fun. 104 99:59:59,999 --> 99:59:59,999 ?? to solve the problem, but upgrading gets really really spicy. 105 99:59:59,999 --> 99:59:59,999 So how do we ensure that dpkg is configured before your new package, 106 99:59:59,999 --> 99:59:59,999 use a feature new dev package is installed. 107 99:59:59,999 --> 99:59:59,999 That's sort of impossible. 108 99:59:59,999 --> 99:59:59,999 It becomes simple, but we end up with a broken dependency ??, 109 99:59:59,999 --> 99:59:59,999 so it's not really an option. 110 99:59:59,999 --> 99:59:59,999 Technically there's also a way to make it simple by having multiple versions 111 99:59:59,999 --> 99:59:59,999 of things, and removing negative dependencies, 112 99:59:59,999 --> 99:59:59,999 but that's not necessarily easy to do either. 113 99:59:59,999 --> 99:59:59,999 Presumably file configs will get in our way or something. 114 99:59:59,999 --> 99:59:59,999 We have to redefine where we place all of the files, that's not going to be 115 99:59:59,999 --> 99:59:59,999 something we're doing anytime soon. 116 99:59:59,999 --> 99:59:59,999 So yeah, it's not really an option. 117 99:59:59,999 --> 99:59:59,999 To give you a better understanding of the problem, please consider this example. 118 99:59:59,999 --> 99:59:59,999 If we have coreutils at some version, depending on either this version 119 99:59:59,999 --> 99:59:59,999 of libc or a newer one. 120 99:59:59,999 --> 99:59:59,999 From this can we immediately conclude 121 99:59:59,999 --> 99:59:59,999 that if libc 2.19 is known to be installable, 122 99:59:59,999 --> 99:59:59,999 that coreutils will also be installable? 123 99:59:59,999 --> 99:59:59,999 How many people think we can immediately conclude something from this? 124 99:59:59,999 --> 99:59:59,999 How many people think we cannot conclude anything from this? 125 99:59:59,999 --> 99:59:59,999 We'll that's good. 126 99:59:59,999 --> 99:59:59,999 So it turns out that with negative dependencies and without negative 127 99:59:59,999 --> 99:59:59,999 dependencies, it doesn't really matter. 128 99:59:59,999 --> 99:59:59,999 With negative dependencies, libc or whatever it depends on could 129 99:59:59,999 --> 99:59:59,999 just conflict with coreutils, we're broken, it's out. 130 99:59:59,999 --> 99:59:59,999 And without negative dependencies, you could have one of the things where 131 99:59:59,999 --> 99:59:59,999 it depends on a version that's newer or older, and no. 132 99:59:59,999 --> 99:59:59,999 It's not really trivial to do. 133 99:59:59,999 --> 99:59:59,999 You can't conclude something locally, in theory. 134 99:59:59,999 --> 99:59:59,999 That's something that can become a problem. 135 99:59:59,999 --> 99:59:59,999 Anyway, it is highly tractable in practice. 136 99:59:59,999 --> 99:59:59,999 Because if you do have a break, conflicts, or otherwise negative dependency, 137 99:59:59,999 --> 99:59:59,999 it tends to be upper bound. 138 99:59:59,999 --> 99:59:59,999 So if the previous version of coreutils was installable with that version, 139 99:59:59,999 --> 99:59:59,999 the new version will also probably be. 140 99:59:59,999 --> 99:59:59,999 Likewise, most dependencies, circular or otherwise, tend to be upper bound. 141 99:59:59,999 --> 99:59:59,999 There are cases of version ranges which is a lower bound and upper bound 142 99:59:59,999 --> 99:59:59,999 at the same time. 143 99:59:59,999 --> 99:59:59,999 They exist, they are just not so common yet. 144 99:59:59,999 --> 99:59:59,999 And that's a good thing. 145 99:59:59,999 --> 99:59:59,999 The number of possible solutions to any clause 146 99:59:59,999 --> 99:59:59,999 actually tends to be fairly limited. 147 99:59:59,999 --> 99:59:59,999 Of course there are the exceptions. 148 99:59:59,999 --> 99:59:59,999 Packages from unstable. 149 99:59:59,999 --> 99:59:59,999 They might just be missing, in which case it's really hard to solve the dependency. 150 99:59:59,999 --> 99:59:59,999 You've got mutually exclusive packages all the sendmail stuff, MTAs. 151 99:59:59,999 --> 99:59:59,999 The version ranges I mentioned before. 152 99:59:59,999 --> 99:59:59,999 And then of course the strictly equal versions, which we see inside 153 99:59:59,999 --> 99:59:59,999 packages built from the same source packages. 154 99:59:59,999 --> 99:59:59,999 The redeeming feature here: same source. 155 99:59:59,999 --> 99:59:59,999 Because that is actually very simple to figure out 156 99:59:59,999 --> 99:59:59,999 if they are from the same source. 157 99:59:59,999 --> 99:59:59,999 So they tend to be upgraded ?? anyhow. 158 99:59:59,999 --> 99:59:59,999 The new versions tend to be available at the same time sorta thing. 159 99:59:59,999 --> 99:59:59,999 The problem made hard in a nutshell, is this contrived example for example. 160 99:59:59,999 --> 99:59:59,999 You have a package you want to try, and it depends on any number of foos. 161 99:59:59,999 --> 99:59:59,999 Each of the foos can either be solved by 162 99:59:59,999 --> 99:59:59,999 picking up the bar package or the good one. 163 99:59:59,999 --> 99:59:59,999 Which might be a hint to which one we would choose if we want to solve it. 164 99:59:59,999 --> 99:59:59,999 And the bar packages, depends on a bad package that does not work 165 99:59:59,999 --> 99:59:59,999 with the starting-package, and we are broken. 166 99:59:59,999 --> 99:59:59,999 And the good package just doesn't have any dependencies and is therefore good. 167 99:59:59,999 --> 99:59:59,999 In the perfect example you saw, 168 99:59:59,999 --> 99:59:59,999 you try foo, good, foo1, and then good and be done. 169 99:59:59,999 --> 99:59:59,999 In practice, if you don't have some way of ordering these so you know 170 99:59:59,999 --> 99:59:59,999 one of them is better than the other. 171 99:59:59,999 --> 99:59:59,999 You now have an n times m trace, where you have to guess up and down, 172 99:59:59,999 --> 99:59:59,999 which one am I doing, backtrack back and forth and all, and it's going to be horrible. 173 99:59:59,999 --> 99:59:59,999 And this contrived example can of course be solved by some solvers that can see 174 99:59:59,999 --> 99:59:59,999 the pattern, but you can always a more contrived pattern 175 99:59:59,999 --> 99:59:59,999 that's harder to understand. 176 99:59:59,999 --> 99:59:59,999 So I tried to keep it somewhat simple. 177 99:59:59,999 --> 99:59:59,999 The good thing is, very few people write software like this. 178 99:59:59,999 --> 99:59:59,999 And that is the most redeeming feature about the package ??. 179 99:59:59,999 --> 99:59:59,999 We do have exceptions. 180 99:59:59,999 --> 99:59:59,999 Aspell-dictionaries, ispell-dictionaries. 181 99:59:59,999 --> 99:59:59,999 Basically if you depend on ispell-dictionary, 182 99:59:59,999 --> 99:59:59,999 which is a virtual package provided by 50 different ispell packages, 183 99:59:59,999 --> 99:59:59,999 or aspell packages, and then they depend on something else after that. 184 99:59:59,999 --> 99:59:59,999 So in theory they can be a part of creating this problem. 185 99:59:59,999 --> 99:59:59,999 Fortunately, they themselves are fairly simple once you've passed them. 186 99:59:59,999 --> 99:59:59,999 You also have multiarch foreign, or multiarch allowed with 187 99:59:59,999 --> 99:59:59,999 package interdependencies. 188 99:59:59,999 --> 99:59:59,999 So if you have multiple architectures, you in theory can have any number 189 99:59:59,999 --> 99:59:59,999 of things satisfying the same clause. 190 99:59:59,999 --> 99:59:59,999 This gives extra unnecssary work for most resolvers, 191 99:59:59,999 --> 99:59:59,999 if you enable them to do multiarch and cross things. 192 99:59:59,999 --> 99:59:59,999 We have 12 of them, but I suspect most users of multiarch are somewhat limited 193 99:59:59,999 --> 99:59:59,999 to maybe 3. 194 99:59:59,999 --> 99:59:59,999 Possible exception being people like writing resolvers and trying them out, 195 99:59:59,999 --> 99:59:59,999 torture test them. 196 99:59:59,999 --> 99:59:59,999 The real problem, if you had to do this in the archive you would need to do, 197 99:59:59,999 --> 99:59:59,999 for example something like, write indifferent awk implementations. 198 99:59:59,999 --> 99:59:59,999 We have three. 199 99:59:59,999 --> 99:59:59,999 And then afterwards, for example, have m different distinct, 200 99:59:59,999 --> 99:59:59,999 but equally valid libc implementations. 201 99:59:59,999 --> 99:59:59,999 It's not something we're likely to do in the near future. 202 99:59:59,999 --> 99:59:59,999 And you have to do this on multiple levels, 203 99:59:59,999 --> 99:59:59,999 because you can just presolve the essential set most of the time. 204 99:59:59,999 --> 99:59:59,999 The real problem: if you had to do this in the archive, you would need to do, 205 99:59:59,999 --> 99:59:59,999 for example, something like write N different awk implementations. 206 99:59:59,999 --> 99:59:59,999 We have 3. 207 99:59:59,999 --> 99:59:59,999 Then afterwards for example, have M different distinct but equally valid 208 99:59:59,999 --> 99:59:59,999 libc implementations. 209 99:59:59,999 --> 99:59:59,999 It's not something we're likely to do in the near future. 210 99:59:59,999 --> 99:59:59,999 And you have to do this on multiple levels. 211 99:59:59,999 --> 99:59:59,999 Because you can just presolve the essential set most of the time, 212 99:59:59,999 --> 99:59:59,999 this particular thing is not very interesting. 213 99:59:59,999 --> 99:59:59,999 And the data packages can in theory blow up, 214 99:59:59,999 --> 99:59:59,999 when you have truly interchangable data, like we do with the aspell packages. 215 99:59:59,999 --> 99:59:59,999 But they tend to either not have any dependencies after the aspell, 216 99:59:59,999 --> 99:59:59,999 after the data package, or they have a simple loopback into what you came from. 217 99:59:59,999 --> 99:59:59,999 So the blowup blowup tends to be limited to one level. 218 99:59:59,999 --> 99:59:59,999 If you have enough of those, it's still going to be a problem, 219 99:59:59,999 --> 99:59:59,999 but it keeps it simple. 220 99:59:59,999 --> 99:59:59,999 Onto installing. 221 99:59:59,999 --> 99:59:59,999 The easier case, as mentioned is when you have a single suit 222 99:59:59,999 --> 99:59:59,999 and a single architecture. 223 99:59:59,999 --> 99:59:59,999 All things in the graph collapse into what looks like a regular graph, 224 99:59:59,999 --> 99:59:59,999 and therefore become simple, trivial. 225 99:59:59,999 --> 99:59:59,999 This actually happens so much that if you take Lintian, 226 99:59:59,999 --> 99:59:59,999 it has 26 distinct dependency clauses. 227 99:59:59,999 --> 99:59:59,999 If you look at each of them, 228 99:59:59,999 --> 99:59:59,999 when you only have a single suite and architecture, there's at most 229 99:59:59,999 --> 99:59:59,999 one package directly solving each clause locally. 230 99:59:59,999 --> 99:59:59,999 Then further down the graph somewhere you depend on for example, debconf, 231 99:59:59,999 --> 99:59:59,999 or debconf-2.0 which is one of these alternatives, giving rise to its ?? 232 99:59:59,999 --> 99:59:59,999 Linitan itself is not subject to becoming a backtrack point. 233 99:59:59,999 --> 99:59:59,999 There's no guesswork when you reach Lintian, which of its dependencies 234 99:59:59,999 --> 99:59:59,999 should I pick, which version of it and all. 235 99:59:59,999 --> 99:59:59,999 That's not there, you have to go further down to see that.