9:59:59.000,9:59:59.000 (Offscreen) So welcome to the talk from Niels 9:59:59.000,9:59:59.000 about dependency resolution in[br]polynomial time. 9:59:59.000,9:59:59.000 (Neils) I'm Neils, I'm doing my second[br]talk today. 9:59:59.000,9:59:59.000 This time about something slightly[br]more technical. 9:59:59.000,9:59:59.000 I wanted to do dependency resolution[br]in deterministic polynomial time, 9:59:59.000,9:59:59.000 which is what this is all about. 9:59:59.000,9:59:59.000 I didn't get as far with this as I wanted. 9:59:59.000,9:59:59.000 I wanted to have a tool ready and have[br]some results, which I didn't quite finish. 9:59:59.000,9:59:59.000 So these will be my findings work in[br]progress notes. 9:59:59.000,9:59:59.000 We have six topics I have to cover. 9:59:59.000,9:59:59.000 A quick introduction. 9:59:59.000,9:59:59.000 Then something about the hard problem. 9:59:59.000,9:59:59.000 Then the part the part where it actually[br]turns out to be highly tractable 9:59:59.000,9:59:59.000 in practice, if you don't think too much[br]about it. 9:59:59.000,9:59:59.000 And then there is some installing,[br]upgrading, and installing again 9:59:59.000,9:59:59.000 for how to improve your solvers. 9:59:59.000,9:59:59.000 Starting with introduction. 9:59:59.000,9:59:59.000 I'm hoping mostly to debunk the myth[br]that dependency resolution is a very 9:59:59.000,9:59:59.000 hard problem that we cannot solve, 9:59:59.000,9:59:59.000 and we should remove packages in the hopes[br]that they will somehow keep the archive 9:59:59.000,9:59:59.000 from exploding, and the dependency[br]resolvers to break, and apt to die, 9:59:59.000,9:59:59.000 and god knows what. 9:59:59.000,9:59:59.000 I've been working on Britney, which is a[br]different problem to solve, but it 9:59:59.000,9:59:59.000 involves the same techniques, or uses some[br]of the same techniques. 9:59:59.000,9:59:59.000 Some will be directly applicable to apt,[br]some will not. 9:59:59.000,9:59:59.000 There's not a one size fits all solution, sorry. 9:59:59.000,9:59:59.000 And I much less haven't wrote it yet,[br]even if there was. 9:59:59.000,9:59:59.000 So defining the problem. 9:59:59.000,9:59:59.000 When you try to install a package on your[br]system, 9:59:59.000,9:59:59.000 there's actually two problems[br]being solved. 9:59:59.000,9:59:59.000 One is the part where apt figures out if I[br]have to install Eclipse. 9:59:59.000,9:59:59.000 You'll be needing a lot of Java packages,[br]you'll be needing a lot of other stuff. 9:59:59.000,9:59:59.000 And so it figures out which packages are[br]needed for this to make sense. 9:59:59.000,9:59:59.000 The separate part of the problem,[br]the second problem is, 9:59:59.000,9:59:59.000 once you have this list, they have to be[br]unpackaged and configured in a certain 9:59:59.000,9:59:59.000 order for this all to make sense. 9:59:59.000,9:59:59.000 They are in theory two distinct problems. 9:59:59.000,9:59:59.000 I'll be talking about the first problem, 9:59:59.000,9:59:59.000 because that's actually[br]dependency resolution. 9:59:59.000,9:59:59.000 The other thing is just ordering. 9:59:59.000,9:59:59.000 The ordering thing is certainly also very[br]important part. 9:59:59.000,9:59:59.000 I don't want to dismiss that, it's just 9:59:59.000,9:59:59.000 it is in fact theoretically a polynomial[br]time problem. 9:59:59.000,9:59:59.000 So to solve the ordering problem, you[br]basically compute all the action- 9:59:59.000,9:59:59.000 Given a set of actions that need[br]to be done, 9:59:59.000,9:59:59.000 then there are some constraints between[br]some of them, 9:59:59.000,9:59:59.000 like you unpack things before you[br]configure it. 9:59:59.000,9:59:59.000 Maybe you deconfigure something else[br]before that. 9:59:59.000,9:59:59.000 So you end up with a list of partial ??[br]constraints. 9:59:59.000,9:59:59.000 From that you build a graph with ordering. 9:59:59.000,9:59:59.000 It's fairly simple to do, in practice,[br]if you have the right tools for it. 9:59:59.000,9:59:59.000 Then without cycles, this is a trivial[br]things where it just orders it 9:59:59.000,9:59:59.000 and gives you this is the order,[br]go fix. 9:59:59.000,9:59:59.000 When there's cycles you will get a single[br]action consisting of multiple things 9:59:59.000,9:59:59.000 to do at the same time, which is really[br]impossible to do. 9:59:59.000,9:59:59.000 It turns out that the way we defined this[br]whole thing, 9:59:59.000,9:59:59.000 if you have a package that does not have a[br]post install script, 9:59:59.000,9:59:59.000 it does not need to be configured separately. 9:59:59.000,9:59:59.000 That means that it's just unpacked or[br]not unpacked. 9:59:59.000,9:59:59.000 That tends to break most cylcles. 9:59:59.000,9:59:59.000 If you want, there's problems with[br]ordering constraints. 9:59:59.000,9:59:59.000 Help us to find a way to get rid of most[br]of the post install scripts, 9:59:59.000,9:59:59.000 such as the ones just running lvconfig[br]and nothing else. 9:59:59.000,9:59:59.000 That could solve some of the cycles, 9:59:59.000,9:59:59.000 and otherwise it's just feeding it to dpkg[br]and it works. 9:59:59.000,9:59:59.000 If you have a cycle, it's a separate[br]problem fixing that. 9:59:59.000,9:59:59.000 I won't be covering that. 9:59:59.000,9:59:59.000 As I said, the runtime is polynomial. 9:59:59.000,9:59:59.000 It's something like the size of the[br]package, a regular graph. 9:59:59.000,9:59:59.000 Which is fairly polynomial. 9:59:59.000,9:59:59.000 This even covers finding cycles. 9:59:59.000,9:59:59.000 Again breaking cycles is just removing[br]post install scripts. 9:59:59.000,9:59:59.000 And of course if you think this is too[br]simple, 9:59:59.000,9:59:59.000 you can always implement more features[br]on top of it, 9:59:59.000,9:59:59.000 such as trying to optimize for minimal[br]downtime of services and all that. 9:59:59.000,9:59:59.000 You can make any trivial problem very hard[br]very soon with that. 9:59:59.000,9:59:59.000 So, the hard problem. 9:59:59.000,9:59:59.000 First of all, the players. 9:59:59.000,9:59:59.000 We've got apt, aptitude, cupt, whatever. 9:59:59.000,9:59:59.000 We've got Britney. 9:59:59.000,9:59:59.000 We've got DOSE, edos, whatever its[br]name is today. 9:59:59.000,9:59:59.000 They solve the hard problem. 9:59:59.000,9:59:59.000 Or they should be trying to do. 9:59:59.000,9:59:59.000 Notable tools not affected by it. 9:59:59.000,9:59:59.000 dpkg. 9:59:59.000,9:59:59.000 So dpkg surely figures out if you are[br]missing dependencies. It says, 9:59:59.000,9:59:59.000 "I cannot resolve this, and I don't know[br]where to fetch it, so I'm just giving up." 9:59:59.000,9:59:59.000 And that is the only thing you can do. 9:59:59.000,9:59:59.000 Which means it only verifies the solution[br]which is known to be polynomial, 9:59:59.000,9:59:59.000 and is therefore is not a game player. 9:59:59.000,9:59:59.000 DAK rm is also doing a polynomial time[br]check, it just happens to be so 9:59:59.000,9:59:59.000 for other reasons. 9:59:59.000,9:59:59.000 That's basically the known players. 9:59:59.000,9:59:59.000 There are probably others. 9:59:59.000,9:59:59.000 But we're moving onto the[br]hard problem itself 9:59:59.000,9:59:59.000 So the problem we actually have is, 9:59:59.000,9:59:59.000 we've got version dependencies,[br]we've got alternative choices, 9:59:59.000,9:59:59.000 we've got virtual packages. 9:59:59.000,9:59:59.000 All three makes this problem hard. 9:59:59.000,9:59:59.000 You basically have to remove all possible[br]alternative choices, guesses, whatever, 9:59:59.000,9:59:59.000 to make this simple. 9:59:59.000,9:59:59.000 It just so happens if you move version[br]dependencies among other things, 9:59:59.000,9:59:59.000 things become really fun. 9:59:59.000,9:59:59.000 ?? to solve the problem, but upgrading[br]gets really really spicy. 9:59:59.000,9:59:59.000 So how do we ensure that dpkg[br]is configured before your new package, 9:59:59.000,9:59:59.000 use a feature new dev package[br]is installed. 9:59:59.000,9:59:59.000 That's sort of impossible. 9:59:59.000,9:59:59.000 It becomes simple, but we end up with a[br]broken dependency ??, 9:59:59.000,9:59:59.000 so it's not really an option. 9:59:59.000,9:59:59.000 Technically there's also a way to make[br]it simple by having multiple versions 9:59:59.000,9:59:59.000 of things, and removing negative[br]dependencies, 9:59:59.000,9:59:59.000 but that's not necessarily easy to do either. 9:59:59.000,9:59:59.000 Presumably file configs will get in our[br]way or something. 9:59:59.000,9:59:59.000 We have to redefine where we place all of[br]the files, that's not going to be 9:59:59.000,9:59:59.000 something we're doing anytime soon. 9:59:59.000,9:59:59.000 So yeah, it's not really an option. 9:59:59.000,9:59:59.000 To give you a better understanding of the[br]problem, please consider this example. 9:59:59.000,9:59:59.000 If we have coreutils at some version,[br]depending on either this version 9:59:59.000,9:59:59.000 of libc or a newer one. 9:59:59.000,9:59:59.000 From this can we immediately conclude 9:59:59.000,9:59:59.000 that if libc 2.19 is known to be[br]installable, 9:59:59.000,9:59:59.000 that coreutils will also be installable? 9:59:59.000,9:59:59.000 How many people think we can immediately[br]conclude something from this? 9:59:59.000,9:59:59.000 How many people think we cannot[br]conclude anything from this? 9:59:59.000,9:59:59.000 We'll that's good. 9:59:59.000,9:59:59.000 So it turns out that with negative[br]dependencies and without negative 9:59:59.000,9:59:59.000 dependencies, it doesn't really matter. 9:59:59.000,9:59:59.000 With negative dependencies, libc or[br]whatever it depends on could 9:59:59.000,9:59:59.000 just conflict with coreutils, we're[br]broken, it's out. 9:59:59.000,9:59:59.000 And without negative dependencies, you[br]could have one of the things where 9:59:59.000,9:59:59.000 it depends on a version that's newer or[br]older, and no. 9:59:59.000,9:59:59.000 It's not really trivial to do. 9:59:59.000,9:59:59.000 You can't conclude something locally,[br]in theory. 9:59:59.000,9:59:59.000 That's something that can[br]become a problem. 9:59:59.000,9:59:59.000 Anyway, it is highly tractable[br]in practice. 9:59:59.000,9:59:59.000 Because if you do have a break, conflicts,[br]or otherwise negative dependency, 9:59:59.000,9:59:59.000 it tends to be upper bound. 9:59:59.000,9:59:59.000 So if the previous version of coreutils[br]was installable with that version, 9:59:59.000,9:59:59.000 the new version will also probably be. 9:59:59.000,9:59:59.000 Likewise, most dependencies, circular or[br]otherwise, tend to be upper bound. 9:59:59.000,9:59:59.000 There are cases of version ranges which[br]is a lower bound and upper bound 9:59:59.000,9:59:59.000 at the same time. 9:59:59.000,9:59:59.000 They exist, they are just not so[br]common yet. 9:59:59.000,9:59:59.000 And that's a good thing. 9:59:59.000,9:59:59.000 The number of possible solutions to[br]any clause 9:59:59.000,9:59:59.000 actually tends to be fairly limited. 9:59:59.000,9:59:59.000 Of course there are the exceptions. 9:59:59.000,9:59:59.000 Packages from unstable. 9:59:59.000,9:59:59.000 They might just be missing, in which case[br]it's really hard to solve the dependency. 9:59:59.000,9:59:59.000 You've got mutually exclusive packages[br]all the sendmail stuff, MTAs. 9:59:59.000,9:59:59.000 The version ranges I mentioned before. 9:59:59.000,9:59:59.000 And then of course the strictly equal[br]versions, which we see inside 9:59:59.000,9:59:59.000 packages built from the same[br]source packages. 9:59:59.000,9:59:59.000 The redeeming feature here: same source. 9:59:59.000,9:59:59.000 Because that is actually very simple[br]to figure out 9:59:59.000,9:59:59.000 if they are from the same source. 9:59:59.000,9:59:59.000 So they tend to be upgraded ??[br]anyhow. 9:59:59.000,9:59:59.000 The new versions tend to be available[br]at the same time sorta thing. 9:59:59.000,9:59:59.000 The problem made hard in a nutshell, is[br]this contrived example for example. 9:59:59.000,9:59:59.000 You have a package you want to try,[br]and it depends on any number of foos. 9:59:59.000,9:59:59.000 Each of the foos can either be solved by 9:59:59.000,9:59:59.000 picking up the bar package[br]or the good one. 9:59:59.000,9:59:59.000 Which might be a hint to which one we[br]would choose if we want to solve it. 9:59:59.000,9:59:59.000 And the bar packages, depends on a[br]bad package that does not work 9:59:59.000,9:59:59.000 with the starting-package,[br]and we are broken. 9:59:59.000,9:59:59.000 And the good package just doesn't have any[br]dependencies and is therefore good. 9:59:59.000,9:59:59.000 In the perfect example you saw, 9:59:59.000,9:59:59.000 you try foo, good, foo1, and then good[br]and be done. 9:59:59.000,9:59:59.000 In practice, if you don't have some way[br]of ordering these so you know 9:59:59.000,9:59:59.000 one of them is better than the other. 9:59:59.000,9:59:59.000 You now have an n times m trace,[br]where you have to guess up and down, 9:59:59.000,9:59:59.000 which one am I doing, backtrack back and[br]forth and all, and it's going to be horrible. 9:59:59.000,9:59:59.000 And this contrived example can of course[br]be solved by some solvers that can see 9:59:59.000,9:59:59.000 the pattern, but you can always a[br]more contrived pattern 9:59:59.000,9:59:59.000 that's harder to understand. 9:59:59.000,9:59:59.000 So I tried to keep it somewhat simple. 9:59:59.000,9:59:59.000 The good thing is, very few people write[br]software like this. 9:59:59.000,9:59:59.000 And that is the most redeeming feature[br]about the package ??. 9:59:59.000,9:59:59.000 We do have exceptions. 9:59:59.000,9:59:59.000 Aspell-dictionaries, ispell-dictionaries. 9:59:59.000,9:59:59.000 Basically if you depend on[br]ispell-dictionary, 9:59:59.000,9:59:59.000 which is a virtual package provided by 50[br]different ispell packages, 9:59:59.000,9:59:59.000 or aspell packages, and then they depend[br]on something else after that. 9:59:59.000,9:59:59.000 So in theory they can be a part of[br]creating this problem. 9:59:59.000,9:59:59.000 Fortunately, they themselves are fairly[br]simple once you've passed them. 9:59:59.000,9:59:59.000 You also have multiarch foreign, or[br]multiarch allowed with 9:59:59.000,9:59:59.000 package interdependencies. 9:59:59.000,9:59:59.000 So if you have multiple architectures,[br]you in theory can have any number 9:59:59.000,9:59:59.000 of things satisfying the same clause. 9:59:59.000,9:59:59.000 This gives extra unnecssary work for most[br]resolvers, 9:59:59.000,9:59:59.000 if you enable them to do[br]multiarch and cross things. 9:59:59.000,9:59:59.000 We have 12 of them, but I suspect most[br]users of multiarch are somewhat limited 9:59:59.000,9:59:59.000 to maybe 3. 9:59:59.000,9:59:59.000 Possible exception being people like[br]writing resolvers and trying them out, 9:59:59.000,9:59:59.000 torture test them. 9:59:59.000,9:59:59.000 The real problem, if you had to do this in[br]the archive you would need to do, 9:59:59.000,9:59:59.000 for example something like, write[br]indifferent awk implementations. 9:59:59.000,9:59:59.000 We have three. 9:59:59.000,9:59:59.000 And then afterwards, for example,[br]have m different distinct, 9:59:59.000,9:59:59.000 but equally valid libc implementations. 9:59:59.000,9:59:59.000 It's not something we're likely to do in[br]the near future. 9:59:59.000,9:59:59.000 And you have to do this on multiple[br]levels, 9:59:59.000,9:59:59.000 because you can just presolve the[br]essential set most of the time. 9:59:59.000,9:59:59.000 The real problem: if you had to do this in[br]the archive, you would need to do, 9:59:59.000,9:59:59.000 for example, something like write[br]N different awk implementations. 9:59:59.000,9:59:59.000 We have 3. 9:59:59.000,9:59:59.000 Then afterwards for example, have[br]M different distinct but equally valid 9:59:59.000,9:59:59.000 libc implementations. 9:59:59.000,9:59:59.000 It's not something we're likely to do[br]in the near future. 9:59:59.000,9:59:59.000 And you have to do this on multiple[br]levels. 9:59:59.000,9:59:59.000 Because you can just presolve the[br]essential set most of the time, 9:59:59.000,9:59:59.000 this particular thing is not very[br]interesting. 9:59:59.000,9:59:59.000 And the data packages can in theory[br]blow up, 9:59:59.000,9:59:59.000 when you have truly interchangable data,[br]like we do with the aspell packages. 9:59:59.000,9:59:59.000 But they tend to either not have any[br]dependencies after the aspell, 9:59:59.000,9:59:59.000 after the data package, or they have a[br]simple loopback into what you came from. 9:59:59.000,9:59:59.000 So the blowup blowup tends to be limited[br]to one level. 9:59:59.000,9:59:59.000 If you have enough of those, it's still[br]going to be a problem, 9:59:59.000,9:59:59.000 but it keeps it simple. 9:59:59.000,9:59:59.000 Onto installing. 9:59:59.000,9:59:59.000 The easier case, as mentioned is when you[br]have a single suit 9:59:59.000,9:59:59.000 and a single architecture. 9:59:59.000,9:59:59.000 All things in the graph collapse into[br]what looks like a regular graph, 9:59:59.000,9:59:59.000 and therefore become simple,[br]trivial. 9:59:59.000,9:59:59.000 This actually happens so much that[br]if you take Lintian, 9:59:59.000,9:59:59.000 it has 26 distinct dependency clauses. 9:59:59.000,9:59:59.000 If you look at each of them, 9:59:59.000,9:59:59.000 when you only have a single suite[br]and architecture, there's at most 9:59:59.000,9:59:59.000 one package directly solving each[br]clause locally. 9:59:59.000,9:59:59.000 Then further down the graph somewhere[br]you depend on for example, debconf, 9:59:59.000,9:59:59.000 or debconf-2.0 which is one of these[br]alternatives, giving rise to its ?? 9:59:59.000,9:59:59.000 Linitan itself is not subject to becoming[br]a backtrack point. 9:59:59.000,9:59:59.000 There's no guesswork when you reach[br]Lintian, which of its dependencies 9:59:59.000,9:59:59.000 should I pick, which version of[br]it and all. 9:59:59.000,9:59:59.000 That's not there, you have to go further[br]down to see that.