9:59:59.000,9:59:59.000 (Offscreen) So welcome to the talk[br]from Niels 9:59:59.000,9:59:59.000 about dependency resolution in[br]polynomial time. 9:59:59.000,9:59:59.000 (Neils) I'm Neils, I'm doing my second[br]talk today. 9:59:59.000,9:59:59.000 This time about something slightly[br]more technical. 9:59:59.000,9:59:59.000 I wanted to do dependency resolution[br]in deterministic polynomial time, 9:59:59.000,9:59:59.000 which is what this is all about. 9:59:59.000,9:59:59.000 I didn't get as far with this as I wanted. 9:59:59.000,9:59:59.000 I wanted to have a tool ready and have[br]some results, which I didn't quite finish. 9:59:59.000,9:59:59.000 So these will be my findings work in[br]progress notes. 9:59:59.000,9:59:59.000 We have six topics I have to cover. 9:59:59.000,9:59:59.000 A quick introduction. 9:59:59.000,9:59:59.000 Then something about the hard problem. 9:59:59.000,9:59:59.000 Then the part the part where it actually[br]turns out to be highly tractable 9:59:59.000,9:59:59.000 in practice, if you don't think too much[br]about it. 9:59:59.000,9:59:59.000 And then there is some installing,[br]upgrading, and installing again 9:59:59.000,9:59:59.000 for how to improve your solvers. 9:59:59.000,9:59:59.000 Starting with introduction. 9:59:59.000,9:59:59.000 I'm hoping mostly to debunk the myth[br]that dependency resolution is a very 9:59:59.000,9:59:59.000 hard problem that we cannot solve, 9:59:59.000,9:59:59.000 and we should remove packages in the hopes[br]that they will somehow keep the archive 9:59:59.000,9:59:59.000 from exploding, and the dependency[br]resolvers to break, and apt to die, 9:59:59.000,9:59:59.000 and god knows what. 9:59:59.000,9:59:59.000 I've been working on Britney, which is a[br]different problem to solve, but it 9:59:59.000,9:59:59.000 involves the same techniques, or uses some[br]of the same techniques. 9:59:59.000,9:59:59.000 Some will be directly applicable to apt,[br]some will not. 9:59:59.000,9:59:59.000 There's not a one size fits all solution,[br]sorry. 9:59:59.000,9:59:59.000 And I much less haven't wrote it yet,[br]even if there was. 9:59:59.000,9:59:59.000 So defining the problem. 9:59:59.000,9:59:59.000 When you try to install a package on your[br]system, 9:59:59.000,9:59:59.000 there's actually two problems[br]being solved. 9:59:59.000,9:59:59.000 One is the part where apt figures out if I[br]have to install Eclipse. 9:59:59.000,9:59:59.000 You'll be needing a lot of Java packages,[br]you'll be needing a lot of other stuff. 9:59:59.000,9:59:59.000 And so it figures out which packages are[br]needed for this to make sense. 9:59:59.000,9:59:59.000 The separate part of the problem,[br]the second problem is, 9:59:59.000,9:59:59.000 once you have this list, they have to be[br]unpackaged and configured in a certain 9:59:59.000,9:59:59.000 order for this all to make sense. 9:59:59.000,9:59:59.000 They are in theory two distinct problems. 9:59:59.000,9:59:59.000 I'll be talking about the first problem, 9:59:59.000,9:59:59.000 because that's actually[br]dependency resolution. 9:59:59.000,9:59:59.000 The other thing is just ordering. 9:59:59.000,9:59:59.000 The ordering thing is certainly also very[br]important part. 9:59:59.000,9:59:59.000 I don't want to dismiss that, it's just 9:59:59.000,9:59:59.000 it is in fact theoretically a polynomial[br]time problem. 9:59:59.000,9:59:59.000 So to solve the ordering problem, you[br]basically compute all the action- 9:59:59.000,9:59:59.000 Given a set of actions that need[br]to be done, 9:59:59.000,9:59:59.000 then there are some constraints between[br]some of them, 9:59:59.000,9:59:59.000 like you unpack things before you[br]configure it. 9:59:59.000,9:59:59.000 Maybe you deconfigure something else[br]before that. 9:59:59.000,9:59:59.000 So you end up with a list of partial ??[br]constraints. 9:59:59.000,9:59:59.000 From that you build a graph with ordering. 9:59:59.000,9:59:59.000 It's fairly simple to do, in practice,[br]if you have the right tools for it. 9:59:59.000,9:59:59.000 Then without cycles, this is a trivial[br]things where it just orders it 9:59:59.000,9:59:59.000 and gives you this is the order,[br]go fix. 9:59:59.000,9:59:59.000 When there's cycles you will get a single[br]action consisting of multiple things 9:59:59.000,9:59:59.000 to do at the same time, which is really[br]impossible to do. 9:59:59.000,9:59:59.000 It turns out that the way we defined this[br]whole thing, 9:59:59.000,9:59:59.000 if you have a package that does not have a[br]post install script, 9:59:59.000,9:59:59.000 it does not need to be[br]configured separately. 9:59:59.000,9:59:59.000 That means that it's just unpacked or[br]not unpacked. 9:59:59.000,9:59:59.000 That tends to break most cylcles. 9:59:59.000,9:59:59.000 If you want, there's problems with[br]ordering constraints. 9:59:59.000,9:59:59.000 Help us to find a way to get rid of most[br]of the post install scripts, 9:59:59.000,9:59:59.000 such as the ones just running lvconfig[br]and nothing else. 9:59:59.000,9:59:59.000 That could solve some of the cycles, 9:59:59.000,9:59:59.000 and otherwise it's just feeding it to dpkg[br]and it works. 9:59:59.000,9:59:59.000 If you have a cycle, it's a separate[br]problem fixing that. 9:59:59.000,9:59:59.000 I won't be covering that. 9:59:59.000,9:59:59.000 As I said, the runtime is polynomial. 9:59:59.000,9:59:59.000 It's something like the size of the[br]package, a regular graph. 9:59:59.000,9:59:59.000 Which is fairly polynomial. 9:59:59.000,9:59:59.000 This even covers finding cycles. 9:59:59.000,9:59:59.000 Again breaking cycles is just removing[br]post install scripts. 9:59:59.000,9:59:59.000 And of course if you think this is too[br]simple, 9:59:59.000,9:59:59.000 you can always implement more features[br]on top of it, 9:59:59.000,9:59:59.000 such as trying to optimize for minimal[br]downtime of services and all that. 9:59:59.000,9:59:59.000 You can make any trivial problem very hard[br]very soon with that. 9:59:59.000,9:59:59.000 So, the hard problem. 9:59:59.000,9:59:59.000 First of all, the players. 9:59:59.000,9:59:59.000 We've got apt, aptitude, cupt, whatever. 9:59:59.000,9:59:59.000 We've got Britney. 9:59:59.000,9:59:59.000 We've got DOSE, edos, whatever its[br]name is today. 9:59:59.000,9:59:59.000 They solve the hard problem. 9:59:59.000,9:59:59.000 Or they should be trying to do. 9:59:59.000,9:59:59.000 Notable tools not affected by it. 9:59:59.000,9:59:59.000 dpkg. 9:59:59.000,9:59:59.000 So dpkg surely figures out if you are[br]missing dependencies. It says, 9:59:59.000,9:59:59.000 "I cannot resolve this, and I don't know[br]where to fetch it, so I'm just giving up." 9:59:59.000,9:59:59.000 And that is the only thing you can do. 9:59:59.000,9:59:59.000 Which means it only verifies the solution[br]which is known to be polynomial, 9:59:59.000,9:59:59.000 and is therefore is not a game player. 9:59:59.000,9:59:59.000 DAK rm is also doing a polynomial time[br]check, it just happens to be so 9:59:59.000,9:59:59.000 for other reasons. 9:59:59.000,9:59:59.000 That's basically the known players. 9:59:59.000,9:59:59.000 There are probably others. 9:59:59.000,9:59:59.000 But we're moving onto the[br]hard problem itself 9:59:59.000,9:59:59.000 So the problem we actually have is, 9:59:59.000,9:59:59.000 we've got version dependencies,[br]we've got alternative choices, 9:59:59.000,9:59:59.000 we've got virtual packages. 9:59:59.000,9:59:59.000 All three makes this problem hard. 9:59:59.000,9:59:59.000 You basically have to remove all possible[br]alternative choices, guesses, whatever, 9:59:59.000,9:59:59.000 to make this simple. 9:59:59.000,9:59:59.000 It just so happens if you move version[br]dependencies among other things, 9:59:59.000,9:59:59.000 things become really fun. 9:59:59.000,9:59:59.000 ?? to solve the problem, but upgrading[br]gets really really spicy. 9:59:59.000,9:59:59.000 So how do we ensure that dpkg[br]is configured before your new package, 9:59:59.000,9:59:59.000 use a feature new dev package[br]is installed. 9:59:59.000,9:59:59.000 That's sort of impossible. 9:59:59.000,9:59:59.000 It becomes simple, but we end up with a[br]broken dependency ??, 9:59:59.000,9:59:59.000 so it's not really an option. 9:59:59.000,9:59:59.000 Technically there's also a way to make[br]it simple by having multiple versions 9:59:59.000,9:59:59.000 of things, and removing negative[br]dependencies, 9:59:59.000,9:59:59.000 but that's not necessarily[br]easy to do either. 9:59:59.000,9:59:59.000 Presumably file configs will get in our[br]way or something. 9:59:59.000,9:59:59.000 We have to redefine where we place all of[br]the files, that's not going to be 9:59:59.000,9:59:59.000 something we're doing anytime soon. 9:59:59.000,9:59:59.000 So yeah, it's not really an option. 9:59:59.000,9:59:59.000 To give you a better understanding of the[br]problem, please consider this example. 9:59:59.000,9:59:59.000 If we have coreutils at some version,[br]depending on either this version 9:59:59.000,9:59:59.000 of libc or a newer one. 9:59:59.000,9:59:59.000 From this can we immediately conclude 9:59:59.000,9:59:59.000 that if libc 2.19 is known to be[br]installable, 9:59:59.000,9:59:59.000 that coreutils will also be installable? 9:59:59.000,9:59:59.000 How many people think we can immediately[br]conclude something from this? 9:59:59.000,9:59:59.000 How many people think we cannot[br]conclude anything from this? 9:59:59.000,9:59:59.000 We'll that's good. 9:59:59.000,9:59:59.000 So it turns out that with negative[br]dependencies and without negative 9:59:59.000,9:59:59.000 dependencies, it doesn't really matter. 9:59:59.000,9:59:59.000 With negative dependencies, libc or[br]whatever it depends on could 9:59:59.000,9:59:59.000 just conflict with coreutils, we're[br]broken, it's out. 9:59:59.000,9:59:59.000 And without negative dependencies, you[br]could have one of the things where 9:59:59.000,9:59:59.000 it depends on a version that's newer or[br]older, and no. 9:59:59.000,9:59:59.000 It's not really trivial to do. 9:59:59.000,9:59:59.000 You can't conclude something locally,[br]in theory. 9:59:59.000,9:59:59.000 That's something that can[br]become a problem. 9:59:59.000,9:59:59.000 Anyway, it is highly tractable[br]in practice. 9:59:59.000,9:59:59.000 Because if you do have a break, conflicts,[br]or otherwise negative dependency, 9:59:59.000,9:59:59.000 it tends to be upper bound. 9:59:59.000,9:59:59.000 So if the previous version of coreutils[br]was installable with that version, 9:59:59.000,9:59:59.000 the new version will also probably be. 9:59:59.000,9:59:59.000 Likewise, most dependencies, circular or[br]otherwise, tend to be upper bound. 9:59:59.000,9:59:59.000 There are cases of version ranges which[br]is a lower bound and upper bound 9:59:59.000,9:59:59.000 at the same time. 9:59:59.000,9:59:59.000 They exist, they are just not so[br]common yet. 9:59:59.000,9:59:59.000 And that's a good thing. 9:59:59.000,9:59:59.000 The number of possible solutions to[br]any clause 9:59:59.000,9:59:59.000 actually tends to be fairly limited. 9:59:59.000,9:59:59.000 Of course there are the exceptions. 9:59:59.000,9:59:59.000 Packages from unstable. 9:59:59.000,9:59:59.000 They might just be missing, in which case[br]it's really hard to solve the dependency. 9:59:59.000,9:59:59.000 You've got mutually exclusive packages[br]all the sendmail stuff, MTAs. 9:59:59.000,9:59:59.000 The version ranges I mentioned before. 9:59:59.000,9:59:59.000 And then of course the strictly equal[br]versions, which we see inside 9:59:59.000,9:59:59.000 packages built from the same[br]source packages. 9:59:59.000,9:59:59.000 The redeeming feature here: same source. 9:59:59.000,9:59:59.000 Because that is actually very simple[br]to figure out 9:59:59.000,9:59:59.000 if they are from the same source. 9:59:59.000,9:59:59.000 So they tend to be upgraded ??[br]anyhow. 9:59:59.000,9:59:59.000 The new versions tend to be available[br]at the same time sorta thing. 9:59:59.000,9:59:59.000 The problem made hard in a nutshell, is[br]this contrived example for example. 9:59:59.000,9:59:59.000 You have a package you want to try,[br]and it depends on any number of foos. 9:59:59.000,9:59:59.000 Each of the foos can either be solved by 9:59:59.000,9:59:59.000 picking up the bar package[br]or the good one. 9:59:59.000,9:59:59.000 Which might be a hint to which one we[br]would choose if we want to solve it. 9:59:59.000,9:59:59.000 And the bar packages, depends on a[br]bad package that does not work 9:59:59.000,9:59:59.000 with the starting-package,[br]and we are broken. 9:59:59.000,9:59:59.000 And the good package just doesn't have any[br]dependencies and is therefore good. 9:59:59.000,9:59:59.000 In the perfect example you saw, 9:59:59.000,9:59:59.000 you try foo, good, foo1, and then good[br]and be done. 9:59:59.000,9:59:59.000 In practice, if you don't have some way[br]of ordering these so you know 9:59:59.000,9:59:59.000 one of them is better than the other. 9:59:59.000,9:59:59.000 You now have an N times M trace, where[br]you have to guess up and down, which 9:59:59.000,9:59:59.000 one am I doing, backtrack back and forth[br]and all, and it's going to be horrible. 9:59:59.000,9:59:59.000 And this contrived example can of course[br]be solved by some solvers that can see 9:59:59.000,9:59:59.000 the pattern, but you can always a[br]more contrived pattern 9:59:59.000,9:59:59.000 that's harder to understand. 9:59:59.000,9:59:59.000 So I tried to keep it somewhat simple. 9:59:59.000,9:59:59.000 The good thing is, very few people write[br]software like this. 9:59:59.000,9:59:59.000 And that is the most redeeming feature[br]about the package ??. 9:59:59.000,9:59:59.000 We do have exceptions. 9:59:59.000,9:59:59.000 Aspell-dictionaries, ispell-dictionaries. 9:59:59.000,9:59:59.000 Basically if you depend on[br]ispell-dictionary, 9:59:59.000,9:59:59.000 which is a virtual package provided by 50[br]different ispell packages, 9:59:59.000,9:59:59.000 or aspell packages, and then they depend[br]on something else after that. 9:59:59.000,9:59:59.000 So in theory they can be a part of[br]creating this problem. 9:59:59.000,9:59:59.000 Fortunately, they themselves are fairly[br]simple once you've passed them. 9:59:59.000,9:59:59.000 You also have multiarch foreign, or[br]multiarch allowed with 9:59:59.000,9:59:59.000 package interdependencies. 9:59:59.000,9:59:59.000 So if you have multiple architectures,[br]you in theory can have any number 9:59:59.000,9:59:59.000 of things satisfying the same clause. 9:59:59.000,9:59:59.000 This gives extra unnecssary work for most[br]resolvers, 9:59:59.000,9:59:59.000 if you enable them to do[br]multiarch and cross things. 9:59:59.000,9:59:59.000 We have 12 of them, but I suspect most[br]users of multiarch are somewhat limited 9:59:59.000,9:59:59.000 to maybe 3. 9:59:59.000,9:59:59.000 Possible exception being people like[br]writing resolvers and trying them out, 9:59:59.000,9:59:59.000 torture test them. 9:59:59.000,9:59:59.000 The real problem, if you had to do this in[br]the archive you would need to do, 9:59:59.000,9:59:59.000 for example something like, write[br]indifferent awk implementations. 9:59:59.000,9:59:59.000 We have three. 9:59:59.000,9:59:59.000 And then afterwards, for example,[br]have m different distinct, 9:59:59.000,9:59:59.000 but equally valid libc implementations. 9:59:59.000,9:59:59.000 It's not something we're likely to do in[br]the near future. 9:59:59.000,9:59:59.000 And you have to do this on multiple[br]levels, 9:59:59.000,9:59:59.000 because you can just presolve the[br]essential set most of the time. 9:59:59.000,9:59:59.000 The real problem: if you had to do this in[br]the archive, you would need to do, 9:59:59.000,9:59:59.000 for example, something like write[br]N different awk implementations. 9:59:59.000,9:59:59.000 We have 3. 9:59:59.000,9:59:59.000 Then afterwards for example, have[br]M different distinct but equally valid 9:59:59.000,9:59:59.000 libc implementations. 9:59:59.000,9:59:59.000 It's not something we're likely to do[br]in the near future. 9:59:59.000,9:59:59.000 And you have to do this on multiple[br]levels. 9:59:59.000,9:59:59.000 Because you can just presolve the[br]essential set most of the time, 9:59:59.000,9:59:59.000 this particular thing is not very[br]interesting. 9:59:59.000,9:59:59.000 And the data packages can in theory[br]blow up, 9:59:59.000,9:59:59.000 when you have truly interchangable data,[br]like we do with the aspell packages. 9:59:59.000,9:59:59.000 But they tend to either not have any[br]dependencies after the aspell, 9:59:59.000,9:59:59.000 after the data package, or they have a[br]simple loopback into what you came from. 9:59:59.000,9:59:59.000 So the blowup blowup tends to be limited[br]to one level. 9:59:59.000,9:59:59.000 If you have enough of those, it's still[br]going to be a problem, 9:59:59.000,9:59:59.000 but it keeps it simple. 9:59:59.000,9:59:59.000 Onto installing. 9:59:59.000,9:59:59.000 The easier case, as mentioned is when you[br]have a single suit 9:59:59.000,9:59:59.000 and a single architecture. 9:59:59.000,9:59:59.000 All things in the graph collapse into[br]what looks like a regular graph, 9:59:59.000,9:59:59.000 and therefore become simple,[br]trivial. 9:59:59.000,9:59:59.000 This actually happens so much that[br]if you take Lintian, 9:59:59.000,9:59:59.000 it has 26 distinct dependency clauses. 9:59:59.000,9:59:59.000 If you look at each of them, 9:59:59.000,9:59:59.000 when you only have a single suite[br]and architecture, there's at most 9:59:59.000,9:59:59.000 one package directly solving each[br]clause locally. 9:59:59.000,9:59:59.000 Then further down the graph somewhere[br]you depend on for example, debconf, 9:59:59.000,9:59:59.000 or debconf-2.0 which is one of these[br]alternatives, giving rise to its ?? 9:59:59.000,9:59:59.000 Linitan itself is not subject to becoming[br]a backtrack point. 9:59:59.000,9:59:59.000 There's no guesswork when you reach[br]Lintian, which of its dependencies 9:59:59.000,9:59:59.000 should I pick, which version of[br]it and all. 9:59:59.000,9:59:59.000 That's not there, you have to go further[br]down to see that. 9:59:59.000,9:59:59.000 Actually, but the time you reach the[br]debconf thing, 9:59:59.000,9:59:59.000 it has already been solved by[br]something else, as I recall. 9:59:59.000,9:59:59.000 Oh, no wait. The dependencies of debconf[br]is already covered by the time you're 9:59:59.000,9:59:59.000 forced to resolve this choice,[br]so it's not really a big issue, either. 9:59:59.000,9:59:59.000 You just pick the debconf one, c-debconf[br]currently depends on debconf, 9:59:59.000,9:59:59.000 so it's trivial that way. 9:59:59.000,9:59:59.000 The question is, when do we have[br]these conditions. 9:59:59.000,9:59:59.000 We do that a lot actually. 9:59:59.000,9:59:59.000 If you do builts in unstable for example. 9:59:59.000,9:59:59.000 ??, only one suite only one architecture,[br]piece of cake. 9:59:59.000,9:59:59.000 If you install packages of a pure[br]Jessie system, with multiarch, 9:59:59.000,9:59:59.000 you have the single suite one,[br]which limits a lot of them, 9:59:59.000,9:59:59.000 and then you have not the single[br]architecture. 9:59:59.000,9:59:59.000 That's what it is, but in pure Squeeze,[br]you won't have multiarch because 9:59:59.000,9:59:59.000 because it didn't work back then. 9:59:59.000,9:59:59.000 So that's even simpler. 9:59:59.000,9:59:59.000 Also you can do unstable + experimental,[br]or stable + backports if you fiddle a bit 9:59:59.000,9:59:59.000 with the end resolver and say that it is[br]not allowed to pick experimental, 9:59:59.000,9:59:59.000 unless it is explicitly requested to pick[br]experimental, 9:59:59.000,9:59:59.000 and from there it only picks[br]what it absolutely needs to solve it. 9:59:59.000,9:59:59.000 You basically get the same result with a[br]single suite restriction, 9:59:59.000,9:59:59.000 but that requires you to actually code[br]something for it. 9:59:59.000,9:59:59.000 And then there's Britney, with the testing[br]migration thing, and that's because 9:59:59.000,9:59:59.000 she basically takes things, moves it into[br]a new version of testing, 9:59:59.000,9:59:59.000 and tries to test that this is still[br]the same solution. 9:59:59.000,9:59:59.000 So she forces it into a single[br]suite currently. 9:59:59.000,9:59:59.000 So these are common cases where[br]it happens. 9:59:59.000,9:59:59.000 This is not everywhere where it happens, 9:59:59.000,9:59:59.000 so the stable upgrades are still funny[br]enough, not a single suite and all that. 9:59:59.000,9:59:59.000 It happens because there is only one[br]package and architecture. 9:59:59.000,9:59:59.000 Only one instance of package, you only[br]have one version, one architecture that 9:59:59.000,9:59:59.000 solves that particular dependency. 9:59:59.000,9:59:59.000 Whereas with multiarch you could have,[br]i386 and the amd64 version. 9:59:59.000,9:59:59.000 If you do an upgrade you could have the[br]old version and the new version 9:59:59.000,9:59:59.000 that may or may not satisfy it. 9:59:59.000,9:59:59.000 Also we have this thing where we[br]don't like libraries to have 9:59:59.000,9:59:59.000 alternative implementations, it sort of[br]breaks things horribly. 9:59:59.000,9:59:59.000 Especially when they are not actually[br]agreeing on the interface they provide. 9:59:59.000,9:59:59.000 There's the social aspect of it that we[br]don't really bother having 9:59:59.000,9:59:59.000 200 interchangeable implementations of[br]everything. 9:59:59.000,9:59:59.000 I think the record is something like 9:59:59.000,9:59:59.000 five or six different versions[br]of sendmail, implemenatations.[br] 9:59:59.000,9:59:59.000 ?? 9:59:59.000,9:59:59.000 I don't think so, but ?? might have. 9:59:59.000,9:59:59.000 And even when you do have one of these[br]explosions, 9:59:59.000,9:59:59.000 it has to actually hide the breakage[br]beneath one of these choice 9:59:59.000,9:59:59.000 explosion alternative explosions for[br]it to be a problem. 9:59:59.000,9:59:59.000 You might have aspell over here on the[br]right hand side, 9:59:59.000,9:59:59.000 and you have a breakage over here which[br]?? to find out. 9:59:59.000,9:59:59.000 So if you use the resolver to just the[br]first explosion to solve the other part, 9:59:59.000,9:59:59.000 it realizes this is not working,[br]I'll bail out. 9:59:59.000,9:59:59.000 There's a couple things that we do to[br]make this easier. 9:59:59.000,9:59:59.000 The mos interesting thing is of course,[br]can we still make this simple without 9:59:59.000,9:59:59.000 the single suite and single[br]architecture restriction. 9:59:59.000,9:59:59.000 We can to some extent. 9:59:59.000,9:59:59.000 And that's where we move to upgrading, 9:59:59.000,9:59:59.000 in deterministic polynomial time. 9:59:59.000,9:59:59.000 This is where I spent most of my[br]time working. 9:59:59.000,9:59:59.000 So to start with, when we do an upgrade[br]from stable to stable, 9:59:59.000,9:59:59.000 and here I'm taking pure stable, no[br]backports and all. 9:59:59.000,9:59:59.000 The general recommended way to do it is[br]to replace Wheezy with Jessie, 9:59:59.000,9:59:59.000 hit apt-get update, and[br]upgrade afterwards, 9:59:59.000,9:59:59.000 and apt replaces all the old packages with[br]the new version, 9:59:59.000,9:59:59.000 if there is a new version. 9:59:59.000,9:59:59.000 We also want all the packages from Wheezy[br]that did not have any version in Jessie 9:59:59.000,9:59:59.000 to be removed, and there might be a new[br]essential package somewhere. 9:59:59.000,9:59:59.000 So long story short, upgrade if it's[br]present, remove if it is not present, 9:59:59.000,9:59:59.000 and then install the new essential[br]packages, if any. 9:59:59.000,9:59:59.000 They don't tend to change that often. 9:59:59.000,9:59:59.000 I'll be ignoring the installing part[br]for now. 9:59:59.000,9:59:59.000 It is of course valid and interesting,[br]but we'll skip it. 9:59:59.000,9:59:59.000 So let's take a simple example, somewhat[br]contrived, somewhat made up the numbers. 9:59:59.000,9:59:59.000 They are not too far from reality, but[br]they are not exact either. 9:59:59.000,9:59:59.000 We have have a system that we are[br]upgrading from Wheezy to Jessie. 9:59:59.000,9:59:59.000 I claim that there are 30 thousand[br]packages in Wheezy, 9:59:59.000,9:59:59.000 we've got 35 thousand in Jessie. 9:59:59.000,9:59:59.000 The system we are working with has[br]two thousand packages installed. 9:59:59.000,9:59:59.000 Mine here has something like 1200. 9:59:59.000,9:59:59.000 If you do a simple dpkg -L 9:59:59.000,9:59:59.000 and then do a line count, you can see[br]how many you have on your system, 9:59:59.000,9:59:59.000 plus-minus five or so, to give you an idea[br]of how many packages 9:59:59.000,9:59:59.000 you're actually working with. 9:59:59.000,9:59:59.000 We assume that all Wheezy package got a[br]new version in Jessie, 9:59:59.000,9:59:59.000 because I'm about to do a pop quiz! 9:59:59.000,9:59:59.000 So with these numbers, what is the maximum[br]problem size of this upgrade. 9:59:59.000,9:59:59.000 Any takers? 9:59:59.000,9:59:59.000 Is it 30? 9:59:59.000,9:59:59.000 Anyone for 35? 9:59:59.000,9:59:59.000 37? 9:59:59.000,9:59:59.000 One for 65. 9:59:59.000,9:59:59.000 One for 67.5 thousand. 9:59:59.000,9:59:59.000 And who believes that I was an asshole[br]and did not include the right answer? 9:59:59.000,9:59:59.000 Oh, so little faith. 9:59:59.000,9:59:59.000 I'm glad you believe me. 9:59:59.000,9:59:59.000 The right answer is 35 thousand. 9:59:59.000,9:59:59.000 The trick is, when we do the upgrade, we[br]replace Wheezy with Jessie, 9:59:59.000,9:59:59.000 we do an update, so all the Wheezy[br]packages not installed 9:59:59.000,9:59:59.000 on our system disappears. 9:59:59.000,9:59:59.000 Then you get all of the Jessie packages,[br]with the assumption that 9:59:59.000,9:59:59.000 no Jessie packages have a new version[br]compared to Wheezy. 9:59:59.000,9:59:59.000 You've got the two thousand from[br]Wheezy on your system, 9:59:59.000,9:59:59.000 plus the 35 thousand[br]packages from Jessie. 9:59:59.000,9:59:59.000 So that means your average[br]stable-to-stable upgrade, 9:59:59.000,9:59:59.000 if you do it this way, is actually only[br]about 40% of the worst case 9:59:59.000,9:59:59.000 that it could've been if you kept the[br]old stable as well. 9:59:59.000,9:59:59.000 There's also another awesome feature with[br]removing the old stable repository. 9:59:59.000,9:59:59.000 It means that every time you[br]upgrade a package, 9:59:59.000,9:59:59.000 your effective problem size[br]decreased by one. 9:59:59.000,9:59:59.000 That's not always useful, it's not[br]always awesome, 9:59:59.000,9:59:59.000 and it's not always the thing that solves[br]your problem, 9:59:59.000,9:59:59.000 but it means that if you can actually make[br]apt upgrade a single thing 9:59:59.000,9:59:59.000 every now and then, eventually it might[br]be able to figure out the rest of 9:59:59.000,9:59:59.000 the upgrade path, after ??[br]200 packages or so. 9:59:59.000,9:59:59.000 As we upgrade, we end up to moving towards[br]a single suite situation, 9:59:59.000,9:59:59.000 so the more things we upgrade, the more we[br]get to a single suite situation. 9:59:59.000,9:59:59.000 Again, most ?? pure stable to stable[br]upgrades. 9:59:59.000,9:59:59.000 And of course the right answer would've[br]been 65 thousand packages 9:59:59.000,9:59:59.000 if you kept your stable. 9:59:59.000,9:59:59.000 So that would've been a possible[br]answer as well. 9:59:59.000,9:59:59.000 Now upgrading. 9:59:59.000,9:59:59.000 This should be as easy as mark new[br]essential packages for install, 9:59:59.000,9:59:59.000 Mark everything that has a new version for[br]upgrade, remove stuff, 9:59:59.000,9:59:59.000 and then figure out if something is[br]missing, then install that. 9:59:59.000,9:59:59.000 This solves upgrading by installing, so[br]I'm not really interested in doing this, 9:59:59.000,9:59:59.000 because installing is hard. 9:59:59.000,9:59:59.000 We can do something smarter than[br]that for upgrading. 9:59:59.000,9:59:59.000 When we do upgrades in a polynomial[br]deterministic time, 9:59:59.000,9:59:59.000 I'm not going to give you a 100% solution,[br]that's not going to work. 9:59:59.000,9:59:59.000 If I could, I would be very very rich. 9:59:59.000,9:59:59.000 So, I intended this to sort of find some[br]easy low hanging fruits 9:59:59.000,9:59:59.000 that can be solved cheaply, and then you can[br]throw a general purpose resolver at 9:59:59.000,9:59:59.000 the rest of the problem, after you've done[br]the simple parts. 9:59:59.000,9:59:59.000 Or you can feed your solver a partial[br]solution and ask it to fix it up 9:59:59.000,9:59:59.000 so it would be a slightly[br]smaller problem size. 9:59:59.000,9:59:59.000 It relies on two things. 9:59:59.000,9:59:59.000 One, it exploits a lot of common patterns[br]on how we do dependencies. 9:59:59.000,9:59:59.000 And the other thing is, if you have a[br]valid system state, 9:59:59.000,9:59:59.000 so if you have a system where all of your[br]packages are in fact installable, 9:59:59.000,9:59:59.000 which dpkg tends to enforce, 9:59:59.000,9:59:59.000 you can verify that is true[br]in polynomial time. 9:59:59.000,9:59:59.000 You can also verify a change to it in[br]polynomial time. 9:59:59.000,9:59:59.000 So here I'm going to do a bit of theory,[br]and a bit of English inbetween. 9:59:59.000,9:59:59.000 We start with a not-broken system,[br]that is we have a set of installable 9:59:59.000,9:59:59.000 packages that are mutually[br]co-installable and all that. 9:59:59.000,9:59:59.000 We call that 'I'. 9:59:59.000,9:59:59.000 Then we can add things to I, we can remove[br]things from this set, we can take something 9:59:59.000,9:59:59.000 in the set and replace it with something else,[br]basically add and remove. 9:59:59.000,9:59:59.000 That can be done in linear time. 9:59:59.000,9:59:59.000 If I take a random package, mash it in,[br]that's constant, maybe, 9:59:59.000,9:59:59.000 depending on your implementation. 9:59:59.000,9:59:59.000 The theory is we can compute a new[br]set, where we remove something, 9:59:59.000,9:59:59.000 add something else, we get a new set. 9:59:59.000,9:59:59.000 Then we can verify this new set is indeed[br]still a valid solution in polynomial time. 9:59:59.000,9:59:59.000 So, a constant time modification[br]can be verified in polynomial time. 9:59:59.000,9:59:59.000 The issue of the day is, randomly feeding[br]packages in and out of the set 9:59:59.000,9:59:59.000 is not going to work very well. 9:59:59.000,9:59:59.000 So we need something smarter than[br]just random modifications. 9:59:59.000,9:59:59.000 Ah well, somebody might actually write[br]something that works with that but anyway. 9:59:59.000,9:59:59.000 So the first thing I did,[br]as I mentioned earlier, 9:59:59.000,9:59:59.000 we have these exclusively equal[br]dependencies inside binaries, 9:59:59.000,9:59:59.000 ?? binaries from the same source. 9:59:59.000,9:59:59.000 So I grouped binaries from the same source. 9:59:59.000,9:59:59.000 If I was to mark one of them for upgrade, 9:59:59.000,9:59:59.000 I would immediately pull the rest of[br]them as well, if they were installed. 9:59:59.000,9:59:59.000 It also happens to sort out a very[br]common case of breaks-replaces, 9:59:59.000,9:59:59.000 when you move files between two binaries[br]in the same source. 9:59:59.000,9:59:59.000 Then you just try to upgrade each of these[br]groups, in some order. 9:59:59.000,9:59:59.000 Preferably deterministically,[br]but I didn't get that far. 9:59:59.000,9:59:59.000 And if the result of one of these[br]modifications is upgradable, 9:59:59.000,9:59:59.000 we can test that fairly cheap, we commit[br]it, and you just rinse-and-repeat 9:59:59.000,9:59:59.000 until you run out of things to test,[br]that leads to something new. 9:59:59.000,9:59:59.000 This works largely depending[br]on what you have installed, 9:59:59.000,9:59:59.000 unsurpisingly. 9:59:59.000,9:59:59.000 So if you have very few packages,[br]it may work better than if you have more, 9:59:59.000,9:59:59.000 And sometimes it works better when you[br]have more, and that's really exciting. 9:59:59.000,9:59:59.000 So I did an example Wheezy installation[br]based on what I have installed 9:59:59.000,9:59:59.000 on my machine. 9:59:59.000,9:59:59.000 Not quite, but close, well related. 9:59:59.000,9:59:59.000 So libc for example was immediately[br]upgradable by this procedure. 9:59:59.000,9:59:59.000 As well as some Java package,[br]worked fine. 9:59:59.000,9:59:59.000 man-db, eclipse, xterm, then I had to do a[br]couple packages before that, 9:59:59.000,9:59:59.000 which were all upgradable. 9:59:59.000,9:59:59.000 I think libc was primarily ?? and maybe[br]some libc linux thing. 9:59:59.000,9:59:59.000 But there are actually a set of packages[br]you can upgrade with this. 9:59:59.000,9:59:59.000 This is of course not testing on[br]configurations. 9:59:59.000,9:59:59.000 In this particular configuration, I could[br]upgrade dpkg like this, 9:59:59.000,9:59:59.000 but I don't expect you to be able[br]to do that in all cases. 9:59:59.000,9:59:59.000 In fact I find it highly unlikely[br]because in Wheezy to Jessie, 9:59:59.000,9:59:59.000 dpkg has tons of breaks for[br]all sorts of things, 9:59:59.000,9:59:59.000 so if you have the right thing there,[br]you break something else, 9:59:59.000,9:59:59.000 and that has to be upgraded[br]at the same time. 9:59:59.000,9:59:59.000 And that is sure to build[br]a loop eventually. 9:59:59.000,9:59:59.000 So basically what we are exploiting here[br]is the greater-than-equal version, 9:59:59.000,9:59:59.000 which is the common way of doing things. 9:59:59.000,9:59:59.000 We rebuild a package, it gets the new[br]dependencies on a higher 9:59:59.000,9:59:59.000 version of things. 9:59:59.000,9:59:59.000 That means the foo in stable ??. 9:59:59.000,9:59:59.000 Right so, everything depending on foo is[br]equally happy with the version 9:59:59.000,9:59:59.000 in Jessie as well, cause it's a lower[br]bound, and Jessie has a newer version. 9:59:59.000,9:59:59.000 So that works very well. 9:59:59.000,9:59:59.000 This is the common case for libraries[br]without ABI transitions. 9:59:59.000,9:59:59.000 Apparently including libc 9:59:59.000,9:59:59.000 and enables you to upgrade from the inside[br]out, from the core of the onion and out, 9:59:59.000,9:59:59.000 if you think of the graph as an onion. 9:59:59.000,9:59:59.000 The algorithm is fairly dumb,[br]unsurprisingly. 9:59:59.000,9:59:59.000 It cannot solve Perl, it can't talk Python[br]because they tend to involve 9:59:59.000,9:59:59.000 pulling in a new package so you have to[br]install that and all that. 9:59:59.000,9:59:59.000 In Perl it could technically solve if it[br]would merge groups together 9:59:59.000,9:59:59.000 into larger groups, but anyway the runtime[br]of this is something like I2 * I+E. 9:59:59.000,9:59:59.000 Which is upper bound by[br]I to the power of 4. 9:59:59.000,9:59:59.000 So it's fairly cheap, it is fairly polynomial. 9:59:59.000,9:59:59.000 It's very trivial to do. 9:59:59.000,9:59:59.000 We can do better than that as mentioned,[br]if we can have it figure out that 9:59:59.000,9:59:59.000 dpkg needs a new version of libc,[br]we could upgrade those together. 9:59:59.000,9:59:59.000 That's not a problem on my system,[br]but it might be on others. 9:59:59.000,9:59:59.000 Then there's the part where dpkg is[br]breaking all of this stuff, 9:59:59.000,9:59:59.000 so you might have to upgrade it[br]at the same time, or before dpkg. 9:59:59.000,9:59:59.000 Then end up with some sort of big loop or[br]three structure of groups you have to 9:59:59.000,9:59:59.000 merge together or upgrade together,[br]and it should just work. 9:59:59.000,9:59:59.000 There's also the part were we have to[br]handle rename packages. 9:59:59.000,9:59:59.000 This comes into ?? basically. 9:59:59.000,9:59:59.000 They become an API bump, which will be[br]very useful for stretch due to dcc5. 9:59:59.000,9:59:59.000 Basically if you want to do a same[br]restriction on this you could do something 9:59:59.000,9:59:59.000 like, we have a clause that is not[br]satisfied, but the thing we need for it 9:59:59.000,9:59:59.000 is a package introduced in the new source,[br]then we pull that. 9:59:59.000,9:59:59.000 If it has the same dependencies, we've[br]already solved it, that sort of thing. 9:59:59.000,9:59:59.000 There's also the part where people rename[br]the package from foo to foo replacement 9:59:59.000,9:59:59.000 or something like that. 9:59:59.000,9:59:59.000 Again, if it is from the same source, we[br]might do some magic here, 9:59:59.000,9:59:59.000 some heuristics here. 9:59:59.000,9:59:59.000 And also at some point you will end up[br]needing to install stuff, 9:59:59.000,9:59:59.000 that is actually required for[br]Wheezy to Jessie 9:59:59.000,9:59:59.000 because it pulls in a new a system. 9:59:59.000,9:59:59.000 If it had trivial no-guess solutions, 9:59:59.000,9:59:59.000 by which I mean the first thing is of the[br]alternative choices is a real package, 9:59:59.000,9:59:59.000 and there is only one of them, 9:59:59.000,9:59:59.000 you could solve this automatically[br]in a deterministic way. 9:59:59.000,9:59:59.000 And otherwise, give up and try[br]something else. 9:59:59.000,9:59:59.000 So this is the basic algorithm, and the[br]basic ideas I've put down, 9:59:59.000,9:59:59.000 and I've been fiddling a bit with,[br]and didn't get very far with so far. 9:59:59.000,9:59:59.000 Installing part 2. 9:59:59.000,9:59:59.000 So after we've learned this with[br]upgrading, we can now, 9:59:59.000,9:59:59.000 with single suite single architecture[br]it is trivial. 9:59:59.000,9:59:59.000 Upgrading we can usually ?? it,[br]reduce the problem size to some extent. 9:59:59.000,9:59:59.000 We can do better on the[br]installing part as well. 9:59:59.000,9:59:59.000 We can look at these[br]aspell packages again, 9:59:59.000,9:59:59.000 because they are one of the big problems. 9:59:59.000,9:59:59.000 We have a couple packages that appear[br]to be almost identical, 9:59:59.000,9:59:59.000 and then we also have packages that[br]are clearly superior to others. 9:59:59.000,9:59:59.000 Packages that are identical,[br]shows up like this. 9:59:59.000,9:59:59.000 So we ?? long enough in the aspell[br]package from Russia and the one 9:59:59.000,9:59:59.000 from Ukraine, something[br]very special happens. 9:59:59.000,9:59:59.000 You will notice that primary difference[br]between the fields I've selected are 9:59:59.000,9:59:59.000 the package name and the version. 9:59:59.000,9:59:59.000 Exiting, because that means that if I[br]can install aspell-uk, 9:59:59.000,9:59:59.000 that's the only thing on the system, 9:59:59.000,9:59:59.000 then the same solution is valid[br]for the Russian one. 9:59:59.000,9:59:59.000 So ?? they are different, but semantically[br]they differ only by name and version. 9:59:59.000,9:59:59.000 Sometimes you can have version[br]dependencies that always 9:59:59.000,9:59:59.000 satisfied ?? for example, then that's[br]the same game actually. 9:59:59.000,9:59:59.000 This is a simplified view because in[br]theory you could have a package 9:59:59.000,9:59:59.000 that refuses to work with the Ukranian[br]one, but not the Russian one, 9:59:59.000,9:59:59.000 and vice versa, so they actually become[br]distinct solutions. 9:59:59.000,9:59:59.000 But the general use of aspell dictionaries[br]tends to be any one of them, 9:59:59.000,9:59:59.000 and I really don't care[br]which one you pick. 9:59:59.000,9:59:59.000 So we find those by saying they have[br]the same effective dependencies clauses. 9:59:59.000,9:59:59.000 We can also look at the negative[br]dependencies, that's fairly important too, 9:59:59.000,9:59:59.000 they have to have the same there. 9:59:59.000,9:59:59.000 Here we need to remember that negative[br]dependencies, we do not think of them 9:59:59.000,9:59:59.000 as just directed, they're not. 9:59:59.000,9:59:59.000 So it really doesn't matter if foo breaks[br]bar or bar breaks foo, 9:59:59.000,9:59:59.000 the important part is they[br]don't work together. 9:59:59.000,9:59:59.000 Then we have assert that they satisfy[br]the same clause. 9:59:59.000,9:59:59.000 Both of them are a valid solution to the[br]same clause as the other one. 9:59:59.000,9:59:59.000 And this becomes interesting when you have[br]dependency ranges, 9:59:59.000,9:59:59.000 which is not truly a dependency range,[br]but you have two clauses that says, 9:59:59.000,9:59:59.000 I will need foo greater than[br]version 1, and I need it strictly 9:59:59.000,9:59:59.000 less than version 2, then one of them[br]contains both of them, 9:59:59.000,9:59:59.000 and the other one does not, when you[br]are doing upgrades for example. 9:59:59.000,9:59:59.000 It has to satisfy the same[br]clauses as well. 9:59:59.000,9:59:59.000 A little trick thing that I discovered[br]a little too late, but it's fun. 9:59:59.000,9:59:59.000 These things things can generally be done[br]in polynomial time, that ballpark. 9:59:59.000,9:59:59.000 I haven't really computed[br]the actual results. 9:59:59.000,9:59:59.000 But we can go further than that.