Return to Video

Debian_dependency_resolution_in_polynomial_time.webm

  • Not Synced
    (Offscreen) So welcome to the talk
    from Niels
  • Not Synced
    about dependency resolution in
    polynomial time.
  • Not Synced
    (Neils) I'm Neils, I'm doing my second
    talk today.
  • Not Synced
    This time about something slightly
    more technical.
  • Not Synced
    I wanted to do dependency resolution
    in deterministic polynomial time,
  • Not Synced
    which is what this is all about.
  • Not Synced
    I didn't get as far with this as I wanted.
  • Not Synced
    I wanted to have a tool ready and have
    some results, which I didn't quite finish.
  • Not Synced
    So these will be my findings work in
    progress notes.
  • Not Synced
    We have six topics I have to cover.
  • Not Synced
    A quick introduction.
  • Not Synced
    Then something about the hard problem.
  • Not Synced
    Then the part the part where it actually
    turns out to be highly tractable
  • Not Synced
    in practice, if you don't think too much
    about it.
  • Not Synced
    And then there is some installing,
    upgrading, and installing again
  • Not Synced
    for how to improve your solvers.
  • Not Synced
    Starting with introduction.
  • Not Synced
    I'm hoping mostly to debunk the myth
    that dependency resolution is a very
  • Not Synced
    hard problem that we cannot solve,
  • Not Synced
    and we should remove packages in the hopes
    that they will somehow keep the archive
  • Not Synced
    from exploding, and the dependency
    resolvers to break, and apt to die,
  • Not Synced
    and god knows what.
  • Not Synced
    I've been working on Britney, which is a
    different problem to solve, but it
  • Not Synced
    involves the same techniques, or uses some
    of the same techniques.
  • Not Synced
    Some will be directly applicable to apt,
    some will not.
  • Not Synced
    There's not a one size fits all solution,
    sorry.
  • Not Synced
    And I much less haven't wrote it yet,
    even if there was.
  • Not Synced
    So defining the problem.
  • Not Synced
    When you try to install a package on your
    system,
  • Not Synced
    there's actually two problems
    being solved.
  • Not Synced
    One is the part where apt figures out if I
    have to install Eclipse.
  • Not Synced
    You'll be needing a lot of Java packages,
    you'll be needing a lot of other stuff.
  • Not Synced
    And so it figures out which packages are
    needed for this to make sense.
  • Not Synced
    The separate part of the problem,
    the second problem is,
  • Not Synced
    once you have this list, they have to be
    unpackaged and configured in a certain
  • Not Synced
    order for this all to make sense.
  • Not Synced
    They are in theory two distinct problems.
  • Not Synced
    I'll be talking about the first problem,
  • Not Synced
    because that's actually
    dependency resolution.
  • Not Synced
    The other thing is just ordering.
  • Not Synced
    The ordering thing is certainly also very
    important part.
  • Not Synced
    I don't want to dismiss that, it's just
  • Not Synced
    it is in fact theoretically a polynomial
    time problem.
  • Not Synced
    So to solve the ordering problem, you
    basically compute all the action-
  • Not Synced
    Given a set of actions that need
    to be done,
  • Not Synced
    then there are some constraints between
    some of them,
  • Not Synced
    like you unpack things before you
    configure it.
  • Not Synced
    Maybe you deconfigure something else
    before that.
  • Not Synced
    So you end up with a list of partial ??
    constraints.
  • Not Synced
    From that you build a graph with ordering.
  • Not Synced
    It's fairly simple to do, in practice,
    if you have the right tools for it.
  • Not Synced
    Then without cycles, this is a trivial
    things where it just orders it
  • Not Synced
    and gives you this is the order,
    go fix.
  • Not Synced
    When there's cycles you will get a single
    action consisting of multiple things
  • Not Synced
    to do at the same time, which is really
    impossible to do.
  • Not Synced
    It turns out that the way we defined this
    whole thing,
  • Not Synced
    if you have a package that does not have a
    post install script,
  • Not Synced
    it does not need to be
    configured separately.
  • Not Synced
    That means that it's just unpacked or
    not unpacked.
  • Not Synced
    That tends to break most cylcles.
  • Not Synced
    If you want, there's problems with
    ordering constraints.
  • Not Synced
    Help us to find a way to get rid of most
    of the post install scripts,
  • Not Synced
    such as the ones just running lvconfig
    and nothing else.
  • Not Synced
    That could solve some of the cycles,
  • Not Synced
    and otherwise it's just feeding it to dpkg
    and it works.
  • Not Synced
    If you have a cycle, it's a separate
    problem fixing that.
  • Not Synced
    I won't be covering that.
  • Not Synced
    As I said, the runtime is polynomial.
  • Not Synced
    It's something like the size of the
    package, a regular graph.
  • Not Synced
    Which is fairly polynomial.
  • Not Synced
    This even covers finding cycles.
  • Not Synced
    Again breaking cycles is just removing
    post install scripts.
  • Not Synced
    And of course if you think this is too
    simple,
  • Not Synced
    you can always implement more features
    on top of it,
  • Not Synced
    such as trying to optimize for minimal
    downtime of services and all that.
  • Not Synced
    You can make any trivial problem very hard
    very soon with that.
  • Not Synced
    So, the hard problem.
  • Not Synced
    First of all, the players.
  • Not Synced
    We've got apt, aptitude, cupt, whatever.
  • Not Synced
    We've got Britney.
  • Not Synced
    We've got DOSE, edos, whatever its
    name is today.
  • Not Synced
    They solve the hard problem.
  • Not Synced
    Or they should be trying to do.
  • Not Synced
    Notable tools not affected by it.
  • Not Synced
    dpkg.
  • Not Synced
    So dpkg surely figures out if you are
    missing dependencies. It says,
  • Not Synced
    "I cannot resolve this, and I don't know
    where to fetch it, so I'm just giving up."
  • Not Synced
    And that is the only thing you can do.
  • Not Synced
    Which means it only verifies the solution
    which is known to be polynomial,
  • Not Synced
    and is therefore is not a game player.
  • Not Synced
    DAK rm is also doing a polynomial time
    check, it just happens to be so
  • Not Synced
    for other reasons.
  • Not Synced
    That's basically the known players.
  • Not Synced
    There are probably others.
  • Not Synced
    But we're moving onto the
    hard problem itself
  • Not Synced
    So the problem we actually have is,
  • Not Synced
    we've got version dependencies,
    we've got alternative choices,
  • Not Synced
    we've got virtual packages.
  • Not Synced
    All three makes this problem hard.
  • Not Synced
    You basically have to remove all possible
    alternative choices, guesses, whatever,
  • Not Synced
    to make this simple.
  • Not Synced
    It just so happens if you move version
    dependencies among other things,
  • Not Synced
    things become really fun.
  • Not Synced
    ?? to solve the problem, but upgrading
    gets really really spicy.
  • Not Synced
    So how do we ensure that dpkg
    is configured before your new package,
  • Not Synced
    use a feature new dev package
    is installed.
  • Not Synced
    That's sort of impossible.
  • Not Synced
    It becomes simple, but we end up with a
    broken dependency ??,
  • Not Synced
    so it's not really an option.
  • Not Synced
    Technically there's also a way to make
    it simple by having multiple versions
  • Not Synced
    of things, and removing negative
    dependencies,
  • Not Synced
    but that's not necessarily
    easy to do either.
  • Not Synced
    Presumably file configs will get in our
    way or something.
  • Not Synced
    We have to redefine where we place all of
    the files, that's not going to be
  • Not Synced
    something we're doing anytime soon.
  • Not Synced
    So yeah, it's not really an option.
  • Not Synced
    To give you a better understanding of the
    problem, please consider this example.
  • Not Synced
    If we have coreutils at some version,
    depending on either this version
  • Not Synced
    of libc or a newer one.
  • Not Synced
    From this can we immediately conclude
  • Not Synced
    that if libc 2.19 is known to be
    installable,
  • Not Synced
    that coreutils will also be installable?
  • Not Synced
    How many people think we can immediately
    conclude something from this?
  • Not Synced
    How many people think we cannot
    conclude anything from this?
  • Not Synced
    We'll that's good.
  • Not Synced
    So it turns out that with negative
    dependencies and without negative
  • Not Synced
    dependencies, it doesn't really matter.
  • Not Synced
    With negative dependencies, libc or
    whatever it depends on could
  • Not Synced
    just conflict with coreutils, we're
    broken, it's out.
  • Not Synced
    And without negative dependencies, you
    could have one of the things where
  • Not Synced
    it depends on a version that's newer or
    older, and no.
  • Not Synced
    It's not really trivial to do.
  • Not Synced
    You can't conclude something locally,
    in theory.
  • Not Synced
    That's something that can
    become a problem.
  • Not Synced
    Anyway, it is highly tractable
    in practice.
  • Not Synced
    Because if you do have a break, conflicts,
    or otherwise negative dependency,
  • Not Synced
    it tends to be upper bound.
  • Not Synced
    So if the previous version of coreutils
    was installable with that version,
  • Not Synced
    the new version will also probably be.
  • Not Synced
    Likewise, most dependencies, circular or
    otherwise, tend to be upper bound.
  • Not Synced
    There are cases of version ranges which
    is a lower bound and upper bound
  • Not Synced
    at the same time.
  • Not Synced
    They exist, they are just not so
    common yet.
  • Not Synced
    And that's a good thing.
  • Not Synced
    The number of possible solutions to
    any clause
  • Not Synced
    actually tends to be fairly limited.
  • Not Synced
    Of course there are the exceptions.
  • Not Synced
    Packages from unstable.
  • Not Synced
    They might just be missing, in which case
    it's really hard to solve the dependency.
  • Not Synced
    You've got mutually exclusive packages
    all the sendmail stuff, MTAs.
  • Not Synced
    The version ranges I mentioned before.
  • Not Synced
    And then of course the strictly equal
    versions, which we see inside
  • Not Synced
    packages built from the same
    source packages.
  • Not Synced
    The redeeming feature here: same source.
  • Not Synced
    Because that is actually very simple
    to figure out
  • Not Synced
    if they are from the same source.
  • Not Synced
    So they tend to be upgraded ??
    anyhow.
  • Not Synced
    The new versions tend to be available
    at the same time sorta thing.
  • Not Synced
    The problem made hard in a nutshell, is
    this contrived example for example.
  • Not Synced
    You have a package you want to try,
    and it depends on any number of foos.
  • Not Synced
    Each of the foos can either be solved by
  • Not Synced
    picking up the bar package
    or the good one.
  • Not Synced
    Which might be a hint to which one we
    would choose if we want to solve it.
  • Not Synced
    And the bar packages, depends on a
    bad package that does not work
  • Not Synced
    with the starting-package,
    and we are broken.
  • Not Synced
    And the good package just doesn't have any
    dependencies and is therefore good.
  • Not Synced
    In the perfect example you saw,
  • Not Synced
    you try foo, good, foo1, and then good
    and be done.
  • Not Synced
    In practice, if you don't have some way
    of ordering these so you know
  • Not Synced
    one of them is better than the other.
  • Not Synced
    You now have an N times M trace, where
    you have to guess up and down, which
  • Not Synced
    one am I doing, backtrack back and forth
    and all, and it's going to be horrible.
  • Not Synced
    And this contrived example can of course
    be solved by some solvers that can see
  • Not Synced
    the pattern, but you can always a
    more contrived pattern
  • Not Synced
    that's harder to understand.
  • Not Synced
    So I tried to keep it somewhat simple.
  • Not Synced
    The good thing is, very few people write
    software like this.
  • Not Synced
    And that is the most redeeming feature
    about the package ??.
  • Not Synced
    We do have exceptions.
  • Not Synced
    Aspell-dictionaries, ispell-dictionaries.
  • Not Synced
    Basically if you depend on
    ispell-dictionary,
  • Not Synced
    which is a virtual package provided by 50
    different ispell packages,
  • Not Synced
    or aspell packages, and then they depend
    on something else after that.
  • Not Synced
    So in theory they can be a part of
    creating this problem.
  • Not Synced
    Fortunately, they themselves are fairly
    simple once you've passed them.
  • Not Synced
    You also have multiarch foreign, or
    multiarch allowed with
  • Not Synced
    package interdependencies.
  • Not Synced
    So if you have multiple architectures,
    you in theory can have any number
  • Not Synced
    of things satisfying the same clause.
  • Not Synced
    This gives extra unnecssary work for most
    resolvers,
  • Not Synced
    if you enable them to do
    multiarch and cross things.
  • Not Synced
    We have 12 of them, but I suspect most
    users of multiarch are somewhat limited
  • Not Synced
    to maybe 3.
  • Not Synced
    Possible exception being people like
    writing resolvers and trying them out,
  • Not Synced
    torture test them.
  • Not Synced
    The real problem, if you had to do this in
    the archive you would need to do,
  • Not Synced
    for example something like, write
    indifferent awk implementations.
  • Not Synced
    We have three.
  • Not Synced
    And then afterwards, for example,
    have m different distinct,
  • Not Synced
    but equally valid libc implementations.
  • Not Synced
    It's not something we're likely to do in
    the near future.
  • Not Synced
    And you have to do this on multiple
    levels,
  • Not Synced
    because you can just presolve the
    essential set most of the time.
  • Not Synced
    The real problem: if you had to do this in
    the archive, you would need to do,
  • Not Synced
    for example, something like write
    N different awk implementations.
  • Not Synced
    We have 3.
  • Not Synced
    Then afterwards for example, have
    M different distinct but equally valid
  • Not Synced
    libc implementations.
  • Not Synced
    It's not something we're likely to do
    in the near future.
  • Not Synced
    And you have to do this on multiple
    levels.
  • Not Synced
    Because you can just presolve the
    essential set most of the time,
  • Not Synced
    this particular thing is not very
    interesting.
  • Not Synced
    And the data packages can in theory
    blow up,
  • Not Synced
    when you have truly interchangable data,
    like we do with the aspell packages.
  • Not Synced
    But they tend to either not have any
    dependencies after the aspell,
  • Not Synced
    after the data package, or they have a
    simple loopback into what you came from.
  • Not Synced
    So the blowup blowup tends to be limited
    to one level.
  • Not Synced
    If you have enough of those, it's still
    going to be a problem,
  • Not Synced
    but it keeps it simple.
  • Not Synced
    Onto installing.
  • Not Synced
    The easier case, as mentioned is when you
    have a single suit
  • Not Synced
    and a single architecture.
  • Not Synced
    All things in the graph collapse into
    what looks like a regular graph,
  • Not Synced
    and therefore become simple,
    trivial.
  • Not Synced
    This actually happens so much that
    if you take Lintian,
  • Not Synced
    it has 26 distinct dependency clauses.
  • Not Synced
    If you look at each of them,
  • Not Synced
    when you only have a single suite
    and architecture, there's at most
  • Not Synced
    one package directly solving each
    clause locally.
  • Not Synced
    Then further down the graph somewhere
    you depend on for example, debconf,
  • Not Synced
    or debconf-2.0 which is one of these
    alternatives, giving rise to its ??
  • Not Synced
    Linitan itself is not subject to becoming
    a backtrack point.
  • Not Synced
    There's no guesswork when you reach
    Lintian, which of its dependencies
  • Not Synced
    should I pick, which version of
    it and all.
  • Not Synced
    That's not there, you have to go further
    down to see that.
  • Not Synced
    Actually, but the time you reach the
    debconf thing,
  • Not Synced
    it has already been solved by
    something else, as I recall.
  • Not Synced
    Oh, no wait. The dependencies of debconf
    is already covered by the time you're
  • Not Synced
    forced to resolve this choice,
    so it's not really a big issue, either.
  • Not Synced
    You just pick the debconf one, c-debconf
    currently depends on debconf,
  • Not Synced
    so it's trivial that way.
  • Not Synced
    The question is, when do we have
    these conditions.
  • Not Synced
    We do that a lot actually.
  • Not Synced
    If you do builts in unstable for example.
  • Not Synced
    ??, only one suite only one architecture,
    piece of cake.
  • Not Synced
    If you install packages of a pure
    Jessie system, with multiarch,
  • Not Synced
    you have the single suite one,
    which limits a lot of them,
  • Not Synced
    and then you have not the single
    architecture.
  • Not Synced
    That's what it is, but in pure Squeeze,
    you won't have multiarch because
  • Not Synced
    because it didn't work back then.
  • Not Synced
    So that's even simpler.
  • Not Synced
    Also you can do unstable + experimental,
    or stable + backports if you fiddle a bit
  • Not Synced
    with the end resolver and say that it is
    not allowed to pick experimental,
  • Not Synced
    unless it is explicitly requested to pick
    experimental,
  • Not Synced
    and from there it only picks
    what it absolutely needs to solve it.
  • Not Synced
    You basically get the same result with a
    single suite restriction,
  • Not Synced
    but that requires you to actually code
    something for it.
  • Not Synced
    And then there's Britney, with the testing
    migration thing, and that's because
  • Not Synced
    she basically takes things, moves it into
    a new version of testing,
  • Not Synced
    and tries to test that this is still
    the same solution.
  • Not Synced
    So she forces it into a single
    suite currently.
  • Not Synced
    So these are common cases where
    it happens.
  • Not Synced
    This is not everywhere where it happens,
  • Not Synced
    so the stable upgrades are still funny
    enough, not a single suite and all that.
  • Not Synced
    It happens because there is only one
    package and architecture.
  • Not Synced
    Only one instance of package, you only
    have one version, one architecture that
  • Not Synced
    solves that particular dependency.
  • Not Synced
    Whereas with multiarch you could have,
    i386 and the amd64 version.
  • Not Synced
    If you do an upgrade you could have the
    old version and the new version
  • Not Synced
    that may or may not satisfy it.
  • Not Synced
    Also we have this thing where we
    don't like libraries to have
  • Not Synced
    alternative implementations, it sort of
    breaks things horribly.
  • Not Synced
    Especially when they are not actually
    agreeing on the interface they provide.
  • Not Synced
    There's the social aspect of it that we
    don't really bother having
  • Not Synced
    200 interchangeable implementations of
    everything.
  • Not Synced
    I think the record is something like
  • Not Synced
    five or six different versions
    of sendmail, implemenatations.
  • Not Synced
    ??
  • Not Synced
    I don't think so, but ?? might have.
  • Not Synced
    And even when you do have one of these
    explosions,
  • Not Synced
    it has to actually hide the breakage
    beneath one of these choice
  • Not Synced
    explosion alternative explosions for
    it to be a problem.
  • Not Synced
    You might have aspell over here on the
    right hand side,
  • Not Synced
    and you have a breakage over here which
    ?? to find out.
  • Not Synced
    So if you use the resolver to just the
    first explosion to solve the other part,
  • Not Synced
    it realizes this is not working,
    I'll bail out.
  • Not Synced
    There's a couple things that we do to
    make this easier.
  • Not Synced
    The mos interesting thing is of course,
    can we still make this simple without
  • Not Synced
    the single suite and single
    architecture restriction.
  • Not Synced
    We can to some extent.
  • Not Synced
    And that's where we move to upgrading,
  • Not Synced
    in deterministic polynomial time.
  • Not Synced
    This is where I spent most of my
    time working.
  • Not Synced
    So to start with, when we do an upgrade
    from stable to stable,
  • Not Synced
    and here I'm taking pure stable, no
    backports and all.
  • Not Synced
    The general recommended way to do it is
    to replace Wheezy with Jessie,
  • Not Synced
    hit apt-get update, and
    upgrade afterwards,
  • Not Synced
    and apt replaces all the old packages with
    the new version,
  • Not Synced
    if there is a new version.
  • Not Synced
    We also want all the packages from Wheezy
    that did not have any version in Jessie
  • Not Synced
    to be removed, and there might be a new
    essential package somewhere.
  • Not Synced
    So long story short, upgrade if it's
    present, remove if it is not present,
  • Not Synced
    and then install the new essential
    packages, if any.
  • Not Synced
    They don't tend to change that often.
  • Not Synced
    I'll be ignoring the installing part
    for now.
  • Not Synced
    It is of course valid and interesting,
    but we'll skip it.
  • Not Synced
    So let's take a simple example, somewhat
    contrived, somewhat made up the numbers.
  • Not Synced
    They are not too far from reality, but
    they are not exact either.
  • Not Synced
    We have have a system that we are
    upgrading from Wheezy to Jessie.
  • Not Synced
    I claim that there are 30 thousand
    packages in Wheezy,
  • Not Synced
    we've got 35 thousand in Jessie.
  • Not Synced
    The system we are working with has
    two thousand packages installed.
  • Not Synced
    Mine here has something like 1200.
  • Not Synced
    If you do a simple dpkg -L
  • Not Synced
    and then do a line count, you can see
    how many you have on your system,
  • Not Synced
    plus-minus five or so, to give you an idea
    of how many packages
  • Not Synced
    you're actually working with.
  • Not Synced
    We assume that all Wheezy package got a
    new version in Jessie,
  • Not Synced
    because I'm about to do a pop quiz!
  • Not Synced
    So with these numbers, what is the maximum
    problem size of this upgrade.
  • Not Synced
    Any takers?
  • Not Synced
    Is it 30?
  • Not Synced
    Anyone for 35?
  • Not Synced
    37?
  • Not Synced
    One for 65.
  • Not Synced
    One for 67.5 thousand.
  • Not Synced
    And who believes that I was an asshole
    and did not include the right answer?
  • Not Synced
    Oh, so little faith.
  • Not Synced
    I'm glad you believe me.
  • Not Synced
    The right answer is 35 thousand.
  • Not Synced
    The trick is, when we do the upgrade, we
    replace Wheezy with Jessie,
  • Not Synced
    we do an update, so all the Wheezy
    packages not installed
  • Not Synced
    on our system disappears.
  • Not Synced
    Then you get all of the Jessie packages,
    with the assumption that
  • Not Synced
    no Jessie packages have a new version
    compared to Wheezy.
  • Not Synced
    You've got the two thousand from
    Wheezy on your system,
  • Not Synced
    plus the 35 thousand
    packages from Jessie.
  • Not Synced
    So that means your average
    stable-to-stable upgrade,
  • Not Synced
    if you do it this way, is actually only
    about 40% of the worst case
  • Not Synced
    that it could've been if you kept the
    old stable as well.
  • Not Synced
    There's also another awesome feature with
    removing the old stable repository.
  • Not Synced
    It means that every time you
    upgrade a package,
  • Not Synced
    your effective problem size
    decreased by one.
  • Not Synced
    That's not always useful, it's not
    always awesome,
  • Not Synced
    and it's not always the thing that solves
    your problem,
  • Not Synced
    but it means that if you can actually make
    apt upgrade a single thing
  • Not Synced
    every now and then, eventually it might
    be able to figure out the rest of
  • Not Synced
    the upgrade path, after ??
    200 packages or so.
  • Not Synced
    As we upgrade, we end up to moving towards
    a single suite situation,
  • Not Synced
    so the more things we upgrade, the more we
    get to a single suite situation.
  • Not Synced
    Again, most ?? pure stable to stable
    upgrades.
  • Not Synced
    And of course the right answer would've
    been 65 thousand packages
  • Not Synced
    if you kept your stable.
  • Not Synced
    So that would've been a possible
    answer as well.
  • Not Synced
    Now upgrading.
  • Not Synced
    This should be as easy as mark new
    essential packages for install,
  • Not Synced
    Mark everything that has a new version for
    upgrade, remove stuff,
  • Not Synced
    and then figure out if something is
    missing, then install that.
  • Not Synced
    This solves upgrading by installing, so
    I'm not really interested in doing this,
  • Not Synced
    because installing is hard.
  • Not Synced
    We can do something smarter than
    that for upgrading.
  • Not Synced
    When we do upgrades in a polynomial
    deterministic time,
  • Not Synced
    I'm not going to give you a 100% solution,
    that's not going to work.
  • Not Synced
    If I could, I would be very very rich.
  • Not Synced
    So, I intended this to sort of find some
    easy low hanging fruits
  • Not Synced
    that can be solved cheaply, and then you can
    throw a general purpose resolver at
  • Not Synced
    the rest of the problem, after you've done
    the simple parts.
  • Not Synced
    Or you can feed your solver a partial
    solution and ask it to fix it up
  • Not Synced
    so it would be a slightly
    smaller problem size.
  • Not Synced
    It relies on two things.
  • Not Synced
    One, it exploits a lot of common patterns
    on how we do dependencies.
  • Not Synced
    And the other thing is, if you have a
    valid system state,
  • Not Synced
    so if you have a system where all of your
    packages are in fact installable,
  • Not Synced
    which dpkg tends to enforce,
  • Not Synced
    you can verify that is true
    in polynomial time.
  • Not Synced
    You can also verify a change to it in
    polynomial time.
  • Not Synced
    So here I'm going to do a bit of theory,
    and a bit of English inbetween.
  • Not Synced
    We start with a not-broken system,
    that is we have a set of installable
  • Not Synced
    packages that are mutually
    co-installable and all that.
  • Not Synced
    We call that 'I'.
  • Not Synced
    Then we can add things to I, we can remove
    things from this set, we can take something
  • Not Synced
    in the set and replace it with something else,
    basically add and remove.
  • Not Synced
    That can be done in linear time.
  • Not Synced
    If I take a random package, mash it in,
    that's constant, maybe,
  • Not Synced
    depending on your implementation.
  • Not Synced
    The theory is we can compute a new
    set, where we remove something,
  • Not Synced
    add something else, we get a new set.
  • Not Synced
    Then we can verify this new set is indeed
    still a valid solution in polynomial time.
  • Not Synced
    So, a constant time modification
    can be verified in polynomial time.
  • Not Synced
    The issue of the day is, randomly feeding
    packages in and out of the set
  • Not Synced
    is not going to work very well.
  • Not Synced
    So we need something smarter than
    just random modifications.
  • Not Synced
    Ah well, somebody might actually write
    something that works with that but anyway.
  • Not Synced
    So the first thing I did,
    as I mentioned earlier,
  • Not Synced
    we have these exclusively equal
    dependencies inside binaries,
  • Not Synced
    ?? binaries from the same source.
  • Not Synced
    So I grouped binaries from the same source.
  • Not Synced
    If I was to mark one of them for upgrade,
  • Not Synced
    I would immediately pull the rest of
    them as well, if they were installed.
  • Not Synced
    It also happens to sort out a very
    common case of breaks-replaces,
  • Not Synced
    when you move files between two binaries
    in the same source.
  • Not Synced
    Then you just try to upgrade each of these
    groups, in some order.
  • Not Synced
    Preferably deterministically,
    but I didn't get that far.
  • Not Synced
    And if the result of one of these
    modifications is upgradable,
  • Not Synced
    we can test that fairly cheap, we commit
    it, and you just rinse-and-repeat
  • Not Synced
    until you run out of things to test,
    that leads to something new.
  • Not Synced
    This works largely depending
    on what you have installed,
  • Not Synced
    unsurpisingly.
  • Not Synced
    So if you have very few packages,
    it may work better than if you have more,
  • Not Synced
    And sometimes it works better when you
    have more, and that's really exciting.
  • Not Synced
    So I did an example Wheezy installation
    based on what I have installed
  • Not Synced
    on my machine.
  • Not Synced
    Not quite, but close, well related.
  • Not Synced
    So libc for example was immediately
    upgradable by this procedure.
  • Not Synced
    As well as some Java package,
    worked fine.
  • Not Synced
    man-db, eclipse, xterm, then I had to do a
    couple packages before that,
  • Not Synced
    which were all upgradable.
  • Not Synced
    I think libc was primarily ?? and maybe
    some libc linux thing.
  • Not Synced
    But there are actually a set of packages
    you can upgrade with this.
  • Not Synced
    This is of course not testing on
    configurations.
  • Not Synced
    In this particular configuration, I could
    upgrade dpkg like this,
  • Not Synced
    but I don't expect you to be able
    to do that in all cases.
  • Not Synced
    In fact I find it highly unlikely
    because in Wheezy to Jessie,
  • Not Synced
    dpkg has tons of breaks for
    all sorts of things,
  • Not Synced
    so if you have the right thing there,
    you break something else,
  • Not Synced
    and that has to be upgraded
    at the same time.
  • Not Synced
    And that is sure to build
    a loop eventually.
  • Not Synced
    So basically what we are exploiting here
    is the greater-than-equal version,
  • Not Synced
    which is the common way of doing things.
  • Not Synced
    We rebuild a package, it gets the new
    dependencies on a higher
  • Not Synced
    version of things.
  • Not Synced
    That means the foo in stable ??.
  • Not Synced
    Right so, everything depending on foo is
    equally happy with the version
  • Not Synced
    in Jessie as well, cause it's a lower
    bound, and Jessie has a newer version.
  • Not Synced
    So that works very well.
  • Not Synced
    This is the common case for libraries
    without ABI transitions.
  • Not Synced
    Apparently including libc
  • Not Synced
    and enables you to upgrade from the inside
    out, from the core of the onion and out,
  • Not Synced
    if you think of the graph as an onion.
  • Not Synced
    The algorithm is fairly dumb,
    unsurprisingly.
  • Not Synced
    It cannot solve Perl, it can't talk Python
    because they tend to involve
  • Not Synced
    pulling in a new package so you have to
    install that and all that.
  • Not Synced
    In Perl it could technically solve if it
    would merge groups together
  • Not Synced
    into larger groups, but anyway the runtime
    of this is something like I2 * I+E.
  • Not Synced
    Which is upper bound by
    I to the power of 4.
  • Not Synced
    So it's fairly cheap, it is fairly polynomial.
  • Not Synced
    It's very trivial to do.
  • Not Synced
    We can do better than that as mentioned,
    if we can have it figure out that
  • Not Synced
    dpkg needs a new version of libc,
    we could upgrade those together.
  • Not Synced
    That's not a problem on my system,
    but it might be on others.
  • Not Synced
    Then there's the part where dpkg is
    breaking all of this stuff,
  • Not Synced
    so you might have to upgrade it
    at the same time, or before dpkg.
  • Not Synced
    Then end up with some sort of big loop or
    three structure of groups you have to
  • Not Synced
    merge together or upgrade together,
    and it should just work.
  • Not Synced
    There's also the part were we have to
    handle rename packages.
  • Not Synced
    This comes into ?? basically.
  • Not Synced
    They become an API bump, which will be
    very useful for stretch due to dcc5.
  • Not Synced
    Basically if you want to do a same
    restriction on this you could do something
  • Not Synced
    like, we have a clause that is not
    satisfied, but the thing we need for it
  • Not Synced
    is a package introduced in the new source,
    then we pull that.
  • Not Synced
    If it has the same dependencies, we've
    already solved it, that sort of thing.
  • Not Synced
    There's also the part where people rename
    the package from foo to foo replacement
  • Not Synced
    or something like that.
  • Not Synced
    Again, if it is from the same source, we
    might do some magic here,
  • Not Synced
    some heuristics here.
  • Not Synced
    And also at some point you will end up
    needing to install stuff,
  • Not Synced
    that is actually required for
    Wheezy to Jessie
  • Not Synced
    because it pulls in a new a system.
  • Not Synced
    If it had trivial no-guess solutions,
  • Not Synced
    by which I mean the first thing is of the
    alternative choices is a real package,
  • Not Synced
    and there is only one of them,
  • Not Synced
    you could solve this automatically
    in a deterministic way.
  • Not Synced
    And otherwise, give up and try
    something else.
  • Not Synced
    So this is the basic algorithm, and the
    basic ideas I've put down,
  • Not Synced
    and I've been fiddling a bit with,
    and didn't get very far with so far.
  • Not Synced
    Installing part 2.
  • Not Synced
    So after we've learned this with
    upgrading, we can now,
  • Not Synced
    with single suite single architecture
    it is trivial.
  • Not Synced
    Upgrading we can usually ?? it,
    reduce the problem size to some extent.
  • Not Synced
    We can do better on the
    installing part as well.
  • Not Synced
    We can look at these
    aspell packages again,
  • Not Synced
    because they are one of the big problems.
  • Not Synced
    We have a couple packages that appear
    to be almost identical,
  • Not Synced
    and then we also have packages that
    are clearly superior to others.
  • Not Synced
    Packages that are identical,
    shows up like this.
  • Not Synced
    So we ?? long enough in the aspell
    package from Russia and the one
  • Not Synced
    from Ukraine, something
    very special happens.
  • Not Synced
    You will notice that primary difference
    between the fields I've selected are
  • Not Synced
    the package name and the version.
  • Not Synced
    Exiting, because that means that if I
    can install aspell-uk,
  • Not Synced
    that's the only thing on the system,
  • Not Synced
    then the same solution is valid
    for the Russian one.
  • Not Synced
    So ?? they are different, but semantically
    they differ only by name and version.
  • Not Synced
    Sometimes you can have version
    dependencies that always
  • Not Synced
    satisfied ?? for example, then that's
    the same game actually.
  • Not Synced
    This is a simplified view because in
    theory you could have a package
  • Not Synced
    that refuses to work with the Ukranian
    one, but not the Russian one,
  • Not Synced
    and vice versa, so they actually become
    distinct solutions.
  • Not Synced
    But the general use of aspell dictionaries
    tends to be any one of them,
  • Not Synced
    and I really don't care
    which one you pick.
  • Not Synced
    So we find those by saying they have
    the same effective dependencies clauses.
  • Not Synced
    We can also look at the negative
    dependencies, that's fairly important too,
  • Not Synced
    they have to have the same there.
  • Not Synced
    Here we need to remember that negative
    dependencies, we do not think of them
  • Not Synced
    as just directed, they're not.
  • Not Synced
    So it really doesn't matter if foo breaks
    bar or bar breaks foo,
  • Not Synced
    the important part is they
    don't work together.
  • Not Synced
    Then we have assert that they satisfy
    the same clause.
  • Not Synced
    Both of them are a valid solution to the
    same clause as the other one.
  • Not Synced
    And this becomes interesting when you have
    dependency ranges,
  • Not Synced
    which is not truly a dependency range,
    but you have two clauses that says,
  • Not Synced
    I will need foo greater than
    version 1, and I need it strictly
  • Not Synced
    less than version 2, then one of them
    contains both of them,
  • Not Synced
    and the other one does not, when you
    are doing upgrades for example.
  • Not Synced
    It has to satisfy the same
    clauses as well.
  • Not Synced
    A little trick thing that I discovered
    a little too late, but it's fun.
  • Not Synced
    These things things can generally be done
    in polynomial time, that ballpark.
  • Not Synced
    I haven't really computed
    the actual results.
  • Not Synced
    But we can go further than that because
    equivalent or identical packages
  • Not Synced
    are easy to find, but there is
    something better than that.
  • Not Synced
    We can also have the case where one of the
    packages is clearly better than the other.
  • Not Synced
    For example, it has fewer of the same
    effective dependencies.
  • Not Synced
    So I need less to solve this package.
  • Not Synced
    It has fewer of the same negative
    dependencies, fewer things to
  • Not Synced
    not work with this,
    this is also good.
  • Not Synced
    And finally it solves at least as many
    dependency clauses as the other one.
  • Not Synced
    This is typically something you find
    in upgrade scenarios.
  • Not Synced
    So your version and the new version do not
    have and different dependencies
  • Not Synced
    for example, then don't have any new
    config relations, but the new version
  • Not Synced
    might solve more ?? dependencies.
  • Not Synced
    It might have more packages it can solve.
  • Not Synced
    In that case, you can just unconditionally
    take the new version,
  • Not Synced
    because if a solution works
    for the new version,
  • Not Synced
    it may or may not work for the old one,
  • Not Synced
    but if the solution works
    with the old one,
  • Not Synced
    it definitely works with the new one.
  • Not Synced
    So that's sort of a freebie.
  • Not Synced
    You're just solving for one of them, if
    that doesn't work,
  • Not Synced
    you know you don't have to try the other,
    that's the important part.
  • Not Synced
    And the point with these is basically
    that identical packages are just
  • Not Synced
    two-way substitutable.
  • Not Synced
    This being a one-way substitution instead.
  • Not Synced
    I haven't figured out how to find these,
    in general, more effectively than
  • Not Synced
    I do the identical ones.
  • Not Synced
    The identical ones are easy to find,
    but I would like to get some where
  • Not Synced
    we can find these superior packages
    trivially, because they are more
  • Not Synced
    useful in general, or there might
    be more of them as well.
  • Not Synced
    This is as far as I got, including writing
    slides five minutes before the talk.
  • Not Synced
    So are there any questions?
  • Not Synced
    There's a mic there,
    or there's a runner.
  • Not Synced
    [question] Thanks for your interesting
    talk. I have two questions in fact.
  • Not Synced
    The first question concerns your problem
    of finding equivalent packages.
  • Not Synced
    Are you aware of the ?? tool.
  • Not Synced
    [Neils] I have through the article
    about it.
  • Not Synced
    The thing I remember with that was that it
    wrote somewhere that
  • Not Synced
    this should blow up exponentially
    but it doesn't.
  • Not Synced
    [question] We these are all theoretically
    hard problems, or course.
  • Not Synced
    [Neils] [laughing] Yes.
    [question] So this tool in fact
  • Not Synced
    tries to group to find out where to
    classify packages with respect to
  • Not Synced
    the switch other packages
    that are coninstallable.
  • Not Synced
    [Neils] Yes.
    [question] And this works.
  • Not Synced
    One of the steps is to group together
    packages which behave the same.
  • Not Synced
    With respect to coninstallability
    with other packages.
  • Not Synced
    So this would be your notion of ??
    if I understood it correctly.
  • Not Synced
    [Neils] It sounds like it.
    [question] At least it is very similar,
  • Not Synced
    so maybe we can do this offline later,
    but this is definitely something you
  • Not Synced
    should look at I think.
  • Not Synced
    And my other question is about this
    deterministic ptime installability.
  • Not Synced
    Of course you are well aware of the fact
    that this is a result which holds under
  • Not Synced
    some restrictions of the problems,
    of course,
  • Not Synced
    as you don't plan to solve P = NP, so can
    you, for instance, if you already know
  • Not Synced
    that it has this good complexity when
    you don't have any alternatives.
  • Not Synced
    As you've already said these are
    explicitly, not implicitly,
  • Not Synced
    and also when you don't any conflicts,
    either implicitly or explicitly,
  • Not Synced
    you seem to have a different class.
  • Not Synced
    Can you clarify precisely under which
    conditions on the dependency graph
  • Not Synced
    you get a ptime complexity of the problem.
  • Not Synced
    [Neils] So actually the negative
    dependencies are not an issue.
  • Not Synced
    If you take a package and the entire
    transitive dependencies below it
  • Not Synced
    can always be solved by looking
    elsewhere in this.
  • Not Synced
    So you have a set of transitive
    dependencies, and you can always
  • Not Synced
    solve it by following a clause where
    there's only one option,
  • Not Synced
    or there's a package you've already picked
    earlier.
  • Not Synced
    Then locally this becomes a simple regular
    graph, well you can massage it into one.
  • Not Synced
    I don't think it happens often enough for
    us to rely on it 100%,
  • Not Synced
    I think there may be cases, or corners of
    the graph where it happens locally,
  • Not Synced
    and then when you leave that part,
    go further down the stack,
  • Not Synced
    or outside that part, you get back
    to the original problem,
  • Not Synced
    but will have parts of it that
    are locally polynomial,
  • Not Synced
    which will be interesting to find.
  • Not Synced
    [question] So whenever the
    dependency problem fails,
  • Not Synced
    and you have to do it manually, I've
    always thought it would be optimal
  • Not Synced
    if a user could submit their way to
    resolve the issue to a public place
  • Not Synced
    and share it so others can see.
  • Not Synced
    And then it would be easier
    to solve dependency.
  • Not Synced
    You have things like pop-con, where you
    already upload what package
  • Not Synced
    you've installed, so it would be quite a
    small step to upload the problems
  • Not Synced
    you have, and the solutions.
  • Not Synced
    So I'd like to hear if you have
    any views on that.
  • Not Synced
    [Neils] I think it could be interesting to
    have that,
  • Not Synced
    but mostly as a way of generating
    test data for solvers.
  • Not Synced
    My take is that as a user, this problem
    should be simple enough that
  • Not Synced
    we can solve this in the tools,
    so the user doesn't have to
  • Not Synced
    manually fix the problem.
  • Not Synced
    But I said, feedback getting actual test
    case data might be useful for that.
  • Not Synced
    There was a question there.
  • Not Synced
    [question] You went off your slides.
    You say that in order for upgrade
  • Not Synced
    from stable distribution to the next
    stable one,
  • Not Synced
    it is only considered to be a ?? upgrade
    when you have only packages
  • Not Synced
    in the new stable, which means that if you
    have a package which no longer exists
  • Not Synced
    in the nearest table, the package
    is removed, is that really a good thing?
  • Not Synced
    [Neils] This is the short version of it.
  • Not Synced
    Usually when there is not a new version
    in the stable release,
  • Not Synced
    it's because we've removed it, and we no
    longer support it.
  • Not Synced
    I suppose there may be cases where the
    user might still want your version
  • Not Synced
    of the package to be there, if it's
    installable and all.
  • Not Synced
    This is one of the cases where we may
    be debating over what is
  • Not Synced
    practically useful, and what's the
    theoretical idea of doing this upgrade.
  • Not Synced
    That might collide.
  • Not Synced
    So this was just useful for me to get an
    idea of where I was headed,
  • Not Synced
    what problem I was
    trying to solve basically.
  • Not Synced
    [question] So maybe I can just give a
    remark to your question.
  • Not Synced
    So the problem when you are trying to
    precompute solutions to
  • Not Synced
    insolubility problems, then it of course
    also depends on which suite you are
  • Not Synced
    taking the packages you want to install
    or upgrade,
  • Not Synced
    but also what you have currently
    installed on your machine.
  • Not Synced
    This of course, there are hardly two users
    who have installed the same packages
  • Not Synced
    on their machine.
  • Not Synced
    For that reason, it is difficult to
    precompute this kind of problem,
  • Not Synced
    I would say.
  • Not Synced
    And the other remark is that in practice,
    even for very hard instances,
  • Not Synced
    like you have many different suites from
    which you install without having any
  • Not Synced
    pin preferences among them, stuff like
    that, in practice you get very good
  • Not Synced
    solutions with other solver technologies
    like ??, or other solvers,
  • Not Synced
    which can be used as external solvers
    in apt for precisely the same reason
  • Not Synced
    that you have already identified, but in
    practice,
  • Not Synced
    even theoretically hard problems are often
    much much easier than what you
  • Not Synced
    could obtain in the worst case.
  • Not Synced
    [Neils] I think we're running out of time.
  • Not Synced
    So if there are no more questions...
  • Not Synced
    last one.
  • Not Synced
    [question] So do you have any
    preliminary results from timings,
  • Not Synced
    or something like that, at least for your
    system where you've tried it.
  • Not Synced
    [Neils] No. What I've got is from Britney
    actually.
  • Not Synced
    I added some steps to Britney, and
    we got a dataset from Kali Linux.
  • Not Synced
    They were trying to go from their version
    of Wheezy to Jessie,
  • Not Synced
    which was in August, so that was
    before Jessie froze.
  • Not Synced
    And basically we were working with
    70 thousand nodes, or packages.
  • Not Synced
    So basically, all of Wheezy, plus all of
    Jessie, had a unique version each.
  • Not Synced
    The interesting this is that,
  • Not Synced
    for example, the average node in that
    graph had like 3 to 4 dependencies,
  • Not Synced
    The median was 3 dependency clauses,
    and the average as 4.3.
  • Not Synced
    Each of these clauses had an average of
  • Not Synced
    -ah I can't read this shit anymore-
  • Not Synced
    had a median of 2 possible options for it,
    and an average of 1.8.
  • Not Synced
    And this is before Britney accounts for
    excluding things.
  • Not Synced
    So this is the roll graph and then,
  • Not Synced
    based on that she selects
    something that is in testing,
  • Not Synced
    and everything outside
    of that she ignores.
  • Not Synced
    So when I have two options, a lot of the
    time she would throw out one of them,
  • Not Synced
    because that one is not in testing.
  • Not Synced
    So that's the number's I have.
  • Not Synced
    I have more if you want to see them but,
  • Not Synced
    it's not very detailed, it's not to the
    point where I think I can predict
  • Not Synced
    something useful from it.
  • Not Synced
    [Moderator] So okay Neils, thank you
    for your talk,
  • Not Synced
    and this was a really good insight into
    the problems of dependency resolving.
  • Not Synced
    Thank you very much.
  • Not Synced
    [appluase]
Title:
Debian_dependency_resolution_in_polynomial_time.webm
Video Language:
English
Team:
Debconf
Project:
2015_debconf15

English subtitles

Incomplete

Revisions Compare revisions