< Return to Video

Debian_dependency_resolution_in_polynomial_time.webm

  • Not Synced
    (Offscreen) So welcome to the talk from Niels
  • Not Synced
    about dependency resolution in
    polynomial time.
  • Not Synced
    (Neils) I'm Neils, I'm doing my second
    talk today.
  • Not Synced
    This time about something slightly
    more technical.
  • Not Synced
    I wanted to do dependency resolution
    in deterministic polynomial time,
  • Not Synced
    which is what this is all about.
  • Not Synced
    I didn't get as far with this as I wanted.
  • Not Synced
    I wanted to have a tool ready and have
    some results, which I didn't quite finish.
  • Not Synced
    So these will be my findings work in
    progress notes.
  • Not Synced
    We have six topics I have to cover.
  • Not Synced
    A quick introduction.
  • Not Synced
    Then something about the hard problem.
  • Not Synced
    Then the part the part where it actually
    turns out to be highly tractable
  • Not Synced
    in practice, if you don't think too much
    about it.
  • Not Synced
    And then there is some installing,
    upgrading, and installing again
  • Not Synced
    for how to improve your solvers.
  • Not Synced
    Starting with introduction.
  • Not Synced
    I'm hoping mostly to debunk the myth
    that dependency resolution is a very
  • Not Synced
    hard problem that we cannot solve,
  • Not Synced
    and we should remove packages in the hopes
    that they will somehow keep the archive
  • Not Synced
    from exploding, and the dependency
    resolvers to break, and apt to die,
  • Not Synced
    and god knows what.
  • Not Synced
    I've been working on Britney, which is a
    different problem to solve, but it
  • Not Synced
    involves the same techniques, or uses some
    of the same techniques.
  • Not Synced
    Some will be directly applicable to apt,
    some will not.
  • Not Synced
    There's not a one size fits all solution, sorry.
  • Not Synced
    And I much less haven't wrote it yet,
    even if there was.
  • Not Synced
    So defining the problem.
  • Not Synced
    When you try to install a package on your
    system,
  • Not Synced
    there's actually two problems
    being solved.
  • Not Synced
    One is the part where apt figures out if I
    have to install Eclipse.
  • Not Synced
    You'll be needing a lot of Java packages,
    you'll be needing a lot of other stuff.
  • Not Synced
    And so it figures out which packages are
    needed for this to make sense.
  • Not Synced
    The separate part of the problem,
    the second problem is,
  • Not Synced
    once you have this list, they have to be
    unpackaged and configured in a certain
  • Not Synced
    order for this all to make sense.
  • Not Synced
    They are in theory two distinct problems.
  • Not Synced
    I'll be talking about the first problem,
  • Not Synced
    because that's actually
    dependency resolution.
  • Not Synced
    The other thing is just ordering.
  • Not Synced
    The ordering thing is certainly also very
    important part.
  • Not Synced
    I don't want to dismiss that, it's just
  • Not Synced
    it is in fact theoretically a polynomial
    time problem.
  • Not Synced
    So to solve the ordering problem, you
    basically compute all the action-
  • Not Synced
    Given a set of actions that need
    to be done,
  • Not Synced
    then there are some constraints between
    some of them,
  • Not Synced
    like you unpack things before you
    configure it.
  • Not Synced
    Maybe you deconfigure something else
    before that.
  • Not Synced
    So you end up with a list of partial ??
    constraints.
  • Not Synced
    From that you build a graph with ordering.
  • Not Synced
    It's fairly simple to do, in practice,
    if you have the right tools for it.
  • Not Synced
    Then without cycles, this is a trivial
    things where it just orders it
  • Not Synced
    and gives you this is the order,
    go fix.
  • Not Synced
    When there's cycles you will get a single
    action consisting of multiple things
  • Not Synced
    to do at the same time, which is really
    impossible to do.
  • Not Synced
    It turns out that the way we defined this
    whole thing,
  • Not Synced
    if you have a package that does not have a
    post install script,
  • Not Synced
    it does not need to be configured separately.
  • Not Synced
    That means that it's just unpacked or
    not unpacked.
  • Not Synced
    That tends to break most cylcles.
  • Not Synced
    If you want, there's problems with
    ordering constraints.
  • Not Synced
    Help us to find a way to get rid of most
    of the post install scripts,
  • Not Synced
    such as the ones just running lvconfig
    and nothing else.
  • Not Synced
    That could solve some of the cycles,
  • Not Synced
    and otherwise it's just feeding it to dpkg
    and it works.
  • Not Synced
    If you have a cycle, it's a separate
    problem fixing that.
  • Not Synced
    I won't be covering that.
  • Not Synced
    As I said, the runtime is polynomial.
  • Not Synced
    It's something like the size of the
    package, a regular graph.
  • Not Synced
    Which is fairly polynomial.
  • Not Synced
    This even covers finding cycles.
  • Not Synced
    Again breaking cycles is just removing
    post install scripts.
  • Not Synced
    And of course if you think this is too
    simple,
  • Not Synced
    you can always implement more features
    on top of it,
  • Not Synced
    such as trying to optimize for minimal
    downtime of services and all that.
  • Not Synced
    You can make any trivial problem very hard
    very soon with that.
  • Not Synced
    So, the hard problem.
  • Not Synced
    First of all, the players.
  • Not Synced
    We've got apt, aptitude, cupt, whatever.
  • Not Synced
    We've got Britney.
  • Not Synced
    We've got DOSE, edos, whatever its
    name is today.
  • Not Synced
    They solve the hard problem.
  • Not Synced
    Or they should be trying to do.
  • Not Synced
    Notable tools not affected by it.
  • Not Synced
    dpkg.
  • Not Synced
    So dpkg surely figures out if you are
    missing dependencies. It says,
  • Not Synced
    "I cannot resolve this, and I don't know
    where to fetch it, so I'm just giving up."
  • Not Synced
    And that is the only thing you can do.
  • Not Synced
    Which means it only verifies the solution
    which is known to be polynomial,
  • Not Synced
    and is therefore is not a game player.
  • Not Synced
    DAK rm is also doing a polynomial time
    check, it just happens to be so
  • Not Synced
    for other reasons.
  • Not Synced
    That's basically the known players.
  • Not Synced
    There are probably others.
  • Not Synced
    But we're moving onto the
    hard problem itself
  • Not Synced
    So the problem we actually have is,
  • Not Synced
    we've got version dependencies,
    we've got alternative choices,
  • Not Synced
    we've got virtual packages.
  • Not Synced
    All three makes this problem hard.
  • Not Synced
    You basically have to remove all possible
    alternative choices, guesses, whatever,
  • Not Synced
    to make this simple.
  • Not Synced
    It just so happens if you move version
    dependencies among other things,
  • Not Synced
    things become really fun.
  • Not Synced
    ?? to solve the problem, but upgrading
    gets really really spicy.
  • Not Synced
    So how do we ensure that dpkg
    is configured before your new package,
  • Not Synced
    use a feature new dev package
    is installed.
  • Not Synced
    That's sort of impossible.
  • Not Synced
    It becomes simple, but we end up with a
    broken dependency ??,
  • Not Synced
    so it's not really an option.
  • Not Synced
    Technically there's also a way to make
    it simple by having multiple versions
  • Not Synced
    of things, and removing negative
    dependencies,
  • Not Synced
    but that's not necessarily easy to do either.
  • Not Synced
    Presumably file configs will get in our
    way or something.
  • Not Synced
    We have to redefine where we place all of
    the files, that's not going to be
  • Not Synced
    something we're doing anytime soon.
  • Not Synced
    So yeah, it's not really an option.
  • Not Synced
    To give you a better understanding of the
    problem, please consider this example.
  • Not Synced
    If we have coreutils at some version,
    depending on either this version
  • Not Synced
    of libc or a newer one.
  • Not Synced
    From this can we immediately conclude
  • Not Synced
    that if libc 2.19 is known to be
    installable,
  • Not Synced
    that coreutils will also be installable?
  • Not Synced
    How many people think we can immediately
    conclude something from this?
  • Not Synced
    How many people think we cannot
    conclude anything from this?
  • Not Synced
    We'll that's good.
  • Not Synced
    So it turns out that with negative
    dependencies and without negative
  • Not Synced
    dependencies, it doesn't really matter.
  • Not Synced
    With negative dependencies, libc or
    whatever it depends on could
  • Not Synced
    just conflict with coreutils, we're
    broken, it's out.
  • Not Synced
    And without negative dependencies, you
    could have one of the things where
  • Not Synced
    it depends on a version that's newer or
    older, and no.
  • Not Synced
    It's not really trivial to do.
  • Not Synced
    You can't conclude something locally,
    in theory.
  • Not Synced
    That's something that can
    become a problem.
  • Not Synced
    Anyway, it is highly tractable
    in practice.
  • Not Synced
    Because if you do have a break, conflicts,
    or otherwise negative dependency,
  • Not Synced
    it tends to be upper bound.
  • Not Synced
    So if the previous version of coreutils
    was installable with that version,
  • Not Synced
    the new version will also probably be.
  • Not Synced
    Likewise, most dependencies, circular or
    otherwise, tend to be upper bound.
  • Not Synced
    There are cases of version ranges which
    is a lower bound and upper bound
  • Not Synced
    at the same time.
  • Not Synced
    They exist, they are just not so
    common yet.
  • Not Synced
    And that's a good thing.
  • Not Synced
    The number of possible solutions to
    any clause
  • Not Synced
    actually tends to be fairly limited.
  • Not Synced
    Of course there are the exceptions.
  • Not Synced
    Packages from unstable.
  • Not Synced
    They might just be missing, in which case
    it's really hard to solve the dependency.
  • Not Synced
    You've got mutually exclusive packages
    all the sendmail stuff, MTAs.
  • Not Synced
    The version ranges I mentioned before.
  • Not Synced
    And then of course the strictly equal
    versions, which we see inside
  • Not Synced
    packages built from the same
    source packages.
  • Not Synced
    The redeeming feature here: same source.
  • Not Synced
    Because that is actually very simple
    to figure out
  • Not Synced
    if they are from the same source.
  • Not Synced
    So they tend to be upgraded ??
    anyhow.
  • Not Synced
    The new versions tend to be available
    at the same time sorta thing.
  • Not Synced
    The problem made hard in a nutshell, is
    this contrived example for example.
  • Not Synced
    You have a package you want to try,
    and it depends on any number of foos.
  • Not Synced
    Each of the foos can either be solved by
  • Not Synced
    picking up the bar package
    or the good one.
  • Not Synced
    Which might be a hint to which one we
    would choose if we want to solve it.
  • Not Synced
    And the bar packages, depends on a
    bad package that does not work
  • Not Synced
    with the starting-package,
    and we are broken.
  • Not Synced
    And the good package just doesn't have any
    dependencies and is therefore good.
  • Not Synced
    In the perfect example you saw,
  • Not Synced
    you try foo, good, foo1, and then good
    and be done.
  • Not Synced
    In practice, if you don't have some way
    of ordering these so you know
  • Not Synced
    one of them is better than the other.
  • Not Synced
    You now have an n times m trace,
    where you have to guess up and down,
  • Not Synced
    which one am I doing, backtrack back and
    forth and all, and it's going to be horrible.
  • Not Synced
    And this contrived example can of course
    be solved by some solvers that can see
  • Not Synced
    the pattern, but you can always a
    more contrived pattern
  • Not Synced
    that's harder to understand.
  • Not Synced
    So I tried to keep it somewhat simple.
  • Not Synced
    The good thing is, very few people write
    software like this.
  • Not Synced
    And that is the most redeeming feature
    about the package ??.
  • Not Synced
    We do have exceptions.
  • Not Synced
    Aspell-dictionaries, ispell-dictionaries.
  • Not Synced
    Basically if you depend on
    ispell-dictionary,
  • Not Synced
    which is a virtual package provided by 50
    different ispell packages,
  • Not Synced
    or aspell packages, and then they depend
    on something else after that.
  • Not Synced
    So in theory they can be a part of
    creating this problem.
  • Not Synced
    Fortunately, they themselves are fairly
    simple once you've passed them.
  • Not Synced
    You also have multiarch foreign, or
    multiarch allowed with
  • Not Synced
    package interdependencies.
  • Not Synced
    So if you have multiple architectures,
    you in theory can have any number
  • Not Synced
    of things satisfying the same clause.
  • Not Synced
    This gives extra unnecssary work for most
    resolvers,
  • Not Synced
    if you enable them to do
    multiarch and cross things.
  • Not Synced
    We have 12 of them, but I suspect most
    users of multiarch are somewhat limited
  • Not Synced
    to maybe 3.
  • Not Synced
    Possible exception being people like
    writing resolvers and trying them out,
  • Not Synced
    torture test them.
  • Not Synced
    The real problem, if you had to do this in
    the archive you would need to do,
  • Not Synced
    for example something like, write
    indifferent awk implementations.
  • Not Synced
    We have three.
  • Not Synced
    And then afterwards, for example,
    have m different distinct,
  • Not Synced
    but equally valid libc implementations.
  • Not Synced
    It's not something we're likely to do in
    the near future.
  • Not Synced
    And you have to do this on multiple
    levels,
  • Not Synced
    because you can just presolve the
    essential set most of the time.
  • Not Synced
    The real problem: if you had to do this in
    the archive, you would need to do,
  • Not Synced
    for example, something like write
    N different awk implementations.
  • Not Synced
    We have 3.
  • Not Synced
    Then afterwards for example, have
    M different distinct but equally valid
  • Not Synced
    libc implementations.
  • Not Synced
    It's not something we're likely to do
    in the near future.
  • Not Synced
    And you have to do this on multiple
    levels.
  • Not Synced
    Because you can just presolve the
    essential set most of the time,
  • Not Synced
    this particular thing is not very
    interesting.
  • Not Synced
    And the data packages can in theory
    blow up,
  • Not Synced
    when you have truly interchangable data,
    like we do with the aspell packages.
  • Not Synced
    But they tend to either not have any
    dependencies after the aspell,
  • Not Synced
    after the data package, or they have a
    simple loopback into what you came from.
  • Not Synced
    So the blowup blowup tends to be limited
    to one level.
  • Not Synced
    If you have enough of those, it's still
    going to be a problem,
  • Not Synced
    but it keeps it simple.
  • Not Synced
    Onto installing.
  • Not Synced
    The easier case, as mentioned is when you
    have a single suit
  • Not Synced
    and a single architecture.
  • Not Synced
    All things in the graph collapse into
    what looks like a regular graph,
  • Not Synced
    and therefore become simple,
    trivial.
  • Not Synced
    This actually happens so much that
    if you take Lintian,
  • Not Synced
    it has 26 distinct dependency clauses.
  • Not Synced
    If you look at each of them,
  • Not Synced
    when you only have a single suite
    and architecture, there's at most
  • Not Synced
    one package directly solving each
    clause locally.
  • Not Synced
    Then further down the graph somewhere
    you depend on for example, debconf,
  • Not Synced
    or debconf-2.0 which is one of these
    alternatives, giving rise to its ??
  • Not Synced
    Linitan itself is not subject to becoming
    a backtrack point.
  • Not Synced
    There's no guesswork when you reach
    Lintian, which of its dependencies
  • Not Synced
    should I pick, which version of
    it and all.
  • Not Synced
    That's not there, you have to go further
    down to see that.
  • Not Synced
    Actually, but the time you reach the
    debconf thing,
  • Not Synced
    it has already been solved by
    something else, as I recall.
  • Not Synced
    Oh, no wait. The dependencies of debconf
    is already covered by the time you're
  • Not Synced
    forced to resolve this choice,
    so it's not really a big issue, either.
  • Not Synced
    You just pick the debconf one, c-debconf
    current on debconf so it's trivial that way.
  • Not Synced
    The question is, when do we have
    these conditions.
  • Not Synced
    We do that a lot actually.
  • Not Synced
    If you do builts in unstable for example.
  • Not Synced
    ??, only one suite only one architecture,
    piece of cake.
  • Not Synced
    If you install packages of a pure
    Jesse system, with multiarch,
  • Not Synced
    you have the single suite one,
    which limits a lot of them,
  • Not Synced
    and then you have not the single
    architecture.
  • Not Synced
    That's what it is, but in pure Squeeze,
    you won't have multiarch because
  • Not Synced
    because it didn't work back then.
  • Not Synced
    So that's even simpler.
  • Not Synced
    Also you can do unstable plus experimental,
    or stable plus backports if you fiddle a bit
  • Not Synced
    with the end resolver and say that it is
    not allowed to pick experimental,
  • Not Synced
    unless it is explicitly requested to pick
    experimental,
  • Not Synced
    and from there it only picks
    what it absolutely needs to solve it.
  • Not Synced
    You basically get the same result with a
    single suite restriction,
  • Not Synced
    but that requires you to actually code
    something for it.
  • Not Synced
    And then there's Britney, with the testing
    migration thing, and that's because
  • Not Synced
    she basically takes things, moves it into
    a new version of testing,
  • Not Synced
    and tries to test that this is still
    the same solution.
  • Not Synced
    So she forces it into a single
    suite currently.
  • Not Synced
    So these are common cases where
    it happens.
  • Not Synced
    This is not everywhere where it happens,
  • Not Synced
    so the stable upgrades are still funny
    enough, not a single suite and all that.
  • Not Synced
    It happens because there is only one
    package and architecture.
  • Not Synced
    Only one instance of package, you only
    have one version, one architecture that
  • Not Synced
    solves that particular dependency.
  • Not Synced
    Whereas with multiarch you could have,
    i386 and the amd64 version.
  • Not Synced
    If you do an upgrade you could have the
    old version and the new version
  • Not Synced
    that may or may not satisfy it.
  • Not Synced
    Also we have this thing where we
    don't like libraries to have
  • Not Synced
    alternative implementations, it sort of
    breaks things horribly.
  • Not Synced
    Especially when they are not actually
    agreeing on the interface they provide.
  • Not Synced
    There's the social aspect of it that we
    don't really bother having
  • Not Synced
    200 interchangeable implementations of
    everything.
  • Not Synced
    I think the record is something like
  • Not Synced
    five or six different versions
    of sendmail, implemenatations.
  • Not Synced
    ??
  • Not Synced
    I don't think so, but ?? might have.
  • Not Synced
    And even when you do have one of these
    explosions,
  • Not Synced
    it has to actually hide the breakage
    beneath one of these choice
  • Not Synced
    explosion alternative explosions for
    it to be a problem.
  • Not Synced
    You might have aspell over here on the
    right hand side,
  • Not Synced
    and you have a breakage over here which
    ?? to find out.
  • Not Synced
    So if you use the resolver to just the first
    explosion to solve the other part,
  • Not Synced
    it realizes this is not working,
    I'll bail out.
  • Not Synced
    There's a couple things that we do to
    make this easier.
  • Not Synced
    The mos interesting thing is of course,
    can we still make this simple without
  • Not Synced
    the single suite and single
    architecture restriction.
  • Not Synced
    We can to some extent.
  • Not Synced
    And that's where we move to upgrading,
  • Not Synced
    in deterministic polynomial time.
  • Not Synced
    This is where I spent most of my
    time working.
  • Not Synced
    So to start with, when we do an upgrade
    from stable to stable,
  • Not Synced
    and here I'm taking pure stable, no
    backports and all.
  • Not Synced
    The general recommended way to do it is
    to replace Weezy with Jesse,
  • Not Synced
    hit apt-get update, and
    upgrade afterwards,
  • Not Synced
    and apt replaces all the old packages with
    the new version, if there is a new version.
  • Not Synced
    We also want all the packages from Weezy
    that did not have any version in Jesse
  • Not Synced
    to be removed, and there might be a new
    essential package somewhere.
  • Not Synced
    So long story short, upgrade if it's
    present, remove if it is not present,
  • Not Synced
    and then install the new essential
    packages, if any.
  • Not Synced
    They don't tend to change that often.
  • Not Synced
    I'll be ignoring the installing part
    for now.
  • Not Synced
    It is of course valid and interesting,
    but we'll skip it.
  • Not Synced
    So let's take a simple example, somewhat
    contrived, somewhat made up the numbers.
  • Not Synced
    They are not too far from reality, but
    they are not exact either.
Title:
Debian_dependency_resolution_in_polynomial_time.webm
Video Language:
English
Team:
Debconf
Project:
2015_debconf15

English subtitles

Incomplete

Revisions Compare revisions