(Offscreen) So welcome to the talk from Niels
about dependency resolution in
polynomial time.
(Neils) I'm Neils, I'm doing my second
talk today.
This time about something slightly
more technical.
I wanted to do dependency resolution
in deterministic polynomial time,
which is what this is all about.
I didn't get as far with this as I wanted.
I wanted to have a tool ready and have
some results, which I didn't quite finish.
So these will be my findings work in
progress notes.
We have six topics I have to cover.
A quick introduction.
Then something about the hard problem.
Then the part the part where it actually
turns out to be highly tractable
in practice, if you don't think too much
about it.
And then there is some installing,
upgrading, and installing again
for how to improve your solvers.
Starting with introduction.
I'm hoping mostly to debunk the myth
that dependency resolution is a very
hard problem that we cannot solve,
and we should remove packages in the hopes
that they will somehow keep the archive
from exploding, and the dependency
resolvers to break, and apt to die,
and god knows what.
I've been working on Britney, which is a
different problem to solve, but it
involves the same techniques, or uses some
of the same techniques.
Some will be directly applicable to apt,
some will not.
There's not a one size fits all solution, sorry.
And I much less haven't wrote it yet,
even if there was.
So defining the problem.
When you try to install a package on your
system,
there's actually two problems
being solved.
One is the part where apt figures out if I
have to install Eclipse.
You'll be needing a lot of Java packages,
you'll be needing a lot of other stuff.
And so it figures out which packages are
needed for this to make sense.
The separate part of the problem,
the second problem is,
once you have this list, they have to be
unpackaged and configured in a certain
order for this all to make sense.
They are in theory two distinct problems.
I'll be talking about the first problem,
because that's actually
dependency resolution.
The other thing is just ordering.
The ordering thing is certainly also very
important part.
I don't want to dismiss that, it's just
it is in fact theoretically a polynomial
time problem.
So to solve the ordering problem, you
basically compute all the action-
Given a set of actions that need
to be done,
then there are some constraints between
some of them,
like you unpack things before you
configure it.
Maybe you deconfigure something else
before that.
So you end up with a list of partial ??
constraints.
From that you build a graph with ordering.
It's fairly simple to do, in practice,
if you have the right tools for it.
Then without cycles, this is a trivial
things where it just orders it
and gives you this is the order,
go fix.
When there's cycles you will get a single
action consisting of multiple things
to do at the same time, which is really
impossible to do.
It turns out that the way we defined this
whole thing,
if you have a package that does not have a
post install script,
it does not need to be configured separately.
That means that it's just unpacked or
not unpacked.
That tends to break most cylcles.
If you want, there's problems with
ordering constraints.
Help us to find a way to get rid of most
of the post install scripts,
such as the ones just running lvconfig
and nothing else.
That could solve some of the cycles,
and otherwise it's just feeding it to dpkg
and it works.
If you have a cycle, it's a separate
problem fixing that.
I won't be covering that.
As I said, the runtime is polynomial.
It's something like the size of the
package, a regular graph.
Which is fairly polynomial.
This even covers finding cycles.
Again breaking cycles is just removing
post install scripts.
And of course if you think this is too
simple,
you can always implement more features
on top of it,
such as trying to optimize for minimal
downtime of services and all that.
You can make any trivial problem very hard
very soon with that.
So, the hard problem.
First of all, the players.
We've got apt, aptitude, cupt, whatever.
We've got Britney.
We've got DOSE, edos, whatever its
name is today.
They solve the hard problem.
Or they should be trying to do.
Notable tools not affected by it.
dpkg.
So dpkg surely figures out if you are
missing dependencies. It says,
"I cannot resolve this, and I don't know
where to fetch it, so I'm just giving up."
And that is the only thing you can do.
Which means it only verifies the solution
which is known to be polynomial,
and is therefore is not a game player.
DAK rm is also doing a polynomial time
check, it just happens to be so
for other reasons.
That's basically the known players.
There are probably others.
But we're moving onto the
hard problem itself
So the problem we actually have is,
we've got version dependencies,
we've got alternative choices,
we've got virtual packages.
All three makes this problem hard.
You basically have to remove all possible
alternative choices, guesses, whatever,
to make this simple.
It just so happens if you move version
dependencies among other things,
things become really fun.
?? to solve the problem, but upgrading
gets really really spicy.
So how do we ensure that dpkg
is configured before your new package,
use a feature new dev package
is installed.
That's sort of impossible.
It becomes simple, but we end up with a
broken dependency ??,
so it's not really an option.
Technically there's also a way to make
it simple by having multiple versions
of things, and removing negative
dependencies,
but that's not necessarily easy to do either.
Presumably file configs will get in our
way or something.
We have to redefine where we place all of
the files, that's not going to be
something we're doing anytime soon.
So yeah, it's not really an option.
To give you a better understanding of the
problem, please consider this example.
If we have coreutils at some version,
depending on either this version
of libc or a newer one.
From this can we immediately conclude
that if libc 2.19 is known to be
installable,
that coreutils will also be installable?
How many people think we can immediately
conclude something from this?
How many people think we cannot
conclude anything from this?
We'll that's good.
So it turns out that with negative
dependencies and without negative
dependencies, it doesn't really matter.
With negative dependencies, libc or
whatever it depends on could
just conflict with coreutils, we're
broken, it's out.
And without negative dependencies, you
could have one of the things where
it depends on a version that's newer or
older, and no.
It's not really trivial to do.
You can't conclude something locally,
in theory.
That's something that can
become a problem.
Anyway, it is highly tractable
in practice.
Because if you do have a break, conflicts,
or otherwise negative dependency,
it tends to be upper bound.
So if the previous version of coreutils
was installable with that version,
the new version will also probably be.
Likewise, most dependencies, circular or
otherwise, tend to be upper bound.
There are cases of version ranges which
is a lower bound and upper bound
at the same time.
They exist, they are just not so
common yet.
And that's a good thing.
The number of possible solutions to
any clause
actually tends to be fairly limited.
Of course there are the exceptions.
Packages from unstable.
They might just be missing, in which case
it's really hard to solve the dependency.
You've got mutually exclusive packages
all the sendmail stuff, MTAs.
The version ranges I mentioned before.
And then of course the strictly equal
versions, which we see inside
packages built from the same
source packages.
The redeeming feature here: same source.
Because that is actually very simple
to figure out
if they are from the same source.
So they tend to be upgraded ??
anyhow.
The new versions tend to be available
at the same time sorta thing.
The problem made hard in a nutshell, is
this contrived example for example.
You have a package you want to try,
and it depends on any number of foos.
Each of the foos can either be solved by
picking up the bar package
or the good one.
Which might be a hint to which one we
would choose if we want to solve it.
And the bar packages, depends on a
bad package that does not work
with the starting-package,
and we are broken.
And the good package just doesn't have any
dependencies and is therefore good.
In the perfect example you saw,
you try foo, good, foo1, and then good
and be done.
In practice, if you don't have some way
of ordering these so you know
one of them is better than the other.
You now have an n times m trace,
where you have to guess up and down,
which one am I doing, backtrack back and
forth and all, and it's going to be horrible.
And this contrived example can of course
be solved by some solvers that can see
the pattern, but you can always a
more contrived pattern
that's harder to understand.
So I tried to keep it somewhat simple.
The good thing is, very few people write
software like this.
And that is the most redeeming feature
about the package ??.
We do have exceptions.
Aspell-dictionaries, ispell-dictionaries.
Basically if you depend on
ispell-dictionary,
which is a virtual package provided by 50
different ispell packages,
or aspell packages, and then they depend
on something else after that.
So in theory they can be a part of
creating this problem.
Fortunately, they themselves are fairly
simple once you've passed them.
You also have multiarch foreign, or
multiarch allowed with
package interdependencies.
So if you have multiple architectures,
you in theory can have any number
of things satisfying the same clause.
This gives extra unnecssary work for most
resolvers,
if you enable them to do
multiarch and cross things.
We have 12 of them, but I suspect most
users of multiarch are somewhat limited
to maybe 3.
Possible exception being people like
writing resolvers and trying them out,
torture test them.
The real problem, if you had to do this in
the archive you would need to do,
for example something like, write
indifferent awk implementations.
We have three.
And then afterwards, for example,
have m different distinct,
but equally valid libc implementations.
It's not something we're likely to do in
the near future.
And you have to do this on multiple
levels,
because you can just presolve the
essential set most of the time.