Debian GNU/Hurd status update
Let's talk about GNU/Hurd
For us it's a bit all about freedom 0,
that is, the ability to use software,
basically, for any purpose.
And for us, the important thing is that
you shouldn't have to ask
the system administrator for things.
You should be allowed to do
whatever you want.
So for instance, why is fdisk, mke2fs,
etc. hidden in /sbin?
I want to be able to build disk images,
play with them, mount them, etc.
So just be able to work with the kind of
disk and network access I have,
and do whatever I want with this.
It's about freedom to innovate as well, if
I want to use an experimental filesystem,
just play with, without being afraid of
crashing the machine.
You should be able to just
run the file system
and let the system administrator be happy
with this because it's safe to do this.
And also, it's a way to provide freedom
from misbehaving programs
like a driver which doesn't work so well,
some things like this.
Just to give an idea, in GNU/Hurd,
you have the kernel which does basically
almost nothing,
just managing tasks, the memory and
inter-process communications,
and then you have a lot of daemons
doing the actual stuff,
so the pfinet is the TCP/IP stack,
and ext2fs does the filesystem thing.
And then, you have the user,
just running programs.
And these tools just, actually just talk
to the daemons through the microkernel,
the microkernel doesn't do much, it just
passes requests along.
For instance, if a server crashes,
then that's fine.
For instance a driver crashes,
or just hangs,
you just can kill and then pfinet
will re-open a new instance of the driver
and it will just work, thanks to TCP just
continuing to ping the other computer.
So it's just an error, it's not something
of the death.
At some point of my desktop, I could
switch off the light,
and then that would crash my laptop.
Because switching off the light would
reboot my hard disk, USB hard disk,
and then the kernel of the laptop
wouldn't like this.
This is not something which is
supposed to happen.
So, with a server approach, this is
completely fixed.
It's also easier to debug, it's really
nice to be able to gdb a TCP/IP stack,
when there is something happening in there,
just run gdb, you can gprof it, etc.
You can also dare more crazy things.
For instance, the Linux console doesn't
support much, because we don't want to put
too much complex code in there.
On GNU/Hurd the console actually supports
things like Chinese,
double-width support, etc.
which is not supported
by the Linux console,
and that's right because you don't want
to put too crazy stuff.
Here since it's just a userland program,
then you're fine,
and so we do have Chinese support in,
actually, textmode in the Debian Installer.
Just to show an example, so here I have
ftpfs which uses the TCP/IP stack
to actually mount a remote directory,
and then I can use isofs to mount an ISO
image which is inside that FTP server.
And then I can just let cp copy a file
from the ISO image which is on the server.
So this translates that way, so I've done
this command a long time ago,
just to say that "ftp:" in my home
directory is whatever FTP,
and then I can take a "~/ftp:/etc." URL
and give that to isofs
and then mount that on my "mnt",
and then I can just browse inside the ISO
image, without having to download
the whole ISO image, without having to ask
root for this kind of things, etc.
And I can also permanently store this
in ext2fs.
So just to give an example, I have a
translator on my signature files,
which just calls fortune, so when I
"cat .signature" [demo],
I get one signature or another, because
each time I open the file,
it's a new instance of fortune which is
started.
You can see that, indeed, this is stored
in my signature file.
So this is fun!
Another example: as a user, I can start
my own TCP/IP stack,
tell it to use a virtual network
interface,
and then put the TCP/IP service on some
node in my home,
and then I can run openvpn to actually
push and pull packets
from that virtual interface, and build a
VPN with somewhere else.
And then I can remap the system, what is
supposed to be the system TCP/IP stack
into my own socket,
and then I get a new shell for which the
system TCP/IP stack is actually
my own TCP/IP stack.
So I can decide which program actually
uses this TCP/IP stack,
and just do my own VPN without having to
ask anything to the administrator.
But also, for instance it happens quite
often that you have a binary,
maybe not sh, but like, python or perl
or whatever,
you have a program which wants /bin/sh
to be actually bash or whatever,
so I want to change this, so I can remap
this, so for instance [demo]
if I look at sh, so as usual,
oh it's green,
but you can see here that it's dash,
and if I remap /bin/sh into /bin/bash, for
instance, I get a new shell where actually,
sh is not the same, so it's remapped into
/bin/bash,
and so it's actually bash which actually
shows up here.
So I do really choose how I work, what my
environment looks like.
And for instance, I can remap the whole
/bin directory into my own directory,
where I expose /bin, but also
other things,
so that programs which have /bin/something
hardcoded into them,
I can use them without having to ask the
administrator to install stuff inside /bin.
So it's kind of interesting, a bit like
stow, Nix, Guix, but done in a nice way.
How does it work? Well it's actually
relatively simple in the principle,
it's simply that libc doesn't talk with
the kernel or whatever,
it always uses RPCs, so to ask nicely
about opening files etc.,
and so it's really natural in GNU/Hurd
that you can redirect things.
So for instance, the remap translator here
is like, maybe,
200-300 lines
[Note of transcriptor: 150 actually],
because it's just a matter of
"you open a file, OK, I look at the file
path, is it something I want to translate?
Yes, I translate that, and then I open
the real file,
and give the new handle to the program",
and that's all, so it's extremely simple.
So everything in GNU/Hurd is an RPC and
so it is interposable,
and then translators get exposed in the
filesystem, we have seen the TCP/IP stack,
it's just a path inside the filesystem.
And then the user can decide whatever it
wants to do to interpose whatever.
So, for instance fakeroot, in Linux,
is quite big,
because it has to interpose libc symbols,
and every time libc invents something new,
then it breaks in fakeroot
because fakeroot has to know about this
new symbol, etc. and interpose them,
either through ptrace or ld or whatever.
In GNU/Hurd, fakeroot is, like,
a thousand lines long,
because it just implements a few basic
things,
and then everything just works, which just
interpose basic authentication hooks,
and libc uses them all the time.
So it's fully virtualizable, and with
a really fine grain interface,
because you can precisely decide
which RPCs are interposed,
or which files in the filesystem
are interposed.
And then you can just use your home
directory, the TCP/IP stack,
and pile stuff over it, the way you want.
Just to give a crazy example, we have
a lot of stuff,
I actually have ISO image inside a
partitioned disk image on FTP over a VPN.
And this is not so crazy.
Maybe the ISO image inside the partitioned
disk, the ISO image is a bit too much,
but one file inside the partitioned disk
image on FTP over VPN is not so crazy,
because maybe you are on a hostile
network, so you have to use a VPN,
and then you want to access a file
you know is inside a disk image,
I don't know, a known disk image which is
provided on a public FTP server,
and you don't want to download the whole
image just to get, I don't know,
the README file or something like this.
So it's not so crazy, and it just
works nicely.
So a bit more Debian stuff.
Porting packages to Hurd is quite easy
in principle,
because it's just a POSIX system, there is
a lot more than just POSIX,
but it provides a POSIX interface.
So portable programs should be
really fine.
Just for fun, some dumb issues, so for
instance some programs think that
if it's Linux or BSD, then they can
include windows.h...
Why not...
If the system has mach.h, that must be
MacOS,
because MacOS is the only system in the
world that uses Mach, I don't know why...
Some people try to grep cpuinfo, which
doesn't exist on GNU/Hurd yet,
and so they basically just run "make -j"
which just explodes the system,
I mean even on a Linux system it's just
the same, unless it's a small program,
but with a lot of C++ files it's horrible.
Some people include limits.h from linux/
instead of just the standard one, well...
A problematic thing is people who
hardcoded errno values;
the values of errno are not standardized,
so you shouldn't hardcode them, like,
in testsuite results or things like this.
And quite often in configure it's
hardcoded that
only Linux knows -lpthread or -ldl, etc.
so quite often programs are not
generic enough,
and that's just easy to fix, but we have
more and more of these.
So we have a porter page developing
a bit more about these.
I wanted to talk a bit more about
PATH_MAX, it is not defined on GNU/Hurd,
for very good reasons, and it is allowed
by POSIX not to define it,
just to say that there is no limitation on
the PATH_MAX value,
we don't have a limit on the size of
the paths.
And indeed it has a fragile semantic, it
has never meant
"a reasonable size for a array of
characters to store a path".
On Linux it's 4000, that's a whole page,
that's a whole TLB entry for
just one file name.
It's extremely costly, most people don't
have so long paths,
and so it's really a pity to use so much
memory, because it's always a whole page
because it will always be aligned
on 4k etc.
So, well, that's a waste for one.
And paths can actually be longer,
there is no strict limitation,
you can mkdir something, cd into that
mkdir again, cd, etc.,
you can do that as much as you want,
there is no limitation on this,
it's just that when you call
"get current working directory",
you won't get it completely.
And actually, some programs misbehave
in that case,
because they won't see these files,
they will be quite actually hidden,
or protected, or I don't know,
you can not remove them just giving
the path, you have to cd, cd, cd, cd,
and then you can access the file.
And for no reason, actually, because Linux
inside doesn't have
such limitation, actually.
And also, it's stupid, but POSIX didn't
really said precisely whether
the final \0 actually is included in
PATH_MAX or not,
so people would allocate PATH_MAX+1,
or maybe not.
So we have a lot of code which doesn't,
maybe, actually work,
but nobody tests it, actually, because
they would never have so long paths.
So I'm a bit afraid of all these using
PATH_MAX.
You should be afraid as well.
Just to give an overview of the state.
We have a i386 support, we have a 64bit
support which has started,
we have the kernel booting,
and now it's mostly translating between
32 and 64 in our RPCs.
We have drivers for network boards as a
userland translator, using the DDE layer.
We have disk, we have a Xen port.
We have a preliminary sound which was
announced today, using Rump,
the Rump kernel.
We don't have USB yet.
It is quite stable, I haven't reinstalled
my boxes for, like, a decade,
I don't remember when I installed them,
actually.
And then the buildd machines just keep
building packages for weeks
without a problem.
We have 81% of the archive.
We have the native Debian Installer which
is really working great.
Recent work is, like, interesting thing is,
a distributed mtab translator
to provide /proc/mounts in a hurdish way.
We have quite a few optimizations which
went in to improve the performance.
We had releases quite some time ago,
I really recommend to have a look at this
one, it's fun.
We've some Wheezy and Jessie snapshots,
they are not official, but for us it's
really an official thing.
An important thing I wanted to discuss
this week is the removal from ftp-master.
This is due since quite a few years now,
honestly,
it's really not useful to mirror the hurd
packages over the whole world,
because there are not even as many users
as the number of mirrors.
So OK, that's fine for just the removal
from the main archive in terms of mirroring.
But then we have a lot of consequences.
For instance, buildd.debian.org is really
an important thing,
because that is where the release team
schedules transitions,
and loosing this, for us, would be really
tedious work,
because I've been there, doing, actually,
the transition work,
the same work as the release team, and
it's really painful to do this again.
So we would really like to have a solution
for this.
Maybe get that fed from debian-ports and,
then that's fine, we can be on
debian-ports, as long as at least
there is some synchronization between
something.
And also, getting exposed on the buildd
package status page,
so that people are aware that there is
some port which is failing,
and maybe they are keen on spending some
time on it, maybe not,
but at least get them know about it.
And also, a corner thing, when we have
a version upgrade, like gcc or perl,
the release team asks
"OK, we'll have to upgrade the buildds",
and at the moment they don't even have
an account on them,
so they can not check whether the version
is good or not.
Maybe we should just provide an account,
we'd thus need to know who we need to give
an account to.
Basically, my idea would be
"OK, that's fine not being on ftp-master".
The thing is we still want to have most of
the support of Debian,
to make our life less a burden,
as much as possible,
without any extra load on
the release team, etc.
We do understand well that we don't want
to put work on people's hand.
But we would to still get some benefit and
probably there are solutions for this.
And conversely, all of this, I mean, not
putting more work on us Hurd porters,
would actually be the same solutions that
existing ports on debian-ports
would be really happy to have, to improve
their life, to have less work to do, [...]
So maybe we want to think about a real
status for Second Class Citizens,
like Hurd, but also the sparc, hppa, etc.
Maybe want to have some BoF at some time,
so we can gather and discuss about this.
Future work, the most interesting thing is
probably using the Rump drivers,
because at the moment we use DDE but
it's not really going forward.
We thought it would be a way to get newer
drivers, Linux drivers,
without extra efforts, but it doesn't
actually happen at the moment,
while Rump does go forward, we see work
being done with Xen etc.
So this is probably a long-term solution.
Maybe we'll have another distribution
through Guix.
This is progressing, we are quite far from
doing this,
so for now Debian is really the only Hurd
distribution that we have, so we'll see.
And of course, just come and have fun with
your own pet project, just join, thanks!
[Michael Banck] Any quick question before
we run to lunch?
[Steve Chamberlain] Hello, I just wondered
if you're using Hurd on that laptop
for the presentation?
[Samuel Thibault] Yeah, yeah,
this is running Hurd, yes.
[SC] So it's quite, like, usable everyday?
[ST] Well, not everyday because
without USB,
you can not mount a USB stick for instance,
so that's quite inconvenient,
but yeah, I could probably use it everyday.
I don't, I mean, for work,
I can not afford this, but yeah.
Also, we don't have wireless drivers
at the moment.
We hope that with the Rump drivers
we would get this.
So, yes.
Some people do use it everyday.
Not me.
[SC] But those would be the major thing
missing for more people
to be able to use it.
OK, thanks.
[MB] Any more questions? We're run out of
questions, then thanks again. Thanks.