[Tollef]: Hello, I'm Tollef. I'm part of the DSA team With me today, we have zobel who's also a DSA member and we're here to talk a little bit about what DSA does and obviously, if you have questions or anything, we'd be happy to try to answer those [zobel]: Try to keep that as some sort of round-table because we want a discussion, this is not going to be a talk so whenever you have questions, just ask [Tollef]: The DSA currently consists of 7 people. Most of us are in Europe, but we also have Luca who is in Canada apart from that, paravoid is on "holiday" and ?? and various other people. [zobel]: The duties we have as Debian System Administrators is basically to build and maintain the infrastructure you are all using for running our distribution It's the general sysadmin stuff; we are doing installing security updates, keeping machines up to date, keeping the hardware running, creating accounts for you, running DNS and mail. [Tollef]: One thing we actually don't do is... we provide the base, we provide the OS, support for that we don't run the services so lists.debian.org, we're not the people you want to talk to if there is some problem with spam handling Then, you want to talk to this guy who is also a part of the listmaster team Similar, bugs.debian.org, the web pages We make sure that apache is running but if you find typos on the web page, if there is a typo don't blame us. [zobel]: I don't know how many machines we run in the meantime I think it's around 150/160 machines in total, including the VMs. [Tollef]: No, if you count VMs, it's more like 250 [zobel]: OK We run those machines currently at about 30 locations worldwide Also part of our duty is to deal with hosters and the local admins If they have firewalls running in front of our machines we try to convince them to disable the firewall parts for our machines so we get to manage that stuff ourselves. [Tollef]: This is often ?? we have some locations where the machines are ?? connected for instance and this breaks secure NTP There are various places where we have to make accommodations because it's hard to get the hardware to be another place maybe it's dev boards for an architecture which is being bootstrapped in some cases we kind of have to endure a little bit of pain for that but most hosters and most local admins are really nice people really easy to deal with and very very accommodating I mean, we don't pay for any of this. It's all sponsored and given to us, free of charge. We're quite lucky. [zobel]: It differs from location to location we currently have locations where we have a full rack which we can populate with hardware, there are other locations where we just have 1 or 2 machines just sitting and doing the jobs for us. Keep in mind all of us, 7 persons, are not paid to do sysadmin jobs We are all doing that on our volunteer time So if you speak up on IRC, sometimes you will not get a reaction within 5 minutes but I think that's mostly clear to all of you. [Tollef]: Because we have to so many machines, we like automation We run puppet everywhere It was chosen some time ago and it generally does the right thing and generally works okay. This often makes for some interesting problems when bootstrapping because apparently ruby is really awesome to bootstrap. Right, Steve? [laughs] Especially on arm. We also like git we have the entire puppet repository in git our domains are in git Our wiki is in git [zobel]: Everything. [Tollef]: Yeah, basically everything can be put into git You probably don't to do it to a database but anything else, put it into git [zobel]: We have some sort of account management tool which we are currently rewriting called userd-LDAP or ud-ldap Luca has done quite a lot of work on the rewrite I think it's already handling the generating stuff just rolled out to the debian.org machines all the other parts of ud-ldap are still using the old codebase which is ugly to read, it reads like perl bash, written in python. If you have spare time and knowledge in python, help us to finish the rewrite jump [Tollef]: The new ud is a django project So it's fairly nice and well written What ud-ldap actually does is... it has a local ldap server which runs on the machine called ?? which is db.debian.org and on there it generates static files which are synced out to all machines. So even though we're using LDAP for account information, we don't have a single point of failure. So if that machine goes down, it means you can't update your password or your SSH keys but you can still login, at various places. [zobel]: It also works around network issues between machines If SSH between ?? and the porting machine or whatever, you can login to machines. We monitor our machines using Munin and nowadays Icinga. We had some performance issues with Munin, with the wheezy version, I think it was and I think there were other stuff, Munin works quite well for us In general if there are web pages, like Icinga or Munin asking for a password then this is just dsa-guest and either no password or just a random password. This is just to protect our services against script kiddies or whoever to wants to see what his script is doing in effect to the Debian services, not seeing the results directly but everyone who knows how the Debian system works you get access to there. [Tollef]: It's also so that we don't accidentally end up with spiders walking around because Munin web interface is generating the graphs on the fly and it's using rrdtool and that can consume great amounts of CPU power and web spiders are really good at wasting CPU power, for us so we want to keep them off those pages. [zobel]: To track our issues we currently have with hardware failures, with accounts we need to create and so on, we use request tracker on rt.debian.org which some other teams use as well You can even mail it, or use the web interface. Debian developers, I think only, for viewing the web interface? [Tollef]: For most people it's read-only You can interface with request through mail of course. If you need to send something there, send it to rt@rt.debian.org and make sure to including debian rt in the subject else we'll just throw it away because then it's spam It's a really efficient spam filter Slightly annoying for when you submit the first ticket. [zobel]: The last talk we gave about the DSA team was, I think, 2 years ago. We tried to summarise what we've done in the last 2 years When was that meeting in Oslo, 3 years ago? [Tollef]: 3 years ago, yes. [zobel]: 3 years ago we decided that we want, at least the infrastructure hardware not the porting hardware on machines that are under warranty so we can open a ticket at HP, IBM or whatever and ask them to send replacement parts when hardware breaks. We use server-grade hardware, currently most of the machines are HP machines: 80 DL360 DL580 [Tollef]: They work quite well, I think we're mostly done with that transition. It turns out having actual servers, rather than something someone put under a desk and forgot about actually makes for less pain and more uptime [zobel]: We try to consolidate the amount of data centres we are having core services running in. So currently we have like 3 to 5 data centres, we have quite a lot of services running in. that's: manda; bytemark; grnet, still a little bit; also OSUOSL, UBC. [Tollef]: UBC-ECE Yes. We also have some other places with fewer machines, but since it's often painful to have a single machine in a location, we try to avoid that. It's kind of a tradeoff, you want to have enough locations that you resilience but you don't want to have so many that you basically have 2 machines everywhere, and each time there is a problem you have to deal with somebody you haven't spoken to in 2 years because that was when the last problem occurred. [zobel]: For the core services we are currently using ganeti for virtualization which is some sort of KVM-based virtualization framework. [Tollef]: It's a cluster manager which came out of Google, and which works really well. Its target is clusters from 1 to 50 machines, free software of course. [zobel]: It works very well for us. The target where I try to work on in the last few months is the single sign on framework web applications. Thankfully together with Enrico, who helped quite a lot with that. Rewritten the ugly perl code I wrote to a python django framework We hope to be able to provide single sign on, also for non debian.org web services. Which with the current software we use for debian.org, didn't work out for security reasons. So let's see where we stand in 2 years with SSO for web stuff. [Tollef]: We had a problem earlier this year, that the backup server we had would die and then die and then die, with various problems: It claimed to have hard drive errors, but it looked more like controller errors and so on. Obviously, running without backups isn't a terribly good idea. We bootstrapped another backup server but it was running at the bytemark data centre and because we have many other services hosted there, that's not a very good situation just because if something happens at that data centre and it burns down, suddenly we've lost both the backups and the services being backed up. So a month ago, we got a new machine it's hosted at DGI? [zobel]: DGI, in Dusseldorf. [Tollef]: and it's happily chugging along making backups. We're currently using Bacula for backups, and it's working okay, we're having some interesting problems with scheduling of backups so we're probably going to need to do something to fix things there. [zobel]: Luca is doing the ud-ldap rewrite, as already mentioned earlier, we need helping hands there. I think Paul and Peter are working on the snapshot infrastructure, giving especially the QA integration for snapshot. [Tollef]: We had a donation from Leaseweb, earlier this year. Similar to the backups, it turns out servers when they grow big enough you get lots of disk dying... Linux isn't terribly good at handling this when you get enough of your disks dying. We had one machine that died, with controller failure again! We tried to revive it, it wasn't really successful. So we ended up getting this donation from Leaseweb and we then have a small cluster of machines in their data centre [zobel]: snapshot is currently about 23 terabytes [Tollef]: something like that. [zobel]: 30 terabytes in size Which is currently the biggest 'archive' we maintain. We tried to roll out SSL everywhere in the past [Tollef]: It's been something we wanted to do for a while, to enable HTTPS and so on everywhere, even on public and open resources. It felt, with the - it wasn't really triggered by but it was in the same way as the Snowden things. It was like we should probably actually move forward with this, because it turns out there are entirely too many people who are TCP dumping too much. [single applaud] We're pushing for more SSL everywhere. There was a little bit of controversy around this when we did it to people.debian.org because it turns out that ??? had some problems with verifying the certificate and so on. It's not a completely uncontroversial and smooth move but sometimes you need to make a little bit of sacrifice to actually get the security we want. Related to that also, we pushed some bits towards using CDNs, which also are interesting in the context of SSL, because you have to give your cert to somebody else, there is a tradeoff there. You kind of have to trust your provider there. [zobel]: What we are also doing due to the fact that we got a huge donation from Bytemark, I think one and a half years ago? It was a full blade centre and 6 MSA shelves [Tollef]: 3 chassis plus 3 ?? [zobel]: Currently we still have some spare CPU cycles left at Bytemark. Currently setting up Openstack at the Bytemark datacentre for one or two blades In the end, the idea is that Debian Developers can start VMs there, themselves Similar to the VMs we are using for our infrastructure So we can more easily migrate debian.net services to debian.org services giving you some sort of common infrastructure we use on debian So you can help us to migrate services, or we can help you to migrate from your hardware to the Debian infrastructure [Tollef]: Part of the reason that is: it turns out running various half-official services on peoples' home machines and co-lo machines isn't a terribly good idea. Often they'll run for years and then somebody will get bored or they'll quit debian or they'll go broke, the machine will burn down... something will happen, the services disappear and people get upset. So we try to talk any services that are half-official and we would rather move them onto debian.org hardware So if you have a service which is kind of a half-official thing and you want to make it more official, and actually have somebody do the base OS maintenance for you so you don't have to worry about that, then please come and talk to us. We're quite happy to provide you we reasonable VMs. [zobel]: How to contact us. There are several mailing lists, there is a debian-admin@lists.debian.org list where we discussed that this mailing list will more or less be open to every Debian Developer. Debian devel people can subscribe to that mailing list aswell. There's dsa@debian.org email address, which we changed due to the fact there was a debian-admin@debian.org and there was quite a lot of confusion about the debian-admin@lists and the debian-admin@debian.org email address. So we decided to move to a new email alias which is dsa@debian.org You can hang around on IRC as mentioned earlier, in the #debian-admin channel Feel free to join there if you have any issues, just raise them and talk to us. [Tollef]: Like any people in any teams in Debian, we obviously have more things to do than we actually have time for. So help is very much appreciated. Getting help with sysadmin tasks is kind of an interesting challenge because you can't just give out root to all debian.org machines to somebody who shows up and goes: "I would like to rewrite your authentication infrastructure" However, since we keep the puppet repository and so on, in git it's at least possible for people to get in and contribute. Send us patches, show up, discuss things. If you think something can be improved, that's quite likely and we would be happy to discuss how to do that. Documentation is always welcome, there is a bit of documentation for things like debile.debian.org and so on. But more is always welcome. Also just hanging out on IRC, answering people's questions is often surprisingly useful. [zobel]: We also really want to grow the team, from the seven person team we are currently We had, a few months ago, spoken to a Debian Developer who - is he here in the room? Might be! - who said he currently does not want to become a member of the DSA team due to the fact he has too many other things, other duties in Debian. Just talk to us and help us, and at one point we'd probably get annoying with doing too many tasks for you, so we just give out the root access then. [Tollef]: It's how it usually works in Debian, at some point you've contributed enough, that's it's more annoying to merge your patches and review them than to just give you access. So that happens. [zobel]: I think that's all about the slides, so just ask... [Tollef]: questions! [question1]: I guess this is more DSA the listmaster pieces, are they in puppet as well? [zobel]: No, the list stuff is not in puppet. The exim config we are using on debian.org machines is in puppet But lists uses postfix. Alex Wirt is also sitting here in lecture room, he could easily your questions for lists.debian.org More questions! No more questions? [laughter] [question2]: As one of the local admins for a bunch of buildds, I know that every now and again we get asked for stuff opening up, more ports because we're one of those evil places with a firewall, even for the DMZ. Do you actually have a central list of all of things that you want to be able to get access to, you know? That kind of thing would be awesome, that I could just point, say the ARM network sysadmins at instead of every now and again having to say: "Oh and we need this extra thing" and then backwards are forwards, because their immediate response is: "Well, why?" If we can give them a list and just give them a notification that there are few new things we'd like, it might go easier. [Tollef]: I don't think we have a list as such. What we do have is our firewall config is ?? So we have a list of things we want to be able to accept on various servers, even though we don't have a list as in "Go to this web page and here you have these ports and their justification", we can generate that. So yeah that's a good idea, we should do something like that. [question3]: Can you explain more about your backup system? I think you covered it very briefly. [Tollef]: We have bacula, it's a centralised backup system using, it's kind of mix of push and pull, in that you have a central director which tells the machines that are to be backed up, that you are now going to be backing your things up to this storage daemon over here. Then it also tells the storage daemon, that please expect a connection from this machine. We run the director, which is this central component, that runs in adm in Bytemark. The actual storage is at DGI and obviously the various machines being backed up are everywhere. One of the painful things about bacula is that it thinks, even though we are backing up to hard drives, it still thinks we are actually backing up to tape drives and that makes for, the nicest thing about hard drives is that generally you don't really have seek time in the same way you have seek time on tapes. So you don't care about rebinding tapes and switching to a different tape, that's called "opening another file" and it doesn't take very long. We also have the problem that bacula doesn't have the concept of- if you look at a backup system like backuppc, it never does full backups, it will only do incremental backups and then has a hardlink farm. bacula will do a full backup, then incrementals then a full backup then incrementals. This makes less sense when you have hard drives than when you have tapes. Also the scheduler isn't very smart, if it can't back up a machine for some reason then instead of rescheduling that back up it will, depending on how you configure it, it will then just skip it. Some of our hosts actually don't have that good connections, so when you're trying to do a full backup, which can take 24 hours, you really don't want that TCP stream to be disconnected because then you've lost that full backup. And also it ends up batching the full backups so they're very clustered rather than being nicely spread out. One of the things we're looking at is writing a different scheduler for bacula just to basically tell it: "please do a full backup of this host, now." rather than relying on the built-in scheduler. [zobel]: (inaudible) [question3]: I'm the maintainer of a package called 'bup' It's not a full-fledged backup system with a scheduler, et cetera but it does use for its backend, git packfiles rather than tapes If you're interested in git, maybe some interesting technology to take a look at [Tollef]: Look time I looked at bup, it didn't actually support expiring backups Which makes for some pain. [question3]: There are some workarounds, but it's one of the limitations currently [Tollef]: For us, that would mean we would run into- I'm sure ?? or ?? would be very happy but I'm not sure that our treasure would be as happy. We need the ability to expire backups, just because we don't have infinite sized hard drives and backups are actually quite big. [zobel]: One of the other issues with bacula is that currently all of the full backups run at the same time so we run into some sort of ?? limitations which it's not an issue but it's annoying that all machines are doing the full backups at the same time. Any else questions? [question4]: You touched on it earlier, Single Sign On What services are next for that? [Tollef]: Don't run away from the mic! [Enrico]: I'll answer that as far as I know many people may have different plans. Single Sign On is currently using DACS, which I would suggest against, in general... [laughter] Having looked deeply into it, it probably seemed like a good idea at the time but the internet moved in a different direction. But DACS is still useful because it's an apache thing so one can just put a directory of static files under DACS and that can be done quite reasonably simply. At DebConf I want to discuss with the currently available DSAs about finishing the DACS setup, putting the basic stuff in puppet and making a guide for deploying new stuff. Any Debian Developer that deploys services can set up DACS reasonably easily, but the way that I see we should go in the future is OAuth 2, which is what we are using for the conference thing. Because that is a bit more like a standard that may work now and which hopefully supports log out! [laughs] which DACS does not do very well. I have not studied OAuth 2, so I'm not interested, it won't be me who does it. If any of you knows OAuth 2 and wants to sit down with me and explain it step by step during DebConf, then please I would like to migrate NM and Debian Contributors to OAuth 2, if at all possible. But I do want to understand the protocol before I touch it. So the direction as far as I'm concerned, will be OAuth 2. We may get stuck with DACS, because it integrates with apache but I'm not comfortable with it and there are too many hacky things to make things work as expected. I wish, my personal dream would be to at some move to OAuth 2 and then replace DACS with just an OAuth 2 provider. [zobel]: Other limitations of our current DACS set up is that it only works for the debian.org domain otherwise we would need to give out credentials, there is some jurisdiction key and federation key, so we would need to give out access to them to the debian.net services. That's one of the other limitations of our current DACS set up. So probably OAuth 2 might be the way to go. But in the end it's up to you and the Debian Developers helping to extend the single sign on. [Enrico]: As new DACS services, ?? set up something that uses DACS [zobel]: It's a new PTS implementation I think he just wants to if a person is logged in then he can modify some news on the new PTS implementation and so on. [Enrico]: One good thing with DACS at the moment, is that login is optional and it totally supports serving a site as it is and if one is logged in, in single sign on then more stuff can happen. [zobel]: I think OAuth 2 is a better thing for the wiki to do. [Enrico]: Does Moin does OAuth 2? [Steve]: (inaudible) [zobel]: I think I looked that up a few months ago and I think it supports OAuth 2. [Enrico]: DACS will give you a remote user variable, so in theory it's easy but if it does OAuth 2, then it's more future proof in my opinion. [Steve]: The fun thing with the wiki as well I was going touch about on this in my Wiki and Web BoF (see advertising too!) We've also currently got, like thousands of existing user accounts. Now obviously for people who've already got Alioth or a Debian LDAP account then we will encourage people to merge and just move over to those but for the many thousands of others who haven't, we're going to have to come up with something. I don't know what that is. [Tollef]: I don't have any response to that on the spot. It's tempting to say, they can just get themselves an Alioth account. Some people might be upset at that answer. I guess there's also the question of how many of these accounts are actually active? Rather than somebody registered back in 2005 and haven't used the account since. [Enrico]: I'd be happy to have a conversation about this during DebConf. Because for Debian Contributors, I require to have an Alioth account to get credited in site because I don't want to have a user database in Debian Contributors. It may be too much of a strict requirement, it may be that we just document that if you do anything in Debian you get an Alioth account. Let's talk about it, separately. [Tollef]: In that case, I think we need to have a conversation with my other hat, which is that hat of various other people, which is the Alioth admin hat. [Steve L.]: As the person who inflicted Alioth logins on everybody for DebConf this year, I have been getting feedback that, in particular the sign up process for Alioth is a bit of an obstacle. So there are a few things there, which I think we should talk about streamlining. As the person who decided that we were, for this year, moving away from Penta and moving to Summit, no I did not want to have an authentication database. I didn't want password hashes in Summit and so I said yes, we're going to have figure out how to hook this up to Debian SSO and the consequence of that was, yes: we had the Debian SSO which was only available to Debian Developers Alioth was the other database that was out there and so I guess, my fault, I apologise for anyone that was stressed about the rollout of that because I didn't entirely co-ordinate with all of the parties ahead of time but I think it's hanging together fairly well. But we should talk sometime this week about where we should go forward with that and if alioth is the right authentication provider. But I think it's important we agree there be an authentication provider, for these kinds of services, whether that lives in Alioth or somewhere else. [Enrico]: With a flat namespace. [Steve L.]: With a flat username space, yes. Which we kind of have, today [Enrico]: (inaudible) [Steve L.]: The way OAuth provides them is you get the domain name with it So in fact all Debian Developers have two different- It's a "flat namespace" and DDs all have two they can use. [laughter] [zobel]: More questions? [question5]: You mentioned that all our hosting is sponsored by the hosts and we get some hardware donations at least. I think we, we buy some, as well, don't we? My question isn't really about that, it's about how much support do we get- there's 1.5, well 2, tending to 1 hardware manufacturers on the sponsors there -how much support do we get from them doing interesting stuff. I'm thinking, you mentioned we get fairly regular controller failures on some of our hardware and all of the sponsors we've got have got nice but hard to set up multipath things. It seems to me it would be interesting and for them to set things up like that, on the Debian infrastructure. Is that kind of thing possible, or? [Tollef]: Yeah, so we do have that in some places Like the Bytemark set-up, the UBC-ECE set-up and so on There we have a SAN, we have a bunch of machines and either it's doing SATA or it's doing Fibre Channel [zobel]: iSCSI [Tollef]: iSCSI, as well, yeah. So we do have a bunch of that, the problem is if you want to do data storage where you have available 25 terabytes and you want to do that on a SAN, that's very not cheap, that's really quite expensive. That's a reason why those machines with special storage requirements, like backups and snapshot, basically, they're different. That's also why they need those two machines, they have like 5 controllers each, that's why they are different in that regard. We do get a bunch of sponsorship from the hardware vendors, we usually buy HP gear, mostly because we've had good experience with it and it generally works [zobel]: We had good connections at HP. [Tollef]: We also had historically good connections, they've been good about giving us hardware in the past they are happy to sponsor Debian and DebConf both in actual terms of money given to us, but also in terms of pretty nice prices. I don't think we've actually approached them about saying, "could you please give us this enormously expensive piece of hardware?" It's often hard for them to give that away, because it has to come out of somebodies' budget and somehow they don't have large SANs just hidden under their desks. [zobel]: More questions? Criticism? [question6]: Hi, I was just curious about your mail infrastructure. It doesn't look like you use DKIM or SPF, or DMARC records. Do you have plans for any of that? [Tollef]: There's been some experimentation with domain keys. Luca has been playing with that. There's this interesting ?? ?? we generally don't provide outgoing SMTP for random people, because that's painful. [zobel]: We are not a mail provider. [Tollef]: Yes, obviously you get a @debian.org account You get incoming email, which we then forward onto somewhere where you'll hopefully remember to update that when that account expires rather than giving us bounces. That's a big change, which we forgot to mention is that we are actually in the process of reworking the entire way we do mail We have drastically reduced the number of incoming mail servers, so now most mail now goes to a set of two MXs. [zobel]: It will increase in the future. At MIT, we will open up one more mailserver. [Tollef]: Well, we can. Currently we have two and then if there is special mail routing needed, it will be routed to the right internal host. But most hosts no longer listens for incoming mail from the internet. Which is a good thing. Not only because it means we don't have to run spamassassin everywhere. [zobel]: Peter did this DAME, SMTP thing weasel, Peter, wanted to do ?? DAME encryption ?? for outgoing mails. So we're experimenting with a bunch of things. What I was going to say about domain keys, is that because we don't provide outgoing mail servers, you need to be able to provide the infrastructure with what your key is going to be. Luca has been working on some patches to ud-ldap to do this so it can show up in DNS and so on. So yes, things are happening. If you're interested in that, do grab us and we can talk more about it. [zobel]: I think we are done because the timer's almost over. I have one small announcement to make. Luca offered some RIPE NCC ATLASS notes to give away and the persons who applied for those notes and got into the list of getting those notes, please come to me, talk to me directly after the talk so I can hand out those notes. Because Luca is not here at DebConf14 this year. [Anibal]: Any plans to use Yubikeys? [Tollef]: I'm port of the maintainer team of yubikey tools in Debian I would very much like to use them for some things. We need to find out how they should best fit into the infrastructure if we're going to do that. One thing that has been mentioned is, for some cases we want to do actual two-factor. Currently there is no two-factor authentication anywhere. [zobel]: Help us setting up those infrastructure [Tollef]: There are no concrete plans but yes, we are very much aware of yubikeys I'm kind of looking for good places to put them in. I like them. I like both the company and the product. They are also quite happy to sponsor free software stuff. [zobel]: I think we are done. [Tollef]: I think we're out of time [zobel]: Thank you for being here. [Tollef]: If you have any more questions, grab us afterwards. [applause]