Return to Video

One year of fedmsg in Debian

  • 0:02 - 0:05
    I am Nicolas Dandrimont.
  • 0:06 - 0:09
    I am going to talk to you about a year of fedmsg in Debian.
  • 0:09 - 0:12
    We had a problem before with infrastructure in distributions.
  • 0:14 - 0:16
    All services are bit like people.
  • Non sincronizzato
    There are dozen of services maintained by many people
  • Non sincronizzato
    and each of those services has its own way of communicating with the rest of the world
  • Non sincronizzato
    Meaning that if you want to spin up a new service
    that needs to talk to other services in the distribution
  • Non sincronizzato
    which is basically any service you want to include
  • Non sincronizzato
    you will need to implement a bunch of communication systems
  • Non sincronizzato
    For instance, in the Debian infrastructure
  • Non sincronizzato
    we have our archive software, which is dak,
  • Non sincronizzato
    that mostly uses emails and databases to communicate.
  • Non sincronizzato
    The metadat is available in a RFC822 format with no real API.
  • Non sincronizzato
    The database is not public either.
  • Non sincronizzato
    The build queue management software, which is called wanna-build,
  • Non sincronizzato
    polls a database every so often to know what needs to get built.
  • Non sincronizzato
    There is no API outside of its database
  • Non sincronizzato
    that isn't public either
  • Non sincronizzato
    Our bug tracking system, which is called debbugs,
  • Non sincronizzato
    works via email, stores its data in flat files, for now,
  • Non sincronizzato
    and exposes a read-only SOAP API.
  • Non sincronizzato
    Our source control managament pushes in the distribution-provided repositories on alioth
  • Non sincronizzato
    can trigger an IRC bot or some emails
  • Non sincronizzato
    but there is no real central notification mechanism.
  • Non sincronizzato
    We have some kludges that are available to overcome those issues.
  • Non sincronizzato
    We have the Ultimate Debian Database
  • Non sincronizzato
    which contains a snapshot of a lot of the databases that are underlying the Debian infrastructure
  • Non sincronizzato
    This means that every so often,
  • Non sincronizzato
    there is a cron that runs and imports data from a service here, a service there.
  • Non sincronizzato
    There is no realtime data.
  • Non sincronizzato
    It's useful for distro-wide Q&A stuff because you don't need to have realtime data
  • Non sincronizzato
    But when you want some notification for trying to build a new package or something
  • Non sincronizzato
    That doesn't work very well
  • Non sincronizzato
    and the consistency between the different data sources is not guaranteed.
  • Non sincronizzato
    We have another central notification system which the package tracking system
  • Non sincronizzato
    which also is cron-triggered or email-triggered
  • Non sincronizzato
    You can update the data from the BTS using ??
  • Non sincronizzato
    You can subscribe to email updates on a given package
  • Non sincronizzato
    But the messages are not uniform,
  • Non sincronizzato
    they can be machine parsed.
  • Non sincronizzato
    There are a few headers but they are not sufficient to know what the message is about.
  • Non sincronizzato
    And it's still not realtime.
  • Non sincronizzato
    The Fedora people invented something that could improve stuff which is called fedmsg.
  • Non sincronizzato
    It was actually introduced in 2009.
  • Non sincronizzato
    It's an unified message bus that can reduce the coupling between the different services in a distribution.
  • Non sincronizzato
    The idea is that services can subscribe to one or several message topics, register callbacks and react to events
  • Non sincronizzato
    that are triggered by all the services in the distribution.
  • Non sincronizzato
    There is a bunch of stuff that is already implemented in fedmsg.
  • Non sincronizzato
    You get a stream of data with all the activity in your infrastructure which allows you to do statistics for instance
  • Non sincronizzato
    You decouple interdepent services because you can swap something for another
  • Non sincronizzato
    Or just listen to the messages and start doing stuff directly without having to fiddle a database or something.
  • Non sincronizzato
    You can get a pluggable unified notification system that can gather all the events in the project and send them by email, by IRC,
  • Non sincronizzato
    on your mobile phone, on your desktop, everywhere you want.
  • Non sincronizzato
    Fedora people use fedmsg to implement a badge system
  • Non sincronizzato
    which is some kind of gamification of the development process of the distribution.
  • Non sincronizzato
    They implemented a live web dashboard.
  • Non sincronizzato
    They implemented IRC feed.
  • Non sincronizzato
    And then they also got some bot bans on social networks because they were flooding.
  • Non sincronizzato
    How does it work?
  • Non sincronizzato
    Well, the first idea was to use AMQP as implemented by qpid.
  • Non sincronizzato
    Basically, you take all your services and you have them send their messages in a central broker.
  • Non sincronizzato
    Then you have several listeners that can send messages to clients.
  • Non sincronizzato
    There were a few issues with this.
  • Non sincronizzato
    Basically, you have a single point of failure at the central broker.
  • Non sincronizzato
    And the brokers weren't really reliable.
  • Non sincronizzato
    When they tested it under load, the brokers were tipping over.
  • Non sincronizzato
    The actual implementation of fedmsg uses 0mq.
  • Non sincronizzato
    Basically what you get is not a single broker.
  • Non sincronizzato
    You get a mesh of interconnected services.
  • Non sincronizzato
    Basically, you can connect only to the services that you want to listen to.
  • Non sincronizzato
    The big drawback of this is that each and every service has to open up a port on the public Internet
  • Non sincronizzato
    for people to be able to connect to it.
  • Non sincronizzato
    There are some solutions for that which I will talk about.
  • Non sincronizzato
    But the main advantage is that you have no central broker
  • Non sincronizzato
    and they got like a hundred-fold speedup over the previous implementation.
  • Non sincronizzato
    You also have an issue with service discovery.
  • Non sincronizzato
    You can write a broker which gives you back your single point of failure.
  • Non sincronizzato
    You can use DNS which means that can say "Hey I added a new service, let's use this SRV record to get to it"
  • Non sincronizzato
    Or you can distribute a text file.
  • Non sincronizzato
    Last year, during the Google Summer of Code, I mentored Simon Choppin
  • Non sincronizzato
    ...who implemented the DNS solution for integration in fedmsg in Debian.
  • Non sincronizzato
    The Fedora people as they control their whole infrastructure just distribute a text file
  • Non sincronizzato
    ...with the list of servers that are sending fedmsg messages.
  • Non sincronizzato
    How do you use it?
  • Non sincronizzato
    This is the Fedora topology.
  • Non sincronizzato
    I didn't have much time to do the Debian one.
  • Non sincronizzato
    It's really simpler. I'll talk about it later.
  • Non sincronizzato
    Basically, the messages are split in topics where you have a hierarchy of topics.
  • Non sincronizzato
    It's really easy to filter out the things that you want to listen to.
  • Non sincronizzato
    For instance, you can filter all the messages that concern package upload by using the dak service.
  • Non sincronizzato
    Or everything that involves a given package or something else.
  • Non sincronizzato
    Publishing messages is really trivial.
  • Non sincronizzato
    From Python, you only have to import the module,
  • Non sincronizzato
    do fedmsg.publish with a dict of the data that you want to send.
  • Non sincronizzato
    And that's it, your message is published.
  • Non sincronizzato
    From the shell, it's really easy too.
  • Non sincronizzato
    You just have a command called fedmsg-logger that you can pipe some input to.
  • Non sincronizzato
    And it goes on the bus, so it's really simple.
  • Non sincronizzato
    Receiving messages is trivial too.
  • Non sincronizzato
    In Python, you load the configuration
  • Non sincronizzato
    ...and you just have an iterator
  • Non sincronizzato
    [audio stops]
  • Non sincronizzato
    was a replay mechanism with just a sequence number
  • Non sincronizzato
    which will have your client query the event sender for new messages that you would have missed
  • Non sincronizzato
    ...in case of a network failure or anything.
  • Non sincronizzato
    That's how basically the system works.
  • Non sincronizzato
    Now, what about fedmsg in Debian?
  • Non sincronizzato
    During the last Google Summer of code, a lot happened thanks to Simon Chopin's involvement.
  • Non sincronizzato
    He did most of the packaging of fedmsg and its dependencies
  • Non sincronizzato
    ...which means that you can just apt-get install fedmsg and get it running.
  • Non sincronizzato
    It's available in sid, jessie and wheezy-backports.
  • Non sincronizzato
    He adapted the code of fedmsg to make it distribution agnostic.
  • Non sincronizzato
    He had a lot of support from upstream developers in Fedora to make that happen.
  • Non sincronizzato
    They are really excited to have their stuff being used by Debian or by other organizations,
  • Non sincronizzato
    ...that fedmsg was the right solution for event notification.
  • Non sincronizzato
    And finally, we bootstrapped the Debian bus by using mailing-list subscriptions
  • Non sincronizzato
    ...to get bug notifications and package upload notifications
  • Non sincronizzato
    ...and on mentors.debian.net which is a service I can control, so it's easy to add new stuff to it.
  • Non sincronizzato
    What then?
  • Non sincronizzato
    After the Google Summer of Code, there was some packaging adaptations to make it easier to run services based on fedmsg,
  • Non sincronizzato
    ...proper backports and maintainance of the bus
  • Non sincronizzato
    ...which mostly means keeping the software up-to-date
  • Non sincronizzato
    ...because the upstream is really active and responsive to bug reports.
  • Non sincronizzato
    It's really nice to work with them.
  • Non sincronizzato
    Since July 14th 2013 which is the day we started sending messages on the bus,
  • Non sincronizzato
    ...we had around 200k messages split accross 155k bug mails and 45k uploads
  • Non sincronizzato
    ...which proves that Debian is a really active project, I guess.
  • Non sincronizzato
    [laughs]
  • Non sincronizzato
    The latest developments with fedmsg is the packaging of Datanommer
  • Non sincronizzato
    ...which is a database component that can store messages that has been sent to the bus.
  • Non sincronizzato
    It allows Fedora to do queries on their messages
  • Non sincronizzato
    ...and give people the achievements that they did like "yeah, you had a hundred build failures"
  • Non sincronizzato
    ...or stuff like that.
    [laughs]
  • Non sincronizzato
    One big issue with fedmsg that I said earlier is that Debian services are widely distributed.
  • Non sincronizzato
    Some of the times, firewall restrictions are out of Debian control,
  • Non sincronizzato
    ...which is also the case of with the Fedora infrastructure
  • Non sincronizzato
    ...because some of their servers are hosted within Redhat
  • Non sincronizzato
    ...and Redhat networking sometimes don't want to open firewall ports.
  • Non sincronizzato
    So we need a way for services to push their messages instead of having clients pull the messages.
  • Non sincronizzato
    There is a component in fedmsg which have been created by the Fedora people which is called fedmsg-relay
  • Non sincronizzato
    ...which basically is just a tube where you push your message using a 0mq socket
  • Non sincronizzato
    ...and it then pushes it to the subscribers on the other side.
  • Non sincronizzato
    It just allows to bypass firwalls.
  • Non sincronizzato
    The issue is that it uses a non-standard port and a non-standard protocol.
  • Non sincronizzato
    It's just 0mq so it basically put your data on the wire and that's it.
  • Non sincronizzato
    So, I am pondering a way for services to push their messages using more classic web services.
  • Non sincronizzato
    You will take your JSON dictionary and push it by POST through HTTPS.
  • Non sincronizzato
    And then after that send the message to the bus
  • Non sincronizzato
    ...which I think will make it easier to integrate with other Debian services.
  • Non sincronizzato
    This was a really short talk.
  • Non sincronizzato
    I hope there is some discussions afterwards.
  • Non sincronizzato
    In conclusion, I am really glad it works.
  • Non sincronizzato
    For the moment, it's really apart from the Debian infrastructure.
  • Non sincronizzato
    So the big challenge will be to try to integrate fedmsg to Debian infrastructure
  • Non sincronizzato
    ...and use it for real.
  • Non sincronizzato
    If you want to contact me, I am olasd,
  • Non sincronizzato
    ...I am here for the whole conference.
  • Non sincronizzato
    If you want to talk to me about it, if you want to help me,
  • Non sincronizzato
    ...I am a little bit alone on this project, so I'll be glad if someone would join.
  • Non sincronizzato
    I'll be glad to hold an hacking session later this week.
  • Non sincronizzato
    Thanks for your attention!
  • Non sincronizzato
    [applause]
  • Non sincronizzato
    Was it this clear?
  • Non sincronizzato
    You talked about the ??? use to publish SRV record.
  • Non sincronizzato
    I missed some of the details of what that means.
  • Non sincronizzato
    What is in a SRV record and how do I do discovery on it?
  • Non sincronizzato
    The idea is that to actually receive messages, you need the host and the port of the sender.
  • Non sincronizzato
    If you have several WSGI workers, you have several ports that you need to listen to.
  • Non sincronizzato
    What we do with the SRV record is basically under the domain name of the service,
  • Non sincronizzato
    ...for example ftp-master.debian.org, we would have fedmsg.tcp.ftp-master.debian.org
  • Non sincronizzato
    ...which will point to the four or five workers that you would use to get the messages.
  • Non sincronizzato
    So if I don't know that ftp-master.debian.org is something that I want to subscribe to as a mechanism for getting the details,
  • Non sincronizzato
    ...is there something which tells me that ftp-master.debian.org is a an host to begin with?
  • Non sincronizzato
    No, not yet.
  • Non sincronizzato
    Only part of the problem is solved.
  • Non sincronizzato
    Currently there is no list of every single services that publish messages.
  • Non sincronizzato
    What they do in Fedora and what we do in Debian too, for public consumption,
  • Non sincronizzato
    ...there is a component called the gateway which will connect to all the message sources
  • Non sincronizzato
    ...and rewrite the messages to send them to clients.
  • Non sincronizzato
    You don't get the replay mechanism because it works only for a single source
  • Non sincronizzato
    ...but you solve your discovery problem but you get back the single point of failure.
  • Non sincronizzato
Titolo:
One year of fedmsg in Debian
Video Language:
English
Team:
Debconf
Progetto:
2014_debconf14

English subtitles

Incompleto

Revisioni Compare revisions