I am Nicolas Dandrimont.
I am going to talk to you about a year of fedmsg in Debian.
We had a problem before with infrastructure in distributions.
All services are bit like people.
There are dozen of services maintained by many people
and each of those services has its own way of communicating with the rest of the world
Meaning that if you want to spin up a new service
that needs to talk to other services in the distribution
which is basically any service you want to include
you will need to implement a bunch of communication systems
For instance, in the Debian infrastructure
we have our archive software, which is dak,
that mostly uses emails and databases to communicate.
The metadat is available in a RFC822 format with no real API.
The database is not public either.
The build queue management software, which is called wanna-build,
polls a database every so often to know what needs to get built.
There is no API outside of its database
that isn't public either
Our bug tracking system, which is called debbugs,
works via email, stores its data in flat files, for now,
and exposes a read-only SOAP API.
Our source control managament pushes in the distribution-provided repositories on alioth
can trigger an IRC bot or some emails
but there is no real central notification mechanism.
We have some kludges that are available to overcome those issues.
We have the Ultimate Debian Database
which contains a snapshot of a lot of the databases that are underlying the Debian infrastructure
This means that every so often,
there is a cron that runs and imports data from a service here, a service there.
There is no realtime data.
It's useful for distro-wide Q&A stuff because you don't need to have realtime data
But when you want some notification for trying to build a new package or something
That doesn't work very well
and the consistency between the different data sources is not guaranteed.
We have another central notification system which the package tracking system
which also is cron-triggered or email-triggered
You can update the data from the BTS using ??
You can subscribe to email updates on a given package
But the messages are not uniform,
they can be machine parsed.
There are a few headers but they are not sufficient to know what the message is about.
And it's still not realtime.
The Fedora people invented something that could improve stuff which is called fedmsg.
It was actually introduced in 2009.
It's an unified message bus that can reduce the coupling between the different services in a distribution.
The idea is that services can subscribe to one or several message topics, register callbacks and react to events
that are triggered by all the services in the distribution.
There is a bunch of stuff that is already implemented in fedmsg.
You get a stream of data with all the activity in your infrastructure which allows you to do statistics for instance
You decouple interdepent services because you can swap something for another
Or just listen to the messages and start doing stuff directly without having to fiddle a database or something.
You can get a pluggable unified notification system that can gather all the events in the project and send them by email, by IRC,
on your mobile phone, on your desktop, everywhere you want.
Fedora people use fedmsg to implement a badge system
which is some kind of gamification of the development process of the distribution.
They implemented a live web dashboard.
They implemented IRC feed.
And then they also got some bot bans on social networks because they were flooding.
How does it work?
Well, the first idea was to use AMQP as implemented by qpid.
Basically, you take all your services and you have them send their messages in a central broker.
Then you have several listeners that can send messages to clients.
There were a few issues with this.
Basically, you have a single point of failure at the central broker.
And the brokers weren't really reliable.
When they tested it under load, the brokers were tipping over.
The actual implementation of fedmsg uses 0mq.
Basically what you get is not a single broker.
You get a mesh of interconnected services.
Basically, you can connect only to the services that you want to listen to.
The big drawback of this is that each and every service has to open up a port on the public Internet
for people to be able to connect to it.
There are some solutions for that which I will talk about.
But the main advantage is that you have no central broker
and they got like a hundred-fold speedup over the previous implementation.
You also have an issue with service discovery.
You can write a broker which gives you back your single point of failure.
You can use DNS which means that can say "Hey I added a new service, let's use this SRV record to get to it"
Or you can distribute a text file.
Last year, during the Google Summer of Code, I mentored Simon Choppin
...who implemented the DNS solution for integration in fedmsg in Debian.
The Fedora people as they control their whole infrastructure just distribute a text file
...with the list of servers that are sending fedmsg messages.
How do you use it?
This is the Fedora topology.
I didn't have much time to do the Debian one.
It's really simpler. I'll talk about it later.
Basically, the messages are split in topics where you have a hierarchy of topics.
It's really easy to filter out the things that you want to listen to.
For instance, you can filter all the messages that concern package upload by using the dak service.
Or everything that involves a given package or something else.
Publishing messages is really trivial.
From Python, you only have to import the module,
do fedmsg.publish with a dict of the data that you want to send.
And that's it, your message is published.
From the shell, it's really easy too.
You just have a command called fedmsg-logger that you can pipe some input to.
And it goes on the bus, so it's really simple.
Receiving messages is trivial too.
In Python, you load the configuration
...and you just have an iterator
[audio stops]
was a replay mechanism with just a sequence number
which will have your client query the event sender for new messages that you would have missed
...in case of a network failure or anything.
That's how basically the system works.
Now, what about fedmsg in Debian?
During the last Google Summer of code, a lot happened thanks to Simon Chopin's involvement.
He did most of the packaging of fedmsg and its dependencies
...which means that you can just apt-get install fedmsg and get it running.
It's available in sid, jessie and wheezy-backports.
He adapted the code of fedmsg to make it distribution agnostic.
He had a lot of support from upstream developers in Fedora to make that happen.
They are really excited to have their stuff being used by Debian or by other organizations,
...that fedmsg was the right solution for event notification.
And finally, we bootstrapped the Debian bus by using mailing-list subscriptions
...to get bug notifications and package upload notifications
...and on mentors.debian.net which is a service I can control, so it's easy to add new stuff to it.
What then?
After the Google Summer of Code, there was some packaging adaptations to make it easier to run services based on fedmsg,
...proper backports and maintainance of the bus
...which mostly means keeping the software up-to-date
...because the upstream is really active and responsive to bug reports.
It's really nice to work with them.
Since July 14th 2013 which is the day we started sending messages on the bus,
...we had around 200k messages split accross 155k bug mails and 45k uploads
...which proves that Debian is a really active project, I guess.
[laughs]
The latest developments with fedmsg is the packaging of Datanommer
...which is a database component that can store messages that has been sent to the bus.
It allows Fedora to do queries on their messages
...and give people the achievements that they did like "yeah, you had a hundred build failures"
...or stuff like that.
[laughs]
One big issue with fedmsg that I said earlier is that Debian services are widely distributed.
Some of the times, firewall restrictions are out of Debian control,
...which is also the case of with the Fedora infrastructure
...because some of their servers are hosted within Redhat
...and Redhat networking sometimes don't want to open firewall ports.
So we need a way for services to push their messages instead of having clients pull the messages.
There is a component in fedmsg which have been created by the Fedora people which is called fedmsg-relay
...which basically is just a tube where you push your message using a 0mq socket
...and it then pushes it to the subscribers on the other side.
It just allows to bypass firwalls.
The issue is that it uses a non-standard port and a non-standard protocol.
It's just 0mq so it basically put your data on the wire and that's it.
So, I am pondering a way for services to push their messages using more classic web services.
You will take your JSON dictionary and push it by POST through HTTPS.
And then after that send the message to the bus
...which I think will make it easier to integrate with other Debian services.
This was a really short talk.
I hope there is some discussions afterwards.
In conclusion, I am really glad it works.
For the moment, it's really apart from the Debian infrastructure.
So the big challenge will be to try to integrate fedmsg to Debian infrastructure
...and use it for real.
If you want to contact me, I am olasd,
...I am here for the whole conference.
If you want to talk to me about it, if you want to help me,
...I am a little bit alone on this project, so I'll be glad if someone would join.
I'll be glad to hold an hacking session later this week.
Thanks for your attention!
[applause]
Was it this clear?
You talked about the ??? use to publish SRV record.
I missed some of the details of what that means.
What is in a SRV record and how do I do discovery on it?
The idea is that to actually receive messages, you need the host and the port of the sender.
If you have several WSGI workers, you have several ports that you need to listen to.
What we do with the SRV record is basically under the domain name of the service,
...for example ftp-master.debian.org, we would have fedmsg.tcp.ftp-master.debian.org
...which will point to the four or five workers that you would use to get the messages.
So if I don't know that ftp-master.debian.org is something that I want to subscribe to as a mechanism for getting the details,
...is there something which tells me that ftp-master.debian.org is a an host to begin with?
No, not yet.
Only part of the problem is solved.
Currently there is no list of every single services that publish messages.
What they do in Fedora and what we do in Debian too, for public consumption,
...there is a component called the gateway which will connect to all the message sources
...and rewrite the messages to send them to clients.
You don't get the replay mechanism because it works only for a single source
...but you solve your discovery problem but you get back the single point of failure.