1 99:59:59,999 --> 99:59:59,999 I am Nicolas Dandrimont.
I am going to talk to you about a year of fedmsg in Debian.
We had a problem before with infrastructure in distributions.
Services are bit like people.
There are dozen of services maintained by many people
and each of those services has its own way of communicating with the rest of the world
Meaning that if you want to spin up a new service
that needs to talk to other services in the distribution
which is basically any service you want to include
you will need to implement a bunch of communication systems
For instance, in the Debian infrastructure
we have our archive software, which is dak,
that mostly uses emails and databases to communicate.
The metadat is available in a RFC822 format with no real API.
The database is not public either.
The build queue management software, which is wanna-build,
polls a database every so often to know what needs to get built.
There is no API outside of its database
that isn't public either
Out bug tracking system, which is called debbugs,
works via email, stores its data in flat files, for now,
and exposes a read-only SOAP API.
Our source control managament pushes in the distro-provided repos on alioth
can trigger an IRC bot or some emails
but there is no real central notification mechanism.
We have some kludges that are available to overcome those issues.
We have the Ultimate Debian Database
which contains a snapshot of a lot of the databases that are underlying the Debian infrastructure
This means that every so often,
there is a cron that runs and imports data from a service here, a service there.
There is no realtime data.
It's useful for distro-wide Q&A stuff because you don't need to have realtime data
But when you want some notification for trying to build a new package or something
That doesn't work very well
and the consistency between the data sources is not guaranteed.
We have another central notification system which the package tracking system
which also is cron-triggered or email-triggered
You can update the data from the BTS using ??
But you can subscribe to email updates on a given package
But the messages are not uniform,
they can be machine parsed.
There are a few headers but they are not sufficient to know what the messages are about.
And it's still not realtime.
The Fedora people invented something that could improve stuff which is called fedmsg.
It was actually introduced in 2009.
It's an unified message bus that can reduce the coupling between different services of a distribution.
That services can subscribe to one or several message topics, register callbacks and react to events
that are triggered by all the services in the distribution.
There is a bunch of stuff that are already implemented in fedmsg.
You get a stream of data with all the activity in your infrastructure which allows you to do statistics for instance
You decouple interdepent services because you can swap one thing with another
Or just listen to the messages and start doing stuff directly without having to ?? a database or something.
You can get a pluggable unified notification system that can gather all the events in the project and send them by email, by IRC
on your mobile phone, on your desktop, everywhere you want.
Fedora people use fedmsg to implement a badge system
which is some kind of gamification of the development process of the distribution
They implemented a live web dashboard
They implemented IRC feed.
And then they als go some bot bans on social networks because they were flooding
How does it work?
Well, the first idea was to use AMQP as implemented by qpid
Basically, you take all your services and you have them send their messages in a central broker
and then you have several listeners that can send messages to clients.
There were a few issues with this.
Basically, you have a single point of failure at the central broker
And the brokers weren't really reliable.
When they tested it under load, the brokers were tipping over ??
The actual implementation of fedmsg uses 0mq.
Basically what you get is not a single broker.
You get a mesh of interconnected services.
Basically, you can connect only to the service that you want to listen to.
The big drawback of this is that each and every service has to open up a port on the public Internet
for people to be able to connect to it.
There are some solutions for that which I will talk about.
But the main advantages is that you have no central broker
And they got like a hundred-fold speedup over the previous implementation.
You also have an issue with service discovery
You can write a broker which gives you back your single point of failure
You can use DNS which means that can say "Hey I added a new service, let's use this SRV record to get to it"
Or you can distribute a text file.
Last year, during the Google Summer of Code, I mentored Simon Choping
who implemented the DNS solution for integration in fedmsg in Debian.
The Fedora people as they control their whole infrastructure just distribute a text file
with the list of servers that are sending fedmsg messages
How do you use it?
This is the Fedora topology.
I didn't have much time to do the Debian one.
It's really simpler. I'll talk about it later.
Basically, the messages are split in topics where you have a hierarchy of topics.
It's really easy to filter out the things that you want to listen to.
For instance, you can filter all the messages that concern package upload by using the dak service.
Or everything that involves a given package or something else.
Publishing messages is really trivial.
From Python, you only have to import the module,
do fedmsg.publish with a dict of the data that you want to send
And that's it, your message is published.
From the shell, it's really easy too.
You just have a command called fedmsg-logger that you can pipe some input to
And it goes on the bus, so it's really simple.
Receiving messages is trivial too.
In Python, you load the configuration
and you just have an iterator
(video problems, resume at 10:10)
was a replay mechanism with just a sequence number
which will have your client query the event senders for new messages that you would have missed
in case of a network failure ??
That's how basically the system works.
Now, what about fedmsg in Debian
During the last Google Summer of code, a lot happened thanks to Simon Chopin's involvement
He did most of the packaging of fedmsg and its dependencies
It means that you can just apt-get install fedmsg and get it running
It's available in sid, jessie and wheezy-backports
He adapted the code of fedmsg to make it distribution agnostic
So he had a lot of support from upstream developers in Fedora to make that happen
They are really excited to have their stuff being used by Debian or by other organizations
?? fedmsg was the right solution for event notification
And finally, we bootstrapped the Debian bus by using mailing-list subscriptions
to get bug notifications and package upload notifications
and on mentors.debian.net which is a service I can control, so it's easy to add new stuff to it.
What then?
After the Google Summer of Code, there was some packaging adaptations to make it easier to run services based on fedmsg,
proper backports and maintainance of the bus
Which mostly means keeping the software up-to-date
because the upstream is really active and responsive to bug reports
It's really nice to work with them
Since July 14th 2013 which is the day we started sending messages on the bus
we had around 200k messages split accross 155k bug mails and 45k uploads
which proves that Debian is a really active project, I guess
[laughs]
The latest developments with fedmsg is the packaging of Datanommer
Which is a database component that can store messages that has been sent to the bus
It allows Fedora to do queries on their messages
and give people the achievements that they did like "yeah, you got a hundred build failures"
or stuff like that [laughs]
One big issue with fedmsg that I said earlier is that Debian services are widely distributed
Some of the times, firewall restrictions are out of Debian control
which is also the case of with the Fedora infrastructure
because some of their servers are hosted within Redhat
and Redhat networking sometimes don't want to open firewall ports
So we need a way for services to push their messages instead of having clients pull the messages
There is a component in fedmsg which have been created by the Fedora people which is called fedmsg-relay
Which basically is just a tube where you push your message using a 0mq socket
and it then pushes it to the subscribers on the other side
It just allows to bypass firwalls
The issue is that it uses a non-standard port and a non-standard protocol
It's just 0mq so it basically put your data on the wire and that's it.
So, I am pondering a way for services to push their messages using more classic web services
You will take your JSON dictionary and push it by POST through HTTPS
And then after that send the message to the bus
Which I think will make it easier to integrate with other Debian services
This was a really short talk
I hope there is some discussions afterwards
In conclusion, ??
I am really glad ??
For the moment, it's really apart from the Debian infrastructure
So the big challenge will be to try to integrate fedmsg to Debian infrastructure
Use it for real
If you want to contact me, I am olasd
I am here for the whole conference
If you want to talk to me about it, if you want to help me,
I am a little bit alone on this project, so I'll be glad if someone would join
I'll be glad to hold an hacking session later this week
Thanks for your attention
[applause]
Was it this clear?