English sottotitoli

← One year of fedmsg in Debian

Ottieni il codice di inserimento
1 Language

Mostrare Revisione 5 creata 08/30/2014 da DrZaius.

  1. I am Nicolas Dandrimont.
  2. I am going to talk to you about a year of fedmsg in Debian.
  3. We had a problem before with infrastructure in distributions.
  4. All services are bit like people.
  5. Non sincronizzato
    There are dozen of services maintained by many people
  6. Non sincronizzato
    and each of those services has its own way of communicating with the rest of the world
  7. Non sincronizzato
    Meaning that if you want to spin up a new service
    that needs to talk to other services in the distribution
  8. Non sincronizzato
    which is basically any service you want to include
  9. Non sincronizzato
    you will need to implement a bunch of communication systems
  10. Non sincronizzato
    For instance, in the Debian infrastructure
  11. Non sincronizzato
    we have our archive software, which is dak,
  12. Non sincronizzato
    that mostly uses emails and databases to communicate.
  13. Non sincronizzato
    The metadat is available in a RFC822 format with no real API.
  14. Non sincronizzato
    The database is not public either.
  15. Non sincronizzato
    The build queue management software, which is called wanna-build,
  16. Non sincronizzato
    polls a database every so often to know what needs to get built.
  17. Non sincronizzato
    There is no API outside of its database
  18. Non sincronizzato
    that isn't public either
  19. Non sincronizzato
    Our bug tracking system, which is called debbugs,
  20. Non sincronizzato
    works via email, stores its data in flat files, for now,
  21. Non sincronizzato
    and exposes a read-only SOAP API.
  22. Non sincronizzato
    Our source control managament pushes in the distribution-provided repositories on alioth
  23. Non sincronizzato
    can trigger an IRC bot or some emails
  24. Non sincronizzato
    but there is no real central notification mechanism.
  25. Non sincronizzato
    We have some kludges that are available to overcome those issues.
  26. Non sincronizzato
    We have the Ultimate Debian Database
  27. Non sincronizzato
    which contains a snapshot of a lot of the databases that are underlying the Debian infrastructure
  28. Non sincronizzato
    This means that every so often,
  29. Non sincronizzato
    there is a cron that runs and imports data from a service here, a service there.
  30. Non sincronizzato
    There is no realtime data.
  31. Non sincronizzato
    It's useful for distro-wide Q&A stuff because you don't need to have realtime data
  32. Non sincronizzato
    But when you want some notification for trying to build a new package or something
  33. Non sincronizzato
    That doesn't work very well
  34. Non sincronizzato
    and the consistency between the different data sources is not guaranteed.
  35. Non sincronizzato
    We have another central notification system which the package tracking system
  36. Non sincronizzato
    which also is cron-triggered or email-triggered
  37. Non sincronizzato
    You can update the data from the BTS using ??
  38. Non sincronizzato
    You can subscribe to email updates on a given package
  39. Non sincronizzato
    But the messages are not uniform,
  40. Non sincronizzato
    they can be machine parsed.
  41. Non sincronizzato
    There are a few headers but they are not sufficient to know what the message is about.
  42. Non sincronizzato
    And it's still not realtime.
  43. Non sincronizzato
    The Fedora people invented something that could improve stuff which is called fedmsg.
  44. Non sincronizzato
    It was actually introduced in 2009.
  45. Non sincronizzato
    It's an unified message bus that can reduce the coupling between the different services in a distribution.
  46. Non sincronizzato
    The idea is that services can subscribe to one or several message topics, register callbacks and react to events
  47. Non sincronizzato
    that are triggered by all the services in the distribution.
  48. Non sincronizzato
    There is a bunch of stuff that is already implemented in fedmsg.
  49. Non sincronizzato
    You get a stream of data with all the activity in your infrastructure which allows you to do statistics for instance
  50. Non sincronizzato
    You decouple interdepent services because you can swap something for another
  51. Non sincronizzato
    Or just listen to the messages and start doing stuff directly without having to fiddle a database or something.
  52. Non sincronizzato
    You can get a pluggable unified notification system that can gather all the events in the project and send them by email, by IRC,
  53. Non sincronizzato
    on your mobile phone, on your desktop, everywhere you want.
  54. Non sincronizzato
    Fedora people use fedmsg to implement a badge system
  55. Non sincronizzato
    which is some kind of gamification of the development process of the distribution.
  56. Non sincronizzato
    They implemented a live web dashboard.
  57. Non sincronizzato
    They implemented IRC feed.
  58. Non sincronizzato
    And then they also got some bot bans on social networks because they were flooding.
  59. Non sincronizzato
    How does it work?
  60. Non sincronizzato
    Well, the first idea was to use AMQP as implemented by qpid.
  61. Non sincronizzato
    Basically, you take all your services and you have them send their messages in a central broker.
  62. Non sincronizzato
    Then you have several listeners that can send messages to clients.
  63. Non sincronizzato
    There were a few issues with this.
  64. Non sincronizzato
    Basically, you have a single point of failure at the central broker.
  65. Non sincronizzato
    And the brokers weren't really reliable.
  66. Non sincronizzato
    When they tested it under load, the brokers were tipping over.
  67. Non sincronizzato
    The actual implementation of fedmsg uses 0mq.
  68. Non sincronizzato
    Basically what you get is not a single broker.
  69. Non sincronizzato
    You get a mesh of interconnected services.
  70. Non sincronizzato
    Basically, you can connect only to the services that you want to listen to.
  71. Non sincronizzato
    The big drawback of this is that each and every service has to open up a port on the public Internet
  72. Non sincronizzato
    for people to be able to connect to it.
  73. Non sincronizzato
    There are some solutions for that which I will talk about.
  74. Non sincronizzato
    But the main advantage is that you have no central broker
  75. Non sincronizzato
    and they got like a hundred-fold speedup over the previous implementation.
  76. Non sincronizzato
    You also have an issue with service discovery.
  77. Non sincronizzato
    You can write a broker which gives you back your single point of failure.
  78. Non sincronizzato
    You can use DNS which means that can say "Hey I added a new service, let's use this SRV record to get to it"
  79. Non sincronizzato
    Or you can distribute a text file.
  80. Non sincronizzato
    Last year, during the Google Summer of Code, I mentored Simon Choppin
  81. Non sincronizzato
    ...who implemented the DNS solution for integration in fedmsg in Debian.
  82. Non sincronizzato
    The Fedora people as they control their whole infrastructure just distribute a text file
  83. Non sincronizzato
    ...with the list of servers that are sending fedmsg messages.
  84. Non sincronizzato
    How do you use it?
  85. Non sincronizzato
    This is the Fedora topology.
  86. Non sincronizzato
    I didn't have much time to do the Debian one.
  87. Non sincronizzato
    It's really simpler. I'll talk about it later.
  88. Non sincronizzato
    Basically, the messages are split in topics where you have a hierarchy of topics.
  89. Non sincronizzato
    It's really easy to filter out the things that you want to listen to.
  90. Non sincronizzato
    For instance, you can filter all the messages that concern package upload by using the dak service.
  91. Non sincronizzato
    Or everything that involves a given package or something else.
  92. Non sincronizzato
    Publishing messages is really trivial.
  93. Non sincronizzato
    From Python, you only have to import the module,
  94. Non sincronizzato
    do fedmsg.publish with a dict of the data that you want to send.
  95. Non sincronizzato
    And that's it, your message is published.
  96. Non sincronizzato
    From the shell, it's really easy too.
  97. Non sincronizzato
    You just have a command called fedmsg-logger that you can pipe some input to.
  98. Non sincronizzato
    And it goes on the bus, so it's really simple.
  99. Non sincronizzato
    Receiving messages is trivial too.
  100. Non sincronizzato
    In Python, you load the configuration
  101. Non sincronizzato
    ...and you just have an iterator
  102. Non sincronizzato
    [audio stops]
  103. Non sincronizzato
    was a replay mechanism with just a sequence number
  104. Non sincronizzato
    which will have your client query the event sender for new messages that you would have missed
  105. Non sincronizzato
    ...in case of a network failure or anything.
  106. Non sincronizzato
    That's how basically the system works.
  107. Non sincronizzato
    Now, what about fedmsg in Debian?
  108. Non sincronizzato
    During the last Google Summer of code, a lot happened thanks to Simon Chopin's involvement.
  109. Non sincronizzato
    He did most of the packaging of fedmsg and its dependencies
  110. Non sincronizzato
    ...which means that you can just apt-get install fedmsg and get it running.
  111. Non sincronizzato
    It's available in sid, jessie and wheezy-backports.
  112. Non sincronizzato
    He adapted the code of fedmsg to make it distribution agnostic.
  113. Non sincronizzato
    He had a lot of support from upstream developers in Fedora to make that happen.
  114. Non sincronizzato
    They are really excited to have their stuff being used by Debian or by other organizations,
  115. Non sincronizzato
    ...that fedmsg was the right solution for event notification.
  116. Non sincronizzato
    And finally, we bootstrapped the Debian bus by using mailing-list subscriptions
  117. Non sincronizzato
    ...to get bug notifications and package upload notifications
  118. Non sincronizzato
    ...and on mentors.debian.net which is a service I can control, so it's easy to add new stuff to it.
  119. Non sincronizzato
    What then?
  120. Non sincronizzato
    After the Google Summer of Code, there was some packaging adaptations to make it easier to run services based on fedmsg,
  121. Non sincronizzato
    ...proper backports and maintainance of the bus
  122. Non sincronizzato
    ...which mostly means keeping the software up-to-date
  123. Non sincronizzato
    ...because the upstream is really active and responsive to bug reports.
  124. Non sincronizzato
    It's really nice to work with them.
  125. Non sincronizzato
    Since July 14th 2013 which is the day we started sending messages on the bus,
  126. Non sincronizzato
    ...we had around 200k messages split accross 155k bug mails and 45k uploads
  127. Non sincronizzato
    ...which proves that Debian is a really active project, I guess.
  128. Non sincronizzato
    [laughs]
  129. Non sincronizzato
    The latest developments with fedmsg is the packaging of Datanommer
  130. Non sincronizzato
    ...which is a database component that can store messages that has been sent to the bus.
  131. Non sincronizzato
    It allows Fedora to do queries on their messages
  132. Non sincronizzato
    ...and give people the achievements that they did like "yeah, you had a hundred build failures"
  133. Non sincronizzato
    ...or stuff like that.
    [laughs]
  134. Non sincronizzato
    One big issue with fedmsg that I said earlier is that Debian services are widely distributed.
  135. Non sincronizzato
    Some of the times, firewall restrictions are out of Debian control,
  136. Non sincronizzato
    ...which is also the case of with the Fedora infrastructure
  137. Non sincronizzato
    ...because some of their servers are hosted within Redhat
  138. Non sincronizzato
    ...and Redhat networking sometimes don't want to open firewall ports.
  139. Non sincronizzato
    So we need a way for services to push their messages instead of having clients pull the messages.
  140. Non sincronizzato
    There is a component in fedmsg which have been created by the Fedora people which is called fedmsg-relay
  141. Non sincronizzato
    ...which basically is just a tube where you push your message using a 0mq socket
  142. Non sincronizzato
    ...and it then pushes it to the subscribers on the other side.
  143. Non sincronizzato
    It just allows to bypass firwalls.
  144. Non sincronizzato
    The issue is that it uses a non-standard port and a non-standard protocol.
  145. Non sincronizzato
    It's just 0mq so it basically put your data on the wire and that's it.
  146. Non sincronizzato
    So, I am pondering a way for services to push their messages using more classic web services.
  147. Non sincronizzato
    You will take your JSON dictionary and push it by POST through HTTPS.
  148. Non sincronizzato
    And then after that send the message to the bus
  149. Non sincronizzato
    ...which I think will make it easier to integrate with other Debian services.
  150. Non sincronizzato
    This was a really short talk.
  151. Non sincronizzato
    I hope there is some discussions afterwards.
  152. Non sincronizzato
    In conclusion, I am really glad it works.
  153. Non sincronizzato
    For the moment, it's really apart from the Debian infrastructure.
  154. Non sincronizzato
    So the big challenge will be to try to integrate fedmsg to Debian infrastructure
  155. Non sincronizzato
    ...and use it for real.
  156. Non sincronizzato
    If you want to contact me, I am olasd,
  157. Non sincronizzato
    ...I am here for the whole conference.
  158. Non sincronizzato
    If you want to talk to me about it, if you want to help me,
  159. Non sincronizzato
    ...I am a little bit alone on this project, so I'll be glad if someone would join.
  160. Non sincronizzato
    I'll be glad to hold an hacking session later this week.
  161. Non sincronizzato
    Thanks for your attention!
  162. Non sincronizzato
    [applause]
  163. Non sincronizzato
    Was it this clear?
  164. Non sincronizzato
    You talked about the ??? use to publish SRV record.
  165. Non sincronizzato
    I missed some of the details of what that means.
  166. Non sincronizzato
    What is in a SRV record and how do I do discovery on it?
  167. Non sincronizzato
    The idea is that to actually receive messages, you need the host and the port of the sender.
  168. Non sincronizzato
    If you have several WSGI workers, you have several ports that you need to listen to.
  169. Non sincronizzato
    What we do with the SRV record is basically under the domain name of the service,
  170. Non sincronizzato
    ...for example ftp-master.debian.org, we would have fedmsg.tcp.ftp-master.debian.org
  171. Non sincronizzato
    ...which will point to the four or five workers that you would use to get the messages.
  172. Non sincronizzato
    So if I don't know that ftp-master.debian.org is something that I want to subscribe to as a mechanism for getting the details,
  173. Non sincronizzato
    ...is there something which tells me that ftp-master.debian.org is a an host to begin with?
  174. Non sincronizzato
    No, not yet.
  175. Non sincronizzato
    Only part of the problem is solved.
  176. Non sincronizzato
    Currently there is no list of every single services that publish messages.
  177. Non sincronizzato
    What they do in Fedora and what we do in Debian too, for public consumption,
  178. Non sincronizzato
    ...there is a component called the gateway which will connect to all the message sources
  179. Non sincronizzato
    ...and rewrite the messages to send them to clients.
  180. Non sincronizzato
    You don't get the replay mechanism because it works only for a single source
  181. Non sincronizzato
    ...but you solve your discovery problem but you get back the single point of failure.
  182. Non sincronizzato