English titulky

← One year of fedmsg in Debian

Získať kód na vloženie
1 Language

Ukazujem Revíziu 5 vytvorenú 08/30/2014 od DrZaius.

  1. I am Nicolas Dandrimont.
  2. I am going to talk to you about a year of fedmsg in Debian.
  3. We had a problem before with infrastructure in distributions.
  4. All services are bit like people.
  5. Nesynchronizované
    There are dozen of services maintained by many people
  6. Nesynchronizované
    and each of those services has its own way of communicating with the rest of the world
  7. Nesynchronizované
    Meaning that if you want to spin up a new service
    that needs to talk to other services in the distribution
  8. Nesynchronizované
    which is basically any service you want to include
  9. Nesynchronizované
    you will need to implement a bunch of communication systems
  10. Nesynchronizované
    For instance, in the Debian infrastructure
  11. Nesynchronizované
    we have our archive software, which is dak,
  12. Nesynchronizované
    that mostly uses emails and databases to communicate.
  13. Nesynchronizované
    The metadat is available in a RFC822 format with no real API.
  14. Nesynchronizované
    The database is not public either.
  15. Nesynchronizované
    The build queue management software, which is called wanna-build,
  16. Nesynchronizované
    polls a database every so often to know what needs to get built.
  17. Nesynchronizované
    There is no API outside of its database
  18. Nesynchronizované
    that isn't public either
  19. Nesynchronizované
    Our bug tracking system, which is called debbugs,
  20. Nesynchronizované
    works via email, stores its data in flat files, for now,
  21. Nesynchronizované
    and exposes a read-only SOAP API.
  22. Nesynchronizované
    Our source control managament pushes in the distribution-provided repositories on alioth
  23. Nesynchronizované
    can trigger an IRC bot or some emails
  24. Nesynchronizované
    but there is no real central notification mechanism.
  25. Nesynchronizované
    We have some kludges that are available to overcome those issues.
  26. Nesynchronizované
    We have the Ultimate Debian Database
  27. Nesynchronizované
    which contains a snapshot of a lot of the databases that are underlying the Debian infrastructure
  28. Nesynchronizované
    This means that every so often,
  29. Nesynchronizované
    there is a cron that runs and imports data from a service here, a service there.
  30. Nesynchronizované
    There is no realtime data.
  31. Nesynchronizované
    It's useful for distro-wide Q&A stuff because you don't need to have realtime data
  32. Nesynchronizované
    But when you want some notification for trying to build a new package or something
  33. Nesynchronizované
    That doesn't work very well
  34. Nesynchronizované
    and the consistency between the different data sources is not guaranteed.
  35. Nesynchronizované
    We have another central notification system which the package tracking system
  36. Nesynchronizované
    which also is cron-triggered or email-triggered
  37. Nesynchronizované
    You can update the data from the BTS using ??
  38. Nesynchronizované
    You can subscribe to email updates on a given package
  39. Nesynchronizované
    But the messages are not uniform,
  40. Nesynchronizované
    they can be machine parsed.
  41. Nesynchronizované
    There are a few headers but they are not sufficient to know what the message is about.
  42. Nesynchronizované
    And it's still not realtime.
  43. Nesynchronizované
    The Fedora people invented something that could improve stuff which is called fedmsg.
  44. Nesynchronizované
    It was actually introduced in 2009.
  45. Nesynchronizované
    It's an unified message bus that can reduce the coupling between the different services in a distribution.
  46. Nesynchronizované
    The idea is that services can subscribe to one or several message topics, register callbacks and react to events
  47. Nesynchronizované
    that are triggered by all the services in the distribution.
  48. Nesynchronizované
    There is a bunch of stuff that is already implemented in fedmsg.
  49. Nesynchronizované
    You get a stream of data with all the activity in your infrastructure which allows you to do statistics for instance
  50. Nesynchronizované
    You decouple interdepent services because you can swap something for another
  51. Nesynchronizované
    Or just listen to the messages and start doing stuff directly without having to fiddle a database or something.
  52. Nesynchronizované
    You can get a pluggable unified notification system that can gather all the events in the project and send them by email, by IRC,
  53. Nesynchronizované
    on your mobile phone, on your desktop, everywhere you want.
  54. Nesynchronizované
    Fedora people use fedmsg to implement a badge system
  55. Nesynchronizované
    which is some kind of gamification of the development process of the distribution.
  56. Nesynchronizované
    They implemented a live web dashboard.
  57. Nesynchronizované
    They implemented IRC feed.
  58. Nesynchronizované
    And then they also got some bot bans on social networks because they were flooding.
  59. Nesynchronizované
    How does it work?
  60. Nesynchronizované
    Well, the first idea was to use AMQP as implemented by qpid.
  61. Nesynchronizované
    Basically, you take all your services and you have them send their messages in a central broker.
  62. Nesynchronizované
    Then you have several listeners that can send messages to clients.
  63. Nesynchronizované
    There were a few issues with this.
  64. Nesynchronizované
    Basically, you have a single point of failure at the central broker.
  65. Nesynchronizované
    And the brokers weren't really reliable.
  66. Nesynchronizované
    When they tested it under load, the brokers were tipping over.
  67. Nesynchronizované
    The actual implementation of fedmsg uses 0mq.
  68. Nesynchronizované
    Basically what you get is not a single broker.
  69. Nesynchronizované
    You get a mesh of interconnected services.
  70. Nesynchronizované
    Basically, you can connect only to the services that you want to listen to.
  71. Nesynchronizované
    The big drawback of this is that each and every service has to open up a port on the public Internet
  72. Nesynchronizované
    for people to be able to connect to it.
  73. Nesynchronizované
    There are some solutions for that which I will talk about.
  74. Nesynchronizované
    But the main advantage is that you have no central broker
  75. Nesynchronizované
    and they got like a hundred-fold speedup over the previous implementation.
  76. Nesynchronizované
    You also have an issue with service discovery.
  77. Nesynchronizované
    You can write a broker which gives you back your single point of failure.
  78. Nesynchronizované
    You can use DNS which means that can say "Hey I added a new service, let's use this SRV record to get to it"
  79. Nesynchronizované
    Or you can distribute a text file.
  80. Nesynchronizované
    Last year, during the Google Summer of Code, I mentored Simon Choppin
  81. Nesynchronizované
    ...who implemented the DNS solution for integration in fedmsg in Debian.
  82. Nesynchronizované
    The Fedora people as they control their whole infrastructure just distribute a text file
  83. Nesynchronizované
    ...with the list of servers that are sending fedmsg messages.
  84. Nesynchronizované
    How do you use it?
  85. Nesynchronizované
    This is the Fedora topology.
  86. Nesynchronizované
    I didn't have much time to do the Debian one.
  87. Nesynchronizované
    It's really simpler. I'll talk about it later.
  88. Nesynchronizované
    Basically, the messages are split in topics where you have a hierarchy of topics.
  89. Nesynchronizované
    It's really easy to filter out the things that you want to listen to.
  90. Nesynchronizované
    For instance, you can filter all the messages that concern package upload by using the dak service.
  91. Nesynchronizované
    Or everything that involves a given package or something else.
  92. Nesynchronizované
    Publishing messages is really trivial.
  93. Nesynchronizované
    From Python, you only have to import the module,
  94. Nesynchronizované
    do fedmsg.publish with a dict of the data that you want to send.
  95. Nesynchronizované
    And that's it, your message is published.
  96. Nesynchronizované
    From the shell, it's really easy too.
  97. Nesynchronizované
    You just have a command called fedmsg-logger that you can pipe some input to.
  98. Nesynchronizované
    And it goes on the bus, so it's really simple.
  99. Nesynchronizované
    Receiving messages is trivial too.
  100. Nesynchronizované
    In Python, you load the configuration
  101. Nesynchronizované
    ...and you just have an iterator
  102. Nesynchronizované
    [audio stops]
  103. Nesynchronizované
    was a replay mechanism with just a sequence number
  104. Nesynchronizované
    which will have your client query the event sender for new messages that you would have missed
  105. Nesynchronizované
    ...in case of a network failure or anything.
  106. Nesynchronizované
    That's how basically the system works.
  107. Nesynchronizované
    Now, what about fedmsg in Debian?
  108. Nesynchronizované
    During the last Google Summer of code, a lot happened thanks to Simon Chopin's involvement.
  109. Nesynchronizované
    He did most of the packaging of fedmsg and its dependencies
  110. Nesynchronizované
    ...which means that you can just apt-get install fedmsg and get it running.
  111. Nesynchronizované
    It's available in sid, jessie and wheezy-backports.
  112. Nesynchronizované
    He adapted the code of fedmsg to make it distribution agnostic.
  113. Nesynchronizované
    He had a lot of support from upstream developers in Fedora to make that happen.
  114. Nesynchronizované
    They are really excited to have their stuff being used by Debian or by other organizations,
  115. Nesynchronizované
    ...that fedmsg was the right solution for event notification.
  116. Nesynchronizované
    And finally, we bootstrapped the Debian bus by using mailing-list subscriptions
  117. Nesynchronizované
    ...to get bug notifications and package upload notifications
  118. Nesynchronizované
    ...and on mentors.debian.net which is a service I can control, so it's easy to add new stuff to it.
  119. Nesynchronizované
    What then?
  120. Nesynchronizované
    After the Google Summer of Code, there was some packaging adaptations to make it easier to run services based on fedmsg,
  121. Nesynchronizované
    ...proper backports and maintainance of the bus
  122. Nesynchronizované
    ...which mostly means keeping the software up-to-date
  123. Nesynchronizované
    ...because the upstream is really active and responsive to bug reports.
  124. Nesynchronizované
    It's really nice to work with them.
  125. Nesynchronizované
    Since July 14th 2013 which is the day we started sending messages on the bus,
  126. Nesynchronizované
    ...we had around 200k messages split accross 155k bug mails and 45k uploads
  127. Nesynchronizované
    ...which proves that Debian is a really active project, I guess.
  128. Nesynchronizované
  129. Nesynchronizované
    The latest developments with fedmsg is the packaging of Datanommer
  130. Nesynchronizované
    ...which is a database component that can store messages that has been sent to the bus.
  131. Nesynchronizované
    It allows Fedora to do queries on their messages
  132. Nesynchronizované
    ...and give people the achievements that they did like "yeah, you had a hundred build failures"
  133. Nesynchronizované
    ...or stuff like that.
  134. Nesynchronizované
    One big issue with fedmsg that I said earlier is that Debian services are widely distributed.
  135. Nesynchronizované
    Some of the times, firewall restrictions are out of Debian control,
  136. Nesynchronizované
    ...which is also the case of with the Fedora infrastructure
  137. Nesynchronizované
    ...because some of their servers are hosted within Redhat
  138. Nesynchronizované
    ...and Redhat networking sometimes don't want to open firewall ports.
  139. Nesynchronizované
    So we need a way for services to push their messages instead of having clients pull the messages.
  140. Nesynchronizované
    There is a component in fedmsg which have been created by the Fedora people which is called fedmsg-relay
  141. Nesynchronizované
    ...which basically is just a tube where you push your message using a 0mq socket
  142. Nesynchronizované
    ...and it then pushes it to the subscribers on the other side.
  143. Nesynchronizované
    It just allows to bypass firwalls.
  144. Nesynchronizované
    The issue is that it uses a non-standard port and a non-standard protocol.
  145. Nesynchronizované
    It's just 0mq so it basically put your data on the wire and that's it.
  146. Nesynchronizované
    So, I am pondering a way for services to push their messages using more classic web services.
  147. Nesynchronizované
    You will take your JSON dictionary and push it by POST through HTTPS.
  148. Nesynchronizované
    And then after that send the message to the bus
  149. Nesynchronizované
    ...which I think will make it easier to integrate with other Debian services.
  150. Nesynchronizované
    This was a really short talk.
  151. Nesynchronizované
    I hope there is some discussions afterwards.
  152. Nesynchronizované
    In conclusion, I am really glad it works.
  153. Nesynchronizované
    For the moment, it's really apart from the Debian infrastructure.
  154. Nesynchronizované
    So the big challenge will be to try to integrate fedmsg to Debian infrastructure
  155. Nesynchronizované
    ...and use it for real.
  156. Nesynchronizované
    If you want to contact me, I am olasd,
  157. Nesynchronizované
    ...I am here for the whole conference.
  158. Nesynchronizované
    If you want to talk to me about it, if you want to help me,
  159. Nesynchronizované
    ...I am a little bit alone on this project, so I'll be glad if someone would join.
  160. Nesynchronizované
    I'll be glad to hold an hacking session later this week.
  161. Nesynchronizované
    Thanks for your attention!
  162. Nesynchronizované
  163. Nesynchronizované
    Was it this clear?
  164. Nesynchronizované
    You talked about the ??? use to publish SRV record.
  165. Nesynchronizované
    I missed some of the details of what that means.
  166. Nesynchronizované
    What is in a SRV record and how do I do discovery on it?
  167. Nesynchronizované
    The idea is that to actually receive messages, you need the host and the port of the sender.
  168. Nesynchronizované
    If you have several WSGI workers, you have several ports that you need to listen to.
  169. Nesynchronizované
    What we do with the SRV record is basically under the domain name of the service,
  170. Nesynchronizované
    ...for example ftp-master.debian.org, we would have fedmsg.tcp.ftp-master.debian.org
  171. Nesynchronizované
    ...which will point to the four or five workers that you would use to get the messages.
  172. Nesynchronizované
    So if I don't know that ftp-master.debian.org is something that I want to subscribe to as a mechanism for getting the details,
  173. Nesynchronizované
    ...is there something which tells me that ftp-master.debian.org is a an host to begin with?
  174. Nesynchronizované
    No, not yet.
  175. Nesynchronizované
    Only part of the problem is solved.
  176. Nesynchronizované
    Currently there is no list of every single services that publish messages.
  177. Nesynchronizované
    What they do in Fedora and what we do in Debian too, for public consumption,
  178. Nesynchronizované
    ...there is a component called the gateway which will connect to all the message sources
  179. Nesynchronizované
    ...and rewrite the messages to send them to clients.
  180. Nesynchronizované
    You don't get the replay mechanism because it works only for a single source
  181. Nesynchronizované
    ...but you solve your discovery problem but you get back the single point of failure.
  182. Nesynchronizované