36C3 preroll music Andre Klapper: Alright, thank you. Thanks for your interest. I'm Andre, I'm with the Wikimedia Foundation, and one of the things I'm currently trying to find out is how to measure activity, people in our technical communities. And you probably know that Wikimedia is a large, large project. There's like more than 900 websites, and there's many areas where you can contribute, technically, in different ways. And we're currently trying to get an overview. And even that is hard. So, it is a complex task. And in this talk, I would like to quickly show you what we already have in place, and what we want to get in place, and maybe also little bits of the problems and the complexity. So, it's more like, for your interest, or if you're curious also to play with technical metrics, statistics, things like these. What we have currently is, mostly is about git repositories, code repositories, and we mostly use Gerrit for code review. We have our own Gerrit instance at gerrit.wikimedia.org. And for this we've been having a platform called wikimedia.biterg.io. If you've seen a ElasticSearch, Kibana, standard platform thingy, this might be familiar to you. It is all Free and Open Source, it's actually a Linux Foundation project, you can find it under chaoss.community, chaoss with double s, and the code base is public on GitHub. So any other free and open source software project can also set this up for themselves. We have it hosted by Bitergia, but this is also possible to set up yourself, if you're interested in gathering statistics about your Free and Open Source project. And there's also a documentation page on MediaWiki.org which is called community metrics. I think I have screenshots here, because I never trust the Internet at conferences, but I could also show you live… so this is the GitHub page of the chaoss project by the Linux foundation where you could get the code. This is, I hope the zoom is sufficient, wikimedia.biterg.io So this is the overview page. You can see the navigation up here, and you get some basic statistics about the most active people in the git repositories, which organizations we have, so here you can see Wikimedia Foundation individuals, hello welt, Wikimedia Deutschland. So these are, this is the contributor base we have, by organization, by affiliation. And down here there's way more statistics, gits, Geritt, mailing lists, we index a lot of things. We also index a little bit our issue tracking system, which is phabricator, and some edits on MediaWiki.org. And, for example, now, if I go to Gerrit and the overview page, because we use Gerrit for code review, they have more specific statistics, and as it's ElasticSearch, Kibana based, you might know this if you've played with this, whenever you click on a certain value, you can filter by that value. So, for example, if I use the pie chart here, and only want to see the numbers for independent volunteer contributors, I click it, and you see the numbers now change. Obviously a bit lower, and you see up here, that a filter has been applied, and you can continue with these things. Then you can go filter here also via code repository, for example, the MediaWiki core repository. If I click on that one, it also filters for the value, and you can basically drill down the statistics you want to gather here. And there's, as I only have 15 minutes, there's way more things you can find out here, also, for example, who reviews patches in Gerrit, how long patches have been open, median time, all these things you might want to gather to find out how well are we doing as a project, when it comes to both involving volunteers, and also give them the feedback when it comes to code review, and engagement, that you would like to give. Or, also, areas for improvement. For example, in Wikimedia Foundation obviously we have engineering teams, and some of them maintain certain code repositories, so you can filter the view for certain code repositories, and then see, for example, you realize sometimes that patches written by volunteers, it takes longer to review them than patches written by your coworkers. And these kinds of things which you maybe already assumed, but it's nice to have actually data. There's also a few caveats here. So, for example, I usually don't use the git statistics, because Gerrit is where the code review happens. And once a patch proposed and Gerrit has been accepted and merged in the git repository, you would also see that in the git repository, but as all our software is Open Source, Free Software, we also of course pull in a lot of git repositories from other upstream projects, because we use a lot of software invented and maintained somewhere else to run our servers. So the git statistics also include activity that we've imported within the git repositories from other companies. So, that's kind of misleading. And there's a few more caveats, which are actually, I hope all of them are listed on the community metrics page on MediaWiki.org, because at some point I had to create a section "behavior that might surprise you". It also, that page also has some examples like, how can I, for the most common questions I get from interested people, and also co-workers, or, you want to publish an annual report, and show how many volunteer contributors you have in the code bases and these things. So that is what we have. These were the screenshots in case the Wi-Fi doesn't work. And now the section, what is patchwork. A spoiler: Basically everything else. Because this was the look at git and git repositories and Gerrit for code review. But there is way more going on when it comes to technical contributions and code in Wikimedia. There is GitHub. So, we have some projects, quite a few, that don't use Wikimedia git, Wikimedia Gerrit, but they prefer GitHub, because it's a different contribution system or workflow. So, we already track some of that, but we still have to improve even finding a way how to find all the repositories related to Wikimedia Development on GitHub. Because they're not all under the same organization. When it comes to what I just showed you, wikimedia.biterg.io, we define what is being indexed in a public JSON file, "projects". So, this is also linked from the community metrics page on mediawiki.org, where we define basically what's, what gets indexed. And it's a long list as you can say– see, also some mailing lists, but there's a lot of code actually on the Wikis. Inside of Wiki pages. So, there are user scripts, there are gadgets, like small JavaScript things that enhance functionality, and they're actually quite common. So, for example, Wikimedia Commons, or English or German Wikipedia, they have a lot of gadgets even enabled by default, which makes some behavior easier. For example, on Commons a common gadget is adding a category to a photo or image that has been uploaded. That's way easier if you use a gadget which is enabled by default. There are Lua modules, and there's templates. For example the info boxes that you see in many Wikipedia articles on the side, for example, if you look up a Wikipedia article about a person. These are all templates. And they're all stored on Wiki. So, this is harder to track, to get a full overview of that. And some extension code, even we have about 130 MediaWiki extensions deployed on Wikimedia servers. But if you take a look only at the extension home pages or MediaWiki.org, there is more than 2000. So there's a lot of code out there, and sometimes this code is even stored just by copy and paste putting it on a Wiki page, and saying: here, copy and paste this, and it should work. Which might not be the best revision system when it comes to maintaining code, ever, but it's a quick and dirty way, so these things exist. And one other example, unknown code repository locations. We also have something called ToolForge. That's what some people call "cloud services" nowadays. So you can host your own little helper tools which other people then can also use, on a cloud services platform called ToolForge that we offer. One example would be, for example, page views. So, if you want to see which pages are the most popular on some Wiki, that's one example out of, also thousands of tools now actually. And though, of course, the rules are that you must publish the source code, it's sometimes really hard to also make sure that this happens, and where it happens. So for most repositories, we know, we have an index, but for some we actually don't know, which is also something to work out. So, recently, even getting a number of things, or getting an idea, like, what what can we measure, what do we have, how much do we have, I started to create a table, and even visualizing that was, was an interesting task. I'm still not sure if anybody understands this, but black basically means doesn't exist. You don't need to, there is nothing to, to measure, to index. Green means, yes we do measure this already. And the red ones mean, yellow means, it's tricky, but it's kind of possible via some scripts or using the API to get numbers out of the Wikis, in certain name spaces, for example the module name space. And red means, it's very hard, but we'd like to get this data at some point. Plus, also the complexity, so the numbers you see here is sometimes correct numbers, sometimes more of a ballpark vague figure about how many items, code repositories, projects we're actually talking about. And with some numbers, we're even wondering. For example, it says 270 000 modules and templates on the 900 sites, websites we have on Wikimedia servers, and this is what the database query says on hive, but we're not really trusting that number yet. So, this is actually what we're going to be after over the next months to also have way better data, and a way better overview of where our developers actually are. Because we know, in code repositories, we have about 200 to 400 code contributors, in Gerrit code review, per month. And we now also know that we have about 500, 600 people who work on user scripts and gadgets, per year. But for many other things, we don't know yet, and that's what I'm trying to improve over the next months, or, maybe realistically, years. Let's see. But, yeah. So, that's basically it. I hope this was a bit interesting. If you have any comments, questions, feel free to catch me here. I'm sometimes around the table. Feel free to catch me after this talk. These are links with more information, or, if you don't manage to catch me, feel also free on the community metrics page on MediaWiki.org, the first link, there is a discussion page, and there you can also bring up anything, ideas, ask questions, I watch that page, and, usually, reply. Thank you! applause postroll music Subtitles created by c3subtitles.de in the year 2021. Join, and help us!