0:00:00.000,0:00:12.010 36C3 preroll music 0:00:12.010,0:00:22.720 Andre Klapper: Alright, thank you. Thanks[br]for your interest. I'm Andre, I'm with the 0:00:22.720,0:00:28.130 Wikimedia Foundation, and one of the[br]things I'm currently trying to find out is 0:00:28.130,0:00:37.090 how to measure activity, people in our[br]technical communities. And you probably 0:00:37.090,0:00:42.020 know that Wikimedia is a large, large[br]project. There's like more than 900 0:00:42.020,0:00:47.680 websites, and there's many areas where you[br]can contribute, technically, in different 0:00:47.680,0:00:53.330 ways. And we're currently trying to get an[br]overview. And even that is hard. 0:00:53.330,0:01:02.280 So, it is a complex task. And in this talk, I would[br]like to quickly show you what we already 0:01:02.280,0:01:08.220 have in place, and what we want to get in[br]place, and maybe also little bits of the 0:01:08.220,0:01:14.030 problems and the complexity. So, it's more[br]like, for your interest, or if you're 0:01:14.030,0:01:24.260 curious also to play with technical[br]metrics, statistics, things like these. 0:01:24.260,0:01:30.830 What we have currently is, mostly is about[br]git repositories, code repositories, and 0:01:30.830,0:01:35.030 we mostly use Gerrit for code review. We[br]have our own Gerrit instance at 0:01:35.030,0:01:43.320 gerrit.wikimedia.org. And for this we've[br]been having a platform called 0:01:43.320,0:01:52.070 wikimedia.biterg.io. If you've seen a[br]ElasticSearch, Kibana, standard platform 0:01:52.070,0:01:58.979 thingy, this might be familiar to you. It[br]is all Free and Open Source, it's actually 0:01:58.979,0:02:03.259 a Linux Foundation project, you can find[br]it under chaoss.community, chaoss with 0:02:03.259,0:02:09.399 double s, and the code base is public on[br]GitHub. So any other free and open source 0:02:09.399,0:02:14.859 software project can also set this up for[br]themselves. We have it hosted by Bitergia, 0:02:14.859,0:02:19.019 but this is also possible to set up[br]yourself, if you're interested in 0:02:19.019,0:02:27.150 gathering statistics about your Free and[br]Open Source project. And there's also a 0:02:27.150,0:02:36.269 documentation page on MediaWiki.org which[br]is called community metrics. I think I 0:02:36.269,0:02:40.959 have screenshots here, because I never[br]trust the Internet at conferences, but I 0:02:40.959,0:02:47.319 could also show you live… so this is the[br]GitHub page of the chaoss project by the 0:02:47.319,0:02:55.010 Linux foundation where you could get the[br]code. This is, I hope the zoom is 0:02:55.010,0:03:03.699 sufficient, wikimedia.biterg.io So this is[br]the overview page. You can see the 0:03:03.699,0:03:12.790 navigation up here, and you get some basic[br]statistics about the most active people in 0:03:12.790,0:03:18.260 the git repositories, which organizations[br]we have, so here you can see Wikimedia 0:03:18.260,0:03:26.080 Foundation individuals, hello welt,[br]Wikimedia Deutschland. So these are, this 0:03:26.080,0:03:31.619 is the contributor base we have, by[br]organization, by affiliation. And down 0:03:31.619,0:03:37.620 here there's way more statistics, gits,[br]Geritt, mailing lists, we index a lot of 0:03:37.620,0:03:43.230 things. We also index a little bit our[br]issue tracking system, which is 0:03:43.230,0:03:51.469 phabricator, and some edits on[br]MediaWiki.org. And, for example, now, if I 0:03:51.469,0:03:58.999 go to Gerrit and the overview page,[br]because we use Gerrit for code review, 0:03:58.999,0:04:06.109 they have more specific statistics, and as[br]it's ElasticSearch, Kibana based, you 0:04:06.109,0:04:09.930 might know this if you've played with[br]this, whenever you click on a certain 0:04:09.930,0:04:15.029 value, you can filter by that value. So,[br]for example, if I use the pie chart here, 0:04:15.029,0:04:19.590 and only want to see the numbers for[br]independent volunteer contributors, 0:04:19.590,0:04:26.400 I click it, and you see the numbers now[br]change. Obviously a bit lower, and you see 0:04:26.400,0:04:30.530 up here, that a filter has been applied,[br]and you can continue with these things. 0:04:30.530,0:04:36.250 Then you can go filter here also via code[br]repository, for example, the MediaWiki 0:04:36.250,0:04:42.500 core repository. If I click on that one,[br]it also filters for the value, and you can 0:04:42.500,0:04:49.510 basically drill down the statistics you[br]want to gather here. And there's, as I 0:04:49.510,0:04:53.871 only have 15 minutes, there's way more[br]things you can find out here, also, for 0:04:53.871,0:05:02.600 example, who reviews patches in Gerrit,[br]how long patches have been open, median 0:05:02.600,0:05:08.870 time, all these things you might want to[br]gather to find out how well are we doing 0:05:08.870,0:05:15.540 as a project, when it comes to both[br]involving volunteers, and also give them 0:05:15.540,0:05:21.350 the feedback when it comes to code review,[br]and engagement, that you would like to 0:05:21.350,0:05:26.470 give. Or, also, areas for improvement. For[br]example, in Wikimedia Foundation obviously 0:05:26.470,0:05:33.100 we have engineering teams, and some of[br]them maintain certain code repositories, 0:05:33.100,0:05:39.261 so you can filter the view for certain[br]code repositories, and then see, for 0:05:39.261,0:05:44.640 example, you realize sometimes that[br]patches written by volunteers, it takes 0:05:44.640,0:05:49.130 longer to review them than patches written[br]by your coworkers. And these kinds of 0:05:49.130,0:05:54.180 things which you maybe already assumed,[br]but it's nice to have actually data. 0:05:54.180,0:06:02.810 There's also a few caveats here. So, for[br]example, I usually don't use the git 0:06:02.810,0:06:10.310 statistics, because Gerrit is where the[br]code review happens. And once a patch 0:06:10.310,0:06:15.430 proposed and Gerrit has been accepted and[br]merged in the git repository, you would 0:06:15.430,0:06:20.700 also see that in the git repository, but[br]as all our software is Open Source, Free 0:06:20.700,0:06:26.420 Software, we also of course pull in a lot[br]of git repositories from other upstream 0:06:26.420,0:06:31.020 projects, because we use a lot of software[br]invented and maintained somewhere else to 0:06:31.020,0:06:38.550 run our servers. So the git statistics[br]also include activity that we've imported 0:06:38.550,0:06:43.790 within the git repositories from other[br]companies. So, that's kind of misleading. 0:06:43.790,0:06:48.820 And there's a few more caveats, which are[br]actually, I hope all of them are listed on 0:06:48.820,0:06:54.350 the community metrics page on[br]MediaWiki.org, because at some point I had 0:06:54.350,0:07:01.230 to create a section "behavior that might[br]surprise you". It also, that page also has 0:07:01.230,0:07:05.820 some examples like, how can I, for the[br]most common questions I get from 0:07:05.820,0:07:12.820 interested people, and also co-workers,[br]or, you want to publish an annual report, 0:07:12.820,0:07:16.300 and show how many volunteer contributors[br]you have in the code bases and these 0:07:16.300,0:07:27.870 things. So that is what we have. These[br]were the screenshots in case the Wi-Fi 0:07:27.870,0:07:35.990 doesn't work. And now the section, what is[br]patchwork. A spoiler: Basically everything 0:07:35.990,0:07:43.120 else. Because this was the look at git and[br]git repositories and Gerrit for code 0:07:43.120,0:07:49.480 review. But there is way more going on[br]when it comes to technical contributions 0:07:49.480,0:07:58.590 and code in Wikimedia. There is GitHub.[br]So, we have some projects, quite a few, 0:07:58.590,0:08:02.461 that don't use Wikimedia git, Wikimedia[br]Gerrit, but they prefer GitHub, because 0:08:02.461,0:08:10.860 it's a different contribution system or[br]workflow. So, we already track some of 0:08:10.860,0:08:15.840 that, but we still have to improve even[br]finding a way how to find all the 0:08:15.840,0:08:20.100 repositories related to Wikimedia[br]Development on GitHub. Because they're not 0:08:20.100,0:08:27.090 all under the same organization. When it[br]comes to what I just showed you, 0:08:27.090,0:08:33.650 wikimedia.biterg.io, we define what is[br]being indexed in a public JSON file, 0:08:33.650,0:08:38.409 "projects". So, this is also linked from[br]the community metrics page on 0:08:38.409,0:08:43.379 mediawiki.org, where we define basically[br]what's, what gets indexed. And it's a long 0:08:43.379,0:08:50.579 list as you can say– see, also some[br]mailing lists, but there's a lot of code 0:08:50.579,0:08:57.149 actually on the Wikis. Inside of Wiki[br]pages. So, there are user scripts, there 0:08:57.149,0:09:02.830 are gadgets, like small JavaScript things[br]that enhance functionality, and they're 0:09:02.830,0:09:08.759 actually quite common. So, for example,[br]Wikimedia Commons, or English or German 0:09:08.759,0:09:15.059 Wikipedia, they have a lot of gadgets even[br]enabled by default, which makes some 0:09:15.059,0:09:22.279 behavior easier. For example, on Commons a[br]common gadget is adding a category to a 0:09:22.279,0:09:26.640 photo or image that has been uploaded.[br]That's way easier if you use a gadget 0:09:26.640,0:09:34.240 which is enabled by default. There are Lua[br]modules, and there's templates. For 0:09:34.240,0:09:39.241 example the info boxes that you see in[br]many Wikipedia articles on the side, for 0:09:39.241,0:09:43.839 example, if you look up a Wikipedia[br]article about a person. These are all 0:09:43.839,0:09:51.009 templates. And they're all stored on Wiki.[br]So, this is harder to track, to get a full 0:09:51.009,0:10:00.079 overview of that. And some extension code,[br]even we have about 130 MediaWiki 0:10:00.079,0:10:06.449 extensions deployed on Wikimedia servers.[br]But if you take a look only at the 0:10:06.449,0:10:11.860 extension home pages or MediaWiki.org,[br]there is more than 2000. So there's a lot 0:10:11.860,0:10:16.100 of code out there, and sometimes this code[br]is even stored just by copy and paste 0:10:16.100,0:10:20.510 putting it on a Wiki page, and saying:[br]here, copy and paste this, and it should 0:10:20.510,0:10:26.720 work. Which might not be the best revision[br]system when it comes to maintaining code, 0:10:26.720,0:10:33.139 ever, but it's a quick and dirty way, so[br]these things exist. And one other example, 0:10:33.139,0:10:40.199 unknown code repository locations. We also[br]have something called ToolForge. That's 0:10:40.199,0:10:44.920 what some people call "cloud services"[br]nowadays. So you can host your own little 0:10:44.920,0:10:50.579 helper tools which other people then can[br]also use, on a cloud services platform 0:10:50.579,0:10:55.069 called ToolForge that we offer. One[br]example would be, for example, page views. 0:10:55.069,0:11:02.770 So, if you want to see which pages are the[br]most popular on some Wiki, that's one 0:11:02.770,0:11:08.319 example out of, also thousands of tools[br]now actually. And though, of course, the 0:11:08.319,0:11:14.019 rules are that you must publish the source[br]code, it's sometimes really hard to also 0:11:14.019,0:11:18.249 make sure that this happens, and where it[br]happens. So for most repositories, we 0:11:18.249,0:11:23.329 know, we have an index, but for some we[br]actually don't know, which is also 0:11:23.329,0:11:31.790 something to work out. So, recently, even[br]getting a number of things, or getting an 0:11:31.790,0:11:38.790 idea, like, what what can we measure, what[br]do we have, how much do we have, I started 0:11:38.790,0:11:43.829 to create a table, and even visualizing[br]that was, was an interesting task. I'm 0:11:43.829,0:11:49.439 still not sure if anybody understands[br]this, but black basically means doesn't 0:11:49.439,0:11:55.970 exist. You don't need to, there is nothing[br]to, to measure, to index. Green means, yes 0:11:55.970,0:12:02.830 we do measure this already. And the red[br]ones mean, yellow means, it's tricky, but 0:12:02.830,0:12:09.459 it's kind of possible via some scripts or[br]using the API to get numbers out of the 0:12:09.459,0:12:15.420 Wikis, in certain name spaces, for example[br]the module name space. And red means, it's 0:12:15.420,0:12:22.600 very hard, but we'd like to get this data[br]at some point. Plus, also the complexity, 0:12:22.600,0:12:28.579 so the numbers you see here is sometimes[br]correct numbers, sometimes more of a 0:12:28.579,0:12:34.670 ballpark vague figure about how many[br]items, code repositories, projects we're 0:12:34.670,0:12:39.089 actually talking about. And with some[br]numbers, we're even wondering. For 0:12:39.089,0:12:46.199 example, it says 270 000 modules and[br]templates on the 900 sites, websites 0:12:46.199,0:12:53.019 we have on Wikimedia servers, and this is[br]what the database query says on hive, but 0:12:53.019,0:12:58.179 we're not really trusting that number yet.[br]So, this is actually what we're going to 0:12:58.179,0:13:03.139 be after over the next months to also have[br]way better data, and a way better overview 0:13:03.139,0:13:07.890 of where our developers actually are.[br]Because we know, in code repositories, we 0:13:07.890,0:13:17.209 have about 200 to 400 code contributors,[br]in Gerrit code review, per month. 0:13:17.209,0:13:24.480 And we now also know that we have about 500,[br]600 people who work on user scripts and 0:13:24.480,0:13:30.619 gadgets, per year. But for many other[br]things, we don't know yet, and that's what 0:13:30.619,0:13:36.199 I'm trying to improve over the next[br]months, or, maybe realistically, years. 0:13:36.199,0:13:45.299 Let's see. But, yeah. So, that's basically[br]it. I hope this was a bit interesting. 0:13:45.299,0:13:51.089 If you have any comments, questions, feel[br]free to catch me here. I'm sometimes 0:13:51.089,0:13:56.329 around the table. Feel free to catch me[br]after this talk. These are links with more 0:13:56.329,0:14:03.019 information, or, if you don't manage to[br]catch me, feel also free on the community 0:14:03.019,0:14:09.110 metrics page on MediaWiki.org, the first[br]link, there is a discussion page, and 0:14:09.110,0:14:14.939 there you can also bring up anything,[br]ideas, ask questions, I watch that page, 0:14:14.939,0:14:18.149 and, usually, reply. Thank you! 0:14:18.149,0:14:21.049 applause 0:14:21.049,0:14:24.809 postroll music 0:14:24.809,0:14:48.000 Subtitles created by c3subtitles.de[br]in the year 2021. Join, and help us!