0:00:00.000,0:00:12.010
36C3 preroll music
0:00:12.010,0:00:22.720
Andre Klapper: Alright, thank you. Thanks[br]for your interest. I'm Andre, I'm with the
0:00:22.720,0:00:28.130
Wikimedia Foundation, and one of the[br]things I'm currently trying to find out is
0:00:28.130,0:00:37.090
how to measure activity, people in our[br]technical communities. And you probably
0:00:37.090,0:00:42.020
know that Wikimedia is a large, large[br]project. There's like more than 900
0:00:42.020,0:00:47.680
websites, and there's many areas where you[br]can contribute, technically, in different
0:00:47.680,0:00:53.330
ways. And we're currently trying to get an[br]overview. And even that is hard.
0:00:53.330,0:01:02.280
So, it is a complex task. And in this talk, I would[br]like to quickly show you what we already
0:01:02.280,0:01:08.220
have in place, and what we want to get in[br]place, and maybe also little bits of the
0:01:08.220,0:01:14.030
problems and the complexity. So, it's more[br]like, for your interest, or if you're
0:01:14.030,0:01:24.260
curious also to play with technical[br]metrics, statistics, things like these.
0:01:24.260,0:01:30.830
What we have currently is, mostly is about[br]git repositories, code repositories, and
0:01:30.830,0:01:35.030
we mostly use Gerrit for code review. We[br]have our own Gerrit instance at
0:01:35.030,0:01:43.320
gerrit.wikimedia.org. And for this we've[br]been having a platform called
0:01:43.320,0:01:52.070
wikimedia.biterg.io. If you've seen a[br]ElasticSearch, Kibana, standard platform
0:01:52.070,0:01:58.979
thingy, this might be familiar to you. It[br]is all Free and Open Source, it's actually
0:01:58.979,0:02:03.259
a Linux Foundation project, you can find[br]it under chaoss.community, chaoss with
0:02:03.259,0:02:09.399
double s, and the code base is public on[br]GitHub. So any other free and open source
0:02:09.399,0:02:14.859
software project can also set this up for[br]themselves. We have it hosted by Bitergia,
0:02:14.859,0:02:19.019
but this is also possible to set up[br]yourself, if you're interested in
0:02:19.019,0:02:27.150
gathering statistics about your Free and[br]Open Source project. And there's also a
0:02:27.150,0:02:36.269
documentation page on MediaWiki.org which[br]is called community metrics. I think I
0:02:36.269,0:02:40.959
have screenshots here, because I never[br]trust the Internet at conferences, but I
0:02:40.959,0:02:47.319
could also show you live… so this is the[br]GitHub page of the chaoss project by the
0:02:47.319,0:02:55.010
Linux foundation where you could get the[br]code. This is, I hope the zoom is
0:02:55.010,0:03:03.699
sufficient, wikimedia.biterg.io So this is[br]the overview page. You can see the
0:03:03.699,0:03:12.790
navigation up here, and you get some basic[br]statistics about the most active people in
0:03:12.790,0:03:18.260
the git repositories, which organizations[br]we have, so here you can see Wikimedia
0:03:18.260,0:03:26.080
Foundation individuals, hello welt,[br]Wikimedia Deutschland. So these are, this
0:03:26.080,0:03:31.619
is the contributor base we have, by[br]organization, by affiliation. And down
0:03:31.619,0:03:37.620
here there's way more statistics, gits,[br]Geritt, mailing lists, we index a lot of
0:03:37.620,0:03:43.230
things. We also index a little bit our[br]issue tracking system, which is
0:03:43.230,0:03:51.469
phabricator, and some edits on[br]MediaWiki.org. And, for example, now, if I
0:03:51.469,0:03:58.999
go to Gerrit and the overview page,[br]because we use Gerrit for code review,
0:03:58.999,0:04:06.109
they have more specific statistics, and as[br]it's ElasticSearch, Kibana based, you
0:04:06.109,0:04:09.930
might know this if you've played with[br]this, whenever you click on a certain
0:04:09.930,0:04:15.029
value, you can filter by that value. So,[br]for example, if I use the pie chart here,
0:04:15.029,0:04:19.590
and only want to see the numbers for[br]independent volunteer contributors,
0:04:19.590,0:04:26.400
I click it, and you see the numbers now[br]change. Obviously a bit lower, and you see
0:04:26.400,0:04:30.530
up here, that a filter has been applied,[br]and you can continue with these things.
0:04:30.530,0:04:36.250
Then you can go filter here also via code[br]repository, for example, the MediaWiki
0:04:36.250,0:04:42.500
core repository. If I click on that one,[br]it also filters for the value, and you can
0:04:42.500,0:04:49.510
basically drill down the statistics you[br]want to gather here. And there's, as I
0:04:49.510,0:04:53.871
only have 15 minutes, there's way more[br]things you can find out here, also, for
0:04:53.871,0:05:02.600
example, who reviews patches in Gerrit,[br]how long patches have been open, median
0:05:02.600,0:05:08.870
time, all these things you might want to[br]gather to find out how well are we doing
0:05:08.870,0:05:15.540
as a project, when it comes to both[br]involving volunteers, and also give them
0:05:15.540,0:05:21.350
the feedback when it comes to code review,[br]and engagement, that you would like to
0:05:21.350,0:05:26.470
give. Or, also, areas for improvement. For[br]example, in Wikimedia Foundation obviously
0:05:26.470,0:05:33.100
we have engineering teams, and some of[br]them maintain certain code repositories,
0:05:33.100,0:05:39.261
so you can filter the view for certain[br]code repositories, and then see, for
0:05:39.261,0:05:44.640
example, you realize sometimes that[br]patches written by volunteers, it takes
0:05:44.640,0:05:49.130
longer to review them than patches written[br]by your coworkers. And these kinds of
0:05:49.130,0:05:54.180
things which you maybe already assumed,[br]but it's nice to have actually data.
0:05:54.180,0:06:02.810
There's also a few caveats here. So, for[br]example, I usually don't use the git
0:06:02.810,0:06:10.310
statistics, because Gerrit is where the[br]code review happens. And once a patch
0:06:10.310,0:06:15.430
proposed and Gerrit has been accepted and[br]merged in the git repository, you would
0:06:15.430,0:06:20.700
also see that in the git repository, but[br]as all our software is Open Source, Free
0:06:20.700,0:06:26.420
Software, we also of course pull in a lot[br]of git repositories from other upstream
0:06:26.420,0:06:31.020
projects, because we use a lot of software[br]invented and maintained somewhere else to
0:06:31.020,0:06:38.550
run our servers. So the git statistics[br]also include activity that we've imported
0:06:38.550,0:06:43.790
within the git repositories from other[br]companies. So, that's kind of misleading.
0:06:43.790,0:06:48.820
And there's a few more caveats, which are[br]actually, I hope all of them are listed on
0:06:48.820,0:06:54.350
the community metrics page on[br]MediaWiki.org, because at some point I had
0:06:54.350,0:07:01.230
to create a section "behavior that might[br]surprise you". It also, that page also has
0:07:01.230,0:07:05.820
some examples like, how can I, for the[br]most common questions I get from
0:07:05.820,0:07:12.820
interested people, and also co-workers,[br]or, you want to publish an annual report,
0:07:12.820,0:07:16.300
and show how many volunteer contributors[br]you have in the code bases and these
0:07:16.300,0:07:27.870
things. So that is what we have. These[br]were the screenshots in case the Wi-Fi
0:07:27.870,0:07:35.990
doesn't work. And now the section, what is[br]patchwork. A spoiler: Basically everything
0:07:35.990,0:07:43.120
else. Because this was the look at git and[br]git repositories and Gerrit for code
0:07:43.120,0:07:49.480
review. But there is way more going on[br]when it comes to technical contributions
0:07:49.480,0:07:58.590
and code in Wikimedia. There is GitHub.[br]So, we have some projects, quite a few,
0:07:58.590,0:08:02.461
that don't use Wikimedia git, Wikimedia[br]Gerrit, but they prefer GitHub, because
0:08:02.461,0:08:10.860
it's a different contribution system or[br]workflow. So, we already track some of
0:08:10.860,0:08:15.840
that, but we still have to improve even[br]finding a way how to find all the
0:08:15.840,0:08:20.100
repositories related to Wikimedia[br]Development on GitHub. Because they're not
0:08:20.100,0:08:27.090
all under the same organization. When it[br]comes to what I just showed you,
0:08:27.090,0:08:33.650
wikimedia.biterg.io, we define what is[br]being indexed in a public JSON file,
0:08:33.650,0:08:38.409
"projects". So, this is also linked from[br]the community metrics page on
0:08:38.409,0:08:43.379
mediawiki.org, where we define basically[br]what's, what gets indexed. And it's a long
0:08:43.379,0:08:50.579
list as you can say– see, also some[br]mailing lists, but there's a lot of code
0:08:50.579,0:08:57.149
actually on the Wikis. Inside of Wiki[br]pages. So, there are user scripts, there
0:08:57.149,0:09:02.830
are gadgets, like small JavaScript things[br]that enhance functionality, and they're
0:09:02.830,0:09:08.759
actually quite common. So, for example,[br]Wikimedia Commons, or English or German
0:09:08.759,0:09:15.059
Wikipedia, they have a lot of gadgets even[br]enabled by default, which makes some
0:09:15.059,0:09:22.279
behavior easier. For example, on Commons a[br]common gadget is adding a category to a
0:09:22.279,0:09:26.640
photo or image that has been uploaded.[br]That's way easier if you use a gadget
0:09:26.640,0:09:34.240
which is enabled by default. There are Lua[br]modules, and there's templates. For
0:09:34.240,0:09:39.241
example the info boxes that you see in[br]many Wikipedia articles on the side, for
0:09:39.241,0:09:43.839
example, if you look up a Wikipedia[br]article about a person. These are all
0:09:43.839,0:09:51.009
templates. And they're all stored on Wiki.[br]So, this is harder to track, to get a full
0:09:51.009,0:10:00.079
overview of that. And some extension code,[br]even we have about 130 MediaWiki
0:10:00.079,0:10:06.449
extensions deployed on Wikimedia servers.[br]But if you take a look only at the
0:10:06.449,0:10:11.860
extension home pages or MediaWiki.org,[br]there is more than 2000. So there's a lot
0:10:11.860,0:10:16.100
of code out there, and sometimes this code[br]is even stored just by copy and paste
0:10:16.100,0:10:20.510
putting it on a Wiki page, and saying:[br]here, copy and paste this, and it should
0:10:20.510,0:10:26.720
work. Which might not be the best revision[br]system when it comes to maintaining code,
0:10:26.720,0:10:33.139
ever, but it's a quick and dirty way, so[br]these things exist. And one other example,
0:10:33.139,0:10:40.199
unknown code repository locations. We also[br]have something called ToolForge. That's
0:10:40.199,0:10:44.920
what some people call "cloud services"[br]nowadays. So you can host your own little
0:10:44.920,0:10:50.579
helper tools which other people then can[br]also use, on a cloud services platform
0:10:50.579,0:10:55.069
called ToolForge that we offer. One[br]example would be, for example, page views.
0:10:55.069,0:11:02.770
So, if you want to see which pages are the[br]most popular on some Wiki, that's one
0:11:02.770,0:11:08.319
example out of, also thousands of tools[br]now actually. And though, of course, the
0:11:08.319,0:11:14.019
rules are that you must publish the source[br]code, it's sometimes really hard to also
0:11:14.019,0:11:18.249
make sure that this happens, and where it[br]happens. So for most repositories, we
0:11:18.249,0:11:23.329
know, we have an index, but for some we[br]actually don't know, which is also
0:11:23.329,0:11:31.790
something to work out. So, recently, even[br]getting a number of things, or getting an
0:11:31.790,0:11:38.790
idea, like, what what can we measure, what[br]do we have, how much do we have, I started
0:11:38.790,0:11:43.829
to create a table, and even visualizing[br]that was, was an interesting task. I'm
0:11:43.829,0:11:49.439
still not sure if anybody understands[br]this, but black basically means doesn't
0:11:49.439,0:11:55.970
exist. You don't need to, there is nothing[br]to, to measure, to index. Green means, yes
0:11:55.970,0:12:02.830
we do measure this already. And the red[br]ones mean, yellow means, it's tricky, but
0:12:02.830,0:12:09.459
it's kind of possible via some scripts or[br]using the API to get numbers out of the
0:12:09.459,0:12:15.420
Wikis, in certain name spaces, for example[br]the module name space. And red means, it's
0:12:15.420,0:12:22.600
very hard, but we'd like to get this data[br]at some point. Plus, also the complexity,
0:12:22.600,0:12:28.579
so the numbers you see here is sometimes[br]correct numbers, sometimes more of a
0:12:28.579,0:12:34.670
ballpark vague figure about how many[br]items, code repositories, projects we're
0:12:34.670,0:12:39.089
actually talking about. And with some[br]numbers, we're even wondering. For
0:12:39.089,0:12:46.199
example, it says 270 000 modules and[br]templates on the 900 sites, websites
0:12:46.199,0:12:53.019
we have on Wikimedia servers, and this is[br]what the database query says on hive, but
0:12:53.019,0:12:58.179
we're not really trusting that number yet.[br]So, this is actually what we're going to
0:12:58.179,0:13:03.139
be after over the next months to also have[br]way better data, and a way better overview
0:13:03.139,0:13:07.890
of where our developers actually are.[br]Because we know, in code repositories, we
0:13:07.890,0:13:17.209
have about 200 to 400 code contributors,[br]in Gerrit code review, per month.
0:13:17.209,0:13:24.480
And we now also know that we have about 500,[br]600 people who work on user scripts and
0:13:24.480,0:13:30.619
gadgets, per year. But for many other[br]things, we don't know yet, and that's what
0:13:30.619,0:13:36.199
I'm trying to improve over the next[br]months, or, maybe realistically, years.
0:13:36.199,0:13:45.299
Let's see. But, yeah. So, that's basically[br]it. I hope this was a bit interesting.
0:13:45.299,0:13:51.089
If you have any comments, questions, feel[br]free to catch me here. I'm sometimes
0:13:51.089,0:13:56.329
around the table. Feel free to catch me[br]after this talk. These are links with more
0:13:56.329,0:14:03.019
information, or, if you don't manage to[br]catch me, feel also free on the community
0:14:03.019,0:14:09.110
metrics page on MediaWiki.org, the first[br]link, there is a discussion page, and
0:14:09.110,0:14:14.939
there you can also bring up anything,[br]ideas, ask questions, I watch that page,
0:14:14.939,0:14:18.149
and, usually, reply. Thank you!
0:14:18.149,0:14:21.049
applause
0:14:21.049,0:14:24.809
postroll music
0:14:24.809,0:14:48.000
Subtitles created by c3subtitles.de[br]in the year 2021. Join, and help us!