36C3 preroll music
Andre Klapper: Alright, thank you. Thanks
for your interest. I'm Andre, I'm with the
Wikimedia Foundation, and one of the
things I'm currently trying to find out is
how to measure activity, people in our
technical communities. And you probably
know that Wikimedia is a large, large
project. There's like more than 900
websites, and there's many areas where you
can contribute, technically, in different
ways. And we're currently trying to get an
overview. And even that is hard.
So, it is a complex task. And in this talk, I would
like to quickly show you what we already
have in place, and what we want to get in
place, and maybe also little bits of the
problems and the complexity. So, it's more
like, for your interest, or if you're
curious also to play with technical
metrics, statistics, things like these.
What we have currently is, mostly is about
git repositories, code repositories, and
we mostly use Gerrit for code review. We
have our own Gerrit instance at
gerrit.wikimedia.org. And for this we've
been having a platform called
wikimedia.biterg.io. If you've seen a
ElasticSearch, Kibana, standard platform
thingy, this might be familiar to you. It
is all Free and Open Source, it's actually
a Linux Foundation project, you can find
it under chaoss.community, chaoss with
double s, and the code base is public on
GitHub. So any other free and open source
software project can also set this up for
themselves. We have it hosted by Bitergia,
but this is also possible to set up
yourself, if you're interested in
gathering statistics about your Free and
Open Source project. And there's also a
documentation page on MediaWiki.org which
is called community metrics. I think I
have screenshots here, because I never
trust the Internet at conferences, but I
could also show you live… so this is the
GitHub page of the chaoss project by the
Linux foundation where you could get the
code. This is, I hope the zoom is
sufficient, wikimedia.biterg.io So this is
the overview page. You can see the
navigation up here, and you get some basic
statistics about the most active people in
the git repositories, which organizations
we have, so here you can see Wikimedia
Foundation individuals, hello welt,
Wikimedia Deutschland. So these are, this
is the contributor base we have, by
organization, by affiliation. And down
here there's way more statistics, gits,
Geritt, mailing lists, we index a lot of
things. We also index a little bit our
issue tracking system, which is
phabricator, and some edits on
MediaWiki.org. And, for example, now, if I
go to Gerrit and the overview page,
because we use Gerrit for code review,
they have more specific statistics, and as
it's ElasticSearch, Kibana based, you
might know this if you've played with
this, whenever you click on a certain
value, you can filter by that value. So,
for example, if I use the pie chart here,
and only want to see the numbers for
independent volunteer contributors,
I click it, and you see the numbers now
change. Obviously a bit lower, and you see
up here, that a filter has been applied,
and you can continue with these things.
Then you can go filter here also via code
repository, for example, the MediaWiki
core repository. If I click on that one,
it also filters for the value, and you can
basically drill down the statistics you
want to gather here. And there's, as I
only have 15 minutes, there's way more
things you can find out here, also, for
example, who reviews patches in Gerrit,
how long patches have been open, median
time, all these things you might want to
gather to find out how well are we doing
as a project, when it comes to both
involving volunteers, and also give them
the feedback when it comes to code review,
and engagement, that you would like to
give. Or, also, areas for improvement. For
example, in Wikimedia Foundation obviously
we have engineering teams, and some of
them maintain certain code repositories,
so you can filter the view for certain
code repositories, and then see, for
example, you realize sometimes that
patches written by volunteers, it takes
longer to review them than patches written
by your coworkers. And these kinds of
things which you maybe already assumed,
but it's nice to have actually data.
There's also a few caveats here. So, for
example, I usually don't use the git
statistics, because Gerrit is where the
code review happens. And once a patch
proposed and Gerrit has been accepted and
merged in the git repository, you would
also see that in the git repository, but
as all our software is Open Source, Free
Software, we also of course pull in a lot
of git repositories from other upstream
projects, because we use a lot of software
invented and maintained somewhere else to
run our servers. So the git statistics
also include activity that we've imported
within the git repositories from other
companies. So, that's kind of misleading.
And there's a few more caveats, which are
actually, I hope all of them are listed on
the community metrics page on
MediaWiki.org, because at some point I had
to create a section "behavior that might
surprise you". It also, that page also has
some examples like, how can I, for the
most common questions I get from
interested people, and also co-workers,
or, you want to publish an annual report,
and show how many volunteer contributors
you have in the code bases and these
things. So that is what we have. These
were the screenshots in case the Wi-Fi
doesn't work. And now the section, what is
patchwork. A spoiler: Basically everything
else. Because this was the look at git and
git repositories and Gerrit for code
review. But there is way more going on
when it comes to technical contributions
and code in Wikimedia. There is GitHub.
So, we have some projects, quite a few,
that don't use Wikimedia git, Wikimedia
Gerrit, but they prefer GitHub, because
it's a different contribution system or
workflow. So, we already track some of
that, but we still have to improve even
finding a way how to find all the
repositories related to Wikimedia
Development on GitHub. Because they're not
all under the same organization. When it
comes to what I just showed you,
wikimedia.biterg.io, we define what is
being indexed in a public JSON file,
"projects". So, this is also linked from
the community metrics page on
mediawiki.org, where we define basically
what's, what gets indexed. And it's a long
list as you can say– see, also some
mailing lists, but there's a lot of code
actually on the Wikis. Inside of Wiki
pages. So, there are user scripts, there
are gadgets, like small JavaScript things
that enhance functionality, and they're
actually quite common. So, for example,
Wikimedia Commons, or English or German
Wikipedia, they have a lot of gadgets even
enabled by default, which makes some
behavior easier. For example, on Commons a
common gadget is adding a category to a
photo or image that has been uploaded.
That's way easier if you use a gadget
which is enabled by default. There are Lua
modules, and there's templates. For
example the info boxes that you see in
many Wikipedia articles on the side, for
example, if you look up a Wikipedia
article about a person. These are all
templates. And they're all stored on Wiki.
So, this is harder to track, to get a full
overview of that. And some extension code,
even we have about 130 MediaWiki
extensions deployed on Wikimedia servers.
But if you take a look only at the
extension home pages or MediaWiki.org,
there is more than 2000. So there's a lot
of code out there, and sometimes this code
is even stored just by copy and paste
putting it on a Wiki page, and saying:
here, copy and paste this, and it should
work. Which might not be the best revision
system when it comes to maintaining code,
ever, but it's a quick and dirty way, so
these things exist. And one other example,
unknown code repository locations. We also
have something called ToolForge. That's
what some people call "cloud services"
nowadays. So you can host your own little
helper tools which other people then can
also use, on a cloud services platform
called ToolForge that we offer. One
example would be, for example, page views.
So, if you want to see which pages are the
most popular on some Wiki, that's one
example out of, also thousands of tools
now actually. And though, of course, the
rules are that you must publish the source
code, it's sometimes really hard to also
make sure that this happens, and where it
happens. So for most repositories, we
know, we have an index, but for some we
actually don't know, which is also
something to work out. So, recently, even
getting a number of things, or getting an
idea, like, what what can we measure, what
do we have, how much do we have, I started
to create a table, and even visualizing
that was, was an interesting task. I'm
still not sure if anybody understands
this, but black basically means doesn't
exist. You don't need to, there is nothing
to, to measure, to index. Green means, yes
we do measure this already. And the red
ones mean, yellow means, it's tricky, but
it's kind of possible via some scripts or
using the API to get numbers out of the
Wikis, in certain name spaces, for example
the module name space. And red means, it's
very hard, but we'd like to get this data
at some point. Plus, also the complexity,
so the numbers you see here is sometimes
correct numbers, sometimes more of a
ballpark vague figure about how many
items, code repositories, projects we're
actually talking about. And with some
numbers, we're even wondering. For
example, it says 270 000 modules and
templates on the 900 sites, websites
we have on Wikimedia servers, and this is
what the database query says on hive, but
we're not really trusting that number yet.
So, this is actually what we're going to
be after over the next months to also have
way better data, and a way better overview
of where our developers actually are.
Because we know, in code repositories, we
have about 200 to 400 code contributors,
in Gerrit code review, per month.
And we now also know that we have about 500,
600 people who work on user scripts and
gadgets, per year. But for many other
things, we don't know yet, and that's what
I'm trying to improve over the next
months, or, maybe realistically, years.
Let's see. But, yeah. So, that's basically
it. I hope this was a bit interesting.
If you have any comments, questions, feel
free to catch me here. I'm sometimes
around the table. Feel free to catch me
after this talk. These are links with more
information, or, if you don't manage to
catch me, feel also free on the community
metrics page on MediaWiki.org, the first
link, there is a discussion page, and
there you can also bring up anything,
ideas, ask questions, I watch that page,
and, usually, reply. Thank you!
applause
postroll music
Subtitles created by c3subtitles.de
in the year 2021. Join, and help us!