-
36C3 preroll music
-
Andre Klapper: Alright, thank you. Thanks
for your interest. I'm Andre, I'm with the
-
Wikimedia Foundation, and one of the
things I'm currently trying to find out is
-
how to measure activity, people in our
technical communities. And you probably
-
know that Wikimedia is a large, large
project. There's like more than 900
-
websites, and there's many areas where you
can contribute, technically, in different
-
ways. And we're currently trying to get an
overview. And even that is hard.
-
So, it is a complex task. And in this talk, I would
like to quickly show you what we already
-
have in place, and what we want to get in
place, and maybe also little bits of the
-
problems and the complexity. So, it's more
like, for your interest, or if you're
-
curious also to play with technical
metrics, statistics, things like these.
-
What we have currently is, mostly is about
git repositories, code repositories, and
-
we mostly use Gerrit for code review. We
have our own Gerrit instance at
-
gerrit.wikimedia.org. And for this we've
been having a platform called
-
wikimedia.biterg.io. If you've seen a
ElasticSearch, Kibana, standard platform
-
thingy, this might be familiar to you. It
is all Free and Open Source, it's actually
-
a Linux Foundation project, you can find
it under chaoss.community, chaoss with
-
double s, and the code base is public on
GitHub. So any other free and open source
-
software project can also set this up for
themselves. We have it hosted by Bitergia,
-
but this is also possible to set up
yourself, if you're interested in
-
gathering statistics about your Free and
Open Source project. And there's also a
-
documentation page on MediaWiki.org which
is called community metrics. I think I
-
have screenshots here, because I never
trust the Internet at conferences, but I
-
could also show you live… so this is the
GitHub page of the chaoss project by the
-
Linux foundation where you could get the
code. This is, I hope the zoom is
-
sufficient, wikimedia.biterg.io So this is
the overview page. You can see the
-
navigation up here, and you get some basic
statistics about the most active people in
-
the git repositories, which organizations
we have, so here you can see Wikimedia
-
Foundation individuals, hello welt,
Wikimedia Deutschland. So these are, this
-
is the contributor base we have, by
organization, by affiliation. And down
-
here there's way more statistics, gits,
Geritt, mailing lists, we index a lot of
-
things. We also index a little bit our
issue tracking system, which is
-
phabricator, and some edits on
MediaWiki.org. And, for example, now, if I
-
go to Gerrit and the overview page,
because we use Gerrit for code review,
-
they have more specific statistics, and as
it's ElasticSearch, Kibana based, you
-
might know this if you've played with
this, whenever you click on a certain
-
value, you can filter by that value. So,
for example, if I use the pie chart here,
-
and only want to see the numbers for
independent volunteer contributors,
-
I click it, and you see the numbers now
change. Obviously a bit lower, and you see
-
up here, that a filter has been applied,
and you can continue with these things.
-
Then you can go filter here also via code
repository, for example, the MediaWiki
-
core repository. If I click on that one,
it also filters for the value, and you can
-
basically drill down the statistics you
want to gather here. And there's, as I
-
only have 15 minutes, there's way more
things you can find out here, also, for
-
example, who reviews patches in Gerrit,
how long patches have been open, median
-
time, all these things you might want to
gather to find out how well are we doing
-
as a project, when it comes to both
involving volunteers, and also give them
-
the feedback when it comes to code review,
and engagement, that you would like to
-
give. Or, also, areas for improvement. For
example, in Wikimedia Foundation obviously
-
we have engineering teams, and some of
them maintain certain code repositories,
-
so you can filter the view for certain
code repositories, and then see, for
-
example, you realize sometimes that
patches written by volunteers, it takes
-
longer to review them than patches written
by your coworkers. And these kinds of
-
things which you maybe already assumed,
but it's nice to have actually data.
-
There's also a few caveats here. So, for
example, I usually don't use the git
-
statistics, because Gerrit is where the
code review happens. And once a patch
-
proposed and Gerrit has been accepted and
merged in the git repository, you would
-
also see that in the git repository, but
as all our software is Open Source, Free
-
Software, we also of course pull in a lot
of git repositories from other upstream
-
projects, because we use a lot of software
invented and maintained somewhere else to
-
run our servers. So the git statistics
also include activity that we've imported
-
within the git repositories from other
companies. So, that's kind of misleading.
-
And there's a few more caveats, which are
actually, I hope all of them are listed on
-
the community metrics page on
MediaWiki.org, because at some point I had
-
to create a section "behavior that might
surprise you". It also, that page also has
-
some examples like, how can I, for the
most common questions I get from
-
interested people, and also co-workers,
or, you want to publish an annual report,
-
and show how many volunteer contributors
you have in the code bases and these
-
things. So that is what we have. These
were the screenshots in case the Wi-Fi
-
doesn't work. And now the section, what is
patchwork. A spoiler: Basically everything
-
else. Because this was the look at git and
git repositories and Gerrit for code
-
review. But there is way more going on
when it comes to technical contributions
-
and code in Wikimedia. There is GitHub.
So, we have some projects, quite a few,
-
that don't use Wikimedia git, Wikimedia
Gerrit, but they prefer GitHub, because
-
it's a different contribution system or
workflow. So, we already track some of
-
that, but we still have to improve even
finding a way how to find all the
-
repositories related to Wikimedia
Development on GitHub. Because they're not
-
all under the same organization. When it
comes to what I just showed you,
-
wikimedia.biterg.io, we define what is
being indexed in a public JSON file,
-
"projects". So, this is also linked from
the community metrics page on
-
mediawiki.org, where we define basically
what's, what gets indexed. And it's a long
-
list as you can say– see, also some
mailing lists, but there's a lot of code
-
actually on the Wikis. Inside of Wiki
pages. So, there are user scripts, there
-
are gadgets, like small JavaScript things
that enhance functionality, and they're
-
actually quite common. So, for example,
Wikimedia Commons, or English or German
-
Wikipedia, they have a lot of gadgets even
enabled by default, which makes some
-
behavior easier. For example, on Commons a
common gadget is adding a category to a
-
photo or image that has been uploaded.
That's way easier if you use a gadget
-
which is enabled by default. There are Lua
modules, and there's templates. For
-
example the info boxes that you see in
many Wikipedia articles on the side, for
-
example, if you look up a Wikipedia
article about a person. These are all
-
templates. And they're all stored on Wiki.
So, this is harder to track, to get a full
-
overview of that. And some extension code,
even we have about 130 MediaWiki
-
extensions deployed on Wikimedia servers.
But if you take a look only at the
-
extension home pages or MediaWiki.org,
there is more than 2000. So there's a lot
-
of code out there, and sometimes this code
is even stored just by copy and paste
-
putting it on a Wiki page, and saying:
here, copy and paste this, and it should
-
work. Which might not be the best revision
system when it comes to maintaining code,
-
ever, but it's a quick and dirty way, so
these things exist. And one other example,
-
unknown code repository locations. We also
have something called ToolForge. That's
-
what some people call "cloud services"
nowadays. So you can host your own little
-
helper tools which other people then can
also use, on a cloud services platform
-
called ToolForge that we offer. One
example would be, for example, page views.
-
So, if you want to see which pages are the
most popular on some Wiki, that's one
-
example out of, also thousands of tools
now actually. And though, of course, the
-
rules are that you must publish the source
code, it's sometimes really hard to also
-
make sure that this happens, and where it
happens. So for most repositories, we
-
know, we have an index, but for some we
actually don't know, which is also
-
something to work out. So, recently, even
getting a number of things, or getting an
-
idea, like, what what can we measure, what
do we have, how much do we have, I started
-
to create a table, and even visualizing
that was, was an interesting task. I'm
-
still not sure if anybody understands
this, but black basically means doesn't
-
exist. You don't need to, there is nothing
to, to measure, to index. Green means, yes
-
we do measure this already. And the red
ones mean, yellow means, it's tricky, but
-
it's kind of possible via some scripts or
using the API to get numbers out of the
-
Wikis, in certain name spaces, for example
the module name space. And red means, it's
-
very hard, but we'd like to get this data
at some point. Plus, also the complexity,
-
so the numbers you see here is sometimes
correct numbers, sometimes more of a
-
ballpark vague figure about how many
items, code repositories, projects we're
-
actually talking about. And with some
numbers, we're even wondering. For
-
example, it says 270 000 modules and
templates on the 900 sites, websites
-
we have on Wikimedia servers, and this is
what the database query says on hive, but
-
we're not really trusting that number yet.
So, this is actually what we're going to
-
be after over the next months to also have
way better data, and a way better overview
-
of where our developers actually are.
Because we know, in code repositories, we
-
have about 200 to 400 code contributors,
in Gerrit code review, per month.
-
And we now also know that we have about 500,
600 people who work on user scripts and
-
gadgets, per year. But for many other
things, we don't know yet, and that's what
-
I'm trying to improve over the next
months, or, maybe realistically, years.
-
Let's see. But, yeah. So, that's basically
it. I hope this was a bit interesting.
-
If you have any comments, questions, feel
free to catch me here. I'm sometimes
-
around the table. Feel free to catch me
after this talk. These are links with more
-
information, or, if you don't manage to
catch me, feel also free on the community
-
metrics page on MediaWiki.org, the first
link, there is a discussion page, and
-
there you can also bring up anything,
ideas, ask questions, I watch that page,
-
and, usually, reply. Thank you!
-
applause
-
postroll music
-
Subtitles created by c3subtitles.de
in the year 2021. Join, and help us!