0:00:00.000,0:00:20.510
36C3 preroll music
0:00:20.510,0:00:24.750
Daniel: Good morning! I'm glad you all[br]made it here this early on the last day. I
0:00:24.750,0:00:32.439
know it can can't be easy wasn't easy for[br]me I have to warn you that the way I
0:00:32.439,0:00:36.160
prepared for this song is a bit[br]experimental. I didn't make a slide set I
0:00:36.160,0:00:44.559
just made a mind map and I'll just click[br]through it while I talk to you. So,
0:00:44.559,0:00:51.180
this talk is about modernizing Wikipedia[br]as you probably have noticed visiting,
0:00:51.180,0:00:58.500
Wikipedia can feel a bit like visiting a[br]website from 10-15 years ago but before I
0:00:58.500,0:01:05.280
talk about any problems or things to[br]improve, I first want to revisit that the
0:01:05.280,0:01:11.619
software and the the infrastructure we[br]build around it has been running Wikipedia
0:01:11.619,0:01:20.160
and its sister sites for the last... well[br]nearly 19 years now and it's extremely
0:01:20.160,0:01:32.200
successful. We serve 17 billion page[br]views a month, yes?
0:01:32.200,0:01:40.870
Person in the audience: Could you make it[br]louder or speak up and also make the image
0:01:40.870,0:01:42.870
bigger?
0:01:42.870,0:01:43.870
inaudible dialogue
0:01:43.870,0:01:45.870
Daniel: Is this better? Like if I speak up[br]I will loose my voice in 10 minutes it's
0:01:45.870,0:01:55.720
already in it, no it's fine. We have[br]technology for this. I can... the light
0:01:55.720,0:02:05.490
doesn't help, yeah the contrast could be[br]better. Is it better like this? Okay cool.
0:02:05.490,0:02:13.840
All right so yeah we are serving 17[br]billion page views a month, which is quite
0:02:13.840,0:02:19.560
a lot. Wikipedia exists in about 100[br]languages. If you attended the talk about
0:02:19.560,0:02:24.250
the Wikimedia infrastructure yesterday, we[br]talked about 300 languages. We actually
0:02:24.250,0:02:29.989
support 300 languages for localization but[br]we have Wikipedia in about 100, if I'm not
0:02:29.989,0:02:38.689
completely off. I find this picture quite[br]fascinating. This is a visualization of
0:02:38.689,0:02:43.719
all the places in the world that are[br]described on Wikipedia and sister projects
0:02:43.719,0:02:49.319
and I find this quite impressive although[br]it's also a nice display of cultural bias
0:02:49.319,0:03:00.790
of course. We, that is Wikimedia[br]Foundation, run about 900 to a 1000 wikis
0:03:00.790,0:03:06.680
depending on how you count, but there are[br]many many more media wiki installations
0:03:06.680,0:03:11.459
out there, some of them big and many many[br]of them small. We have actually no idea
0:03:11.459,0:03:17.150
how many small instances there are. So[br]it's a very powerful very flexible and
0:03:17.150,0:03:23.730
versatile piece of software but, you know, but[br]sometimes it can feel like... you can do a
0:03:23.730,0:03:28.329
lot of things with it, right, but[br]sometimes it feels like it's a bit
0:03:28.329,0:03:42.180
overburdened and maybe you should look at[br]improving the foundations. So one of the
0:03:42.180,0:03:47.829
things that make MediaWiki great but also[br]sometimes hard to use is that kind of
0:03:47.829,0:03:52.609
everything is text, everything is markup,[br]everything is done with with wikitext,
0:03:52.609,0:04:02.529
which has grown in complexity over the[br]years so if you look at the autonomy of a
0:04:02.529,0:04:09.159
wiki page it can be a bit daunting. You[br]have different syntax for markup at
0:04:09.159,0:04:16.150
different kinds of transclusion or[br]templates and media and some things
0:04:16.150,0:04:21.739
actually, you know, get displayed in[br]place, some things show up in a completely
0:04:21.739,0:04:26.340
different place on the page it can be[br]rather confusing and daunting for
0:04:26.340,0:04:31.720
newcomers. And also things like having a[br]conversation just talking to people like,
0:04:31.720,0:04:35.540
you know, having a conversation thread[br]looks like this. You open the page you
0:04:35.540,0:04:40.510
look through the markup and you indent to[br]make a conversation thread and then you
0:04:40.510,0:04:43.480
get confused about the indenting and[br]someone messes with the formatting and
0:04:43.480,0:04:52.120
it's all excellent. There have been many[br]attempts over the years to improve the
0:04:52.120,0:05:00.290
situation, we have things like echo which[br]notifies you, for instance when someone
0:05:00.290,0:05:09.130
mentions your name or someone... It is[br]also used to to welcome people and do this
0:05:09.130,0:05:12.400
kind of achievement unlocked[br]notifications: hey, you did your first
0:05:12.400,0:05:19.900
edit, this is great welcome! To make[br]people a bit more engaged with the system
0:05:19.900,0:05:24.380
but it's really mostly improvements around[br]the fringes. We have had a system called
0:05:24.380,0:05:31.350
Flow for awhile to improve the way[br]conversations work. So you have more like
0:05:31.350,0:05:37.960
a thread structure that the software[br]actually knows about but then there are
0:05:37.960,0:05:42.160
many, well quite a few people who have[br]been around for a while that are very used
0:05:42.160,0:05:46.900
to the manual system and also there's a[br]lot of tools to support this manual system
0:05:46.900,0:05:52.780
which of course are incompatible with[br]making things more modern. So we use this
0:05:52.780,0:05:56.250
for instance on MediaWiki.org which is a[br]site which is basically a self
0:05:56.250,0:06:03.000
documentation site of MediaWiki but on[br]most Wikipedia this is not enabled or at
0:06:03.000,0:06:14.530
least not used for default everywhere. The[br]biggest attempt to move away from the text
0:06:14.530,0:06:23.050
only approach is Wikidata, which we[br]started in 2012. The idea of Wikidata of
0:06:23.050,0:06:29.580
course, if you didn't attend many great[br]talks we had about it here over of the
0:06:29.580,0:06:36.470
course of the Congress, is a way to[br]basically model the world using structured
0:06:36.470,0:06:45.470
data, using a semantic approach instead of[br]natural language which has its own
0:06:45.470,0:06:50.740
complexities but at least it's a way to[br]represent the knowledge of the world in a
0:06:50.740,0:06:56.790
way that machines can understand. So this[br]would be an alternative to wiki text but
0:06:56.790,0:07:09.389
still the vast majority of things[br]especially on Wikipedia are just markup.
0:07:09.389,0:07:13.800
And this markup is pretty powerful and[br]there's lots of ways to extend it and to
0:07:13.800,0:07:21.050
do things with it. So a lot of things on[br]MediaWiki are just DIY, just do it
0:07:21.050,0:07:29.250
yourself. Templates are a great example of[br]this. Infoboxes of course, the nice blue
0:07:29.250,0:07:34.730
boxes here on the right side of pages, are[br]done using templates but these templates
0:07:34.730,0:07:39.090
are just for formatting, there is not data[br]processing there's no the data base or
0:07:39.090,0:07:47.530
structured data backing them. It's just[br]basically, you know, it's still just
0:07:47.530,0:07:56.630
markup. It's still... you have a predefined[br]layout but you're still feeding a text not
0:07:56.630,0:08:04.520
data. You have parameters but the values[br]of the parameters are still again maybe
0:08:04.520,0:08:11.610
templates or links or you have markup in[br]them, like you know HTML line breaks and
0:08:11.610,0:08:18.860
stuff. So it's kind of semi structured.[br]And this of course is also used to do
0:08:18.860,0:08:24.100
things like workflow. The template... Oh[br]no, this was actually an infobox, wrong
0:08:24.100,0:08:34.229
picture, wrong capture. This is also used[br]to do workflows, so if a page on Wikipedia
0:08:34.229,0:08:39.789
gets nominated for deletion you put manual[br]put a template on the page that defines
0:08:39.789,0:08:44.870
why this is supposed to be deleted and[br]then you have to go to a different page
0:08:44.870,0:08:49.390
and put a different template there, giving[br]more explanation and this again is used
0:08:49.390,0:08:55.149
for discussion. It's a lot of structure[br]created by the community and maintained by
0:08:55.149,0:09:02.730
the community, using conventions and tools[br]built on top of what is essentially just a
0:09:02.730,0:09:10.620
pile of markup. And because doing all this[br]manually is kind of painful, only on there
0:09:10.620,0:09:17.360
we created a system to allow people to add[br]JavaScript to the site, which is then
0:09:17.360,0:09:27.019
maintained on wiki pages by the community[br]and it can tweak and automate. But again,
0:09:27.019,0:09:30.589
it doesn't really have much to work with,[br]right? It basically messes with whatever
0:09:30.589,0:09:35.470
it can, it directly interacts with the DOM[br]of the page, whenever the layout of the
0:09:35.470,0:09:41.040
software changes, things break. So this is[br]not great for for compatibility but it's
0:09:41.040,0:09:54.730
used a lot and it is very important for[br]the community to have this power. Sorry, I
0:09:54.730,0:10:00.110
wish there was a better way to show these[br]pictures. Okay, that's just to give you an
0:10:00.110,0:10:05.220
idea of what kind of thing is implemented[br]that way and maintained by the community
0:10:05.220,0:10:10.189
on their site. One of the problems we have[br]with that is: these are bound to a wiki
0:10:10.189,0:10:19.410
and I just told you that we run over 900[br]of these not over 9,000 and it would be
0:10:19.410,0:10:26.300
great if you could just share them between[br]wikis but we can't. And again, there have
0:10:26.300,0:10:30.790
been... we have been talking about it a[br]lot and it seems like it shouldn't be so
0:10:30.790,0:10:36.759
hard, but you kind of need to write these[br]tools differently, if you want to share
0:10:36.759,0:10:39.899
them across sites, because different sites[br]use different conventions, they use
0:10:39.899,0:10:45.529
different templates. Then it just doesn't[br]work and you actually have to write decent
0:10:45.529,0:10:50.970
software that uses internationalization if[br]you want to use it across wikis. While
0:10:50.970,0:10:55.019
these are usually just you know one-off[br]hacks with everything hard-coded we would
0:10:55.019,0:10:58.450
have to put in place an[br]internationalization system and it's
0:10:58.450,0:11:02.910
actually a lot of effort and there's a lot[br]of things that are actually unclear about
0:11:02.910,0:11:15.260
it. So, before I dive more deeply into the[br]different things that will make it hard to
0:11:15.260,0:11:20.529
improve on the current situation and the[br]things that we are doing to improve it do
0:11:20.529,0:11:27.309
we have any questions or do you have any[br]other - do you have any things you may
0:11:27.309,0:11:34.519
find particularly, well, annoying or[br]particularly outdated, when interacting
0:11:34.519,0:11:40.920
with Wikipedia? Any thoughts on that?[br]Beyond what I just said?
0:11:40.920,0:11:48.769
Microphone: The strict separation, just in[br]Wikipedia, between mobile layout and
0:11:48.769,0:11:54.259
desktop layout.[br]Daniel: Yeah. So, actually having a
0:11:54.259,0:12:02.069
reactive layout system that would just[br]work for mobile and desktop in the same
0:12:02.069,0:12:09.130
way and allowing the designers and UX[br]experts, who work on the system to just do
0:12:09.130,0:12:15.180
this once and not two or maybe even three[br]times - because of course we also have
0:12:15.180,0:12:20.550
native applications for different[br]platforms - would be great and it's
0:12:20.550,0:12:24.360
something that we're looking into at the[br]moment. But it's not, you know , it's not
0:12:24.360,0:12:29.519
that easy we could build a completely new[br]system, that does this but then again you
0:12:29.519,0:12:33.249
would be telling people: "You can no[br]longer use the old system", but now they
0:12:33.249,0:12:39.019
have build all these tools that rely on[br]how the old system works and you have to
0:12:39.019,0:12:52.089
port all of this over so there's a lot of[br]inertia. Any other thoughts? Everyone is
0:12:52.089,0:13:03.720
still asleep that's excellent. So I can[br]continue. So, another thing that makes it
0:13:03.720,0:13:10.879
difficult to change how MediaWiki works or[br]to improve it is that we are trying to do
0:13:10.879,0:13:19.180
well to be at least two things at once: on[br]the one hand we are running a top 5
0:13:19.180,0:13:24.360
website and serving over 100,000 requests[br]per second using the system and you on the
0:13:24.360,0:13:30.540
other hand, at least until now, we have[br]always made sure that you can just
0:13:30.540,0:13:33.800
download MediaWiki and install it on a[br]shared hosting platform you don't even
0:13:33.800,0:13:38.920
need root on the system, right? You don't[br]even need administrative privileges you
0:13:38.920,0:13:44.769
can just set it up and run it in your web[br]space and it will work. And, having the
0:13:44.769,0:13:51.779
same piece of software do both, run in a[br]minimal environment and run at scale, is
0:13:51.779,0:13:55.040
rather difficult and it also means that[br]there's a lot of things that we can't
0:13:55.040,0:14:02.110
easily do, right? All this modern micro[br]service architecture separate front-end
0:14:02.110,0:14:09.309
and back-end systems, all of that means[br]that it's a lot more complicated to set up
0:14:09.309,0:14:15.720
and needs more knowledge or more[br]infrastructure to set up and so far that
0:14:15.720,0:14:19.500
meant we can't do it, because so far there[br]was this requirement that you should
0:14:19.500,0:14:23.569
really be able to just run it on your[br]shared hosting. And we are currently
0:14:23.569,0:14:29.639
considering to what extent we can continue[br]this, I mean, container based hosting is
0:14:29.639,0:14:34.620
picking up. Maybe this is an alternative[br]it's still unclear but it seems like this
0:14:34.620,0:14:45.999
is something that we need to reconsider.[br]Yeah, but if we make this harder to do
0:14:45.999,0:14:52.739
then a lot of current users of MediaWiki[br]would maybe not, well, maybe no longer
0:14:52.739,0:14:57.230
exist or at least would not exist as they[br]do now, right. You probably have seen
0:14:57.230,0:15:05.259
this nice MediaWiki instance the Congress[br]wiki. Which - with a completely customized
0:15:05.259,0:15:09.689
skin and a lot of extensions installed to[br]allow people to define their sessions
0:15:09.689,0:15:14.410
there and making sure these sessions[br]automatically get listed and get put into
0:15:14.410,0:15:20.660
a calendar - this is all done using[br]extensions, like Semantic MediaWiki, that
0:15:20.660,0:15:34.279
allow you to basically define queries in[br]the wiki text markup. Yeah, another thing
0:15:34.279,0:15:42.079
that, of course, slows down development is[br]that Wikimedia does engineering on a,
0:15:42.079,0:15:48.130
well, comparatively a shoestring budget,[br]right? The budget of the Wikimedia
0:15:48.130,0:15:52.199
Foundation, the annual budget is something[br]like a hundred million dollars, that
0:15:52.199,0:15:58.009
sounds like a lot of money, but if you[br]compare it to other companies running a
0:15:58.009,0:16:03.209
top five or top ten website it's like two[br]percent of their budget or something like
0:16:03.209,0:16:10.769
that, right? It's really, I mean, 100[br]million is not peanuts but compared to
0:16:10.769,0:16:16.699
what other companies invest to achieve[br]this kind of goal it kind of is, so , what
0:16:16.699,0:16:22.230
this budget translates into is something[br]like 300, depending on how you count,
0:16:22.230,0:16:28.800
between three hundred and four hundred[br]staff. So, this is the people who run all
0:16:28.800,0:16:32.189
of this, including all the community[br]outreach all the social aspects all the
0:16:32.189,0:16:40.920
administrative aspects. Less than half of[br]these are the engineers who do all this.
0:16:40.920,0:16:50.989
And we have like, something like 2,500[br]servers, bare-metal, so, which is not a
0:16:50.989,0:16:57.619
lot for this kind of thing. Which also[br]means that we have to design the software
0:16:57.619,0:17:07.079
to be not just scalable but also quite[br]efficient. The modern approach to scaling
0:17:07.079,0:17:11.640
is usually scale horizontally make it so[br]you can just spin up another virtual
0:17:11.640,0:17:19.280
machine in some cloud service, but, yeah,[br]we run our own service, we run our own
0:17:19.280,0:17:24.440
servers, so we can design to scale[br]horizontally, but it means ordering
0:17:24.440,0:17:32.070
hardware and setting it up and it's going[br]to take half a year or so. And we don't
0:17:32.070,0:17:38.390
actually have that many people who do[br]this, so, scalability and performance are
0:17:38.390,0:17:49.000
also important factors when designing the[br]software. Okay. Before I dive into what we
0:17:49.000,0:18:03.860
are actually doing - any questions? This[br]one in the back. Wait for the mic, please.
0:18:03.860,0:18:07.330
In the very...[br]Q: Hi!
0:18:07.330,0:18:12.950
Daniel: Hello.[br]Q: So, you said you don't have that many
0:18:12.950,0:18:22.990
people, but how many do you actually have?[br]Daniel: For... it's something like 150 engineers
0:18:22.990,0:18:27.170
worldwide. It always depends on what you[br]count, right? So you count the people, who
0:18:27.170,0:18:32.260
- do you count engineers, who work on the[br]native apps, do you account engineers, who
0:18:32.260,0:18:36.980
work on the Wikimedia cloud services -[br]actually we do have cloud services, we
0:18:36.980,0:18:41.190
offer them to the community to run their[br]own things, but we don't run our stuff on
0:18:41.190,0:18:45.560
other people's cloud. Yeah, so depending[br]on how you count or something and whether
0:18:45.560,0:18:50.210
you count the people working here in[br]Germany for Wikimedia Germany, which is a
0:18:50.210,0:18:57.760
separate organization technically - it's[br]something like 150 engineers.
0:18:57.760,0:19:08.210
Q: Thanks![br]Q: I'm interested: What are the reasons
0:19:08.210,0:19:13.880
that you don't run on other people's[br]services like on the cloud. I mean, then
0:19:13.880,0:19:17.090
it will be easy to scale horizontally,[br]right?
0:19:17.090,0:19:25.330
Daniel: There's, well, one reason is being[br]independent, right? If we, yeah, I imagine
0:19:25.330,0:19:32.350
we ran all our stuff on Amazon's[br]infrastructure and then maybe Amazon
0:19:32.350,0:19:38.060
doesn't like the way that the Wikipedia[br]article about Amazon is written - what do
0:19:38.060,0:19:42.050
we do, right? Maybe they shut us down,[br]maybe they make things very expensive,
0:19:42.050,0:19:47.360
maybe they make things very painful for[br]us, maybe there is some at least like it
0:19:47.360,0:19:54.070
self-censorship mechanism happening and we[br]want to avoid that. There are there are
0:19:54.070,0:19:58.440
thoughts about this there are thoughts[br]like maybe we can do this at least for
0:19:58.440,0:20:04.270
development infrastructure and CI, not for[br]production or maybe we can make it so that
0:20:04.270,0:20:12.200
we run stuff in the cloud services by more[br]than one vendor, so we basically we spread
0:20:12.200,0:20:17.860
out so we are not reliant on a single[br]company. We are thinking about these
0:20:17.860,0:20:21.820
things but so far the way to actually stay[br]independent has been to run our own
0:20:21.820,0:20:28.300
servers.[br]Q: You've been talking about scalability
0:20:28.300,0:20:35.490
and changing the architecture, that kind[br]of seems to imply to me that there's a
0:20:35.490,0:20:42.270
problem with scaling at the moment or that[br]it's foreseeable that things are not gonna
0:20:42.270,0:20:46.580
work out if you just keep doing what[br]you're doing at the moment. Can you maybe
0:20:46.580,0:20:52.480
elaborate on that.[br]Daniel: So, there's, I think there's two sides
0:20:52.480,0:20:56.850
to this. On the one hand the reason I[br]mentioned it is just that a lot of things
0:20:56.850,0:21:01.610
that are really easy to do basically for[br]me, right? Works on my machine are really
0:21:01.610,0:21:08.920
hard to do if you want to do them at[br]scale. That's one aspect. The other aspect
0:21:08.920,0:21:16.670
is MediaWiki is pretty much a PHP monolith[br]and that means getting it always means
0:21:16.670,0:21:23.680
copying the monolith and breaking it down[br]so you have smaller units that you can
0:21:23.680,0:21:29.040
scale and just say, yeah, I don't know, I[br]need more instances for authentication
0:21:29.040,0:21:33.910
handling or something like that. That[br]would be more efficient, right, because
0:21:33.910,0:21:40.730
you have higher granularity, you can just[br]scale the things that you actually need
0:21:40.730,0:21:47.530
but that of course needs rearchitecting.[br]It's not like things are going to explode
0:21:47.530,0:21:52.910
if we don't do that very soon, it's not,[br]so there's not like an urgent problem
0:21:52.910,0:21:58.400
there. The reason for us to rearchitect is[br]more, to gain more flexibility in
0:21:58.400,0:22:03.330
development, because if you have a[br]monolith that is pretty entangled, code
0:22:03.330,0:22:16.130
changes are risky and take a long time.[br]Q: How many people work on product design
0:22:16.130,0:22:25.460
or like user experience research to, like,[br]sit down with users and try to understand
0:22:25.460,0:22:28.440
what their needs are and from there[br]proceed.
0:22:28.440,0:22:33.230
A: Across... I don't have an exact number,[br]something like five.
0:22:33.230,0:22:37.930
Audience: Do you think that's sufficient?[br]Herald: The question was, whether it's
0:22:37.930,0:22:46.800
sufficient. So just...[br]Daniel: Probably not? But it's more than,
0:22:46.800,0:22:50.310
that's more people than we have for[br]database administration, and that's also
0:22:50.310,0:23:06.040
not sufficient.[br]Herald: Are the further questions? I don't
0:23:06.040,0:23:16.270
think.[br]Daniel: Okay. So, one of the things, that
0:23:16.270,0:23:20.320
holds us back a bit, is that there's[br]literally thousands of extensions for
0:23:20.320,0:23:26.870
MediaWiki and the extension mechanism is[br]heavily reliant on hooks, so basically on
0:23:26.870,0:23:39.600
callbacks. And, we have - I don't have a[br]picture, I have a link here - we have a
0:23:39.600,0:23:44.500
great number of these. So, you see, each[br]paragraph is basically documenting one
0:23:44.500,0:23:51.970
callback that you can use to modify the[br]behavior of the software and, I mean,
0:23:51.970,0:23:59.240
there's, I have never counted, but[br]something like a thousand? And all of them
0:23:59.240,0:24:07.520
are of course interfaces to extra - to[br]software that is maintained externally, so
0:24:07.520,0:24:12.611
they have to be kept stable and if you[br]have a large chunk of software that you
0:24:12.611,0:24:16.730
want to restructure but you have a[br]thousand fixed points that you can't
0:24:16.730,0:24:22.960
change, things become rather difficult.[br]It's basi.. yeah, these hook points kind
0:24:22.960,0:24:27.640
of, like, they act like nails in the[br]architecture and then you kind of have to
0:24:27.640,0:24:36.650
wiggle around them - it's fun. We are[br]working to change that. We want to
0:24:36.650,0:24:43.950
architect it so the interface that is[br]exposed to these hooks become much more
0:24:43.950,0:24:51.360
narrow and the things that these hooks or[br]these callback functions can do is much
0:24:51.360,0:24:58.690
more restricted. There's currently an RSC[br]open for this, has been open for a while
0:24:58.690,0:25:04.690
actually. The problem is that in order to[br]assess whether the proposal is actually
0:25:04.690,0:25:11.530
viable you have to survey all the current[br]users of these hooks and make sure that we
0:25:11.530,0:25:15.660
can, the use case is still covered in the[br]new system and, yeah, we have like a
0:25:15.660,0:25:21.030
thousand hook points and we have like a[br]thousand extensions that's quite a bit of
0:25:21.030,0:25:31.060
work. Another thing that I'm currently[br]working on is establishing a stable
0:25:31.060,0:25:36.990
interface policy. This may sound pretty[br]obvious - it has a lot of pretty obvious
0:25:36.990,0:25:42.430
things like, yeah, if you have a class and[br]there's a public method then that's a
0:25:42.430,0:25:46.410
stable interface it will not just change[br]without notice, we have deprecation policy
0:25:46.410,0:25:53.020
and all that. But if you have worked with[br]extensible systems that rely on the
0:25:53.020,0:25:58.350
mechanisms of object-oriented programming,[br]you may have come across the question
0:25:58.350,0:26:05.040
whether a protected method is part of this[br]stable interface of the software or not,
0:26:05.040,0:26:10.010
or maybe the constructor? I don't know, if[br]you have worked in environments that use
0:26:10.010,0:26:15.860
dependency injection the idea is basically[br]that the construction signature should be
0:26:15.860,0:26:21.270
able to change at any time but then you[br]have extensions that you're subclassing and
0:26:21.270,0:26:25.640
things break. So, this is why we are[br]trying to establish a much more
0:26:25.640,0:26:32.750
restrictive stable interface policy, that[br]would would make explicit things like
0:26:32.750,0:26:36.650
constructor signatures actually not being[br]stable and that gives us a lot more wiggle
0:26:36.650,0:26:51.030
room to restructure the software.[br]MediaWiki itself has grown as a software
0:26:51.030,0:26:58.750
for the last 18 years or so and, at least[br]in the beginning, was mostly created by
0:26:58.750,0:27:06.330
volunteers. And in a monolithic[br]architecture there's a great tendency to
0:27:06.330,0:27:11.070
just, you know, find and grab the thing[br]that you want to use and just use it.
0:27:11.070,0:27:19.100
Which leads to, well, structures like this[br]one: everything depends on everything. And
0:27:19.100,0:27:26.360
if you change one bit of code everything[br]else may or may not break. And with, yeah.
0:27:26.360,0:27:31.350
And if you don't have great test coverage[br]at the same time this just makes it so
0:27:31.350,0:27:35.312
that any change becomes very risky and you[br]have to do a lot of manual testing a lot
0:27:35.312,0:27:43.690
of manual digging around, touching a lot[br]of files and we are for the last year,
0:27:43.690,0:27:50.510
year and a half we have started a[br]concerted effort to tie the worst - to cut
0:27:50.510,0:27:57.760
the worst ties, to decouple these things[br]that are, basically that have most impact
0:27:57.760,0:28:03.320
there's a few objects in the software that[br]rep... - for instance one that represents
0:28:03.320,0:28:08.280
the user and one that represents a title[br]that are used everywhere and the way
0:28:08.280,0:28:14.240
they're implemented currently also means[br]that they depend on everything and that of
0:28:14.240,0:28:29.620
course is not a good situation. On a,[br]well, a similar idea on a higher level is
0:28:29.620,0:28:34.400
decomposition of the software so the[br]decoupling was about the software
0:28:34.400,0:28:39.990
architecture this is about the system[br]architecture breaking up the
0:28:39.990,0:28:45.490
monolith itself into multiple services that[br]serve different purposes. The specifics of
0:28:45.490,0:28:50.281
this diagram are not really relevant to[br]this talk. This is more to, you know, give
0:28:50.281,0:28:57.710
you an impression of the complexity and[br]the sort of work we are doing there. The
0:28:57.710,0:29:05.580
idea is that perhaps we could split out[br]certain functionality into its own service
0:29:05.580,0:29:11.160
into a separate application, like maybe[br]move all the search functionality into
0:29:11.160,0:29:17.150
something separate and self-contained, but[br]then the question is how do you, again,
0:29:17.150,0:29:23.280
compose this into the final user interface[br]- at some point these things have to get
0:29:23.280,0:29:28.420
composed together again - and again this[br]is a very trivial trivial issue if you
0:29:28.420,0:29:32.470
only want to want this to work on your[br]machine or you only need to serve a
0:29:32.470,0:29:39.680
hundred users or something. But doing this[br]at scale doing it at the rate of something
0:29:39.680,0:29:45.230
like 10,000 page views a second, I said a[br]hundred thousand requests earlier but that
0:29:45.230,0:29:51.790
includes resources, icons, CSS and all[br]that. So, yeah, then you have to think
0:29:51.790,0:29:58.470
pretty hard about what you can cache and,[br]thank you, how you can recombine things
0:29:58.470,0:30:02.760
without having to recompute everything and[br]this is something that we are currently
0:30:02.760,0:30:08.580
looking into - coming up with a[br]architecture that allows us to compose and
0:30:08.580,0:30:23.220
recombine the output of different[br]background services. Okay. Before I
0:30:23.220,0:30:27.600
started this talk I said I would probably[br]roughly use half of my time going through
0:30:27.600,0:30:33.310
the presentation and I guess I just hit[br]that spot on. So, this is all I have
0:30:33.310,0:30:41.070
prepared but I'm happy to talk to you more[br]about the things I said or maybe any other
0:30:41.070,0:30:48.050
aspects of this that you may be interested[br]in. If any comments or questions. Oh!
0:30:48.050,0:30:56.800
Three already.[br]Q: First of all thanks a lot for the
0:30:56.800,0:31:03.150
presentation, such a really interesting[br]case of a legacy system and thanks for the
0:31:03.150,0:31:10.130
honesty. It was really interesting as a,[br]you know, software engineer to see how
0:31:10.130,0:31:15.101
that works. I have a question about[br]decoupling, so, I mean, I kind of, you
0:31:15.101,0:31:23.190
have like, probably your system is[br]enormous and how do you find, so to say,
0:31:23.190,0:31:29.100
the most evil, you know, parts which[br]sort of have to be decoupled. Do you use other
0:31:29.100,0:31:34.820
software, with, you know, this, like, what[br]a metrics and stuff or do you just know,
0:31:34.820,0:31:38.370
kind of intuitively..[br]Daniel: Yeah, it's actually, this is quite
0:31:38.370,0:31:44.970
interesting and maybe I can, maybe we can[br]talk about it a bit more in depth later.
0:31:44.970,0:31:49.020
Very quickly: it's a combination on the[br]one hand you just have the anecdotal
0:31:49.020,0:31:53.280
experience of what is actually annoying[br]when you work with the software and you
0:31:53.280,0:31:59.111
try to fix it and on the other hand I try[br]to find good tooling for this and the
0:31:59.111,0:32:05.440
existing tooling tends to die when you[br]just run it against our code base. So, one
0:32:05.440,0:32:09.930
of the things that you are looking for are[br]cyclic dependencies but the number of
0:32:09.930,0:32:15.080
possible cycles in a graph grows[br]exponentially with a number of nodes. And
0:32:15.080,0:32:17.710
if you have a pretty tightly knit graph[br]that number quickly goes into the
0:32:17.710,0:32:26.580
millions. And, yeah, the tool just goes to[br]100% CPU and never returns. So, I spend
0:32:26.580,0:32:33.600
quite a bit of time trying to find[br]heuristics to get around that - was a lot
0:32:33.600,0:32:41.550
of fun. I can, yeah, we can talk about[br]that later, if you like. Okay, thanks.
0:32:41.550,0:32:49.221
Q: So what exactly is this Wikidata you[br]mentioned before. Is it like an extension
0:32:49.221,0:32:55.580
or is it a completely different project?[br]Daniel: Wiki - so there's an extension called
0:32:55.580,0:33:04.630
Wikibase, that implements this, well I[br]would say, ontological modeling interface
0:33:04.630,0:33:11.980
for MediaWiki and that is used to run a[br]website called Wikidata which has
0:33:11.980,0:33:19.500
something like 30 million items modeled[br]that describe the world and serve as a
0:33:19.500,0:33:25.610
machine-readable data back-end to other[br]wiki project, other Wikimedia projects.
0:33:25.610,0:33:32.890
Yeah, I used to work on that project for[br]Wikimedia Germany. I moved on to do
0:33:32.890,0:33:41.150
different things now for a couple of[br]years. Lukas here in front is probably the
0:33:41.150,0:33:50.190
person most knowledgeable about the latest[br]and greatest in the Wikidata development.
0:33:50.190,0:33:56.240
Q: You've shortly talked about test[br]coverage. I will be into history..
0:33:56.240,0:33:58.650
Daniel: Sorry?[br]Q: You talked about test coverage.
0:33:58.650,0:34:02.010
Daniel: Yes.[br]Q: I would be interested in if you amped
0:34:02.010,0:34:07.660
your efforts to help you modernize it and[br]how your current situation is with test
0:34:07.660,0:34:11.809
coverage.[br]Daniel: Test coverage for MediaWiki core is below
0:34:11.809,0:34:21.809
50%. In some parts it's below 10% which is[br]very worrying. One thing that we started
0:34:21.809,0:34:30.050
to look into, like half a year ago, is[br]instead of writing unit tests for all the
0:34:30.050,0:34:36.010
code that we actually want to throw away,[br]before we touch it, we tried to improve
0:34:36.010,0:34:40.900
the test coverage using integration tests[br]on the API level. So we are currently in
0:34:40.900,0:34:48.240
the process of writing a suite of tests,[br]not just for the API modules, but for all
0:34:48.240,0:34:54.540
the functionality, all the application[br]logic behind the the API. And that will
0:34:54.540,0:35:01.070
hopefully cover most of the relevant code[br]paths and will give us confidence when we
0:35:01.070,0:35:12.420
refactor the code.[br]Q: Thanks.
0:35:12.420,0:35:26.280
Herald: Other questions?[br]Q: So you said that you have this legacy
0:35:26.280,0:35:32.240
system and eventually you have to move[br]away from it but are there any, like, I
0:35:32.240,0:35:39.820
don't know, plans for the near future to,[br]I don't know. At some point you have to
0:35:39.820,0:35:47.310
cut the current infrastructure to your[br]extensions and so on and it's a hard cut, I
0:35:47.310,0:35:53.330
see. But are there any plans to build it[br]up from scratch or what are the plans?
0:35:53.330,0:35:58.060
Daniel: Yeah, we are not going to rewrite from[br]scratch - that's a pretty sure fire way to
0:35:58.060,0:36:05.370
just kill the system. We will have to make[br]some tough decisions about backwards
0:36:05.370,0:36:11.340
compatibility and probably reconsider some[br]of the requirements and constraints we
0:36:11.340,0:36:17.100
have, well, with respect to the platforms[br]we run on and also the platforms we serve.
0:36:17.100,0:36:21.130
One of the things that we have been very[br]careful to do in the past for instance is
0:36:21.130,0:36:26.530
to make sure that you can do pretty much[br]everything with MediaWiki with no
0:36:26.530,0:36:32.800
JavaScript on the client side. And that[br]requirement is likely to drop. You will
0:36:32.800,0:36:40.010
still be able to read of course, without[br]any JavaScript or anything, but the extent
0:36:40.010,0:36:45.910
of functionality you will have without[br]JavaScript on the client side is likely to
0:36:45.910,0:36:51.140
be greatly reduced - that kind of thing.[br]Also we will probably end up breaking
0:36:51.140,0:36:57.660
compatibility to at least some of the[br]user-created tools. Hopefully we can offer
0:36:57.660,0:37:02.390
good alternatives, good APIs, good[br]libraries that people can actually port
0:37:02.390,0:37:11.070
to, that are less brittle. I hope that[br]will motivate people and maybe repay them
0:37:11.070,0:37:15.950
a bit for the pain of having their tool[br]broken. If we can give them something that
0:37:15.950,0:37:21.119
is more stable, more reliable, and[br]hopefully even nicer to use. Yeah, so,
0:37:21.119,0:37:25.930
it's small increments, bits, and pieces[br]all over the system there's no, you know,
0:37:25.930,0:37:32.550
no great master plan, no big change to[br]point to really.
0:37:32.550,0:37:45.470
Herald: Okay, okay, further questions?[br]Daniel: I plan to just sit outside here at
0:37:45.470,0:37:54.800
the table later if you just want to come[br]and chat so we can also do that there.
0:37:54.800,0:38:01.250
Herald: Okay, so, last call are there any[br]other questions? It does not appear so,
0:38:01.250,0:38:08.110
so, I'd like ask for a huge applause for[br]Daniel for this talk.
0:38:08.110,0:38:12.627
Applause
0:38:12.627,0:38:14.730
36C3 postroll music
0:38:14.730,0:38:38.320
Subtitles created by c3subtitles.de[br]in the year 2020. Join, and help us!