0:00:00.000,0:00:20.510 36C3 preroll music 0:00:20.510,0:00:24.750 Daniel: Good morning! I'm glad you all[br]made it here this early on the last day. I 0:00:24.750,0:00:32.439 know it can can't be easy wasn't easy for[br]me I have to warn you that the way I 0:00:32.439,0:00:36.160 prepared for this song is a bit[br]experimental. I didn't make a slide set I 0:00:36.160,0:00:44.559 just made a mind map and I'll just click[br]through it while I talk to you. So, 0:00:44.559,0:00:51.180 this talk is about modernizing Wikipedia[br]as you probably have noticed visiting, 0:00:51.180,0:00:58.500 Wikipedia can feel a bit like visiting a[br]website from 10-15 years ago but before I 0:00:58.500,0:01:05.280 talk about any problems or things to[br]improve, I first want to revisit that the 0:01:05.280,0:01:11.619 software and the the infrastructure we[br]build around it has been running Wikipedia 0:01:11.619,0:01:20.160 and its sister sites for the last... well[br]nearly 19 years now and it's extremely 0:01:20.160,0:01:32.200 successful. We serve 17 billion page[br]views a month, yes? 0:01:32.200,0:01:40.870 Person in the audience: Could you make it[br]louder or speak up and also make the image 0:01:40.870,0:01:42.870 bigger? 0:01:42.870,0:01:43.870 inaudible dialogue 0:01:43.870,0:01:45.870 Daniel: Is this better? Like if I speak up[br]I will loose my voice in 10 minutes it's 0:01:45.870,0:01:55.720 already in it, no it's fine. We have[br]technology for this. I can... the light 0:01:55.720,0:02:05.490 doesn't help, yeah the contrast could be[br]better. Is it better like this? Okay cool. 0:02:05.490,0:02:13.840 All right so yeah we are serving 17[br]billion page views a month, which is quite 0:02:13.840,0:02:19.560 a lot. Wikipedia exists in about 100[br]languages. If you attended the talk about 0:02:19.560,0:02:24.250 the Wikimedia infrastructure yesterday, we[br]talked about 300 languages. We actually 0:02:24.250,0:02:29.989 support 300 languages for localization but[br]we have Wikipedia in about 100, if I'm not 0:02:29.989,0:02:38.689 completely off. I find this picture quite[br]fascinating. This is a visualization of 0:02:38.689,0:02:43.719 all the places in the world that are[br]described on Wikipedia and sister projects 0:02:43.719,0:02:49.319 and I find this quite impressive although[br]it's also a nice display of cultural bias 0:02:49.319,0:03:00.790 of course. We, that is Wikimedia[br]Foundation, run about 900 to a 1000 wikis 0:03:00.790,0:03:06.680 depending on how you count, but there are[br]many many more media wiki installations 0:03:06.680,0:03:11.459 out there, some of them big and many many[br]of them small. We have actually no idea 0:03:11.459,0:03:17.150 how many small instances there are. So[br]it's a very powerful very flexible and 0:03:17.150,0:03:23.730 versatile piece of software but, you know, but[br]sometimes it can feel like... you can do a 0:03:23.730,0:03:28.329 lot of things with it, right, but[br]sometimes it feels like it's a bit 0:03:28.329,0:03:42.180 overburdened and maybe you should look at[br]improving the foundations. So one of the 0:03:42.180,0:03:47.829 things that make MediaWiki great but also[br]sometimes hard to use is that kind of 0:03:47.829,0:03:52.609 everything is text, everything is markup,[br]everything is done with with wikitext, 0:03:52.609,0:04:02.529 which has grown in complexity over the[br]years so if you look at the autonomy of a 0:04:02.529,0:04:09.159 wiki page it can be a bit daunting. You[br]have different syntax for markup at 0:04:09.159,0:04:16.150 different kinds of transclusion or[br]templates and media and some things 0:04:16.150,0:04:21.739 actually, you know, get displayed in[br]place, some things show up in a completely 0:04:21.739,0:04:26.340 different place on the page it can be[br]rather confusing and daunting for 0:04:26.340,0:04:31.720 newcomers. And also things like having a[br]conversation just talking to people like, 0:04:31.720,0:04:35.540 you know, having a conversation thread[br]looks like this. You open the page you 0:04:35.540,0:04:40.510 look through the markup and you indent to[br]make a conversation thread and then you 0:04:40.510,0:04:43.480 get confused about the indenting and[br]someone messes with the formatting and 0:04:43.480,0:04:52.120 it's all excellent. There have been many[br]attempts over the years to improve the 0:04:52.120,0:05:00.290 situation, we have things like echo which[br]notifies you, for instance when someone 0:05:00.290,0:05:09.130 mentions your name or someone... It is[br]also used to to welcome people and do this 0:05:09.130,0:05:12.400 kind of achievement unlocked[br]notifications: hey, you did your first 0:05:12.400,0:05:19.900 edit, this is great welcome! To make[br]people a bit more engaged with the system 0:05:19.900,0:05:24.380 but it's really mostly improvements around[br]the fringes. We have had a system called 0:05:24.380,0:05:31.350 Flow for awhile to improve the way[br]conversations work. So you have more like 0:05:31.350,0:05:37.960 a thread structure that the software[br]actually knows about but then there are 0:05:37.960,0:05:42.160 many, well quite a few people who have[br]been around for a while that are very used 0:05:42.160,0:05:46.900 to the manual system and also there's a[br]lot of tools to support this manual system 0:05:46.900,0:05:52.780 which of course are incompatible with[br]making things more modern. So we use this 0:05:52.780,0:05:56.250 for instance on MediaWiki.org which is a[br]site which is basically a self 0:05:56.250,0:06:03.000 documentation site of MediaWiki but on[br]most Wikipedia this is not enabled or at 0:06:03.000,0:06:14.530 least not used for default everywhere. The[br]biggest attempt to move away from the text 0:06:14.530,0:06:23.050 only approach is Wikidata, which we[br]started in 2012. The idea of Wikidata of 0:06:23.050,0:06:29.580 course, if you didn't attend many great[br]talks we had about it here over of the 0:06:29.580,0:06:36.470 course of the Congress, is a way to[br]basically model the world using structured 0:06:36.470,0:06:45.470 data, using a semantic approach instead of[br]natural language which has its own 0:06:45.470,0:06:50.740 complexities but at least it's a way to[br]represent the knowledge of the world in a 0:06:50.740,0:06:56.790 way that machines can understand. So this[br]would be an alternative to wiki text but 0:06:56.790,0:07:09.389 still the vast majority of things[br]especially on Wikipedia are just markup. 0:07:09.389,0:07:13.800 And this markup is pretty powerful and[br]there's lots of ways to extend it and to 0:07:13.800,0:07:21.050 do things with it. So a lot of things on[br]MediaWiki are just DIY, just do it 0:07:21.050,0:07:29.250 yourself. Templates are a great example of[br]this. Infoboxes of course, the nice blue 0:07:29.250,0:07:34.730 boxes here on the right side of pages, are[br]done using templates but these templates 0:07:34.730,0:07:39.090 are just for formatting, there is not data[br]processing there's no the data base or 0:07:39.090,0:07:47.530 structured data backing them. It's just[br]basically, you know, it's still just 0:07:47.530,0:07:56.630 markup. It's still... you have a predefined[br]layout but you're still feeding a text not 0:07:56.630,0:08:04.520 data. You have parameters but the values[br]of the parameters are still again maybe 0:08:04.520,0:08:11.610 templates or links or you have markup in[br]them, like you know HTML line breaks and 0:08:11.610,0:08:18.860 stuff. So it's kind of semi structured.[br]And this of course is also used to do 0:08:18.860,0:08:24.100 things like workflow. The template... Oh[br]no, this was actually an infobox, wrong 0:08:24.100,0:08:34.229 picture, wrong capture. This is also used[br]to do workflows, so if a page on Wikipedia 0:08:34.229,0:08:39.789 gets nominated for deletion you put manual[br]put a template on the page that defines 0:08:39.789,0:08:44.870 why this is supposed to be deleted and[br]then you have to go to a different page 0:08:44.870,0:08:49.390 and put a different template there, giving[br]more explanation and this again is used 0:08:49.390,0:08:55.149 for discussion. It's a lot of structure[br]created by the community and maintained by 0:08:55.149,0:09:02.730 the community, using conventions and tools[br]built on top of what is essentially just a 0:09:02.730,0:09:10.620 pile of markup. And because doing all this[br]manually is kind of painful, only on there 0:09:10.620,0:09:17.360 we created a system to allow people to add[br]JavaScript to the site, which is then 0:09:17.360,0:09:27.019 maintained on wiki pages by the community[br]and it can tweak and automate. But again, 0:09:27.019,0:09:30.589 it doesn't really have much to work with,[br]right? It basically messes with whatever 0:09:30.589,0:09:35.470 it can, it directly interacts with the DOM[br]of the page, whenever the layout of the 0:09:35.470,0:09:41.040 software changes, things break. So this is[br]not great for for compatibility but it's 0:09:41.040,0:09:54.730 used a lot and it is very important for[br]the community to have this power. Sorry, I 0:09:54.730,0:10:00.110 wish there was a better way to show these[br]pictures. Okay, that's just to give you an 0:10:00.110,0:10:05.220 idea of what kind of thing is implemented[br]that way and maintained by the community 0:10:05.220,0:10:10.189 on their site. One of the problems we have[br]with that is: these are bound to a wiki 0:10:10.189,0:10:19.410 and I just told you that we run over 900[br]of these not over 9,000 and it would be 0:10:19.410,0:10:26.300 great if you could just share them between[br]wikis but we can't. And again, there have 0:10:26.300,0:10:30.790 been... we have been talking about it a[br]lot and it seems like it shouldn't be so 0:10:30.790,0:10:36.759 hard, but you kind of need to write these[br]tools differently, if you want to share 0:10:36.759,0:10:39.899 them across sites, because different sites[br]use different conventions, they use 0:10:39.899,0:10:45.529 different templates. Then it just doesn't[br]work and you actually have to write decent 0:10:45.529,0:10:50.970 software that uses internationalization if[br]you want to use it across wikis. While 0:10:50.970,0:10:55.019 these are usually just you know one-off[br]hacks with everything hard-coded we would 0:10:55.019,0:10:58.450 have to put in place an[br]internationalization system and it's 0:10:58.450,0:11:02.910 actually a lot of effort and there's a lot[br]of things that are actually unclear about 0:11:02.910,0:11:15.260 it. So, before I dive more deeply into the[br]different things that will make it hard to 0:11:15.260,0:11:20.529 improve on the current situation and the[br]things that we are doing to improve it do 0:11:20.529,0:11:27.309 we have any questions or do you have any[br]other - do you have any things you may 0:11:27.309,0:11:34.519 find particularly, well, annoying or[br]particularly outdated, when interacting 0:11:34.519,0:11:40.920 with Wikipedia? Any thoughts on that?[br]Beyond what I just said? 0:11:40.920,0:11:48.769 Microphone: The strict separation, just in[br]Wikipedia, between mobile layout and 0:11:48.769,0:11:54.259 desktop layout.[br]Daniel: Yeah. So, actually having a 0:11:54.259,0:12:02.069 reactive layout system that would just[br]work for mobile and desktop in the same 0:12:02.069,0:12:09.130 way and allowing the designers and UX[br]experts, who work on the system to just do 0:12:09.130,0:12:15.180 this once and not two or maybe even three[br]times - because of course we also have 0:12:15.180,0:12:20.550 native applications for different[br]platforms - would be great and it's 0:12:20.550,0:12:24.360 something that we're looking into at the[br]moment. But it's not, you know , it's not 0:12:24.360,0:12:29.519 that easy we could build a completely new[br]system, that does this but then again you 0:12:29.519,0:12:33.249 would be telling people: "You can no[br]longer use the old system", but now they 0:12:33.249,0:12:39.019 have build all these tools that rely on[br]how the old system works and you have to 0:12:39.019,0:12:52.089 port all of this over so there's a lot of[br]inertia. Any other thoughts? Everyone is 0:12:52.089,0:13:03.720 still asleep that's excellent. So I can[br]continue. So, another thing that makes it 0:13:03.720,0:13:10.879 difficult to change how MediaWiki works or[br]to improve it is that we are trying to do 0:13:10.879,0:13:19.180 well to be at least two things at once: on[br]the one hand we are running a top 5 0:13:19.180,0:13:24.360 website and serving over 100,000 requests[br]per second using the system and you on the 0:13:24.360,0:13:30.540 other hand, at least until now, we have[br]always made sure that you can just 0:13:30.540,0:13:33.800 download MediaWiki and install it on a[br]shared hosting platform you don't even 0:13:33.800,0:13:38.920 need root on the system, right? You don't[br]even need administrative privileges you 0:13:38.920,0:13:44.769 can just set it up and run it in your web[br]space and it will work. And, having the 0:13:44.769,0:13:51.779 same piece of software do both, run in a[br]minimal environment and run at scale, is 0:13:51.779,0:13:55.040 rather difficult and it also means that[br]there's a lot of things that we can't 0:13:55.040,0:14:02.110 easily do, right? All this modern micro[br]service architecture separate front-end 0:14:02.110,0:14:09.309 and back-end systems, all of that means[br]that it's a lot more complicated to set up 0:14:09.309,0:14:15.720 and needs more knowledge or more[br]infrastructure to set up and so far that 0:14:15.720,0:14:19.500 meant we can't do it, because so far there[br]was this requirement that you should 0:14:19.500,0:14:23.569 really be able to just run it on your[br]shared hosting. And we are currently 0:14:23.569,0:14:29.639 considering to what extent we can continue[br]this, I mean, container based hosting is 0:14:29.639,0:14:34.620 picking up. Maybe this is an alternative[br]it's still unclear but it seems like this 0:14:34.620,0:14:45.999 is something that we need to reconsider.[br]Yeah, but if we make this harder to do 0:14:45.999,0:14:52.739 then a lot of current users of MediaWiki[br]would maybe not, well, maybe no longer 0:14:52.739,0:14:57.230 exist or at least would not exist as they[br]do now, right. You probably have seen 0:14:57.230,0:15:05.259 this nice MediaWiki instance the Congress[br]wiki. Which - with a completely customized 0:15:05.259,0:15:09.689 skin and a lot of extensions installed to[br]allow people to define their sessions 0:15:09.689,0:15:14.410 there and making sure these sessions[br]automatically get listed and get put into 0:15:14.410,0:15:20.660 a calendar - this is all done using[br]extensions, like Semantic MediaWiki, that 0:15:20.660,0:15:34.279 allow you to basically define queries in[br]the wiki text markup. Yeah, another thing 0:15:34.279,0:15:42.079 that, of course, slows down development is[br]that Wikimedia does engineering on a, 0:15:42.079,0:15:48.130 well, comparatively a shoestring budget,[br]right? The budget of the Wikimedia 0:15:48.130,0:15:52.199 Foundation, the annual budget is something[br]like a hundred million dollars, that 0:15:52.199,0:15:58.009 sounds like a lot of money, but if you[br]compare it to other companies running a 0:15:58.009,0:16:03.209 top five or top ten website it's like two[br]percent of their budget or something like 0:16:03.209,0:16:10.769 that, right? It's really, I mean, 100[br]million is not peanuts but compared to 0:16:10.769,0:16:16.699 what other companies invest to achieve[br]this kind of goal it kind of is, so , what 0:16:16.699,0:16:22.230 this budget translates into is something[br]like 300, depending on how you count, 0:16:22.230,0:16:28.800 between three hundred and four hundred[br]staff. So, this is the people who run all 0:16:28.800,0:16:32.189 of this, including all the community[br]outreach all the social aspects all the 0:16:32.189,0:16:40.920 administrative aspects. Less than half of[br]these are the engineers who do all this. 0:16:40.920,0:16:50.989 And we have like, something like 2,500[br]servers, bare-metal, so, which is not a 0:16:50.989,0:16:57.619 lot for this kind of thing. Which also[br]means that we have to design the software 0:16:57.619,0:17:07.079 to be not just scalable but also quite[br]efficient. The modern approach to scaling 0:17:07.079,0:17:11.640 is usually scale horizontally make it so[br]you can just spin up another virtual 0:17:11.640,0:17:19.280 machine in some cloud service, but, yeah,[br]we run our own service, we run our own 0:17:19.280,0:17:24.440 servers, so we can design to scale[br]horizontally, but it means ordering 0:17:24.440,0:17:32.070 hardware and setting it up and it's going[br]to take half a year or so. And we don't 0:17:32.070,0:17:38.390 actually have that many people who do[br]this, so, scalability and performance are 0:17:38.390,0:17:49.000 also important factors when designing the[br]software. Okay. Before I dive into what we 0:17:49.000,0:18:03.860 are actually doing - any questions? This[br]one in the back. Wait for the mic, please. 0:18:03.860,0:18:07.330 In the very...[br]Q: Hi! 0:18:07.330,0:18:12.950 Daniel: Hello.[br]Q: So, you said you don't have that many 0:18:12.950,0:18:22.990 people, but how many do you actually have?[br]Daniel: For... it's something like 150 engineers 0:18:22.990,0:18:27.170 worldwide. It always depends on what you[br]count, right? So you count the people, who 0:18:27.170,0:18:32.260 - do you count engineers, who work on the[br]native apps, do you account engineers, who 0:18:32.260,0:18:36.980 work on the Wikimedia cloud services -[br]actually we do have cloud services, we 0:18:36.980,0:18:41.190 offer them to the community to run their[br]own things, but we don't run our stuff on 0:18:41.190,0:18:45.560 other people's cloud. Yeah, so depending[br]on how you count or something and whether 0:18:45.560,0:18:50.210 you count the people working here in[br]Germany for Wikimedia Germany, which is a 0:18:50.210,0:18:57.760 separate organization technically - it's[br]something like 150 engineers. 0:18:57.760,0:19:08.210 Q: Thanks![br]Q: I'm interested: What are the reasons 0:19:08.210,0:19:13.880 that you don't run on other people's[br]services like on the cloud. I mean, then 0:19:13.880,0:19:17.090 it will be easy to scale horizontally,[br]right? 0:19:17.090,0:19:25.330 Daniel: There's, well, one reason is being[br]independent, right? If we, yeah, I imagine 0:19:25.330,0:19:32.350 we ran all our stuff on Amazon's[br]infrastructure and then maybe Amazon 0:19:32.350,0:19:38.060 doesn't like the way that the Wikipedia[br]article about Amazon is written - what do 0:19:38.060,0:19:42.050 we do, right? Maybe they shut us down,[br]maybe they make things very expensive, 0:19:42.050,0:19:47.360 maybe they make things very painful for[br]us, maybe there is some at least like it 0:19:47.360,0:19:54.070 self-censorship mechanism happening and we[br]want to avoid that. There are there are 0:19:54.070,0:19:58.440 thoughts about this there are thoughts[br]like maybe we can do this at least for 0:19:58.440,0:20:04.270 development infrastructure and CI, not for[br]production or maybe we can make it so that 0:20:04.270,0:20:12.200 we run stuff in the cloud services by more[br]than one vendor, so we basically we spread 0:20:12.200,0:20:17.860 out so we are not reliant on a single[br]company. We are thinking about these 0:20:17.860,0:20:21.820 things but so far the way to actually stay[br]independent has been to run our own 0:20:21.820,0:20:28.300 servers.[br]Q: You've been talking about scalability 0:20:28.300,0:20:35.490 and changing the architecture, that kind[br]of seems to imply to me that there's a 0:20:35.490,0:20:42.270 problem with scaling at the moment or that[br]it's foreseeable that things are not gonna 0:20:42.270,0:20:46.580 work out if you just keep doing what[br]you're doing at the moment. Can you maybe 0:20:46.580,0:20:52.480 elaborate on that.[br]Daniel: So, there's, I think there's two sides 0:20:52.480,0:20:56.850 to this. On the one hand the reason I[br]mentioned it is just that a lot of things 0:20:56.850,0:21:01.610 that are really easy to do basically for[br]me, right? Works on my machine are really 0:21:01.610,0:21:08.920 hard to do if you want to do them at[br]scale. That's one aspect. The other aspect 0:21:08.920,0:21:16.670 is MediaWiki is pretty much a PHP monolith[br]and that means getting it always means 0:21:16.670,0:21:23.680 copying the monolith and breaking it down[br]so you have smaller units that you can 0:21:23.680,0:21:29.040 scale and just say, yeah, I don't know, I[br]need more instances for authentication 0:21:29.040,0:21:33.910 handling or something like that. That[br]would be more efficient, right, because 0:21:33.910,0:21:40.730 you have higher granularity, you can just[br]scale the things that you actually need 0:21:40.730,0:21:47.530 but that of course needs rearchitecting.[br]It's not like things are going to explode 0:21:47.530,0:21:52.910 if we don't do that very soon, it's not,[br]so there's not like an urgent problem 0:21:52.910,0:21:58.400 there. The reason for us to rearchitect is[br]more, to gain more flexibility in 0:21:58.400,0:22:03.330 development, because if you have a[br]monolith that is pretty entangled, code 0:22:03.330,0:22:16.130 changes are risky and take a long time.[br]Q: How many people work on product design 0:22:16.130,0:22:25.460 or like user experience research to, like,[br]sit down with users and try to understand 0:22:25.460,0:22:28.440 what their needs are and from there[br]proceed. 0:22:28.440,0:22:33.230 A: Across... I don't have an exact number,[br]something like five. 0:22:33.230,0:22:37.930 Audience: Do you think that's sufficient?[br]Herald: The question was, whether it's 0:22:37.930,0:22:46.800 sufficient. So just...[br]Daniel: Probably not? But it's more than, 0:22:46.800,0:22:50.310 that's more people than we have for[br]database administration, and that's also 0:22:50.310,0:23:06.040 not sufficient.[br]Herald: Are the further questions? I don't 0:23:06.040,0:23:16.270 think.[br]Daniel: Okay. So, one of the things, that 0:23:16.270,0:23:20.320 holds us back a bit, is that there's[br]literally thousands of extensions for 0:23:20.320,0:23:26.870 MediaWiki and the extension mechanism is[br]heavily reliant on hooks, so basically on 0:23:26.870,0:23:39.600 callbacks. And, we have - I don't have a[br]picture, I have a link here - we have a 0:23:39.600,0:23:44.500 great number of these. So, you see, each[br]paragraph is basically documenting one 0:23:44.500,0:23:51.970 callback that you can use to modify the[br]behavior of the software and, I mean, 0:23:51.970,0:23:59.240 there's, I have never counted, but[br]something like a thousand? And all of them 0:23:59.240,0:24:07.520 are of course interfaces to extra - to[br]software that is maintained externally, so 0:24:07.520,0:24:12.611 they have to be kept stable and if you[br]have a large chunk of software that you 0:24:12.611,0:24:16.730 want to restructure but you have a[br]thousand fixed points that you can't 0:24:16.730,0:24:22.960 change, things become rather difficult.[br]It's basi.. yeah, these hook points kind 0:24:22.960,0:24:27.640 of, like, they act like nails in the[br]architecture and then you kind of have to 0:24:27.640,0:24:36.650 wiggle around them - it's fun. We are[br]working to change that. We want to 0:24:36.650,0:24:43.950 architect it so the interface that is[br]exposed to these hooks become much more 0:24:43.950,0:24:51.360 narrow and the things that these hooks or[br]these callback functions can do is much 0:24:51.360,0:24:58.690 more restricted. There's currently an RSC[br]open for this, has been open for a while 0:24:58.690,0:25:04.690 actually. The problem is that in order to[br]assess whether the proposal is actually 0:25:04.690,0:25:11.530 viable you have to survey all the current[br]users of these hooks and make sure that we 0:25:11.530,0:25:15.660 can, the use case is still covered in the[br]new system and, yeah, we have like a 0:25:15.660,0:25:21.030 thousand hook points and we have like a[br]thousand extensions that's quite a bit of 0:25:21.030,0:25:31.060 work. Another thing that I'm currently[br]working on is establishing a stable 0:25:31.060,0:25:36.990 interface policy. This may sound pretty[br]obvious - it has a lot of pretty obvious 0:25:36.990,0:25:42.430 things like, yeah, if you have a class and[br]there's a public method then that's a 0:25:42.430,0:25:46.410 stable interface it will not just change[br]without notice, we have deprecation policy 0:25:46.410,0:25:53.020 and all that. But if you have worked with[br]extensible systems that rely on the 0:25:53.020,0:25:58.350 mechanisms of object-oriented programming,[br]you may have come across the question 0:25:58.350,0:26:05.040 whether a protected method is part of this[br]stable interface of the software or not, 0:26:05.040,0:26:10.010 or maybe the constructor? I don't know, if[br]you have worked in environments that use 0:26:10.010,0:26:15.860 dependency injection the idea is basically[br]that the construction signature should be 0:26:15.860,0:26:21.270 able to change at any time but then you[br]have extensions that you're subclassing and 0:26:21.270,0:26:25.640 things break. So, this is why we are[br]trying to establish a much more 0:26:25.640,0:26:32.750 restrictive stable interface policy, that[br]would would make explicit things like 0:26:32.750,0:26:36.650 constructor signatures actually not being[br]stable and that gives us a lot more wiggle 0:26:36.650,0:26:51.030 room to restructure the software.[br]MediaWiki itself has grown as a software 0:26:51.030,0:26:58.750 for the last 18 years or so and, at least[br]in the beginning, was mostly created by 0:26:58.750,0:27:06.330 volunteers. And in a monolithic[br]architecture there's a great tendency to 0:27:06.330,0:27:11.070 just, you know, find and grab the thing[br]that you want to use and just use it. 0:27:11.070,0:27:19.100 Which leads to, well, structures like this[br]one: everything depends on everything. And 0:27:19.100,0:27:26.360 if you change one bit of code everything[br]else may or may not break. And with, yeah. 0:27:26.360,0:27:31.350 And if you don't have great test coverage[br]at the same time this just makes it so 0:27:31.350,0:27:35.312 that any change becomes very risky and you[br]have to do a lot of manual testing a lot 0:27:35.312,0:27:43.690 of manual digging around, touching a lot[br]of files and we are for the last year, 0:27:43.690,0:27:50.510 year and a half we have started a[br]concerted effort to tie the worst - to cut 0:27:50.510,0:27:57.760 the worst ties, to decouple these things[br]that are, basically that have most impact 0:27:57.760,0:28:03.320 there's a few objects in the software that[br]rep... - for instance one that represents 0:28:03.320,0:28:08.280 the user and one that represents a title[br]that are used everywhere and the way 0:28:08.280,0:28:14.240 they're implemented currently also means[br]that they depend on everything and that of 0:28:14.240,0:28:29.620 course is not a good situation. On a,[br]well, a similar idea on a higher level is 0:28:29.620,0:28:34.400 decomposition of the software so the[br]decoupling was about the software 0:28:34.400,0:28:39.990 architecture this is about the system[br]architecture breaking up the 0:28:39.990,0:28:45.490 monolith itself into multiple services that[br]serve different purposes. The specifics of 0:28:45.490,0:28:50.281 this diagram are not really relevant to[br]this talk. This is more to, you know, give 0:28:50.281,0:28:57.710 you an impression of the complexity and[br]the sort of work we are doing there. The 0:28:57.710,0:29:05.580 idea is that perhaps we could split out[br]certain functionality into its own service 0:29:05.580,0:29:11.160 into a separate application, like maybe[br]move all the search functionality into 0:29:11.160,0:29:17.150 something separate and self-contained, but[br]then the question is how do you, again, 0:29:17.150,0:29:23.280 compose this into the final user interface[br]- at some point these things have to get 0:29:23.280,0:29:28.420 composed together again - and again this[br]is a very trivial trivial issue if you 0:29:28.420,0:29:32.470 only want to want this to work on your[br]machine or you only need to serve a 0:29:32.470,0:29:39.680 hundred users or something. But doing this[br]at scale doing it at the rate of something 0:29:39.680,0:29:45.230 like 10,000 page views a second, I said a[br]hundred thousand requests earlier but that 0:29:45.230,0:29:51.790 includes resources, icons, CSS and all[br]that. So, yeah, then you have to think 0:29:51.790,0:29:58.470 pretty hard about what you can cache and,[br]thank you, how you can recombine things 0:29:58.470,0:30:02.760 without having to recompute everything and[br]this is something that we are currently 0:30:02.760,0:30:08.580 looking into - coming up with a[br]architecture that allows us to compose and 0:30:08.580,0:30:23.220 recombine the output of different[br]background services. Okay. Before I 0:30:23.220,0:30:27.600 started this talk I said I would probably[br]roughly use half of my time going through 0:30:27.600,0:30:33.310 the presentation and I guess I just hit[br]that spot on. So, this is all I have 0:30:33.310,0:30:41.070 prepared but I'm happy to talk to you more[br]about the things I said or maybe any other 0:30:41.070,0:30:48.050 aspects of this that you may be interested[br]in. If any comments or questions. Oh! 0:30:48.050,0:30:56.800 Three already.[br]Q: First of all thanks a lot for the 0:30:56.800,0:31:03.150 presentation, such a really interesting[br]case of a legacy system and thanks for the 0:31:03.150,0:31:10.130 honesty. It was really interesting as a,[br]you know, software engineer to see how 0:31:10.130,0:31:15.101 that works. I have a question about[br]decoupling, so, I mean, I kind of, you 0:31:15.101,0:31:23.190 have like, probably your system is[br]enormous and how do you find, so to say, 0:31:23.190,0:31:29.100 the most evil, you know, parts which[br]sort of have to be decoupled. Do you use other 0:31:29.100,0:31:34.820 software, with, you know, this, like, what[br]a metrics and stuff or do you just know, 0:31:34.820,0:31:38.370 kind of intuitively..[br]Daniel: Yeah, it's actually, this is quite 0:31:38.370,0:31:44.970 interesting and maybe I can, maybe we can[br]talk about it a bit more in depth later. 0:31:44.970,0:31:49.020 Very quickly: it's a combination on the[br]one hand you just have the anecdotal 0:31:49.020,0:31:53.280 experience of what is actually annoying[br]when you work with the software and you 0:31:53.280,0:31:59.111 try to fix it and on the other hand I try[br]to find good tooling for this and the 0:31:59.111,0:32:05.440 existing tooling tends to die when you[br]just run it against our code base. So, one 0:32:05.440,0:32:09.930 of the things that you are looking for are[br]cyclic dependencies but the number of 0:32:09.930,0:32:15.080 possible cycles in a graph grows[br]exponentially with a number of nodes. And 0:32:15.080,0:32:17.710 if you have a pretty tightly knit graph[br]that number quickly goes into the 0:32:17.710,0:32:26.580 millions. And, yeah, the tool just goes to[br]100% CPU and never returns. So, I spend 0:32:26.580,0:32:33.600 quite a bit of time trying to find[br]heuristics to get around that - was a lot 0:32:33.600,0:32:41.550 of fun. I can, yeah, we can talk about[br]that later, if you like. Okay, thanks. 0:32:41.550,0:32:49.221 Q: So what exactly is this Wikidata you[br]mentioned before. Is it like an extension 0:32:49.221,0:32:55.580 or is it a completely different project?[br]Daniel: Wiki - so there's an extension called 0:32:55.580,0:33:04.630 Wikibase, that implements this, well I[br]would say, ontological modeling interface 0:33:04.630,0:33:11.980 for MediaWiki and that is used to run a[br]website called Wikidata which has 0:33:11.980,0:33:19.500 something like 30 million items modeled[br]that describe the world and serve as a 0:33:19.500,0:33:25.610 machine-readable data back-end to other[br]wiki project, other Wikimedia projects. 0:33:25.610,0:33:32.890 Yeah, I used to work on that project for[br]Wikimedia Germany. I moved on to do 0:33:32.890,0:33:41.150 different things now for a couple of[br]years. Lukas here in front is probably the 0:33:41.150,0:33:50.190 person most knowledgeable about the latest[br]and greatest in the Wikidata development. 0:33:50.190,0:33:56.240 Q: You've shortly talked about test[br]coverage. I will be into history.. 0:33:56.240,0:33:58.650 Daniel: Sorry?[br]Q: You talked about test coverage. 0:33:58.650,0:34:02.010 Daniel: Yes.[br]Q: I would be interested in if you amped 0:34:02.010,0:34:07.660 your efforts to help you modernize it and[br]how your current situation is with test 0:34:07.660,0:34:11.809 coverage.[br]Daniel: Test coverage for MediaWiki core is below 0:34:11.809,0:34:21.809 50%. In some parts it's below 10% which is[br]very worrying. One thing that we started 0:34:21.809,0:34:30.050 to look into, like half a year ago, is[br]instead of writing unit tests for all the 0:34:30.050,0:34:36.010 code that we actually want to throw away,[br]before we touch it, we tried to improve 0:34:36.010,0:34:40.900 the test coverage using integration tests[br]on the API level. So we are currently in 0:34:40.900,0:34:48.240 the process of writing a suite of tests,[br]not just for the API modules, but for all 0:34:48.240,0:34:54.540 the functionality, all the application[br]logic behind the the API. And that will 0:34:54.540,0:35:01.070 hopefully cover most of the relevant code[br]paths and will give us confidence when we 0:35:01.070,0:35:12.420 refactor the code.[br]Q: Thanks. 0:35:12.420,0:35:26.280 Herald: Other questions?[br]Q: So you said that you have this legacy 0:35:26.280,0:35:32.240 system and eventually you have to move[br]away from it but are there any, like, I 0:35:32.240,0:35:39.820 don't know, plans for the near future to,[br]I don't know. At some point you have to 0:35:39.820,0:35:47.310 cut the current infrastructure to your[br]extensions and so on and it's a hard cut, I 0:35:47.310,0:35:53.330 see. But are there any plans to build it[br]up from scratch or what are the plans? 0:35:53.330,0:35:58.060 Daniel: Yeah, we are not going to rewrite from[br]scratch - that's a pretty sure fire way to 0:35:58.060,0:36:05.370 just kill the system. We will have to make[br]some tough decisions about backwards 0:36:05.370,0:36:11.340 compatibility and probably reconsider some[br]of the requirements and constraints we 0:36:11.340,0:36:17.100 have, well, with respect to the platforms[br]we run on and also the platforms we serve. 0:36:17.100,0:36:21.130 One of the things that we have been very[br]careful to do in the past for instance is 0:36:21.130,0:36:26.530 to make sure that you can do pretty much[br]everything with MediaWiki with no 0:36:26.530,0:36:32.800 JavaScript on the client side. And that[br]requirement is likely to drop. You will 0:36:32.800,0:36:40.010 still be able to read of course, without[br]any JavaScript or anything, but the extent 0:36:40.010,0:36:45.910 of functionality you will have without[br]JavaScript on the client side is likely to 0:36:45.910,0:36:51.140 be greatly reduced - that kind of thing.[br]Also we will probably end up breaking 0:36:51.140,0:36:57.660 compatibility to at least some of the[br]user-created tools. Hopefully we can offer 0:36:57.660,0:37:02.390 good alternatives, good APIs, good[br]libraries that people can actually port 0:37:02.390,0:37:11.070 to, that are less brittle. I hope that[br]will motivate people and maybe repay them 0:37:11.070,0:37:15.950 a bit for the pain of having their tool[br]broken. If we can give them something that 0:37:15.950,0:37:21.119 is more stable, more reliable, and[br]hopefully even nicer to use. Yeah, so, 0:37:21.119,0:37:25.930 it's small increments, bits, and pieces[br]all over the system there's no, you know, 0:37:25.930,0:37:32.550 no great master plan, no big change to[br]point to really. 0:37:32.550,0:37:45.470 Herald: Okay, okay, further questions?[br]Daniel: I plan to just sit outside here at 0:37:45.470,0:37:54.800 the table later if you just want to come[br]and chat so we can also do that there. 0:37:54.800,0:38:01.250 Herald: Okay, so, last call are there any[br]other questions? It does not appear so, 0:38:01.250,0:38:08.110 so, I'd like ask for a huge applause for[br]Daniel for this talk. 0:38:08.110,0:38:12.627 Applause 0:38:12.627,0:38:14.730 36C3 postroll music 0:38:14.730,0:38:38.320 Subtitles created by c3subtitles.de[br]in the year 2020. Join, and help us!