36C3 preroll music Daniel: Good morning! I'm glad you all made it here this early on the last day. I know it can can't be easy wasn't easy for me I have to warn you that the way I prepared for this song is a bit experimental. I didn't make a slide set I just made a mind map and I'll just click through it while I talk to you. So, this talk is about modernizing Wikipedia as you probably have noticed visiting, Wikipedia can feel a bit like visiting a website from 10-15 years ago but before I talk about any problems or things to improve, I first want to revisit that the software and the the infrastructure we build around it has been running Wikipedia and its sister sites for the last... well nearly 19 years now and it's extremely successful. We serve 17 billion page views a month, yes? Person in the audience: Could you make it louder or speak up and also make the image bigger? inaudible dialogue Daniel: Is this better? Like if I speak up I will loose my voice in 10 minutes it's already in it, no it's fine. We have technology for this. I can... the light doesn't help, yeah the contrast could be better. Is it better like this? Okay cool. All right so yeah we are serving 17 billion page views a month, which is quite a lot. Wikipedia exists in about 100 languages. If you attended the talk about the Wikimedia infrastructure yesterday, we talked about 300 languages. We actually support 300 languages for localization but we have Wikipedia in about 100, if I'm not completely off. I find this picture quite fascinating. This is a visualization of all the places in the world that are described on Wikipedia and sister projects and I find this quite impressive although it's also a nice display of cultural bias of course. We, that is Wikimedia Foundation, run about 900 to a 1000 wikis depending on how you count, but there are many many more media wiki installations out there, some of them big and many many of them small. We have actually no idea how many small instances there are. So it's a very powerful very flexible and versatile piece of software but, you know, but sometimes it can feel like... you can do a lot of things with it, right, but sometimes it feels like it's a bit overburdened and maybe you should look at improving the foundations. So one of the things that make MediaWiki great but also sometimes hard to use is that kind of everything is text, everything is markup, everything is done with with wikitext, which has grown in complexity over the years so if you look at the autonomy of a wiki page it can be a bit daunting. You have different syntax for markup at different kinds of transclusion or templates and media and some things actually, you know, get displayed in place, some things show up in a completely different place on the page it can be rather confusing and daunting for newcomers. And also things like having a conversation just talking to people like, you know, having a conversation thread looks like this. You open the page you look through the markup and you indent to make a conversation thread and then you get confused about the indenting and someone messes with the formatting and it's all excellent. There have been many attempts over the years to improve the situation, we have things like echo which notifies you, for instance when someone mentions your name or someone... It is also used to to welcome people and do this kind of achievement unlocked notifications: hey, you did your first edit, this is great welcome! To make people a bit more engaged with the system but it's really mostly improvements around the fringes. We have had a system called Flow for awhile to improve the way conversations work. So you have more like a thread structure that the software actually knows about but then there are many, well quite a few people who have been around for a while that are very used to the manual system and also there's a lot of tools to support this manual system which of course are incompatible with making things more modern. So we use this for instance on MediaWiki.org which is a site which is basically a self documentation site of MediaWiki but on most Wikipedia this is not enabled or at least not used for default everywhere. The biggest attempt to move away from the text only approach is Wikidata, which we started in 2012. The idea of Wikidata of course, if you didn't attend many great talks we had about it here over of the course of the Congress, is a way to basically model the world using structured data, using a semantic approach instead of natural language which has its own complexities but at least it's a way to represent the knowledge of the world in a way that machines can understand. So this would be an alternative to wiki text but still the vast majority of things especially on Wikipedia are just markup. And this markup is pretty powerful and there's lots of ways to extend it and to do things with it. So a lot of things on MediaWiki are just DIY, just do it yourself. Templates are a great example of this. Infoboxes of course, the nice blue boxes here on the right side of pages, are done using templates but these templates are just for formatting, there is not data processing there's no the data base or structured data backing them. It's just basically, you know, it's still just markup. It's still... you have a predefined layout but you're still feeding a text not data. You have parameters but the values of the parameters are still again maybe templates or links or you have markup in them, like you know HTML line breaks and stuff. So it's kind of semi structured. And this of course is also used to do things like workflow. The template... Oh no, this was actually an infobox, wrong picture, wrong capture. This is also used to do workflows, so if a page on Wikipedia gets nominated for deletion you put manual put a template on the page that defines why this is supposed to be deleted and then you have to go to a different page and put a different template there, giving more explanation and this again is used for discussion. It's a lot of structure created by the community and maintained by the community, using conventions and tools built on top of what is essentially just a pile of markup. And because doing all this manually is kind of painful, only on there we created a system to allow people to add JavaScript to the site, which is then maintained on wiki pages by the community and it can tweak and automate. But again, it doesn't really have much to work with, right? It basically messes with whatever it can, it directly interacts with the DOM of the page, whenever the layout of the software changes, things break. So this is not great for for compatibility but it's used a lot and it is very important for the community to have this power. Sorry, I wish there was a better way to show these pictures. Okay, that's just to give you an idea of what kind of thing is implemented that way and maintained by the community on their site. One of the problems we have with that is: these are bound to a wiki and I just told you that we run over 900 of these not over 9,000 and it would be great if you could just share them between wikis but we can't. And again, there have been... we have been talking about it a lot and it seems like it shouldn't be so hard, but you kind of need to write these tools differently, if you want to share them across sites, because different sites use different conventions, they use different templates. Then it just doesn't work and you actually have to write decent software that uses internationalization if you want to use it across wikis. While these are usually just you know one-off hacks with everything hard-coded we would have to put in place an internationalization system and it's actually a lot of effort and there's a lot of things that are actually unclear about it. So, before I dive more deeply into the different things that will make it hard to improve on the current situation and the things that we are doing to improve it do we have any questions or do you have any other - do you have any things you may find particularly, well, annoying or particularly outdated, when interacting with Wikipedia? Any thoughts on that? Beyond what I just said? Microphone: The strict separation, just in Wikipedia, between mobile layout and desktop layout. Daniel: Yeah. So, actually having a reactive layout system that would just work for mobile and desktop in the same way and allowing the designers and UX experts, who work on the system to just do this once and not two or maybe even three times - because of course we also have native applications for different platforms - would be great and it's something that we're looking into at the moment. But it's not, you know , it's not that easy we could build a completely new system, that does this but then again you would be telling people: "You can no longer use the old system", but now they have build all these tools that rely on how the old system works and you have to port all of this over so there's a lot of inertia. Any other thoughts? Everyone is still asleep that's excellent. So I can continue. So, another thing that makes it difficult to change how MediaWiki works or to improve it is that we are trying to do well to be at least two things at once: on the one hand we are running a top 5 website and serving over 100,000 requests per second using the system and you on the other hand, at least until now, we have always made sure that you can just download MediaWiki and install it on a shared hosting platform you don't even need root on the system, right? You don't even need administrative privileges you can just set it up and run it in your web space and it will work. And, having the same piece of software do both, run in a minimal environment and run at scale, is rather difficult and it also means that there's a lot of things that we can't easily do, right? All this modern micro service architecture separate front-end and back-end systems, all of that means that it's a lot more complicated to set up and needs more knowledge or more infrastructure to set up and so far that meant we can't do it, because so far there was this requirement that you should really be able to just run it on your shared hosting. And we are currently considering to what extent we can continue this, I mean, container based hosting is picking up. Maybe this is an alternative it's still unclear but it seems like this is something that we need to reconsider. Yeah, but if we make this harder to do then a lot of current users of MediaWiki would maybe not, well, maybe no longer exist or at least would not exist as they do now, right. You probably have seen this nice MediaWiki instance the Congress wiki. Which - with a completely customized skin and a lot of extensions installed to allow people to define their sessions there and making sure these sessions automatically get listed and get put into a calendar - this is all done using extensions, like Semantic MediaWiki, that allow you to basically define queries in the wiki text markup. Yeah, another thing that, of course, slows down development is that Wikimedia does engineering on a, well, comparatively a shoestring budget, right? The budget of the Wikimedia Foundation, the annual budget is something like a hundred million dollars, that sounds like a lot of money, but if you compare it to other companies running a top five or top ten website it's like two percent of their budget or something like that, right? It's really, I mean, 100 million is not peanuts but compared to what other companies invest to achieve this kind of goal it kind of is, so , what this budget translates into is something like 300, depending on how you count, between three hundred and four hundred staff. So, this is the people who run all of this, including all the community outreach all the social aspects all the administrative aspects. Less than half of these are the engineers who do all this. And we have like, something like 2,500 servers, bare-metal, so, which is not a lot for this kind of thing. Which also means that we have to design the software to be not just scalable but also quite efficient. The modern approach to scaling is usually scale horizontally make it so you can just spin up another virtual machine in some cloud service, but, yeah, we run our own service, we run our own servers, so we can design to scale horizontally, but it means ordering hardware and setting it up and it's going to take half a year or so. And we don't actually have that many people who do this, so, scalability and performance are also important factors when designing the software. Okay. Before I dive into what we are actually doing - any questions? This one in the back. Wait for the mic, please. In the very... Q: Hi! Daniel: Hello. Q: So, you said you don't have that many people, but how many do you actually have? Daniel: For... it's something like 150 engineers worldwide. It always depends on what you count, right? So you count the people, who - do you count engineers, who work on the native apps, do you account engineers, who work on the Wikimedia cloud services - actually we do have cloud services, we offer them to the community to run their own things, but we don't run our stuff on other people's cloud. Yeah, so depending on how you count or something and whether you count the people working here in Germany for Wikimedia Germany, which is a separate organization technically - it's something like 150 engineers. Q: Thanks! Q: I'm interested: What are the reasons that you don't run on other people's services like on the cloud. I mean, then it will be easy to scale horizontally, right? Daniel: There's, well, one reason is being independent, right? If we, yeah, I imagine we ran all our stuff on Amazon's infrastructure and then maybe Amazon doesn't like the way that the Wikipedia article about Amazon is written - what do we do, right? Maybe they shut us down, maybe they make things very expensive, maybe they make things very painful for us, maybe there is some at least like it self-censorship mechanism happening and we want to avoid that. There are there are thoughts about this there are thoughts like maybe we can do this at least for development infrastructure and CI, not for production or maybe we can make it so that we run stuff in the cloud services by more than one vendor, so we basically we spread out so we are not reliant on a single company. We are thinking about these things but so far the way to actually stay independent has been to run our own servers. Q: You've been talking about scalability and changing the architecture, that kind of seems to imply to me that there's a problem with scaling at the moment or that it's foreseeable that things are not gonna work out if you just keep doing what you're doing at the moment. Can you maybe elaborate on that. Daniel: So, there's, I think there's two sides to this. On the one hand the reason I mentioned it is just that a lot of things that are really easy to do basically for me, right? Works on my machine are really hard to do if you want to do them at scale. That's one aspect. The other aspect is MediaWiki is pretty much a PHP monolith and that means getting it always means copying the monolith and breaking it down so you have smaller units that you can scale and just say, yeah, I don't know, I need more instances for authentication handling or something like that. That would be more efficient, right, because you have higher granularity, you can just scale the things that you actually need but that of course needs rearchitecting. It's not like things are going to explode if we don't do that very soon, it's not, so there's not like an urgent problem there. The reason for us to rearchitect is more, to gain more flexibility in development, because if you have a monolith that is pretty entangled, code changes are risky and take a long time. Q: How many people work on product design or like user experience research to, like, sit down with users and try to understand what their needs are and from there proceed. A: Across... I don't have an exact number, something like five. Audience: Do you think that's sufficient? Herald: The question was, whether it's sufficient. So just... Daniel: Probably not? But it's more than, that's more people than we have for database administration, and that's also not sufficient. Herald: Are the further questions? I don't think. Daniel: Okay. So, one of the things, that holds us back a bit, is that there's literally thousands of extensions for MediaWiki and the extension mechanism is heavily reliant on hooks, so basically on callbacks. And, we have - I don't have a picture, I have a link here - we have a great number of these. So, you see, each paragraph is basically documenting one callback that you can use to modify the behavior of the software and, I mean, there's, I have never counted, but something like a thousand? And all of them are of course interfaces to extra - to software that is maintained externally, so they have to be kept stable and if you have a large chunk of software that you want to restructure but you have a thousand fixed points that you can't change, things become rather difficult. It's basi.. yeah, these hook points kind of, like, they act like nails in the architecture and then you kind of have to wiggle around them - it's fun. We are working to change that. We want to architect it so the interface that is exposed to these hooks become much more narrow and the things that these hooks or these callback functions can do is much more restricted. There's currently an RSC open for this, has been open for a while actually. The problem is that in order to assess whether the proposal is actually viable you have to survey all the current users of these hooks and make sure that we can, the use case is still covered in the new system and, yeah, we have like a thousand hook points and we have like a thousand extensions that's quite a bit of work. Another thing that I'm currently working on is establishing a stable interface policy. This may sound pretty obvious - it has a lot of pretty obvious things like, yeah, if you have a class and there's a public method then that's a stable interface it will not just change without notice, we have deprecation policy and all that. But if you have worked with extensible systems that rely on the mechanisms of object-oriented programming, you may have come across the question whether a protected method is part of this stable interface of the software or not, or maybe the constructor? I don't know, if you have worked in environments that use dependency injection the idea is basically that the construction signature should be able to change at any time but then you have extensions that you're subclassing and things break. So, this is why we are trying to establish a much more restrictive stable interface policy, that would would make explicit things like constructor signatures actually not being stable and that gives us a lot more wiggle room to restructure the software. MediaWiki itself has grown as a software for the last 18 years or so and, at least in the beginning, was mostly created by volunteers. And in a monolithic architecture there's a great tendency to just, you know, find and grab the thing that you want to use and just use it. Which leads to, well, structures like this one: everything depends on everything. And if you change one bit of code everything else may or may not break. And with, yeah. And if you don't have great test coverage at the same time this just makes it so that any change becomes very risky and you have to do a lot of manual testing a lot of manual digging around, touching a lot of files and we are for the last year, year and a half we have started a concerted effort to tie the worst - to cut the worst ties, to decouple these things that are, basically that have most impact there's a few objects in the software that rep... - for instance one that represents the user and one that represents a title that are used everywhere and the way they're implemented currently also means that they depend on everything and that of course is not a good situation. On a, well, a similar idea on a higher level is decomposition of the software so the decoupling was about the software architecture this is about the system architecture breaking up the monolith itself into multiple services that serve different purposes. The specifics of this diagram are not really relevant to this talk. This is more to, you know, give you an impression of the complexity and the sort of work we are doing there. The idea is that perhaps we could split out certain functionality into its own service into a separate application, like maybe move all the search functionality into something separate and self-contained, but then the question is how do you, again, compose this into the final user interface - at some point these things have to get composed together again - and again this is a very trivial trivial issue if you only want to want this to work on your machine or you only need to serve a hundred users or something. But doing this at scale doing it at the rate of something like 10,000 page views a second, I said a hundred thousand requests earlier but that includes resources, icons, CSS and all that. So, yeah, then you have to think pretty hard about what you can cache and, thank you, how you can recombine things without having to recompute everything and this is something that we are currently looking into - coming up with a architecture that allows us to compose and recombine the output of different background services. Okay. Before I started this talk I said I would probably roughly use half of my time going through the presentation and I guess I just hit that spot on. So, this is all I have prepared but I'm happy to talk to you more about the things I said or maybe any other aspects of this that you may be interested in. If any comments or questions. Oh! Three already. Q: First of all thanks a lot for the presentation, such a really interesting case of a legacy system and thanks for the honesty. It was really interesting as a, you know, software engineer to see how that works. I have a question about decoupling, so, I mean, I kind of, you have like, probably your system is enormous and how do you find, so to say, the most evil, you know, parts which sort of have to be decoupled. Do you use other software, with, you know, this, like, what a metrics and stuff or do you just know, kind of intuitively.. Daniel: Yeah, it's actually, this is quite interesting and maybe I can, maybe we can talk about it a bit more in depth later. Very quickly: it's a combination on the one hand you just have the anecdotal experience of what is actually annoying when you work with the software and you try to fix it and on the other hand I try to find good tooling for this and the existing tooling tends to die when you just run it against our code base. So, one of the things that you are looking for are cyclic dependencies but the number of possible cycles in a graph grows exponentially with a number of nodes. And if you have a pretty tightly knit graph that number quickly goes into the millions. And, yeah, the tool just goes to 100% CPU and never returns. So, I spend quite a bit of time trying to find heuristics to get around that - was a lot of fun. I can, yeah, we can talk about that later, if you like. Okay, thanks. Q: So what exactly is this Wikidata you mentioned before. Is it like an extension or is it a completely different project? Daniel: Wiki - so there's an extension called Wikibase, that implements this, well I would say, ontological modeling interface for MediaWiki and that is used to run a website called Wikidata which has something like 30 million items modeled that describe the world and serve as a machine-readable data back-end to other wiki project, other Wikimedia projects. Yeah, I used to work on that project for Wikimedia Germany. I moved on to do different things now for a couple of years. Lukas here in front is probably the person most knowledgeable about the latest and greatest in the Wikidata development. Q: You've shortly talked about test coverage. I will be into history.. Daniel: Sorry? Q: You talked about test coverage. Daniel: Yes. Q: I would be interested in if you amped your efforts to help you modernize it and how your current situation is with test coverage. Daniel: Test coverage for MediaWiki core is below 50%. In some parts it's below 10% which is very worrying. One thing that we started to look into, like half a year ago, is instead of writing unit tests for all the code that we actually want to throw away, before we touch it, we tried to improve the test coverage using integration tests on the API level. So we are currently in the process of writing a suite of tests, not just for the API modules, but for all the functionality, all the application logic behind the the API. And that will hopefully cover most of the relevant code paths and will give us confidence when we refactor the code. Q: Thanks. Herald: Other questions? Q: So you said that you have this legacy system and eventually you have to move away from it but are there any, like, I don't know, plans for the near future to, I don't know. At some point you have to cut the current infrastructure to your extensions and so on and it's a hard cut, I see. But are there any plans to build it up from scratch or what are the plans? Daniel: Yeah, we are not going to rewrite from scratch - that's a pretty sure fire way to just kill the system. We will have to make some tough decisions about backwards compatibility and probably reconsider some of the requirements and constraints we have, well, with respect to the platforms we run on and also the platforms we serve. One of the things that we have been very careful to do in the past for instance is to make sure that you can do pretty much everything with MediaWiki with no JavaScript on the client side. And that requirement is likely to drop. You will still be able to read of course, without any JavaScript or anything, but the extent of functionality you will have without JavaScript on the client side is likely to be greatly reduced - that kind of thing. Also we will probably end up breaking compatibility to at least some of the user-created tools. Hopefully we can offer good alternatives, good APIs, good libraries that people can actually port to, that are less brittle. I hope that will motivate people and maybe repay them a bit for the pain of having their tool broken. If we can give them something that is more stable, more reliable, and hopefully even nicer to use. Yeah, so, it's small increments, bits, and pieces all over the system there's no, you know, no great master plan, no big change to point to really. Herald: Okay, okay, further questions? Daniel: I plan to just sit outside here at the table later if you just want to come and chat so we can also do that there. Herald: Okay, so, last call are there any other questions? It does not appear so, so, I'd like ask for a huge applause for Daniel for this talk. Applause 36C3 postroll music Subtitles created by c3subtitles.de in the year 2020. Join, and help us!