36C3 Wikipaka WG: Modernizing Wikipedia

Edit subtitles

0:00 - 0:21

36C3 preroll music
0:21 - 0:25

Daniel: Good morning! I'm glad you all
made it here this early on the last day. I
0:25 - 0:32

know it can can't be easy wasn't easy for
me I have to warn you that the way I
0:32 - 0:36

prepared for this song is a bit
experimental. I didn't make a slide set I
0:36 - 0:45

just made a mind map and I'll just click
through it while I talk to you. So,
0:45 - 0:51

this talk is about modernizing Wikipedia
as you probably have noticed visiting,
0:51 - 0:58

Wikipedia can feel a bit like visiting a
website from 10-15 years ago but before I
0:58 - 1:05

talk about any problems or things to
improve, I first want to revisit that the
1:05 - 1:12

software and the the infrastructure we
build around it has been running Wikipedia
1:12 - 1:20

and its sister sites for the last... well
nearly 19 years now and it's extremely
1:20 - 1:32

successful. We serve 17 billion page
views a month, yes?
1:32 - 1:41

Person in the audience: Could you make it
louder or speak up and also make the image
1:41 - 1:43

bigger?
1:43 - 1:44

inaudible dialogue
1:44 - 1:46

Daniel: Is this better? Like if I speak up
I will loose my voice in 10 minutes it's
1:46 - 1:56

already in it, no it's fine. We have
technology for this. I can... the light
1:56 - 2:05

doesn't help, yeah the contrast could be
better. Is it better like this? Okay cool.
2:05 - 2:14

All right so yeah we are serving 17
billion page views a month, which is quite
2:14 - 2:20

a lot. Wikipedia exists in about 100
languages. If you attended the talk about
2:20 - 2:24

the Wikimedia infrastructure yesterday, we
talked about 300 languages. We actually
2:24 - 2:30

support 300 languages for localization but
we have Wikipedia in about 100, if I'm not
2:30 - 2:39

completely off. I find this picture quite
fascinating. This is a visualization of
2:39 - 2:44

all the places in the world that are
described on Wikipedia and sister projects
2:44 - 2:49

and I find this quite impressive although
it's also a nice display of cultural bias
2:49 - 3:01

of course. We, that is Wikimedia
Foundation, run about 900 to a 1000 wikis
3:01 - 3:07

depending on how you count, but there are
many many more media wiki installations
3:07 - 3:11

out there, some of them big and many many
of them small. We have actually no idea
3:11 - 3:17

how many small instances there are. So
it's a very powerful very flexible and
3:17 - 3:24

versatile piece of software but, you know, but
sometimes it can feel like... you can do a
3:24 - 3:28

lot of things with it, right, but
sometimes it feels like it's a bit
3:28 - 3:42

overburdened and maybe you should look at
improving the foundations. So one of the
3:42 - 3:48

things that make MediaWiki great but also
sometimes hard to use is that kind of
3:48 - 3:53

everything is text, everything is markup,
everything is done with with wikitext,
3:53 - 4:03

which has grown in complexity over the
years so if you look at the autonomy of a
4:03 - 4:09

wiki page it can be a bit daunting. You
have different syntax for markup at
4:09 - 4:16

different kinds of transclusion or
templates and media and some things
4:16 - 4:22

actually, you know, get displayed in
place, some things show up in a completely
4:22 - 4:26

different place on the page it can be
rather confusing and daunting for
4:26 - 4:32

newcomers. And also things like having a
conversation just talking to people like,
4:32 - 4:36

you know, having a conversation thread
looks like this. You open the page you
4:36 - 4:41

look through the markup and you indent to
make a conversation thread and then you
4:41 - 4:43

get confused about the indenting and
someone messes with the formatting and
4:43 - 4:52

it's all excellent. There have been many
attempts over the years to improve the
4:52 - 5:00

situation, we have things like echo which
notifies you, for instance when someone
5:00 - 5:09

mentions your name or someone... It is
also used to to welcome people and do this
5:09 - 5:12

kind of achievement unlocked
notifications: hey, you did your first
5:12 - 5:20

edit, this is great welcome! To make
people a bit more engaged with the system
5:20 - 5:24

but it's really mostly improvements around
the fringes. We have had a system called
5:24 - 5:31

Flow for awhile to improve the way
conversations work. So you have more like
5:31 - 5:38

a thread structure that the software
actually knows about but then there are
5:38 - 5:42

many, well quite a few people who have
been around for a while that are very used
5:42 - 5:47

to the manual system and also there's a
lot of tools to support this manual system
5:47 - 5:53

which of course are incompatible with
making things more modern. So we use this
5:53 - 5:56

for instance on MediaWiki.org which is a
site which is basically a self
5:56 - 6:03

documentation site of MediaWiki but on
most Wikipedia this is not enabled or at
6:03 - 6:15

least not used for default everywhere. The
biggest attempt to move away from the text
6:15 - 6:23

only approach is Wikidata, which we
started in 2012. The idea of Wikidata of
6:23 - 6:30

course, if you didn't attend many great
talks we had about it here over of the
6:30 - 6:36

course of the Congress, is a way to
basically model the world using structured
6:36 - 6:45

data, using a semantic approach instead of
natural language which has its own
6:45 - 6:51

complexities but at least it's a way to
represent the knowledge of the world in a
6:51 - 6:57

way that machines can understand. So this
would be an alternative to wiki text but
6:57 - 7:09

still the vast majority of things
especially on Wikipedia are just markup.
7:09 - 7:14

And this markup is pretty powerful and
there's lots of ways to extend it and to
7:14 - 7:21

do things with it. So a lot of things on
MediaWiki are just DIY, just do it
7:21 - 7:29

yourself. Templates are a great example of
this. Infoboxes of course, the nice blue
7:29 - 7:35

boxes here on the right side of pages, are
done using templates but these templates
7:35 - 7:39

are just for formatting, there is not data
processing there's no the data base or
7:39 - 7:48

structured data backing them. It's just
basically, you know, it's still just
7:48 - 7:57

markup. It's still... you have a predefined
layout but you're still feeding a text not
7:57 - 8:05

data. You have parameters but the values
of the parameters are still again maybe
8:05 - 8:12

templates or links or you have markup in
them, like you know HTML line breaks and
8:12 - 8:19

stuff. So it's kind of semi structured.
And this of course is also used to do
8:19 - 8:24

things like workflow. The template... Oh
no, this was actually an infobox, wrong
8:24 - 8:34

picture, wrong capture. This is also used
to do workflows, so if a page on Wikipedia
8:34 - 8:40

gets nominated for deletion you put manual
put a template on the page that defines
8:40 - 8:45

why this is supposed to be deleted and
then you have to go to a different page
8:45 - 8:49

and put a different template there, giving
more explanation and this again is used
8:49 - 8:55

for discussion. It's a lot of structure
created by the community and maintained by
8:55 - 9:03

the community, using conventions and tools
built on top of what is essentially just a
9:03 - 9:11

pile of markup. And because doing all this
manually is kind of painful, only on there
9:11 - 9:17

we created a system to allow people to add
JavaScript to the site, which is then
9:17 - 9:27

maintained on wiki pages by the community
and it can tweak and automate. But again,
9:27 - 9:31

it doesn't really have much to work with,
right? It basically messes with whatever
9:31 - 9:35

it can, it directly interacts with the DOM
of the page, whenever the layout of the
9:35 - 9:41

software changes, things break. So this is
not great for for compatibility but it's
9:41 - 9:55

used a lot and it is very important for
the community to have this power. Sorry, I
9:55 - 10:00

wish there was a better way to show these
pictures. Okay, that's just to give you an
10:00 - 10:05

idea of what kind of thing is implemented
that way and maintained by the community
10:05 - 10:10

on their site. One of the problems we have
with that is: these are bound to a wiki
10:10 - 10:19

and I just told you that we run over 900
of these not over 9,000 and it would be
10:19 - 10:26

great if you could just share them between
wikis but we can't. And again, there have
10:26 - 10:31

been... we have been talking about it a
lot and it seems like it shouldn't be so
10:31 - 10:37

hard, but you kind of need to write these
tools differently, if you want to share
10:37 - 10:40

them across sites, because different sites
use different conventions, they use
10:40 - 10:46

different templates. Then it just doesn't
work and you actually have to write decent
10:46 - 10:51

software that uses internationalization if
you want to use it across wikis. While
10:51 - 10:55

these are usually just you know one-off
hacks with everything hard-coded we would
10:55 - 10:58

have to put in place an
internationalization system and it's
10:58 - 11:03

actually a lot of effort and there's a lot
of things that are actually unclear about
11:03 - 11:15

it. So, before I dive more deeply into the
different things that will make it hard to
11:15 - 11:21

improve on the current situation and the
things that we are doing to improve it do
11:21 - 11:27

we have any questions or do you have any
other - do you have any things you may
11:27 - 11:35

find particularly, well, annoying or
particularly outdated, when interacting
11:35 - 11:41

with Wikipedia? Any thoughts on that?
Beyond what I just said?
11:41 - 11:49

Microphone: The strict separation, just in
Wikipedia, between mobile layout and
11:49 - 11:54

desktop layout.
Daniel: Yeah. So, actually having a
11:54 - 12:02

reactive layout system that would just
work for mobile and desktop in the same
12:02 - 12:09

way and allowing the designers and UX
experts, who work on the system to just do
12:09 - 12:15

this once and not two or maybe even three
times - because of course we also have
12:15 - 12:21

native applications for different
platforms - would be great and it's
12:21 - 12:24

something that we're looking into at the
moment. But it's not, you know , it's not
12:24 - 12:30

that easy we could build a completely new
system, that does this but then again you
12:30 - 12:33

would be telling people: "You can no
longer use the old system", but now they
12:33 - 12:39

have build all these tools that rely on
how the old system works and you have to
12:39 - 12:52

port all of this over so there's a lot of
inertia. Any other thoughts? Everyone is
12:52 - 13:04

still asleep that's excellent. So I can
continue. So, another thing that makes it
13:04 - 13:11

difficult to change how MediaWiki works or
to improve it is that we are trying to do
13:11 - 13:19

well to be at least two things at once: on
the one hand we are running a top 5
13:19 - 13:24

website and serving over 100,000 requests
per second using the system and you on the
13:24 - 13:31

other hand, at least until now, we have
always made sure that you can just
13:31 - 13:34

download MediaWiki and install it on a
shared hosting platform you don't even
13:34 - 13:39

need root on the system, right? You don't
even need administrative privileges you
13:39 - 13:45

can just set it up and run it in your web
space and it will work. And, having the
13:45 - 13:52

same piece of software do both, run in a
minimal environment and run at scale, is
13:52 - 13:55

rather difficult and it also means that
there's a lot of things that we can't
13:55 - 14:02

easily do, right? All this modern micro
service architecture separate front-end
14:02 - 14:09

and back-end systems, all of that means
that it's a lot more complicated to set up
14:09 - 14:16

and needs more knowledge or more
infrastructure to set up and so far that
14:16 - 14:20

meant we can't do it, because so far there
was this requirement that you should
14:20 - 14:24

really be able to just run it on your
shared hosting. And we are currently
14:24 - 14:30

considering to what extent we can continue
this, I mean, container based hosting is
14:30 - 14:35

picking up. Maybe this is an alternative
it's still unclear but it seems like this
14:35 - 14:46

is something that we need to reconsider.
Yeah, but if we make this harder to do
14:46 - 14:53

then a lot of current users of MediaWiki
would maybe not, well, maybe no longer
14:53 - 14:57

exist or at least would not exist as they
do now, right. You probably have seen
14:57 - 15:05

this nice MediaWiki instance the Congress
wiki. Which - with a completely customized
15:05 - 15:10

skin and a lot of extensions installed to
allow people to define their sessions
15:10 - 15:14

there and making sure these sessions
automatically get listed and get put into
15:14 - 15:21

a calendar - this is all done using
extensions, like Semantic MediaWiki, that
15:21 - 15:34

allow you to basically define queries in
the wiki text markup. Yeah, another thing
15:34 - 15:42

that, of course, slows down development is
that Wikimedia does engineering on a,
15:42 - 15:48

well, comparatively a shoestring budget,
right? The budget of the Wikimedia
15:48 - 15:52

Foundation, the annual budget is something
like a hundred million dollars, that
15:52 - 15:58

sounds like a lot of money, but if you
compare it to other companies running a
15:58 - 16:03

top five or top ten website it's like two
percent of their budget or something like
16:03 - 16:11

that, right? It's really, I mean, 100
million is not peanuts but compared to
16:11 - 16:17

what other companies invest to achieve
this kind of goal it kind of is, so , what
16:17 - 16:22

this budget translates into is something
like 300, depending on how you count,
16:22 - 16:29

between three hundred and four hundred
staff. So, this is the people who run all
16:29 - 16:32

of this, including all the community
outreach all the social aspects all the
16:32 - 16:41

administrative aspects. Less than half of
these are the engineers who do all this.
16:41 - 16:51

And we have like, something like 2,500
servers, bare-metal, so, which is not a
16:51 - 16:58

lot for this kind of thing. Which also
means that we have to design the software
16:58 - 17:07

to be not just scalable but also quite
efficient. The modern approach to scaling
17:07 - 17:12

is usually scale horizontally make it so
you can just spin up another virtual
17:12 - 17:19

machine in some cloud service, but, yeah,
we run our own service, we run our own
17:19 - 17:24

servers, so we can design to scale
horizontally, but it means ordering
17:24 - 17:32

hardware and setting it up and it's going
to take half a year or so. And we don't
17:32 - 17:38

actually have that many people who do
this, so, scalability and performance are
17:38 - 17:49

also important factors when designing the
software. Okay. Before I dive into what we
17:49 - 18:04

are actually doing - any questions? This
one in the back. Wait for the mic, please.
18:04 - 18:07

In the very...
Q: Hi!
18:07 - 18:13

Daniel: Hello.
Q: So, you said you don't have that many
18:13 - 18:23

people, but how many do you actually have?
Daniel: For... it's something like 150 engineers
18:23 - 18:27

worldwide. It always depends on what you
count, right? So you count the people, who
18:27 - 18:32

- do you count engineers, who work on the
native apps, do you account engineers, who
18:32 - 18:37

work on the Wikimedia cloud services -
actually we do have cloud services, we
18:37 - 18:41

offer them to the community to run their
own things, but we don't run our stuff on
18:41 - 18:46

other people's cloud. Yeah, so depending
on how you count or something and whether
18:46 - 18:50

you count the people working here in
Germany for Wikimedia Germany, which is a
18:50 - 18:58

separate organization technically - it's
something like 150 engineers.
18:58 - 19:08

Q: Thanks!
Q: I'm interested: What are the reasons
19:08 - 19:14

that you don't run on other people's
services like on the cloud. I mean, then
19:14 - 19:17

it will be easy to scale horizontally,
right?
19:17 - 19:25

Daniel: There's, well, one reason is being
independent, right? If we, yeah, I imagine
19:25 - 19:32

we ran all our stuff on Amazon's
infrastructure and then maybe Amazon
19:32 - 19:38

doesn't like the way that the Wikipedia
article about Amazon is written - what do
19:38 - 19:42

we do, right? Maybe they shut us down,
maybe they make things very expensive,
19:42 - 19:47

maybe they make things very painful for
us, maybe there is some at least like it
19:47 - 19:54

self-censorship mechanism happening and we
want to avoid that. There are there are
19:54 - 19:58

thoughts about this there are thoughts
like maybe we can do this at least for
19:58 - 20:04

development infrastructure and CI, not for
production or maybe we can make it so that
20:04 - 20:12

we run stuff in the cloud services by more
than one vendor, so we basically we spread
20:12 - 20:18

out so we are not reliant on a single
company. We are thinking about these
20:18 - 20:22

things but so far the way to actually stay
independent has been to run our own
20:22 - 20:28

servers.
Q: You've been talking about scalability
20:28 - 20:35

and changing the architecture, that kind
of seems to imply to me that there's a
20:35 - 20:42

problem with scaling at the moment or that
it's foreseeable that things are not gonna
20:42 - 20:47

work out if you just keep doing what
you're doing at the moment. Can you maybe
20:47 - 20:52

elaborate on that.
Daniel: So, there's, I think there's two sides
20:52 - 20:57

to this. On the one hand the reason I
mentioned it is just that a lot of things
20:57 - 21:02

that are really easy to do basically for
me, right? Works on my machine are really
21:02 - 21:09

hard to do if you want to do them at
scale. That's one aspect. The other aspect
21:09 - 21:17

is MediaWiki is pretty much a PHP monolith
and that means getting it always means
21:17 - 21:24

copying the monolith and breaking it down
so you have smaller units that you can
21:24 - 21:29

scale and just say, yeah, I don't know, I
need more instances for authentication
21:29 - 21:34

handling or something like that. That
would be more efficient, right, because
21:34 - 21:41

you have higher granularity, you can just
scale the things that you actually need
21:41 - 21:48

but that of course needs rearchitecting.
It's not like things are going to explode
21:48 - 21:53

if we don't do that very soon, it's not,
so there's not like an urgent problem
21:53 - 21:58

there. The reason for us to rearchitect is
more, to gain more flexibility in
21:58 - 22:03

development, because if you have a
monolith that is pretty entangled, code
22:03 - 22:16

changes are risky and take a long time.
Q: How many people work on product design
22:16 - 22:25

or like user experience research to, like,
sit down with users and try to understand
22:25 - 22:28

what their needs are and from there
proceed.
22:28 - 22:33

A: Across... I don't have an exact number,
something like five.
22:33 - 22:38

Audience: Do you think that's sufficient?
Herald: The question was, whether it's
22:38 - 22:47

sufficient. So just...
Daniel: Probably not? But it's more than,
22:47 - 22:50

that's more people than we have for
database administration, and that's also
22:50 - 23:06

not sufficient.
Herald: Are the further questions? I don't
23:06 - 23:16

think.
Daniel: Okay. So, one of the things, that
23:16 - 23:20

holds us back a bit, is that there's
literally thousands of extensions for
23:20 - 23:27

MediaWiki and the extension mechanism is
heavily reliant on hooks, so basically on
23:27 - 23:40

callbacks. And, we have - I don't have a
picture, I have a link here - we have a
23:40 - 23:44

great number of these. So, you see, each
paragraph is basically documenting one
23:44 - 23:52

callback that you can use to modify the
behavior of the software and, I mean,
23:52 - 23:59

there's, I have never counted, but
something like a thousand? And all of them
23:59 - 24:08

are of course interfaces to extra - to
software that is maintained externally, so
24:08 - 24:13

they have to be kept stable and if you
have a large chunk of software that you
24:13 - 24:17

want to restructure but you have a
thousand fixed points that you can't
24:17 - 24:23

change, things become rather difficult.
It's basi.. yeah, these hook points kind
24:23 - 24:28

of, like, they act like nails in the
architecture and then you kind of have to
24:28 - 24:37

wiggle around them - it's fun. We are
working to change that. We want to
24:37 - 24:44

architect it so the interface that is
exposed to these hooks become much more
24:44 - 24:51

narrow and the things that these hooks or
these callback functions can do is much
24:51 - 24:59

more restricted. There's currently an RSC
open for this, has been open for a while
24:59 - 25:05

actually. The problem is that in order to
assess whether the proposal is actually
25:05 - 25:12

viable you have to survey all the current
users of these hooks and make sure that we
25:12 - 25:16

can, the use case is still covered in the
new system and, yeah, we have like a
25:16 - 25:21

thousand hook points and we have like a
thousand extensions that's quite a bit of
25:21 - 25:31

work. Another thing that I'm currently
working on is establishing a stable
25:31 - 25:37

interface policy. This may sound pretty
obvious - it has a lot of pretty obvious
25:37 - 25:42

things like, yeah, if you have a class and
there's a public method then that's a
25:42 - 25:46

stable interface it will not just change
without notice, we have deprecation policy
25:46 - 25:53

and all that. But if you have worked with
extensible systems that rely on the
25:53 - 25:58

mechanisms of object-oriented programming,
you may have come across the question
25:58 - 26:05

whether a protected method is part of this
stable interface of the software or not,
26:05 - 26:10

or maybe the constructor? I don't know, if
you have worked in environments that use
26:10 - 26:16

dependency injection the idea is basically
that the construction signature should be
26:16 - 26:21

able to change at any time but then you
have extensions that you're subclassing and
26:21 - 26:26

things break. So, this is why we are
trying to establish a much more
26:26 - 26:33

restrictive stable interface policy, that
would would make explicit things like
26:33 - 26:37

constructor signatures actually not being
stable and that gives us a lot more wiggle
26:37 - 26:51

room to restructure the software.
MediaWiki itself has grown as a software
26:51 - 26:59

for the last 18 years or so and, at least
in the beginning, was mostly created by
26:59 - 27:06

volunteers. And in a monolithic
architecture there's a great tendency to
27:06 - 27:11

just, you know, find and grab the thing
that you want to use and just use it.
27:11 - 27:19

Which leads to, well, structures like this
one: everything depends on everything. And
27:19 - 27:26

if you change one bit of code everything
else may or may not break. And with, yeah.
27:26 - 27:31

And if you don't have great test coverage
at the same time this just makes it so
27:31 - 27:35

that any change becomes very risky and you
have to do a lot of manual testing a lot
27:35 - 27:44

of manual digging around, touching a lot
of files and we are for the last year,
27:44 - 27:51

year and a half we have started a
concerted effort to tie the worst - to cut
27:51 - 27:58

the worst ties, to decouple these things
that are, basically that have most impact
27:58 - 28:03

there's a few objects in the software that
rep... - for instance one that represents
28:03 - 28:08

the user and one that represents a title
that are used everywhere and the way
28:08 - 28:14

they're implemented currently also means
that they depend on everything and that of
28:14 - 28:30

course is not a good situation. On a,
well, a similar idea on a higher level is
28:30 - 28:34

decomposition of the software so the
decoupling was about the software
28:34 - 28:40

architecture this is about the system
architecture breaking up the
28:40 - 28:45

monolith itself into multiple services that
serve different purposes. The specifics of
28:45 - 28:50

this diagram are not really relevant to
this talk. This is more to, you know, give
28:50 - 28:58

you an impression of the complexity and
the sort of work we are doing there. The
28:58 - 29:06

idea is that perhaps we could split out
certain functionality into its own service
29:06 - 29:11

into a separate application, like maybe
move all the search functionality into
29:11 - 29:17

something separate and self-contained, but
then the question is how do you, again,
29:17 - 29:23

compose this into the final user interface
- at some point these things have to get
29:23 - 29:28

composed together again - and again this
is a very trivial trivial issue if you
29:28 - 29:32

only want to want this to work on your
machine or you only need to serve a
29:32 - 29:40

hundred users or something. But doing this
at scale doing it at the rate of something
29:40 - 29:45

like 10,000 page views a second, I said a
hundred thousand requests earlier but that
29:45 - 29:52

includes resources, icons, CSS and all
that. So, yeah, then you have to think
29:52 - 29:58

pretty hard about what you can cache and,
thank you, how you can recombine things
29:58 - 30:03

without having to recompute everything and
this is something that we are currently
30:03 - 30:09

looking into - coming up with a
architecture that allows us to compose and
30:09 - 30:23

recombine the output of different
background services. Okay. Before I
30:23 - 30:28

started this talk I said I would probably
roughly use half of my time going through
30:28 - 30:33

the presentation and I guess I just hit
that spot on. So, this is all I have
30:33 - 30:41

prepared but I'm happy to talk to you more
about the things I said or maybe any other
30:41 - 30:48

aspects of this that you may be interested
in. If any comments or questions. Oh!
30:48 - 30:57

Three already.
Q: First of all thanks a lot for the
30:57 - 31:03

presentation, such a really interesting
case of a legacy system and thanks for the
31:03 - 31:10

honesty. It was really interesting as a,
you know, software engineer to see how
31:10 - 31:15

that works. I have a question about
decoupling, so, I mean, I kind of, you
31:15 - 31:23

have like, probably your system is
enormous and how do you find, so to say,
31:23 - 31:29

the most evil, you know, parts which
sort of have to be decoupled. Do you use other
31:29 - 31:35

software, with, you know, this, like, what
a metrics and stuff or do you just know,
31:35 - 31:38

kind of intuitively..
Daniel: Yeah, it's actually, this is quite
31:38 - 31:45

interesting and maybe I can, maybe we can
talk about it a bit more in depth later.
31:45 - 31:49

Very quickly: it's a combination on the
one hand you just have the anecdotal
31:49 - 31:53

experience of what is actually annoying
when you work with the software and you
31:53 - 31:59

try to fix it and on the other hand I try
to find good tooling for this and the
31:59 - 32:05

existing tooling tends to die when you
just run it against our code base. So, one
32:05 - 32:10

of the things that you are looking for are
cyclic dependencies but the number of
32:10 - 32:15

possible cycles in a graph grows
exponentially with a number of nodes. And
32:15 - 32:18

if you have a pretty tightly knit graph
that number quickly goes into the
32:18 - 32:27

millions. And, yeah, the tool just goes to
100% CPU and never returns. So, I spend
32:27 - 32:34

quite a bit of time trying to find
heuristics to get around that - was a lot
32:34 - 32:42

of fun. I can, yeah, we can talk about
that later, if you like. Okay, thanks.
32:42 - 32:49

Q: So what exactly is this Wikidata you
mentioned before. Is it like an extension
32:49 - 32:56

or is it a completely different project?
Daniel: Wiki - so there's an extension called
32:56 - 33:05

Wikibase, that implements this, well I
would say, ontological modeling interface
33:05 - 33:12

for MediaWiki and that is used to run a
website called Wikidata which has
33:12 - 33:20

something like 30 million items modeled
that describe the world and serve as a
33:20 - 33:26

machine-readable data back-end to other
wiki project, other Wikimedia projects.
33:26 - 33:33

Yeah, I used to work on that project for
Wikimedia Germany. I moved on to do
33:33 - 33:41

different things now for a couple of
years. Lukas here in front is probably the
33:41 - 33:50

person most knowledgeable about the latest
and greatest in the Wikidata development.
33:50 - 33:56

Q: You've shortly talked about test
coverage. I will be into history..
33:56 - 33:59

Daniel: Sorry?
Q: You talked about test coverage.
33:59 - 34:02

Daniel: Yes.
Q: I would be interested in if you amped
34:02 - 34:08

your efforts to help you modernize it and
how your current situation is with test
34:08 - 34:12

coverage.
Daniel: Test coverage for MediaWiki core is below
34:12 - 34:22

50%. In some parts it's below 10% which is
very worrying. One thing that we started
34:22 - 34:30

to look into, like half a year ago, is
instead of writing unit tests for all the
34:30 - 34:36

code that we actually want to throw away,
before we touch it, we tried to improve
34:36 - 34:41

the test coverage using integration tests
on the API level. So we are currently in
34:41 - 34:48

the process of writing a suite of tests,
not just for the API modules, but for all
34:48 - 34:55

the functionality, all the application
logic behind the the API. And that will
34:55 - 35:01

hopefully cover most of the relevant code
paths and will give us confidence when we
35:01 - 35:12

refactor the code.
Q: Thanks.
35:12 - 35:26

Herald: Other questions?
Q: So you said that you have this legacy
35:26 - 35:32

system and eventually you have to move
away from it but are there any, like, I
35:32 - 35:40

don't know, plans for the near future to,
I don't know. At some point you have to
35:40 - 35:47

cut the current infrastructure to your
extensions and so on and it's a hard cut, I
35:47 - 35:53

see. But are there any plans to build it
up from scratch or what are the plans?
35:53 - 35:58

Daniel: Yeah, we are not going to rewrite from
scratch - that's a pretty sure fire way to
35:58 - 36:05

just kill the system. We will have to make
some tough decisions about backwards
36:05 - 36:11

compatibility and probably reconsider some
of the requirements and constraints we
36:11 - 36:17

have, well, with respect to the platforms
we run on and also the platforms we serve.
36:17 - 36:21

One of the things that we have been very
careful to do in the past for instance is
36:21 - 36:27

to make sure that you can do pretty much
everything with MediaWiki with no
36:27 - 36:33

JavaScript on the client side. And that
requirement is likely to drop. You will
36:33 - 36:40

still be able to read of course, without
any JavaScript or anything, but the extent
36:40 - 36:46

of functionality you will have without
JavaScript on the client side is likely to
36:46 - 36:51

be greatly reduced - that kind of thing.
Also we will probably end up breaking
36:51 - 36:58

compatibility to at least some of the
user-created tools. Hopefully we can offer
36:58 - 37:02

good alternatives, good APIs, good
libraries that people can actually port
37:02 - 37:11

to, that are less brittle. I hope that
will motivate people and maybe repay them
37:11 - 37:16

a bit for the pain of having their tool
broken. If we can give them something that
37:16 - 37:21

is more stable, more reliable, and
hopefully even nicer to use. Yeah, so,
37:21 - 37:26

it's small increments, bits, and pieces
all over the system there's no, you know,
37:26 - 37:33

no great master plan, no big change to
point to really.
37:33 - 37:45

Herald: Okay, okay, further questions?
Daniel: I plan to just sit outside here at
37:45 - 37:55

the table later if you just want to come
and chat so we can also do that there.
37:55 - 38:01

Herald: Okay, so, last call are there any
other questions? It does not appear so,
38:01 - 38:08

so, I'd like ask for a huge applause for
Daniel for this talk.
38:08 - 38:13

Applause
38:13 - 38:15

36C3 postroll music
38:15 - 38:38

Subtitles created by c3subtitles.de
in the year 2020. Join, and help us!

Title:: 36C3 Wikipaka WG: Modernizing Wikipedia
Description:: more » « less
Video Language:: English
Duration:: 38:40

	Bar Sch edited English subtitles for 36C3 Wikipaka WG: Modernizing Wikipedia
	Jule 2210@rc3 edited English subtitles for 36C3 Wikipaka WG: Modernizing Wikipedia
	Jule 2210@rc3 edited English subtitles for 36C3 Wikipaka WG: Modernizing Wikipedia
	Jule 2210@rc3 edited English subtitles for 36C3 Wikipaka WG: Modernizing Wikipedia
	Jule 2210@rc3 edited English subtitles for 36C3 Wikipaka WG: Modernizing Wikipedia
	Jule 2210@rc3 edited English subtitles for 36C3 Wikipaka WG: Modernizing Wikipedia
	Jule 2210@rc3 edited English subtitles for 36C3 Wikipaka WG: Modernizing Wikipedia
	Jule 2210@rc3 edited English subtitles for 36C3 Wikipaka WG: Modernizing Wikipedia

Show all

English subtitles

Revisions

Revision 11 Edited

Bar Sch

36C3 Wikipaka WG: Modernizing Wikipedia

Revisions

Our website uses cookies

Operating cookies (Required)