36C3 Wikipaka WG: Infrastructure of Wikipedia

0:00 - 0:19

Music
0:19 - 0:25

Herald:Hi! Welcome, welcome in Wikipaka-
WG, in this extremely crowded Esszimmer.
0:25 - 0:32

I'm Jakob, I'm your Herald for tonight
until 10:00 and I'm here to welcome you
0:32 - 0:37

and to welcome these wonderful three guys
on the stage. They're going to talk about
0:37 - 0:45

the infrastructure of Wikipedia.
And yeah, they are Lucas, Amir, and Daniel
0:45 - 0:53

and I hope you'll have fun!
Applause
0:53 - 0:57

Amir Sarabadani: Hello, my name is
Amir, um, I'm a software engineer at
0:57 - 1:01

Wikimedia Deutschland, which is the German
chapter of Wikimedia Foundation. Wikimedia
1:01 - 1:07

Foundation runs Wikipedia. Here is Lucas.
Lucas is also a software engineer, at
1:07 - 1:10

Wikimedia Deutschland, and Daniel here is
a software architect at Wikimedia
1:10 - 1:15

Foundation. We are all based in Germany,
Daniel in Leipzig, we are in Berlin. And
1:15 - 1:21

today we want to talk about how we run
Wikipedia, with using donors' money and
1:21 - 1:30

not lots of advertisement and collecting
data. So in this talk, first we are going
1:30 - 1:35

to go on an inside-out approach. So we are
going to first talk about the application
1:35 - 1:40

layer and then the outside layers, and
then we go to an outside-in approach and
1:40 - 1:49

then talk about how you're going to hit
Wikipedia from the outside.
1:49 - 1:53

So first of all, let's some,
let me get you some information. First of
1:53 - 1:57

all, all of Wikimedia, Wikipedia
infrastructure is run by Wikimedia
1:57 - 2:02

Foundation, an American nonprofit
charitable organization. We don't run any
2:02 - 2:08

ads and we are only 370 people. If you
count Wikimedia Deutschland or all other
2:08 - 2:12

chapters, it's around 500 people in total.
It's nothing compared to the companies
2:12 - 2:20

outside. But all of the content is
managed by volunteers. Even our staff
2:20 - 2:24

doesn't do edits, add content to
Wikipedia. And we support 300 languages,
2:24 - 2:30

which is a very large number. And
Wikipedia, it's eighteen years old, so it
2:30 - 2:38

can vote now. And also, Wikipedia has some
really, really weird articles. Um, I want
2:38 - 2:43

to ask you, what is your, if you have
encountered any really weird article
2:43 - 2:48

in Wikipedia? My favorite is a list of
people who died on the toilet. But if you
2:48 - 2:55

know anything, raise your hands. Uh, do
you know any weird articles in Wikipedia?
2:55 - 2:59

Do you know some?
Daniel Kinzler: Oh, the classic one….
2:59 - 3:04

Amir: You need to unmute yourself. Oh,
okay.
3:04 - 3:10

Daniel: This is technology. I don't know
anything about technology. OK, no. The, my
3:10 - 3:14

favorite example is "people killed by
their own invention". That's yeah. That's
3:14 - 3:21

a lot of fun. Look it up. It's amazing.
Lucas Werkmeister: There's also a list,
3:21 - 3:25

there is also a list of prison escapes
using helicopters. I almost said
3:25 - 3:29

helicopter escapes using prisons, which
doesn't make any sense. But that was also
3:29 - 3:32

a very interesting list.
Daniel: I think we also have a category of
3:32 - 3:35

lists of lists of lists.
Amir: That's a page.
3:35 - 3:39

Lucas: And every few months someone thinks
it's funny to redirect it to Russel's
3:39 - 3:43

paradox or so.
Daniel: Yeah.
3:43 - 3:49

Amir: But also beside that, people cannot
read Wikipedia in Turkey or China. But
3:49 - 3:54

three days ago, actually, the block in
Turkey was ruled unconstitutional, but
3:54 - 4:01

it's not lifted yet. Hopefully they will
lift it soon. Um, so Wikipedia, Wikimedia
4:01 - 4:06

projects is just not Wikipedia. It's lots
and lots of projects. Some of them are not
4:06 - 4:12

as successful as the Wikipedia. Um, uh,
like Wikinews. But uh, for example,
4:12 - 4:16

Wikipedia is the most successful one, and
there's another one, that's Wikidata. It's
4:16 - 4:22

being developed by Wikimedia Deutschland.
I mean the Wikidata team, with Lucas, um,
4:22 - 4:27

and it's being used – it's infobox – it
has the data that Wikipedia or Google
4:27 - 4:31

Knowledge Graph or Siri or Alexa uses.
It's basically, it's sort of a backbone of
4:31 - 4:38

all of the data, uh, through the whole
Internet. Um, so our infrastructure. Let
4:38 - 4:43

me… So first of all, our infrastructure is
all Open Source. By principle, we never
4:43 - 4:48

use any commercial software. Uh, we could
use a lots of things. They are even
4:48 - 4:54

sometimes were given us for free, but we
were, refused to use them. Second
4:54 - 4:59

thing is we have two primary data center
for like failovers, when, for example, a
4:59 - 5:04

whole datacenter goes offline, so we can
failover to another data center. We have
5:04 - 5:11

three caching points of presence or
CDNs. Our CDNs are all over the world. Uh,
5:11 - 5:15

also, we have our own CDN. We don't have,
we don't use CloudFlare, because
5:15 - 5:21

CloudFlare, we care about the privacy of
the users and is very important that, for
5:21 - 5:25

example, people edit from countries that
might be, uh, dangerous for them to edit
5:25 - 5:30

Wikipedia. So we really care to keep the
data as protected as possible.
5:30 - 5:32

Applause
5:32 - 5:39

Amir: Uh, we have 17 billion page views
per month, and, which goes up and down
5:39 - 5:44

based on the season and everything, we
have around 100 to 200 thousand requests
5:44 - 5:48

per second. It's different from the
pageview because requests can be requests
5:48 - 5:55

to the objects, can be API, can be lots of
things. And we have 300,000 new editors
5:55 - 6:03

per month and we run all of this with 1300
bare metal servers. So right now, Daniel
6:03 - 6:07

is going to talk about the application
layer and the inside of that
6:07 - 6:12

infrastructure.
Daniel: Thanks, Amir. Oh, the clicky
6:12 - 6:20

thing. Thank you. So the application layer
is basically the software that actually
6:20 - 6:25

does what a wiki does, right? It lets you
edit pages, create or update pages and
6:25 - 6:30

then search the page views. interference
noise The challenge for Wikipedia, of
6:30 - 6:37

course, is serving all the many page views
that Amir just described. The core of the
6:37 - 6:43

application is a classic LAMP application.
interference noise I have to stop
6:43 - 6:50

moving. Yes? Is that it? It's a classic
LAMP stack application. So it's written in
6:50 - 6:57

PHP, it runs on an Apache server. It uses
MySQL as a database in the backend. We
6:57 - 7:02

used to use a HHVM instead of the… Yeah,
we…
7:02 - 7:14

Herald: Hier. Sorry. Nimm mal das hier.
Daniel: Hello. We used to use HHVM as the
7:14 - 7:21

PHP engine, but we just switched back to
the mainstream PHP, using PHP 7.2 now,
7:21 - 7:25

because Facebook decided that HHVM is
going to be incompatible with the standard
7:25 - 7:35

and they were just basically developing it
for, for themselves. Right. So we have
7:35 - 7:43

separate clusters of servers for serving
requests, for serving different requests,
7:43 - 7:48

page views on the one hand, and also
handling edits. Then we have a cluster for
7:48 - 7:55

handling API calls and then we have a
bunch of servers set up to handle
7:55 - 8:01

asynchronous jobs, things that happen in
the background, the job runners, and…
8:01 - 8:05

I guess video scaling is a very obvious
example of that. It just takes too long to
8:05 - 8:12

do it on the fly. But we use it for many
other things as well. MediaWiki, MediaWiki
8:12 - 8:16

is kind of an amazing thing because you
can just install it on your own shared-
8:16 - 8:23

hosting, 10-bucks-a-month's webspace and
it will run. But you can also use it to,
8:23 - 8:29

you know, serve half the world. And so
it's a very powerful and versatile system,
8:29 - 8:34

which also… I mean, this, this wide span
of different applications also creates
8:34 - 8:41

problems. That's something that I will
talk about tomorrow. But for now, let's
8:41 - 8:49

look at the fun things. So if you want to
serve a lot of page views, you have to do
8:49 - 8:56

a lot of caching. And so we have a whole…
yeah, a whole set of different caching
8:56 - 9:01

systems. The most important one is
probably the parser cache. So as you
9:01 - 9:07

probably know, wiki pages are created in,
in a markup language, Wikitext, and they
9:07 - 9:13

need to be parsed and turned into HTML.
And the result of that parsing is, of
9:13 - 9:20

course, cached. And that cache is semi-
persistent, it… nothing really ever drops
9:20 - 9:25

out of it. It's a huge thing. And it's, it
lives in a dedicated MySQL database
9:25 - 9:33

system. Yeah. We use memcached a lot for
all kinds of miscellaneous things,
9:33 - 9:39

anything that we need to keep around and
share between server instances. And we
9:39 - 9:44

have been using redis for a while, for
anything that we want to have available,
9:44 - 9:48

not just between different servers, but
also between different data centers,
9:48 - 9:53

because redis is a bit better about
synchronizing things between, between
9:53 - 10:00

different systems, we still use it for
session storage, especially, though we are
10:00 - 10:10

about to move away from that and we'll be
using Cassandra for session storage. We
10:10 - 10:19

have a bunch of additional services
running for specialized purposes, like
10:19 - 10:27

scaling images, rendering formulas, math
formulas, ORES is pretty interesting. ORES
10:27 - 10:33

is a system for automatically detecting
vandalism or rating edits. So this is a
10:33 - 10:38

machine learning based system for
detecting problems and highlighting edits
10:38 - 10:45

that may not be, may not be great and need
more attention. We have some additional
10:45 - 10:51

services that process our content for
consumption on mobile devices, chopping
10:51 - 10:56

pages up into bits and pieces that then
can be consumed individually and many,
10:56 - 11:08

many more. In the background, we also have
to manage events, right, we use Kafka for
11:08 - 11:15

message queuing, and we use that to notify
different parts of the system about
11:15 - 11:20

changes. On the one hand, we use that to
feed the job runners that I just
11:20 - 11:28

mentioned. But we also use it, for
instance, to purge the entries in the
11:28 - 11:35

CDN when pages become updated and things
like that. OK, the next session is going
11:35 - 11:40

to be about the databases. Are there, very
quickly, we will have quite a bit of time
11:40 - 11:45

for discussion afterwards. But are there
any questions right now about what we said
11:45 - 11:57

so far? Everything extremely crystal
clear. OK, no clarity is left? I see. Oh,
11:57 - 12:08

one question, in the back.
Q: Can you maybe turn the volume up a
12:08 - 12:20

little bit? Thank you.
Daniel: Yeah, I think this is your
12:20 - 12:28

section, right? Oh, its Amir again. Sorry.
Amir: So I want to talk about my favorite
12:28 - 12:32

topic, the dungeons of, dungeons of every
production system, databases. The database
12:32 - 12:40

of Wikipedia is really interesting and
complicated on its own. We use MariaDB, we
12:40 - 12:46

switched from MySQL in 2013 for lots of
complicated reasons. As, as I said,
12:46 - 12:50

because we are really open source, you can
go and not just check our database tree,
12:50 - 12:55

that says, like, how it looks and what's
the replicas and masters. Actually, you
12:55 - 13:00

can even query the Wikipedia's database
live when you have that, you can just go
13:00 - 13:03

to that address and login with your
Wikipedia account and just can do whatever
13:03 - 13:07

you want. Like, it was a funny thing that
a couple of months ago, someone sent me a
13:07 - 13:13

message, sent me a message like, oh, I
found a security issue. You can just query
13:13 - 13:18

Wikipedia's database. I was like, no, no,
it's actually, we, we let this happen.
13:18 - 13:22

It's like, it's sanitized. We removed the
password hashes and everything. But still,
13:22 - 13:28

you can use this. And, but if you wanted
to say, like, how the clusters work, the
13:28 - 13:32

database clusters, because it gets too
big, they first started sharding, but now
13:32 - 13:36

we have sections that are basically
different clusters. Uh, really large wikis
13:36 - 13:43

have their own section. For example,
English Wikipedia is s1. German Wikipedia
13:43 - 13:51

with two or three other small wikis are in
s5. Wikidata is on s8, and so on. And
13:51 - 13:56

each section have a master and several
replicas. But one of the replicas is
13:56 - 14:02

actually a master in another data center
because of the failover that I told you.
14:02 - 14:08

So it is, basically two layers of
replication exist. This is, what I'm
14:08 - 14:13

telling you, is about metadata. But for
Wikitext, we also need to have a complete
14:13 - 14:19

different set of databases. But it can be,
we use consistent hashing to just scale it
14:19 - 14:28

horizontally so we can just put more
databases on it, for that. Uh, but I don't
14:28 - 14:32

know if you know it, but Wikipedia stores
every edit. So you have the text of,
14:32 - 14:37

Wikitext of every edit in the whole
history in the database. Uhm, also we have
14:37 - 14:42

parser cache that Daniel explained, and
parser cache is also consistent hashing.
14:42 - 14:47

So we just can horizontally scale it. But
for metadata, it is slightly more
14:47 - 14:56

complicated. Um, metadata shows and is
being used to render the page. So in order
14:56 - 15:02

to do this, this is, for example, a very
short version of the database tree that I
15:02 - 15:07

showed you. You can even go and look for
other ones but this is a s1. s1 eqiad this
15:07 - 15:12

is the main data center the master is this
number and it replicates to some of this
15:12 - 15:17

and then this 7, the second one that this
was with 2000 because it's the second data
15:17 - 15:25

center and it's a master of the other one.
And it has its own replications
15:25 - 15:31

between cross three replications because
the master, that master data center is in
15:31 - 15:37

Ashburn, Virginia. The second data center
is in Dallas, Texas. So they need to have a
15:37 - 15:43

cross DC replication and that happens
with a TLS to make sure that no one starts
15:43 - 15:49

to listen to, in between these two, and we
have snapshots and even dumps of the whole
15:49 - 15:53

history of Wikipedia. You can go to
dumps.wikimedia.org and download the whole
15:53 - 15:59

reserve every wiki you want, except the
ones that we had to remove for privacy
15:59 - 16:05

reasons and with a lots and lots of
backups. I recently realized we have lots
16:05 - 16:15

of backups. And in total it is 570 TB of data
and total 150 database servers and a
16:15 - 16:20

queries that happens to them is around
350,000 queries per second and, in total,
16:20 - 16:29

it requires 70 terabytes of RAM. So and
also we have another storage section that
16:29 - 16:35

called Elasticsearch which you can guess
it- it's being used for search, on the top
16:35 - 16:39

right, if you're using desktop. It's
different in mobile, I think. And also it
16:39 - 16:45

depends on if you're rtl language as well,
but also it runs by a team called search
16:45 - 16:48

platform because none of us are from
search platform we cannot explain it this
16:48 - 16:54

much we don't know much how it works it
slightly. Also we have a media storage for
16:54 - 16:58

all of the free pictures that's being
uploaded to Wikimedia like, for example,
16:58 - 17:02

if you have a category in Commons. Commons
is our wiki that holds all of the free
17:02 - 17:08

media and if we have a category in Commons
called cats looking at left and you have
17:08 - 17:16

category cats looking at right so we have
lots and lots of images. It's 390 terabytes
17:16 - 17:21

of media, 1 billion object and uses Swift.
Swift is the object is storage component
17:21 - 17:29

of OpenStack and it has it has several
layers of caching, frontend, backend.
17:29 - 17:37

Yeah, that's mostly it. And we want to
talk about traffic now and so this picture
17:37 - 17:44

is when Sweden in 1967 moved from a left-
driving from left to there driving to
17:44 - 17:49

right. This is basically what happens in
Wikipedia infrastructure as well. So we
17:49 - 17:55

have five caching layers and the most
recent one is eqsin which is in Singapore,
17:55 - 17:59

the three one are just CDN ulsfo, codfw,
esams and eqsin. Sorry, ulsfo, esams and
17:59 - 18:07

eqsin are just CDNs. We have also two
points of presence, one in Chicago and the
18:07 - 18:15

other one is also in Amsterdam, but we
don't get to that. So, we have, as I said,
18:15 - 18:20

we have our own content delivery network
with our traffic or allocation is done by
18:20 - 18:27

GeoDNS which actually is written and
maintained by one of the traffic people,
18:27 - 18:32

and we can pool and depool DCs. It has a
time to live of 10 minute- 10 minutes, so
18:32 - 18:38

if a data center goes down. We have - it
takes 10 minutes to actually propagate for
18:38 - 18:47

being depooled and repooled again. And we
use LVS as transport layer and this layer
18:47 - 18:56

3 and 4 of the Linux load balancer for
Linux and supports consistent hashing and
18:56 - 19:01

also we ever got we grow so big that we
needed to have something that manages the
19:01 - 19:07

load balancer so we wrote something our
own system is called pybal. And also we -
19:07 - 19:11

lots of companies actually peer with us. We
for example directly connect to
19:11 - 19:20

Amsterdam amps X. So this is how the
caching works, which is, anyway, it's
19:20 - 19:25

there is lots of reasons for this. Let's
just get the started. We use TLS, we
19:25 - 19:31

support TLS 1.2 where we have K then
the first layer we have nginx-. Do you
19:31 - 19:40

know it - does anyone know what nginx-
means? And so that's related but not - not
19:40 - 19:47

correct. So we have nginx which is the free
version and we have nginx plus which is
19:47 - 19:52

the commercial version and nginx. But we
don't use nginx to do load balancing or
19:52 - 19:56

anything so we stripped out everything
from it, and we just use it for TLS
19:56 - 20:02

termination so we call it nginx-, is an
internal joke. So and then we have Varnish
20:02 - 20:10

frontend. Varnish also is a caching layer
and this is the frontend is on the memory
20:10 - 20:15

which is very very fast and you have the
backend which is on the storage and the
20:15 - 20:23

hard disk but this is slow. The fun thing
is like just CDN caching layer takes 90%
20:23 - 20:27

of our requests. Its response and 90% of
because just gets to the Varnish and just
20:27 - 20:35

return and then with doesn't work it goes
through the application layer. The Varnish
20:35 - 20:41

holds-- it has a TTL of 24 hours so if you
change an article, it also get invalidated
20:41 - 20:47

by the application. So if someone added the
CDN actually purges the result. And the
20:47 - 20:52

thing is, the frontend is shorted that can
spike by request so you come here load
20:52 - 20:56

balancer just randomly sends your request
to a frontend but then the backend is
20:56 - 21:01

actually, if the frontend can't find it,
it sends it to the backend and the backend
21:01 - 21:10

is actually sort of - how is it called? -
it's a used hash by request, so, for
21:10 - 21:15

example, article of Barack Obama is only
being served from one node in the data
21:15 - 21:22

center in the CDN. If none of this works it
actually hits the other data center. So,
21:22 - 21:30

yeah, I actually explained all of this. So
we have two - two caching clusters and one
21:30 - 21:36

is called text and the other one is called
upload, it's not confusing at all, and if
21:36 - 21:43

you want to find out, you can just do mtr
en.wikipedia.org and you - you're - the end
21:43 - 21:50

node is text-lb.wikimedia.org which is the
our text storage but if you go to
21:50 - 21:58

upload.wikimedia.org, you get to hit the
upload cluster. Yeah this is so far, what
21:58 - 22:04

is it, and it has lots of problems because
a) varnish is open core, so the version
22:04 - 22:09

that you use is open source we don't use
the commercial one, but the open core one
22:09 - 22:21

doesn't support TLS. What? What happened?
Okay. No, no, no! You should I just-
22:21 - 22:36

you're not supposed to see this. Okay,
sorry for the- huh? Okay, okay sorry. So
22:36 - 22:40

Varnish has lots of problems, Varnish is
open core, it doesn't support TLS
22:40 - 22:45

termination which makes us to have this
nginx- their system just to do TLS
22:45 - 22:50

termination, makes our system complicated.
It doesn't work very well with so if that
22:50 - 22:56

causes us to have a cron job to restart
every Varnish node twice a week. We have a
22:56 - 23:04

cron job that this restarts every Vanish
node which is embarrassing, but also, on
23:04 - 23:09

the other hand then the end of Varnish
like backend wants to talk to the
23:09 - 23:13

application layer, it also doesn't support
terminate - TLS termination, so we use
23:13 - 23:20

IPSec which is even more embarrassing, but
we are changing it. So we call it, if you
23:20 - 23:25

are using a particular fixed server which
is very very nice and it's also open
23:25 - 23:31

source, a fully open source like in with
Apache Foundation, Apache does the TLS,
23:31 - 23:37

does the TLS by termination and still
for now we have a Varnish frontend that
23:37 - 23:45

still exists but a backend is also going
to change to the ATS, so we call this ATS
23:45 - 23:50

sandwich. Two ATS happening between and
there the middle there's a Varnish. The
23:50 - 23:55

good thing is that the TLS termination
when it moves to ATS, you can actually use
23:55 - 24:01

TLS 1.3 which is more modern and more
secure and even very faster so it
24:01 - 24:06

basically drops 100 milliseconds from
every request that goes to Wikipedia.
24:06 - 24:12

That translates to centuries of our
users' time every month, but ATS is going
24:12 - 24:19

on and hopefully it will go live soon and
once these are done, so this is the new
24:19 - 24:26

version. And, as I said, the TLS and when
we can do this we can actually use the
24:26 - 24:37

more secure instead of IPSec to talk about
between data centers. Yes. And now it's
24:37 - 24:42

time that Lucas talks about what happens
when you type in en.wikipedia.org.
24:42 - 24:45

Lucas: Yes, this makes sense, thank you.
24:45 - 24:49

So, first of all, what you see on the
slide here as the image doesn't really
24:49 - 24:52

have anything to do with what happens when
you type in wikipedia.org because it's an
24:52 - 24:57

offline Wikipedia reader but it's just a
nice image. So this is basically a summary
24:57 - 25:03

of everything they already said, so if,
which is the most common case, you are
25:03 - 25:11

lucky and get a URL which is cached, then,
so, first your computer asked for the IP
25:11 - 25:16

address of en.wikipedia.org it reaches
this whole DNS daemon and because we're at
25:16 - 25:19

Congress here it tells you the closest
data center is the one in Amsterdam, so
25:19 - 25:26

esams and it's going to hit the edge, what
we call load bouncers/router there, then
25:26 - 25:32

going through TLS termination through
nginx- and then it's going to hit the
25:32 - 25:37

Varnish caching server, either frontend or
backends and then you get a response and
25:37 - 25:41

that's already it and nothing else is ever
bothered again. It doesn't even reach any
25:41 - 25:46

other data center which is very nice and
so that's, you said around 90% of the
25:46 - 25:52

requests we get, and if you're unlucky and
the URL you requested is not in the
25:52 - 25:57

Varnish in the Amsterdam data center then
it gets forwarded to the eqiad data
25:57 - 26:02

center, which is the primary one and there
it still has a chance to hit the cache and
26:02 - 26:05

perhaps this time it's there and then the
response is going to get cached in the
26:05 - 26:10

frontend, no, in the Amsterdam Varnish and
you're also going to get a response and we
26:10 - 26:14

still don't have to run any application
stuff. If we do have to hit any
26:14 - 26:17

application stuff and then Varnish is
going to forward that, if it's
26:17 - 26:23

upload.wikimedia.org, it goes to the media
storage Swift, if it's any other domain it
26:23 - 26:28

goes to MediaWiki and then MediaWiki does
a ton of work to connect to the database,
26:28 - 26:34

in this case the first shard for English
Wikipedia, get the wiki text from there,
26:34 - 26:39

get the wiki text of all the related pages
and templates. No, wait I forgot
26:39 - 26:44

something. First it checks if the HTML for
this page is available in parser cache, so
26:44 - 26:47

that's another caching layer, and this
application cache - this parser cache
26:47 - 26:54

might either be memcached or the database
cache behind it and if it's not there,
26:54 - 26:58

then it has to go get the wikitext, get
all the related things and render that
26:58 - 27:04

into HTML which takes a long time and goes
through some pretty ancient code and if
27:04 - 27:08

you are doing an edit or an upload, it's
even worse, because then always has to go
27:08 - 27:14

to MediaWiki and then it not only has to
store this new edit, either in the media
27:14 - 27:20

back-end or in the database, it also has
update a bunch of stuff, like, especially
27:20 - 27:25

if you-- first of all, it has to purge the
cache, it has to tell all the Varnish
27:25 - 27:29

servers that there's a new version of this
URL available so that it doesn't take a
27:29 - 27:34

full day until the time-to-live expires.
It also has to update a bunch of things,
27:34 - 27:39

for example, if you edited a template, it
might have been used in a million pages
27:39 - 27:44

and the next time anyone requests one of
those million pages, those should also
27:44 - 27:49

actually be rendered again using the new
version of the template so it has to
27:49 - 27:54

invalidate the cache for all of those and
all that is deferred through the job queue
27:54 - 28:01

and it might have to calculate thumbnails
if you uploaded the file or create a -
28:01 - 28:07

retranscode media files because maybe you
uploaded in - what do we support? - you
28:07 - 28:10

upload in WebM and the browser only
supports some other media codec or
28:10 - 28:13

something, we transcode that and also
encode it down to the different
28:13 - 28:20

resolutions, so then it goes through that
whole dance and, yeah, that was already
28:20 - 28:24

those slides. Is Amir going to talk again
about how we manage -
28:24 - 28:30

Amir: I mean okay yeah I quickly come back
just for a short break to talk about
28:30 - 28:37

managing to manage because managing 100-
1300 bare metal hardware plus a Kubernetes
28:37 - 28:43

cluster is not easy, so what we do is that
we use Puppet for configuration
28:43 - 28:48

management in our bare metal systems, it's
fun, five to 50,000 lines of Puppet code. I
28:48 - 28:52

mean, lines of code is not a great
indicator but you can roughly get an
28:52 - 28:59

estimate of how its things work and we
have 100,000 lines of Ruby and we have our
28:59 - 29:04

CI and CD cluster, we have so we don't
store anything in GitHub or GitLab, we
29:04 - 29:11

have our own system which is based on
Gerrit and for that we have a system of
29:11 - 29:16

Jenkins and the Jenkins does all of this
kind of things and also because we have a
29:16 - 29:22

Kubernetes cluster for services, some of
our services, if you make a merger change
29:22 - 29:26

in the Gerrit it also builds the Docker
files and containers and push it up to the
29:26 - 29:35

production and also in order to run remote
SSH commands, we have cumin that's like in
29:35 - 29:39

the house automation and we built this
farm for our systems and for example you
29:39 - 29:46

go there and say ok we pull this node or
run this command in all of the data
29:46 - 29:53

Varnish nodes that I told you like you
want to restart them. And with this I get
29:53 - 29:58

back to Lucas.
Lucas: So, I am going to talk a bit more
29:58 - 30:02

about Wikimedia Cloud Services which is a
bit different in that it's not really our
30:02 - 30:06

production stuff but it's where you
people, the volunteers of the Wikimedia
30:06 - 30:11

movement can run their own code, so you
can request a project which is kind of a
30:11 - 30:16

group of users and then you get assigned a
pool of you have this much CPU and this
30:16 - 30:21

much RAM and you can create virtual
machines with those resources and then do
30:21 - 30:29

stuff there and run basically whatever you
want, to create and boot and shut down the
30:29 - 30:33

VMs and stuff we use OpenStack and there's
a Horizon frontend for that which you use
30:33 - 30:36

through the browser and it's largely out
all the time but otherwise it works pretty
30:36 - 30:43

well. Internally, ideally you manage the
VMs using Puppet but a lot of people just
30:43 - 30:48

SSH in and then do whatever they need to
set up the VM manually and it happens,
30:48 - 30:53

well, and there's a few big projects like
Toolforge where you can run your own web-
30:53 - 30:57

based tools or the beta cluster which is
basically a copy of some of the biggest
30:57 - 31:02

wikis like there's a beta English
Wikipedia, beta Wikidata, beta Wikimedia
31:02 - 31:08

Commons using mostly the same
configuration as production but using the
31:08 - 31:12

current master version of the software
instead of whatever we deploy once a week so
31:12 - 31:16

if there's a bug, we see it earlier
hopefully, even if we didn't catch it
31:16 - 31:20

locally, because the beta cluster is more
similar to the production environment and
31:20 - 31:24

also the continuous - continuous
integration service run in Wikimedia Cloud
31:24 - 31:29

Services as well. Yeah and also you have
to have Kubernetes somewhere on these
31:29 - 31:34

slides right, so you can use that to
distribute work between the tools in
31:34 - 31:37

Toolforge or you can use the grid engine
which does a similar thing but it's like
31:37 - 31:43

three decades old and through five forks
now I think the current fork we use is son
31:43 - 31:47

of grid engine and I don't know what it
was called before, but that's Cloud
31:47 - 31:55

Services.
Amir: So in a nutshell, this is our - our
31:55 - 32:01

systems. We have 1300 bare metal services
with lots and lots of caching, like lots
32:01 - 32:07

of layers of caching, because mostly we
serves read and we can just keep them as a
32:07 - 32:12

cached version and all of this is open
source, you can contribute to it, if you
32:12 - 32:18

want to and there's a lot of configuration
is also open and I - this is the way I got
32:18 - 32:22

hired like I open it started contributing
to the system I feel like yeah we can-
32:22 - 32:32

come and work for us, so this is a -
Daniel: That's actually how all of us got
32:32 - 32:38

hired.
Amir: So yeah, and this is the whole thing
32:38 - 32:48

that happens in Wikimedia and if you want
to - no, if you want to help us, we are
32:48 - 32:51

hiring. You can just go to jobs at
wikimedia.org, if you want to work for
32:51 - 32:54

Wikimedia Foundation. If you want to work
with Wikimedia Deutschland, you can go to
32:54 - 32:59

wikimedia.de and at the bottom there's a
link for jobs because the links got too
32:59 - 33:03

long. If you can contribute, if you want
to contribute to us, there is so many ways
33:03 - 33:08

to contribute, as I said, there's so many
bugs, we have our own graphical system,
33:08 - 33:13

you can just look at the monitor and a
Phabricator is our bug tracker, you can
33:13 - 33:21

just go there and find the bug and fix
things. Actually, we have one repository
33:21 - 33:26

that is private but it only holds the
certificate for as TLS and things that are
33:26 - 33:31

really really private then we cannot
remove them. But also there are
33:31 - 33:34

documentations, the documentation for
infrastructure is at
33:34 - 33:40

wikitech.wikimedia.org and documentation
for configuration is at noc.wikimedia.org
33:40 - 33:47

plus the documentation of our codebase.
The documentation for MediaWiki itself is
33:47 - 33:53

at mediawiki.org and also we have a our
own system of URL shortener you can go to
33:53 - 33:59

w.wiki and short and shorten any URL in
Wikimedia structure so we reserved the
33:59 - 34:09

dollar sign for the donate site and yeah,
you have any questions, please.
34:09 - 34:17

Applause
34:17 - 34:22

Daniel: It's if you know we have quite a bit of
time for questions so if anything wasn't
34:22 - 34:27

clear or they're curious about anything
please, please ask.
34:27 - 34:37

AM: So one question what is not in the
presentation. Do you have any efforts with
34:37 - 34:42

hacking attacks?
Amir: So the first rule of security issues
34:42 - 34:49

is that we don't talk about security issues
but let's say this baby has all sorts of
34:49 - 34:56

attacks happening, we have usually we have
DDo. Once there was happening a couple of
34:56 - 35:00

months ago that was very successful. I
don't know if you read the news about
35:00 - 35:05

that, but we also, we have a infrastructure
to handle this, we have a security team
35:05 - 35:13

that handles these cases and yes.
AM: Hello how do you manage access to your
35:13 - 35:20

infrastructure from your employees?
Amir: So it's SS-- so we have a LDAP
35:20 - 35:25

group and LDAP for the web-based
systems but for SSH and for this ssh we
35:25 - 35:31

have strict protocols and then you get a
private key and some people usually
35:31 - 35:35

protect their private key using UV keys
and then you have you can SSH to the
35:35 - 35:40

system basically.
Lucas: Yeah, well, there's some
35:40 - 35:45

firewalling setup but there's only one
server for data center that you can
35:45 - 35:48

actually reach through SSH and then you
have to tunnel through that to get to any
35:48 - 35:51

other server.
Amir: And also, like, we have we have a
35:51 - 35:56

internal firewall and it's basically if
you go to the inside of the production you
35:56 - 36:01

cannot talk to the outside. You even, you
for example do git clone github.org, it
36:01 - 36:07

doesn't, github.com doesn't work. It
only can access tools that are for inside
36:07 - 36:13

Wikimedia Foundation infrastructure.
AM: Okay, hi, you said you do TLS
36:13 - 36:19

termination through nginx, do you still
allow non-HTTPS so it should be non-secure access.
36:19 - 36:23

Amir: No we dropped it a really long
time ago but also
36:23 - 36:25

Lucas: 2013 or so
Amir: Yeah, 2015
36:25 - 36:29

Lucas: 2015
Amir: 2013 started serving the most of the
36:29 - 36:36

traffic but 15, we dropped all of the
HTTP- non-HTTPS protocols and recently even
36:36 - 36:44

dropped and we are not serving any SSL
requests anymore and TLS 1.1 is also being
36:44 - 36:48

phased out, so we are sending you a warning
to the users like you're using TLS 1.1,
36:48 - 36:55

please migrate to these new things that
came out around 10 years ago, so yeah
36:55 - 37:00

Lucas: Yeah I think the deadline for that
is like February 2020 or something then
37:00 - 37:05

we'll only have TLS 1.2
Amir: And soon we are going to support TLS
37:05 - 37:07

1.3
Lucas: Yeah
37:07 - 37:12

Are there any questions?
Q: so does read-only traffic
37:12 - 37:18

from logged in users hit all the way
through to the parser cache or is there
37:18 - 37:22

another layer of caching for that?
Amir: Yes we, you bypass all of
37:22 - 37:28

that, you can.
Daniel: We need one more microphone. Yes,
37:28 - 37:34

it actually does and this is a pretty big
problem and something we want to look into
37:34 - 37:39

clears throat but it requires quite a
bit of rearchitecting. If you are
37:39 - 37:44

interested in this kind of thing, maybe
come to my talk tomorrow at noon.
37:44 - 37:49

Amir: Yeah one reason we can, we are
planning to do is active active so we have
37:49 - 37:56

two primaries and the read request gets
request - from like the users can hit
37:56 - 37:58

their secondary data center instead of the
main one.
37:58 - 38:04

Lucas: I think there was a question way in
the back there, for some time already
38:04 - 38:14

AM: Hi, I got a question. I read on the
Wikitech that you are using karate as a
38:14 - 38:19

validation platform for some parts, can
you tell us something about this or what
38:19 - 38:25

parts of Wikipedia or Wikimedia are hosted
on this platform?
38:25 - 38:30

Amir: I am I'm not oh sorry so I don't
know this kind of very very sure but take
38:30 - 38:34

it with a grain of salt but as far as I
know karate is used to build a very small
38:34 - 38:40

VMs in productions that we need for very
very small micro sites that we serve to
38:40 - 38:46

the users. So we built just one or two VMs,
we don't use it very as often as I think
38:46 - 38:55

so.
AM: Do you also think about open hardware?
38:55 - 39:04

Amir: I don't, you can
Daniel: Not - not for servers. I think for
39:04 - 39:08

the offline Reader project, but this is not
actually run by the Foundation, it's
39:08 - 39:10

supported but it's not something that the
Foundation does. They were sort of
39:10 - 39:15

thinking about open hardware but really
open hardware in practice usually means,
39:15 - 39:20

you - you don't, you know, if you really
want to go down to the chip design, it's
39:20 - 39:25

pretty tough, so yeah, it's- it's it- it's
usually not practical, sadly.
39:25 - 39:32

Amir: And one thing I can say but this is
that we have a some machine - machines that
39:32 - 39:37

are really powerful that we give to the
researchers to run analysis on the between
39:37 - 39:43

this itself and we needed to have GPUs for
those but the problem was - was there
39:43 - 39:49

wasn't any open source driver for them so
we migrated and use AMD I think, but AMD
39:49 - 39:54

didn't fit in the rack it was a quite a
endeavor to get it to work for our
39:54 - 40:04

researchers to help you CPU.
AM: I'm still impressed that you answer
40:04 - 40:11

90% out of the cache. Do all people access
the same pages or is the cache that huge?
40:11 - 40:21

So what percentage of - of the whole
database is in the cache then?
40:21 - 40:30

Daniel: I don't have the exact numbers to
be honest, but a large percentage of the
40:30 - 40:37

whole database is in the cache. I mean it
expires after 24 hours so really obscure
40:37 - 40:43

stuff isn't there but I mean it's- it's a-
it's a- it's a power-law distribution
40:43 - 40:48

right? You have a few pages that are
accessed a lot and you have many many many
40:48 - 40:55

pages that are not actually accessed
at all for a week or so except maybe for a
40:55 - 41:02

crawler, so I don't know a number. My
guess would be it's less than 50% that is
41:02 - 41:07

actually cached but, you know, that still
covers 90%-- it's probably the top 10% of
41:07 - 41:12

pages would still cover 90% of the
pageviews, but I don't-- this would be
41:12 - 41:16

actually-- I should look this up, it would
be interesting numbers to have, yes.
41:16 - 41:21

Lucas: Do you know if this is 90% of the
pageviews or 90% of the get requests
41:21 - 41:24

because, like, requests for the JavaScript
would also be cached more often, I assume
41:24 - 41:28

Daniel: I would expect that for non-
pageviews, it's even higher
41:28 - 41:30

Lucas: Yeah
Daniel: Yeah, because you know all the
41:30 - 41:34

icons and- and, you know, JavaScript
bundles and CSS and stuff doesn't ever
41:34 - 41:40

change
Lucas: I'm gonna say for every 180 min 90%
41:40 - 41:51

but there's a question back there
AM: Hey. Do your data centers run on green
41:51 - 41:55

energy?
Amir: Very valid question. So, the
41:55 - 42:03

Amsterdam city n1 is a full green but the
other ones are partially green, partially
42:03 - 42:11

coal and like gas. As far as I know, there
are some plans to make them move away from
42:11 - 42:15

it but the other hand we realized that if
we don't produce as much as a carbon
42:15 - 42:21

emission because we don't have much servers
and we don't use much data, there was a
42:21 - 42:27

summation and that we realized our carbon
emission is basically as the same as 200
42:27 - 42:35

and in the datacenter plus all of their
travel that all of this have to and all of
42:35 - 42:38

the events is 250 households, it's very
very small it's I think it's one
42:38 - 42:45

thousandth of the comparable
traffic with Facebook even if you just cut
42:45 - 42:51

down with the same traffic because
Facebook collects the data, it runs very
42:51 - 42:54

sophisticated machine learning algorithms
that's that's a real complicate, but for
42:54 - 43:01

Wikimedia, we don't do this so we don't
need much energy. Does - does the answer
43:01 - 43:05

your question?
Herald: Do we have any other
43:05 - 43:16

questions left? Yeah sorry
AM: hi how many developers do you need to
43:16 - 43:20

maintain the whole infrastructure and how
many developers or let's say head
43:20 - 43:24

developer hours you needed to build the
whole infrastructure like the question is
43:24 - 43:29

because what I find very interesting about
the talk it's a non-profit, so as an
43:29 - 43:34

example for other nonprofits is how much
money are we talking about in order to
43:34 - 43:39

build something like this as a digital
common.
43:46 - 43:49

Daniel: If this is just about actually
running all this so just operations is
43:49 - 43:54

less than 20 people I think which makes if
you if you basically divide the requests
43:54 - 44:00

per second by people you get to something
like 8,000 requests per second per
44:00 - 44:04

operations engineer which I think is a
pretty impressive number. This is probably
44:04 - 44:10

a lot higher I would I would really like
to know if there's any organization that
44:10 - 44:17

tops that. I don't actually know the whole
the the actual operations budget I know is
44:17 - 44:25

it two two-digit millions annually. Total
hours for building this over the last 18
44:25 - 44:29

years, I have no idea. For the for the
first five or so years, the people doing
44:29 - 44:35

it were actually volunteers. We still had
volunteer database administrators and
44:35 - 44:42

stuff until maybe ten years ago, eight
years ago, so yeah it's really nobody
44:42 - 44:45

did any accounting of this I can only
guess.
44:57 - 45:04

AM: Hello a tools question. I a few years
back I saw some interesting examples of
45:04 - 45:09

saltstack use for Wikimedia but right now
I see only Puppet that come in mentioned
45:09 - 45:18

so kind of what happened with that
Amir: I think we dished saltstack you -
45:18 - 45:23

I don't I cannot because none of us are in
the Cloud Services team and I don't think
45:23 - 45:27

I can answer you but if you look at the
wikitech.wikimedia.org, it's
45:27 - 45:31

probably if last time I checked says like
it's deprecated and obsolete we don't use
45:31 - 45:32

it anymore.
45:37 - 45:40

AM: Do you use the bat-ropes like the top
45:40 - 45:46

runners to fill spare capacity on the web
serving servers or do you have dedicated
45:46 - 45:52

servers for the roles.
Lucas: I think they're dedicated.
45:52 - 45:56

Amir: The job runners if you're asking job runners
are dedicated yes they are they are I
45:56 - 46:03

think 5 per primary data center so
Daniel: Yeah they don't, I mean do we do we
46:03 - 46:07

actually have any spare capacity on
anything? We don't have that much hardware
46:07 - 46:09

everything is pretty much at a hundred
percent.
46:09 - 46:14

Lucas: I think we still have some server
that is just called misc1111 or something
46:14 - 46:19

which run five different things at once,
you can look for those on wikitech.
46:19 - 46:26

Amir: But but we go oh sorry it's not five
it's 20 per data center 20 per primary
46:26 - 46:31

data center that's our job runner and they
run 700 jobs per second.
46:31 - 46:36

Lucas: And I think that does not include
the video scaler so those are separate
46:36 - 46:38

again
Amir: No, they merged them in like a month
46:38 - 46:40

ago
Lucas: Okay, cool
46:47 - 46:51

AM: Maybe a little bit off topic that can
tell us a little bit about decision making
46:51 - 46:56

process for- for technical decision,
architecture decisions, how does it work
46:56 - 47:02

in an organization like this: decision
making process for architectural
47:02 - 47:03

decisions for example.
47:08 - 47:11

Daniel: Yeah so Wikimedia has a
47:11 - 47:17

committee for making high-level technical
decisions, it's called a Wikimedia
47:17 - 47:24

Technical Committee, techcom and we run an
RFC process so any decision that is a
47:24 - 47:28

cross-cutting strategic are especially
hard to undo should go through this
47:28 - 47:34

process and it's pretty informal,
basically you file a ticket and start
47:34 - 47:38

this process. It gets announced
in the mailing list, hopefully you get
47:38 - 47:45

input and feedback and at some point it is
it's approved for implementation. We're
47:45 - 47:49

currently looking into improving this
process, it's not- sometimes it works
47:49 - 47:52

pretty well, sometimes things don't get
that much feedback but it still it makes
47:52 - 47:56

sure that people are aware of these high-
level decisions
47:56 - 48:00

Amir: Daniel is the chair of that
committee
48:02 - 48:08

Daniel: Yeah, if you want to complain
about the process, please do.
48:14 - 48:21

AM: yes regarding CI and CD across along the
pipeline, of course with that much traffic
48:21 - 48:27

you want to keep everything consistent
right. So is there any testing
48:27 - 48:32

strategies that you have said internally,
like of course unit tests integration
48:32 - 48:36

tests but do you do something like
continuous end to end testing on beta
48:36 - 48:40

instances?
Amir: So if we have beta cluster but also
48:40 - 48:45

we do deploy, we call it train and so
we deploy once a week, all of the changes
48:45 - 48:50

gets merged to one, like a branch and the
branch gets cut in every Tuesday and it
48:50 - 48:55

first goes to the test wikis and
then it goes to all of the wikis that are
48:55 - 48:59

not Wikipedia except Catalan and Hebrew
Wikipedia. So basically Hebrew and Catalan
48:59 - 49:04

Wikipedia volunteer to be the guinea pigs
of the next wikis and if everything works
49:04 - 49:08

fine usually it goes there and is like oh
the fatal mater and we have a logging and
49:08 - 49:13

then it's like okay we need to fix this
and we fix it immediately and then it goes
49:13 - 49:19

live to all wikis. This is one way of
looking at it well so okay yeah
49:19 - 49:23

Daniel: So, our test coverage is not as
great as it should be and so we kind of,
49:23 - 49:31

you know, abuse our users for this. We
are, of course, working to improve this
49:31 - 49:37

and one thing that we started recently is
a program for creating end-to-end tests
49:37 - 49:43

for all the API modules we have, in the
hope that we can thereby cover pretty much
49:43 - 49:50

all of the application logic bypassing the
user interface. I mean, full end-to-end
49:50 - 49:53

should, of course, include the user
interface but user interface tests are
49:53 - 49:58

pretty brittle and often tests you know
where things are on the screen and it just
49:58 - 50:03

seems to us that it makes a lot of sense
to have more- to have tests that actually
50:03 - 50:07

test the application logic for what the
system actually should be doing, rather
50:07 - 50:16

than what it should look like and, yeah,
we are currently working on making- so
50:16 - 50:20

yeah, basically this has been a proof of
concept and we're currently working to
50:20 - 50:27

actually integrate it in- in CI. That
perhaps should land once everyone is back
50:27 - 50:35

from the vacations and then we have to
write about a thousand or so tests, I
50:35 - 50:38

guess.
Lucas: I think there's also a plan to move
50:38 - 50:43

to a system where we actually deploy
basically after every commit and can
50:43 - 50:46

immediately roll back if something goes
wrong but that's more midterm stuff and
50:46 - 50:48

I'm not sure what the current status of
that proposal is
50:48 - 50:50

Amir: And it will be in Kubernetes, so it
will be completely different
50:50 - 50:56

Daniel: That would be amazing
Lucas: But right now, we are on this
50:56 - 51:00

weekly basis, if something goes wrong, we
roll back to the last week's version of
51:00 - 51:06

the code
Herald: Are there are any questions-
51:06 - 51:19

questions left? Sorry. Yeah. Okay, um, I
don't think so. So, yeah, thank you for
51:19 - 51:25

this wonderful talk. Thank you for all
your questions. Um, yeah, I hope you liked
51:25 - 51:30

it. Um, see you around, yeah.
51:30 - 51:34

Applause
51:34 - 51:39

Music
51:39 - 52:01

Subtitles created by c3subtitles.de
in the year 2021. Join, and help us!

Title:: 36C3 Wikipaka WG: Infrastructure of Wikipedia
Description:: more » « less
Video Language:: English
Duration:: 52:01

	fsjan edited English subtitles for 36C3 Wikipaka WG: Infrastructure of Wikipedia
	fsjan edited English subtitles for 36C3 Wikipaka WG: Infrastructure of Wikipedia
	C3Subtitles edited English subtitles for 36C3 Wikipaka WG: Infrastructure of Wikipedia
	C3Subtitles edited English subtitles for 36C3 Wikipaka WG: Infrastructure of Wikipedia
	C3Subtitles edited English subtitles for 36C3 Wikipaka WG: Infrastructure of Wikipedia

English subtitles

Revisions

Revision 5 Edited

fsjan

36C3 Wikipaka WG: Infrastructure of Wikipedia

Revisions

Our website uses cookies

Operating cookies (Required)