WEBVTT

00:00:00.000 --> 00:00:18.772
<i>Music</i>

00:00:18.772 --> 00:00:25.332
Herald:Hi! Welcome, welcome in Wikipaka-
WG, in this extremely crowded Esszimmer.

00:00:25.332 --> 00:00:32.079
I'm Jakob, I'm your Herald for tonight
until 10:00 and I'm here to welcome you

00:00:32.079 --> 00:00:36.690
and to welcome these wonderful three guys
on the stage. They're going to talk about

00:00:36.690 --> 00:00:44.710
the infrastructure of Wikipedia.
And yeah, they are Lucas, Amir, and Daniel

00:00:44.710 --> 00:00:52.970
and I hope you'll have fun!
<i>Applause</i>

00:00:52.970 --> 00:00:57.059
Amir Sarabadani: Hello, my name is
Amir, um, I'm a software engineer at

00:00:57.059 --> 00:01:01.130
Wikimedia Deutschland, which is the German
chapter of Wikimedia Foundation. Wikimedia

00:01:01.130 --> 00:01:06.520
Foundation runs Wikipedia. Here is Lucas.
Lucas is also a software engineer, at

00:01:06.520 --> 00:01:10.300
Wikimedia Deutschland, and Daniel here is
a software architect at Wikimedia

00:01:10.300 --> 00:01:15.110
Foundation. We are all based in Germany,
Daniel in Leipzig, we are in Berlin. And

00:01:15.110 --> 00:01:21.420
today we want to talk about how we run
Wikipedia, with using donors' money and

00:01:21.420 --> 00:01:29.910
not lots of advertisement and collecting
data. So in this talk, first we are going

00:01:29.910 --> 00:01:34.860
to go on an inside-out approach. So we are
going to first talk about the application

00:01:34.860 --> 00:01:39.830
layer and then the outside layers, and
then we go to an outside-in approach and

00:01:39.830 --> 00:01:48.635
then talk about how you're going to hit
Wikipedia from the outside.

00:01:48.635 --> 00:01:53.320
So first of all, let's some,
let me get you some information. First of

00:01:53.320 --> 00:01:57.259
all, all of Wikimedia, Wikipedia
infrastructure is run by Wikimedia

00:01:57.259 --> 00:02:01.810
Foundation, an American nonprofit
charitable organization. We don't run any

00:02:01.810 --> 00:02:07.960
ads and we are only 370 people. If you
count Wikimedia Deutschland or all other

00:02:07.960 --> 00:02:12.500
chapters, it's around 500 people in total.
It's nothing compared to the companies

00:02:12.500 --> 00:02:19.530
outside. But all of the content is
managed by volunteers. Even our staff

00:02:19.530 --> 00:02:24.170
doesn't do edits, add content to
Wikipedia. And we support 300 languages,

00:02:24.170 --> 00:02:29.501
which is a very large number. And 
Wikipedia, it's eighteen years old, so it

00:02:29.501 --> 00:02:37.950
can vote now. And also, Wikipedia has some
really, really weird articles. Um, I want

00:02:37.950 --> 00:02:42.510
to ask you, what is your, if you have
encountered any really weird article

00:02:42.510 --> 00:02:47.970
in Wikipedia? My favorite is a list of
people who died on the toilet. But if you

00:02:47.970 --> 00:02:54.620
know anything, raise your hands. Uh, do
you know any weird articles in Wikipedia?

00:02:54.620 --> 00:02:58.750
Do you know some?
Daniel Kinzler: Oh, the classic one….

00:02:58.750 --> 00:03:03.600
Amir: You need to unmute yourself. Oh,
okay.

00:03:03.600 --> 00:03:09.551
Daniel: This is technology. I don't know
anything about technology. OK, no. The, my

00:03:09.551 --> 00:03:13.900
favorite example is "people killed by
their own invention". That's yeah. That's

00:03:13.900 --> 00:03:20.510
a lot of fun. Look it up. It's amazing.
Lucas Werkmeister: There's also a list,

00:03:20.510 --> 00:03:24.810
there is also a list of prison escapes
using helicopters. I almost said

00:03:24.810 --> 00:03:28.790
helicopter escapes using prisons, which
doesn't make any sense. But that was also

00:03:28.790 --> 00:03:31.830
a very interesting list.
Daniel: I think we also have a category of

00:03:31.830 --> 00:03:35.310
lists of lists of lists.
Amir: That's a page.

00:03:35.310 --> 00:03:39.040
Lucas: And every few months someone thinks
it's funny to redirect it to Russel's

00:03:39.040 --> 00:03:42.940
paradox or so.
Daniel: Yeah.

00:03:42.940 --> 00:03:49.209
Amir: But also beside that, people cannot
read Wikipedia in Turkey or China. But

00:03:49.209 --> 00:03:54.450
three days ago, actually, the block in
Turkey was ruled unconstitutional, but

00:03:54.450 --> 00:04:01.000
it's not lifted yet. Hopefully they will
lift it soon. Um, so Wikipedia, Wikimedia

00:04:01.000 --> 00:04:05.660
projects is just not Wikipedia. It's lots
and lots of projects. Some of them are not

00:04:05.660 --> 00:04:11.650
as successful as the Wikipedia. Um, uh,
like Wikinews. But uh, for example,

00:04:11.650 --> 00:04:16.190
Wikipedia is the most successful one, and
there's another one, that's Wikidata. It's

00:04:16.190 --> 00:04:21.680
being developed by Wikimedia Deutschland.
I mean the Wikidata team, with Lucas, um,

00:04:21.680 --> 00:04:26.520
and it's being used – it's infobox – it
has the data that Wikipedia or Google

00:04:26.520 --> 00:04:31.449
Knowledge Graph or Siri or Alexa uses.
It's basically, it's sort of a backbone of

00:04:31.449 --> 00:04:37.981
all of the data, uh, through the whole
Internet. Um, so our infrastructure. Let

00:04:37.981 --> 00:04:42.910
me… So first of all, our infrastructure is
all Open Source. By principle, we never

00:04:42.910 --> 00:04:48.081
use any commercial software. Uh, we could
use a lots of things. They are even

00:04:48.081 --> 00:04:54.330
sometimes were given us for free, but we
were, refused to use them. Second

00:04:54.330 --> 00:04:59.060
thing is we have two primary data center
for like failovers, when, for example, a

00:04:59.060 --> 00:05:03.960
whole datacenter goes offline, so we can
failover to another data center. We have

00:05:03.960 --> 00:05:11.100
three caching points of presence or
CDNs. Our CDNs are all over the world. Uh,

00:05:11.100 --> 00:05:15.180
also, we have our own CDN. We don't have,
we don't use CloudFlare, because

00:05:15.180 --> 00:05:20.960
CloudFlare, we care about the privacy of
the users and is very important that, for

00:05:20.960 --> 00:05:25.490
example, people edit from countries that
might be, uh, dangerous for them to edit

00:05:25.490 --> 00:05:29.810
Wikipedia. So we really care to keep the
data as protected as possible.

00:05:29.810 --> 00:05:32.400
<i>Applause</i>

00:05:32.400 --> 00:05:39.460
Amir: Uh, we have 17 billion page views
per month, and, which goes up and down

00:05:39.460 --> 00:05:44.350
based on the season and everything, we
have around 100 to 200 thousand requests

00:05:44.350 --> 00:05:48.449
per second. It's different from the
pageview because requests can be requests

00:05:48.449 --> 00:05:54.540
to the objects, can be API, can be lots of
things. And we have 300,000 new editors

00:05:54.540 --> 00:06:03.120
per month and we run all of this with 1300
bare metal servers. So right now, Daniel

00:06:03.120 --> 00:06:07.010
is going to talk about the application
layer and the inside of that

00:06:07.010 --> 00:06:11.830
infrastructure.
Daniel: Thanks, Amir. Oh, the clicky

00:06:11.830 --> 00:06:20.330
thing. Thank you. So the application layer
is basically the software that actually

00:06:20.330 --> 00:06:25.050
does what a wiki does, right? It lets you
edit pages, create or update pages and

00:06:25.050 --> 00:06:29.650
then search the page views. <i>interference
noise</i> The challenge for Wikipedia, of

00:06:29.650 --> 00:06:37.150
course, is serving all the many page views
that Amir just described. The core of the

00:06:37.150 --> 00:06:42.690
application is a classic LAMP application.
<i>interference noise</i> I have to stop

00:06:42.690 --> 00:06:50.130
moving. Yes? Is that it? It's a classic
LAMP stack application. So it's written in

00:06:50.130 --> 00:06:57.080
PHP, it runs on an Apache server. It uses
MySQL as a database in the backend. We

00:06:57.080 --> 00:07:01.630
used to use a HHVM instead of the… Yeah,
we…

00:07:01.630 --> 00:07:13.830
Herald: Hier. Sorry. Nimm mal das hier.
Daniel: Hello. We used to use HHVM as the

00:07:13.830 --> 00:07:20.810
PHP engine, but we just switched back to
the mainstream PHP, using PHP 7.2 now,

00:07:20.810 --> 00:07:24.720
because Facebook decided that HHVM is
going to be incompatible with the standard

00:07:24.720 --> 00:07:35.430
and they were just basically developing it
for, for themselves. Right. So we have

00:07:35.430 --> 00:07:42.740
separate clusters of servers for serving
requests, for serving different requests,

00:07:42.740 --> 00:07:48.020
page views on the one hand, and also
handling edits. Then we have a cluster for

00:07:48.020 --> 00:07:55.350
handling API calls and then we have a
bunch of servers set up to handle

00:07:55.350 --> 00:08:01.050
asynchronous jobs, things that happen in
the background, the job runners, and…

00:08:01.050 --> 00:08:05.240
I guess video scaling is a very obvious
example of that. It just takes too long to

00:08:05.240 --> 00:08:11.720
do it on the fly. But we use it for many
other things as well. MediaWiki, MediaWiki

00:08:11.720 --> 00:08:15.930
is kind of an amazing thing because you
can just install it on your own shared-

00:08:15.930 --> 00:08:23.419
hosting, 10-bucks-a-month's webspace and
it will run. But you can also use it to,

00:08:23.419 --> 00:08:29.270
you know, serve half the world. And so
it's a very powerful and versatile system,

00:08:29.270 --> 00:08:34.479
which also… I mean, this, this wide span
of different applications also creates

00:08:34.479 --> 00:08:41.000
problems. That's something that I will
talk about tomorrow. But for now, let's

00:08:41.000 --> 00:08:49.230
look at the fun things. So if you want to
serve a lot of page views, you have to do

00:08:49.230 --> 00:08:55.550
a lot of caching. And so we have a whole…
yeah, a whole set of different caching

00:08:55.550 --> 00:09:00.880
systems. The most important one is
probably the parser cache. So as you

00:09:00.880 --> 00:09:07.431
probably know, wiki pages are created in,
in a markup language, Wikitext, and they

00:09:07.431 --> 00:09:13.290
need to be parsed and turned into HTML.
And the result of that parsing is, of

00:09:13.290 --> 00:09:19.940
course, cached. And that cache is semi-
persistent, it… nothing really ever drops

00:09:19.940 --> 00:09:25.060
out of it. It's a huge thing. And it's, it
lives in a dedicated MySQL database

00:09:25.060 --> 00:09:33.490
system. Yeah. We use memcached a lot for
all kinds of miscellaneous things,

00:09:33.490 --> 00:09:38.930
anything that we need to keep around and
share between server instances. And we

00:09:38.930 --> 00:09:43.589
have been using redis for a while, for
anything that we want to have available,

00:09:43.589 --> 00:09:47.560
not just between different servers, but
also between different data centers,

00:09:47.560 --> 00:09:53.200
because redis is a bit better about
synchronizing things between, between

00:09:53.200 --> 00:09:59.820
different systems, we still use it for
session storage, especially, though we are

00:09:59.820 --> 00:10:09.600
about to move away from that and we'll be
using Cassandra for session storage. We

00:10:09.600 --> 00:10:19.310
have a bunch of additional services
running for specialized purposes, like

00:10:19.310 --> 00:10:27.120
scaling images, rendering formulas, math
formulas, ORES is pretty interesting. ORES

00:10:27.120 --> 00:10:33.400
is a system for automatically detecting
vandalism or rating edits. So this is a

00:10:33.400 --> 00:10:38.120
machine learning based system for
detecting problems and highlighting edits

00:10:38.120 --> 00:10:45.060
that may not be, may not be great and need
more attention. We have some additional

00:10:45.060 --> 00:10:50.940
services that process our content for
consumption on mobile devices, chopping

00:10:50.940 --> 00:10:56.480
pages up into bits and pieces that then
can be consumed individually and many,

00:10:56.480 --> 00:11:08.200
many more. In the background, we also have
to manage events, right, we use Kafka for

00:11:08.200 --> 00:11:14.640
message queuing, and we use that to notify
different parts of the system about

00:11:14.640 --> 00:11:19.980
changes. On the one hand, we use that to
feed the job runners that I just

00:11:19.980 --> 00:11:27.540
mentioned. But we also use it, for
instance, to purge the entries in the

00:11:27.540 --> 00:11:35.050
CDN when pages become updated and things
like that. OK, the next session is going

00:11:35.050 --> 00:11:40.269
to be about the databases. Are there, very
quickly, we will have quite a bit of time

00:11:40.269 --> 00:11:45.230
for discussion afterwards. But are there
any questions right now about what we said

00:11:45.230 --> 00:11:57.120
so far? Everything extremely crystal
clear. OK, no clarity is left? I see. Oh,

00:11:57.120 --> 00:12:07.570
one question, in the back.
Q: Can you maybe turn the volume up a

00:12:07.570 --> 00:12:20.220
little bit? Thank you.
Daniel: Yeah, I think this is your

00:12:20.220 --> 00:12:27.959
section, right? Oh, its Amir again. Sorry.
Amir: So I want to talk about my favorite

00:12:27.959 --> 00:12:32.279
topic, the dungeons of, dungeons of every
production system, databases. The database

00:12:32.279 --> 00:12:39.580
of Wikipedia is really interesting and
complicated on its own. We use MariaDB, we

00:12:39.580 --> 00:12:45.870
switched from MySQL in 2013 for lots of
complicated reasons. As, as I said,

00:12:45.870 --> 00:12:50.200
because we are really open source, you can
go and not just check our database tree,

00:12:50.200 --> 00:12:55.310
that says, like, how it looks and what's
the replicas and masters. Actually, you

00:12:55.310 --> 00:12:59.650
can even query the Wikipedia's database
live when you have that, you can just go

00:12:59.650 --> 00:13:02.930
to that address and login with your
Wikipedia account and just can do whatever

00:13:02.930 --> 00:13:07.430
you want. Like, it was a funny thing that
a couple of months ago, someone sent me a

00:13:07.430 --> 00:13:12.970
message, sent me a message like, oh, I
found a security issue. You can just query

00:13:12.970 --> 00:13:18.000
Wikipedia's database. I was like, no, no,
it's actually, we, we let this happen.

00:13:18.000 --> 00:13:21.900
It's like, it's sanitized. We removed the
password hashes and everything. But still,

00:13:21.900 --> 00:13:27.779
you can use this. And, but if you wanted
to say, like, how the clusters work, the

00:13:27.779 --> 00:13:32.029
database clusters, because it gets too
big, they first started sharding, but now

00:13:32.029 --> 00:13:36.279
we have sections that are basically
different clusters. Uh, really large wikis

00:13:36.279 --> 00:13:42.839
have their own section. For example,
English Wikipedia is s1. German Wikipedia

00:13:42.839 --> 00:13:50.820
with two or three other small wikis are in
s5. Wikidata is on s8, and so on. And

00:13:50.820 --> 00:13:56.250
each section have a master and several
replicas. But one of the replicas is

00:13:56.250 --> 00:14:01.700
actually a master in another data center
because of the failover that I told you.

00:14:01.700 --> 00:14:08.079
So it is, basically two layers of
replication exist. This is, what I'm

00:14:08.079 --> 00:14:13.070
telling you, is about metadata. But for
Wikitext, we also need to have a complete

00:14:13.070 --> 00:14:19.450
different set of databases. But it can be,
we use consistent hashing to just scale it

00:14:19.450 --> 00:14:27.630
horizontally so we can just put more
databases on it, for that. Uh, but I don't

00:14:27.630 --> 00:14:32.070
know if you know it, but Wikipedia stores
every edit. So you have the text of,

00:14:32.070 --> 00:14:36.930
Wikitext of every edit in the whole
history in the database. Uhm, also we have

00:14:36.930 --> 00:14:41.910
parser cache that Daniel explained, and
parser cache is also consistent hashing.

00:14:41.910 --> 00:14:47.000
So we just can horizontally scale it. But
for metadata, it is slightly more

00:14:47.000 --> 00:14:56.440
complicated. Um, metadata shows and is
being used to render the page. So in order

00:14:56.440 --> 00:15:01.680
to do this, this is, for example, a very
short version of the database tree that I

00:15:01.680 --> 00:15:07.019
showed you. You can even go and look for
other ones but this is a s1. s1 eqiad this

00:15:07.019 --> 00:15:12.100
is the main data center the master is this
number and it replicates to some of this

00:15:12.100 --> 00:15:16.860
and then this 7, the second one that this
was with 2000 because it's the second data

00:15:16.860 --> 00:15:24.750
center and it's a master of the other one.
And it has its own replications

00:15:24.750 --> 00:15:30.680
between cross three replications because
the master, that master data center is in

00:15:30.680 --> 00:15:37.399
Ashburn, Virginia. The second data center
is in Dallas, Texas. So they need to have a

00:15:37.399 --> 00:15:43.220
cross DC replication and that happens
with a TLS to make sure that no one starts

00:15:43.220 --> 00:15:49.200
to listen to, in between these two, and we
have snapshots and even dumps of the whole

00:15:49.200 --> 00:15:53.440
history of Wikipedia. You can go to
dumps.wikimedia.org and download the whole

00:15:53.440 --> 00:15:59.130
reserve every wiki you want, except the
ones that we had to remove for privacy

00:15:59.130 --> 00:16:04.899
reasons and with a lots and lots of
backups. I recently realized we have lots

00:16:04.899 --> 00:16:15.149
of backups. And in total it is 570 TB of data
and total 150 database servers and a

00:16:15.149 --> 00:16:20.269
queries that happens to them is around
350,000 queries per second and, in total,

00:16:20.269 --> 00:16:29.459
it requires 70 terabytes of RAM. So and
also we have another storage section that

00:16:29.459 --> 00:16:35.000
called Elasticsearch which you can guess
it- it's being used for search, on the top

00:16:35.000 --> 00:16:39.050
right, if you're using desktop. It's
different in mobile, I think. And also it

00:16:39.050 --> 00:16:44.610
depends on if you're rtl language as well,
but also it runs by a team called search

00:16:44.610 --> 00:16:47.550
platform because none of us are from
search platform we cannot explain it this

00:16:47.550 --> 00:16:54.010
much we don't know much how it works it
slightly. Also we have a media storage for

00:16:54.010 --> 00:16:58.420
all of the free pictures that's being
uploaded to Wikimedia like, for example,

00:16:58.420 --> 00:17:02.400
if you have a category in Commons. Commons
is our wiki that holds all of the free

00:17:02.400 --> 00:17:08.130
media and if we have a category in Commons
called cats looking at left and you have

00:17:08.130 --> 00:17:15.630
category cats looking at right so we have
lots and lots of images. It's 390 terabytes

00:17:15.630 --> 00:17:20.620
of media, 1 billion object and uses Swift.
Swift is the object is storage component

00:17:20.620 --> 00:17:29.190
of OpenStack and it has it has several
layers of caching, frontend, backend.

00:17:29.190 --> 00:17:36.799
Yeah, that's mostly it. And we want to
talk about traffic now and so this picture

00:17:36.799 --> 00:17:43.929
is when Sweden in 1967 moved from a left-
driving from left to there driving to

00:17:43.929 --> 00:17:48.999
right. This is basically what happens in
Wikipedia infrastructure as well. So we

00:17:48.999 --> 00:17:54.942
have five caching layers and the most
recent one is eqsin which is in Singapore,

00:17:54.942 --> 00:17:59.310
the three one are just CDN ulsfo, codfw,
esams and eqsin. Sorry, ulsfo, esams and

00:17:59.310 --> 00:18:06.590
eqsin are just CDNs. We have also two
points of presence, one in Chicago and the

00:18:06.590 --> 00:18:15.080
other one is also in Amsterdam, but we
don't get to that. So, we have, as I said,

00:18:15.080 --> 00:18:20.230
we have our own content delivery network
with our traffic or allocation is done by

00:18:20.230 --> 00:18:26.860
GeoDNS which actually is written and
maintained by one of the traffic people,

00:18:26.860 --> 00:18:32.140
and we can pool and depool DCs. It has a
time to live of 10 minute- 10 minutes, so

00:18:32.140 --> 00:18:37.950
if a data center goes down. We have - it
takes 10 minutes to actually propagate for

00:18:37.950 --> 00:18:47.110
being depooled and repooled again. And we
use LVS as transport layer and this layer

00:18:47.110 --> 00:18:55.799
3 and 4 of the Linux load balancer for
Linux and supports consistent hashing and

00:18:55.799 --> 00:19:00.679
also we ever got we grow so big that we
needed to have something that manages the

00:19:00.679 --> 00:19:07.100
load balancer so we wrote something our
own system is called pybal. And also we -

00:19:07.100 --> 00:19:11.210
lots of companies actually peer with us. We
for example directly connect to

00:19:11.210 --> 00:19:20.440
Amsterdam amps X. So this is how the
caching works, which is, anyway, it's

00:19:20.440 --> 00:19:24.779
there is lots of reasons for this. Let's
just get the started. We use TLS, we

00:19:24.779 --> 00:19:31.080
support TLS 1.2 where we have K then
the first layer we have nginx-. Do you

00:19:31.080 --> 00:19:40.049
know it - does anyone know what nginx-
means? And so that's related but not - not

00:19:40.049 --> 00:19:46.780
correct. So we have nginx which is the free
version and we have nginx plus which is

00:19:46.780 --> 00:19:51.729
the commercial version and nginx. But we
don't use nginx to do load balancing or

00:19:51.729 --> 00:19:56.389
anything so we stripped out everything
from it, and we just use it for TLS

00:19:56.389 --> 00:20:02.019
termination so we call it nginx-, is an
internal joke. So and then we have Varnish

00:20:02.019 --> 00:20:09.809
frontend. Varnish also is a caching layer
and this is the frontend is on the memory

00:20:09.809 --> 00:20:15.000
which is very very fast and you have the
backend which is on the storage and the

00:20:15.000 --> 00:20:22.559
hard disk but this is slow. The fun thing
is like just CDN caching layer takes 90%

00:20:22.559 --> 00:20:26.869
of our requests. Its response and 90% of
because just gets to the Varnish and just

00:20:26.869 --> 00:20:34.720
return and then with doesn't work it goes
through the application layer. The Varnish

00:20:34.720 --> 00:20:41.259
holds-- it has a TTL of 24 hours so if you
change an article, it also get invalidated

00:20:41.259 --> 00:20:47.159
by the application. So if someone added the
CDN actually purges the result. And the

00:20:47.159 --> 00:20:52.330
thing is, the frontend is shorted that can
spike by request so you come here load

00:20:52.330 --> 00:20:56.470
balancer just randomly sends your request
to a frontend but then the backend is

00:20:56.470 --> 00:21:00.989
actually, if the frontend can't find it,
it sends it to the backend and the backend

00:21:00.989 --> 00:21:09.700
is actually sort of - how is it called? -
it's a used hash by request, so, for

00:21:09.700 --> 00:21:15.402
example, article of Barack Obama is only
being served from one node in the data

00:21:15.402 --> 00:21:22.059
center in the CDN. If none of this works it
actually hits the other data center. So,

00:21:22.059 --> 00:21:29.940
yeah, I actually explained all of this. So
we have two - two caching clusters and one

00:21:29.940 --> 00:21:35.820
is called text and the other one is called
upload, it's not confusing at all, and if

00:21:35.820 --> 00:21:42.559
you want to find out, you can just do mtr
en.wikipedia.org and you - you're - the end

00:21:42.559 --> 00:21:49.909
node is text-lb.wikimedia.org which is the
our text storage but if you go to

00:21:49.909 --> 00:21:57.789
upload.wikimedia.org, you get to hit the
upload cluster. Yeah this is so far, what

00:21:57.789 --> 00:22:03.669
is it, and it has lots of problems because
a) varnish is open core, so the version

00:22:03.669 --> 00:22:09.309
that you use is open source we don't use
the commercial one, but the open core one

00:22:09.309 --> 00:22:21.009
doesn't support TLS. What? What happened?
Okay. No, no, no! You should I just-

00:22:21.009 --> 00:22:35.789
you're not supposed to see this. Okay,
sorry for the- huh? Okay, okay sorry. So

00:22:35.789 --> 00:22:40.119
Varnish has lots of problems, Varnish is
open core, it doesn't support TLS

00:22:40.119 --> 00:22:45.220
termination which makes us to have this
nginx- their system just to do TLS

00:22:45.220 --> 00:22:49.539
termination, makes our system complicated.
It doesn't work very well with so if that

00:22:49.539 --> 00:22:55.970
causes us to have a cron job to restart
every Varnish node twice a week. We have a

00:22:55.970 --> 00:23:04.330
cron job that this restarts every Vanish
node which is embarrassing, but also, on

00:23:04.330 --> 00:23:08.809
the other hand then the end of Varnish
like backend wants to talk to the

00:23:08.809 --> 00:23:13.010
application layer, it also doesn't support
terminate - TLS termination, so we use

00:23:13.010 --> 00:23:19.970
IPSec which is even more embarrassing, but
we are changing it. So we call it, if you

00:23:19.970 --> 00:23:25.080
are using a particular fixed server which
is very very nice and it's also open

00:23:25.080 --> 00:23:31.070
source, a fully open source like in with
Apache Foundation, Apache does the TLS,

00:23:31.070 --> 00:23:37.169
does the TLS by termination and still
for now we have a Varnish frontend that

00:23:37.169 --> 00:23:44.809
still exists but a backend is also going
to change to the ATS, so we call this ATS

00:23:44.809 --> 00:23:49.970
sandwich. Two ATS happening between and
there the middle there's a Varnish. The

00:23:49.970 --> 00:23:55.269
good thing is that the TLS termination
when it moves to ATS, you can actually use

00:23:55.269 --> 00:24:01.499
TLS 1.3 which is more modern and more
secure and even very faster so it

00:24:01.499 --> 00:24:05.889
basically drops 100 milliseconds from
every request that goes to Wikipedia.

00:24:05.889 --> 00:24:12.350
That translates to centuries of our
users' time every month, but ATS is going

00:24:12.350 --> 00:24:19.480
on and hopefully it will go live soon and
once these are done, so this is the new

00:24:19.480 --> 00:24:25.669
version. And, as I said, the TLS and when
we can do this we can actually use the

00:24:25.669 --> 00:24:36.519
more secure instead of IPSec to talk about
between data centers. Yes. And now it's

00:24:36.519 --> 00:24:42.260
time that Lucas talks about what happens
when you type in en.wikipedia.org.

00:24:42.260 --> 00:24:44.879

Lucas: Yes, this makes sense, thank you.

00:24:44.879 --> 00:24:49.070
So, first of all, what you see on the
slide here as the image doesn't really

00:24:49.070 --> 00:24:52.299
have anything to do with what happens when
you type in wikipedia.org because it's an

00:24:52.299 --> 00:24:57.249
offline Wikipedia reader but it's just a
nice image. So this is basically a summary

00:24:57.249 --> 00:25:02.850
of everything they already said, so if,
which is the most common case, you are

00:25:02.850 --> 00:25:10.969
lucky and get a URL which is cached, then,
so, first your computer asked for the IP

00:25:10.969 --> 00:25:15.619
address of en.wikipedia.org it reaches
this whole DNS daemon and because we're at

00:25:15.619 --> 00:25:19.239
Congress here it tells you the closest
data center is the one in Amsterdam, so

00:25:19.239 --> 00:25:25.759
esams and it's going to hit the edge, what
we call load bouncers/router there, then

00:25:25.759 --> 00:25:31.929
going through TLS termination through
nginx- and then it's going to hit the

00:25:31.929 --> 00:25:36.809
Varnish caching server, either frontend or
backends and then you get a response and

00:25:36.809 --> 00:25:40.940
that's already it and nothing else is ever
bothered again. It doesn't even reach any

00:25:40.940 --> 00:25:46.320
other data center which is very nice and
so that's, you said around 90% of the

00:25:46.320 --> 00:25:52.419
requests we get, and if you're unlucky and
the URL you requested is not in the

00:25:52.419 --> 00:25:57.400
Varnish in the Amsterdam data center then
it gets forwarded to the eqiad data

00:25:57.400 --> 00:26:01.519
center, which is the primary one and there
it still has a chance to hit the cache and

00:26:01.519 --> 00:26:04.840
perhaps this time it's there and then the
response is going to get cached in the

00:26:04.840 --> 00:26:09.739
frontend, no, in the Amsterdam Varnish and
you're also going to get a response and we

00:26:09.739 --> 00:26:13.639
still don't have to run any application
stuff. If we do have to hit any

00:26:13.639 --> 00:26:17.450
application stuff and then Varnish is
going to forward that, if it's

00:26:17.450 --> 00:26:22.970
upload.wikimedia.org, it goes to the media
storage Swift, if it's any other domain it

00:26:22.970 --> 00:26:28.450
goes to MediaWiki and then MediaWiki does
a ton of work to connect to the database,

00:26:28.450 --> 00:26:33.529
in this case the first shard for English
Wikipedia, get the wiki text from there,

00:26:33.529 --> 00:26:38.599
get the wiki text of all the related pages
and templates. No, wait I forgot

00:26:38.599 --> 00:26:43.519
something. First it checks if the HTML for
this page is available in parser cache, so

00:26:43.519 --> 00:26:46.909
that's another caching layer, and this
application cache - this parser cache

00:26:46.909 --> 00:26:53.529
might either be memcached or the database
cache behind it and if it's not there,

00:26:53.529 --> 00:26:57.679
then it has to go get the wikitext, get
all the related things and render that

00:26:57.679 --> 00:27:03.679
into HTML which takes a long time and goes
through some pretty ancient code and if

00:27:03.679 --> 00:27:07.779
you are doing an edit or an upload, it's
even worse, because then always has to go

00:27:07.779 --> 00:27:13.969
to MediaWiki and then it not only has to
store this new edit, either in the media

00:27:13.969 --> 00:27:19.629
back-end or in the database, it also has
update a bunch of stuff, like, especially

00:27:19.629 --> 00:27:25.200
if you-- first of all, it has to purge the
cache, it has to tell all the Varnish

00:27:25.200 --> 00:27:28.999
servers that there's a new version of this
URL available so that it doesn't take a

00:27:28.999 --> 00:27:33.940
full day until the time-to-live expires.
It also has to update a bunch of things,

00:27:33.940 --> 00:27:38.639
for example, if you edited a template, it
might have been used in a million pages

00:27:38.639 --> 00:27:43.750
and the next time anyone requests one of
those million pages, those should also

00:27:43.750 --> 00:27:49.019
actually be rendered again using the new
version of the template so it has to

00:27:49.019 --> 00:27:54.149
invalidate the cache for all of those and
all that is deferred through the job queue

00:27:54.149 --> 00:28:01.440
and it might have to calculate thumbnails
if you uploaded the file or create a -

00:28:01.440 --> 00:28:06.609
retranscode media files because maybe you
uploaded in - what do we support? - you

00:28:06.609 --> 00:28:09.839
upload in WebM and the browser only
supports some other media codec or

00:28:09.839 --> 00:28:12.869
something, we transcode that and also
encode it down to the different

00:28:12.869 --> 00:28:19.740
resolutions, so then it goes through that
whole dance and, yeah, that was already

00:28:19.740 --> 00:28:23.769
those slides. Is Amir going to talk again
about how we manage -

00:28:23.769 --> 00:28:29.519
Amir: I mean okay yeah I quickly come back
just for a short break to talk about

00:28:29.519 --> 00:28:36.690
managing to manage because managing 100-
1300 bare metal hardware plus a Kubernetes

00:28:36.690 --> 00:28:42.700
cluster is not easy, so what we do is that
we use Puppet for configuration

00:28:42.700 --> 00:28:48.220
management in our bare metal systems, it's
fun, five to 50,000 lines of Puppet code. I

00:28:48.220 --> 00:28:52.119
mean, lines of code is not a great
indicator but you can roughly get an

00:28:52.119 --> 00:28:59.149
estimate of how its things work and we
have 100,000 lines of Ruby and we have our

00:28:59.149 --> 00:29:04.429
CI and CD cluster, we have so we don't
store anything in GitHub or GitLab, we

00:29:04.429 --> 00:29:10.559
have our own system which is based on
Gerrit and for that we have a system of

00:29:10.559 --> 00:29:15.539
Jenkins and the Jenkins does all of this
kind of things and also because we have a

00:29:15.539 --> 00:29:21.960
Kubernetes cluster for services, some of
our services, if you make a merger change

00:29:21.960 --> 00:29:26.440
in the Gerrit it also builds the Docker
files and containers and push it up to the

00:29:26.440 --> 00:29:35.440
production and also in order to run remote
SSH commands, we have cumin that's like in

00:29:35.440 --> 00:29:39.200
the house automation and we built this
farm for our systems and for example you

00:29:39.200 --> 00:29:45.570
go there and say ok we pull this node or
run this command in all of the data

00:29:45.570 --> 00:29:52.889
Varnish nodes that I told you like you
want to restart them. And with this I get

00:29:52.889 --> 00:29:57.899
back to Lucas.
Lucas: So, I am going to talk a bit more

00:29:57.899 --> 00:30:01.929
about Wikimedia Cloud Services which is a
bit different in that it's not really our

00:30:01.929 --> 00:30:06.269
production stuff but it's where you
people, the volunteers of the Wikimedia

00:30:06.269 --> 00:30:11.489
movement can run their own code, so you
can request a project which is kind of a

00:30:11.489 --> 00:30:15.509
group of users and then you get assigned a
pool of you have this much CPU and this

00:30:15.509 --> 00:30:20.999
much RAM and you can create virtual
machines with those resources and then do

00:30:20.999 --> 00:30:29.119
stuff there and run basically whatever you
want, to create and boot and shut down the

00:30:29.119 --> 00:30:33.360
VMs and stuff we use OpenStack and there's
a Horizon frontend for that which you use

00:30:33.360 --> 00:30:36.409
through the browser and it's largely out
all the time but otherwise it works pretty

00:30:36.409 --> 00:30:42.619
well. Internally, ideally you manage the
VMs using Puppet but a lot of people just

00:30:42.619 --> 00:30:47.860
SSH in and then do whatever they need to
set up the VM manually and it happens,

00:30:47.860 --> 00:30:52.759
well, and there's a few big projects like
Toolforge where you can run your own web-

00:30:52.759 --> 00:30:57.499
based tools or the beta cluster which is
basically a copy of some of the biggest

00:30:57.499 --> 00:31:02.499
wikis like there's a beta English
Wikipedia, beta Wikidata, beta Wikimedia

00:31:02.499 --> 00:31:08.320
Commons using mostly the same
configuration as production but using the

00:31:08.320 --> 00:31:12.450
current master version of the software
instead of whatever we deploy once a week so

00:31:12.450 --> 00:31:15.840
if there's a bug, we see it earlier
hopefully, even if we didn't catch it

00:31:15.840 --> 00:31:20.279
locally, because the beta cluster is more
similar to the production environment and

00:31:20.279 --> 00:31:24.230
also the continuous - continuous
integration service run in Wikimedia Cloud

00:31:24.230 --> 00:31:28.979
Services as well. Yeah and also you have
to have Kubernetes somewhere on these

00:31:28.979 --> 00:31:33.609
slides right, so you can use that to
distribute work between the tools in

00:31:33.609 --> 00:31:37.179
Toolforge or you can use the grid engine
which does a similar thing but it's like

00:31:37.179 --> 00:31:42.519
three decades old and through five forks
now I think the current fork we use is son

00:31:42.519 --> 00:31:46.999
of grid engine and I don't know what it
was called before, but that's Cloud

00:31:46.999 --> 00:31:54.789
Services.
Amir: So in a nutshell, this is our - our

00:31:54.789 --> 00:32:01.090
systems. We have 1300 bare metal services
with lots and lots of caching, like lots

00:32:01.090 --> 00:32:06.919
of layers of caching, because mostly we
serves read and we can just keep them as a

00:32:06.919 --> 00:32:12.179
cached version and all of this is open
source, you can contribute to it, if you

00:32:12.179 --> 00:32:18.089
want to and there's a lot of configuration
is also open and I - this is the way I got

00:32:18.089 --> 00:32:21.940
hired like I open it started contributing
to the system I feel like yeah we can-

00:32:21.940 --> 00:32:31.549
come and work for us, so this is a -
Daniel: That's actually how all of us got

00:32:31.549 --> 00:32:38.350
hired.
Amir: So yeah, and this is the whole thing

00:32:38.350 --> 00:32:47.570
that happens in Wikimedia and if you want
to - no, if you want to help us, we are

00:32:47.570 --> 00:32:51.419
hiring. You can just go to jobs at
wikimedia.org, if you want to work for

00:32:51.419 --> 00:32:54.379
Wikimedia Foundation. If you want to work
with Wikimedia Deutschland, you can go to

00:32:54.379 --> 00:32:59.179
wikimedia.de and at the bottom there's a
link for jobs because the links got too

00:32:59.179 --> 00:33:03.469
long. If you can contribute, if you want
to contribute to us, there is so many ways

00:33:03.469 --> 00:33:07.929
to contribute, as I said, there's so many
bugs, we have our own graphical system,

00:33:07.929 --> 00:33:12.721
you can just look at the monitor and a
Phabricator is our bug tracker, you can

00:33:12.721 --> 00:33:20.639
just go there and find the bug and fix
things. Actually, we have one repository

00:33:20.639 --> 00:33:26.469
that is private but it only holds the
certificate for as TLS and things that are

00:33:26.469 --> 00:33:31.499
really really private then we cannot
remove them. But also there are

00:33:31.499 --> 00:33:33.779
documentations, the documentation for
infrastructure is at

00:33:33.779 --> 00:33:40.409
wikitech.wikimedia.org and documentation
for configuration is at noc.wikimedia.org

00:33:40.409 --> 00:33:46.599
plus the documentation of our codebase.
The documentation for MediaWiki itself is

00:33:46.599 --> 00:33:52.989
at mediawiki.org and also we have a our
own system of URL shortener you can go to

00:33:52.989 --> 00:33:58.789
w.wiki and short and shorten any URL in
Wikimedia structure so we reserved the

00:33:58.789 --> 00:34:08.779
dollar sign for the donate site and yeah,
you have any questions, please.

00:34:08.779 --> 00:34:16.540
<i>Applause</i>

00:34:16.540 --> 00:34:21.679
Daniel: It's if you know we have quite a bit of
time for questions so if anything wasn't

00:34:21.679 --> 00:34:27.149
clear or they're curious about anything
please, please ask.

00:34:27.149 --> 00:34:37.200
AM: So one question what is not in the
presentation. Do you have any efforts with

00:34:37.200 --> 00:34:42.460
hacking attacks?
Amir: So the first rule of security issues

00:34:42.460 --> 00:34:49.210
is that we don't talk about security issues
but let's say this baby has all sorts of

00:34:49.210 --> 00:34:56.240
attacks happening, we have usually we have
DDo. Once there was happening a couple of

00:34:56.240 --> 00:34:59.819
months ago that was very successful. I
don't know if you read the news about

00:34:59.819 --> 00:35:05.200
that, but we also, we have a infrastructure
to handle this, we have a security team

00:35:05.200 --> 00:35:12.740
that handles these cases and yes.
AM: Hello how do you manage access to your

00:35:12.740 --> 00:35:20.069
infrastructure from your employees?
Amir: So it's SS-- so we have a LDAP

00:35:20.069 --> 00:35:25.390
group and LDAP for the web-based
systems but for SSH and for this ssh we

00:35:25.390 --> 00:35:30.660
have strict protocols and then you get a
private key and some people usually

00:35:30.660 --> 00:35:35.480
protect their private key using UV keys
and then you have you can SSH to the

00:35:35.480 --> 00:35:40.420
system basically.
Lucas: Yeah, well, there's some

00:35:40.420 --> 00:35:44.720
firewalling setup but there's only one
server for data center that you can

00:35:44.720 --> 00:35:48.221
actually reach through SSH and then you
have to tunnel through that to get to any

00:35:48.221 --> 00:35:51.359
other server.
Amir: And also, like, we have we have a

00:35:51.359 --> 00:35:55.500
internal firewall and it's basically if
you go to the inside of the production you

00:35:55.500 --> 00:36:01.450
cannot talk to the outside. You even, you
for example do git clone github.org, it

00:36:01.450 --> 00:36:07.200
doesn't, github.com doesn't work. It
only can access tools that are for inside

00:36:07.200 --> 00:36:13.390
Wikimedia Foundation infrastructure.
AM: Okay, hi, you said you do TLS

00:36:13.390 --> 00:36:18.640
termination through nginx, do you still
allow non-HTTPS so it should be non-secure access.

00:36:18.640 --> 00:36:22.780
Amir: No we dropped it a really long
time ago but also

00:36:22.780 --> 00:36:25.069
Lucas: 2013 or so
Amir: Yeah, 2015

00:36:25.069 --> 00:36:28.651
Lucas: 2015
Amir: 2013 started serving the most of the

00:36:28.651 --> 00:36:35.740
traffic but 15, we dropped all of the
HTTP- non-HTTPS protocols and recently even

00:36:35.740 --> 00:36:43.940
dropped and we are not serving any SSL
requests anymore and TLS 1.1 is also being

00:36:43.940 --> 00:36:48.460
phased out, so we are sending you a warning
to the users like you're using TLS 1.1,

00:36:48.460 --> 00:36:54.810
please migrate to these new things that
came out around 10 years ago, so yeah

00:36:54.810 --> 00:36:59.849
Lucas: Yeah I think the deadline for that
is like February 2020 or something then

00:36:59.849 --> 00:37:04.710
we'll only have TLS 1.2
Amir: And soon we are going to support TLS

00:37:04.710 --> 00:37:06.640
1.3
Lucas: Yeah

00:37:06.640 --> 00:37:12.460
Are there any questions?
Q: so does read-only traffic

00:37:12.460 --> 00:37:18.029
from logged in users hit all the way
through to the parser cache or is there

00:37:18.029 --> 00:37:22.280
another layer of caching for that?
Amir: Yes we, you bypass all of

00:37:22.280 --> 00:37:28.470
that, you can.
Daniel: We need one more microphone. Yes,

00:37:28.470 --> 00:37:33.869
it actually does and this is a pretty big
problem and something we want to look into

00:37:33.869 --> 00:37:38.930
<i>clears throat</i> but it requires quite a
bit of rearchitecting. If you are

00:37:38.930 --> 00:37:44.250
interested in this kind of thing, maybe
come to my talk tomorrow at noon.

00:37:44.250 --> 00:37:48.819
Amir: Yeah one reason we can, we are
planning to do is active active so we have

00:37:48.819 --> 00:37:56.500
two primaries and the read request gets
request - from like the users can hit

00:37:56.500 --> 00:37:58.460
their secondary data center instead of the
main one.

00:37:58.460 --> 00:38:03.990
Lucas: I think there was a question way in
the back there, for some time already

00:38:03.990 --> 00:38:13.950
AM: Hi, I got a question. I read on the
Wikitech that you are using karate as a

00:38:13.950 --> 00:38:19.040
validation platform for some parts, can
you tell us something about this or what

00:38:19.040 --> 00:38:24.619
parts of Wikipedia or Wikimedia are hosted
on this platform?

00:38:24.619 --> 00:38:29.589
Amir: I am I'm not oh sorry so I don't
know this kind of very very sure but take

00:38:29.589 --> 00:38:34.390
it with a grain of salt but as far as I
know karate is used to build a very small

00:38:34.390 --> 00:38:39.829
VMs in productions that we need for very
very small micro sites that we serve to

00:38:39.829 --> 00:38:45.619
the users. So we built just one or two VMs,
we don't use it very as often as I think

00:38:45.619 --> 00:38:54.819
so.
AM: Do you also think about open hardware?

00:38:54.819 --> 00:39:03.950
Amir: I don't, you can
Daniel: Not - not for servers. I think for

00:39:03.950 --> 00:39:07.500
the offline Reader project, but this is not
actually run by the Foundation, it's

00:39:07.500 --> 00:39:10.289
supported but it's not something that the
Foundation does. They were sort of

00:39:10.289 --> 00:39:15.100
thinking about open hardware but really
open hardware in practice usually means,

00:39:15.100 --> 00:39:19.609
you - you don't, you know, if you really
want to go down to the chip design, it's

00:39:19.609 --> 00:39:25.210
pretty tough, so yeah, it's- it's it- it's
usually not practical, sadly.

00:39:25.210 --> 00:39:31.660
Amir: And one thing I can say but this is
that we have a some machine - machines that

00:39:31.660 --> 00:39:37.150
are really powerful that we give to the
researchers to run analysis on the between

00:39:37.150 --> 00:39:43.369
this itself and we needed to have GPUs for
those but the problem was - was there

00:39:43.369 --> 00:39:49.109
wasn't any open source driver for them so
we migrated and use AMD I think, but AMD

00:39:49.109 --> 00:39:53.609
didn't fit in the rack it was a quite a
endeavor to get it to work for our

00:39:53.609 --> 00:40:03.710
researchers to help you CPU.
AM: I'm still impressed that you answer

00:40:03.710 --> 00:40:10.920
90% out of the cache. Do all people access
the same pages or is the cache that huge?

00:40:10.920 --> 00:40:21.160
So what percentage of - of the whole
database is in the cache then?

00:40:21.160 --> 00:40:29.760
Daniel: I don't have the exact numbers to
be honest, but a large percentage of the

00:40:29.760 --> 00:40:36.769
whole database is in the cache. I mean it
expires after 24 hours so really obscure

00:40:36.769 --> 00:40:43.430
stuff isn't there but I mean it's- it's a-
it's a- it's a power-law distribution

00:40:43.430 --> 00:40:47.890
right? You have a few pages that are
accessed a lot and you have many many many

00:40:47.890 --> 00:40:55.420
pages that are not actually accessed
at all for a week or so except maybe for a

00:40:55.420 --> 00:41:01.740
crawler, so I don't know a number. My
guess would be it's less than 50% that is

00:41:01.740 --> 00:41:06.520
actually cached but, you know, that still
covers 90%-- it's probably the top 10% of

00:41:06.520 --> 00:41:11.630
pages would still cover 90% of the
pageviews, but I don't-- this would be

00:41:11.630 --> 00:41:15.509
actually-- I should look this up, it would
be interesting numbers to have, yes.

00:41:15.509 --> 00:41:20.710
Lucas: Do you know if this is 90% of the
pageviews or 90% of the get requests

00:41:20.710 --> 00:41:24.279
because, like, requests for the JavaScript
would also be cached more often, I assume

00:41:24.279 --> 00:41:27.529
Daniel: I would expect that for non-
pageviews, it's even higher

00:41:27.529 --> 00:41:30.010
Lucas: Yeah
Daniel: Yeah, because you know all the

00:41:30.010 --> 00:41:34.150
icons and- and, you know, JavaScript
bundles and CSS and stuff doesn't ever

00:41:34.150 --> 00:41:40.309
change
Lucas: I'm gonna say for every 180 min 90%

00:41:40.309 --> 00:41:50.790
but there's a question back there
AM: Hey. Do your data centers run on green

00:41:50.790 --> 00:41:55.220
energy?
Amir: Very valid question. So, the

00:41:55.220 --> 00:42:03.450
Amsterdam city n1 is a full green but the
other ones are partially green, partially

00:42:03.450 --> 00:42:10.840
coal and like gas. As far as I know, there
are some plans to make them move away from

00:42:10.840 --> 00:42:15.170
it but the other hand we realized that if
we don't produce as much as a carbon

00:42:15.170 --> 00:42:21.349
emission because we don't have much servers
and we don't use much data, there was a

00:42:21.349 --> 00:42:26.789
summation and that we realized our carbon
emission is basically as the same as 200

00:42:26.789 --> 00:42:34.720
and in the datacenter plus all of their
travel that all of this have to and all of

00:42:34.720 --> 00:42:37.880
the events is 250 households, it's very
very small it's I think it's one

00:42:37.880 --> 00:42:44.890
thousandth of the comparable
traffic with Facebook even if you just cut

00:42:44.890 --> 00:42:50.650
down with the same traffic because
Facebook collects the data, it runs very

00:42:50.650 --> 00:42:54.269
sophisticated machine learning algorithms
that's that's a real complicate, but for

00:42:54.269 --> 00:43:01.119
Wikimedia, we don't do this so we don't
need much energy. Does - does the answer

00:43:01.119 --> 00:43:04.920
your question?
Herald: Do we have any other

00:43:04.920 --> 00:43:15.720
questions left? Yeah sorry
AM: hi how many developers do you need to

00:43:15.720 --> 00:43:19.789
maintain the whole infrastructure and how
many developers or let's say head

00:43:19.789 --> 00:43:24.500
developer hours you needed to build the
whole infrastructure like the question is

00:43:24.500 --> 00:43:29.329
because what I find very interesting about
the talk it's a non-profit, so as an

00:43:29.329 --> 00:43:34.109
example for other nonprofits is how much
money are we talking about in order to

00:43:34.109 --> 00:43:38.760
build something like this as a digital
common.

00:43:45.630 --> 00:43:48.980
Daniel: If this is just about actually
running all this so just operations is

00:43:48.980 --> 00:43:53.530
less than 20 people I think which makes if
you if you basically divide the requests

00:43:53.530 --> 00:43:59.869
per second by people you get to something
like 8,000 requests per second per

00:43:59.869 --> 00:44:04.369
operations engineer which I think is a
pretty impressive number. This is probably

00:44:04.369 --> 00:44:09.809
a lot higher I would I would really like
to know if there's any organization that

00:44:09.809 --> 00:44:17.270
tops that. I don't actually know the whole
the the actual operations budget I know is

00:44:17.270 --> 00:44:24.559
it two two-digit millions annually. Total
hours for building this over the last 18

00:44:24.559 --> 00:44:29.069
years, I have no idea. For the for the
first five or so years, the people doing

00:44:29.069 --> 00:44:34.609
it were actually volunteers. We still had
volunteer database administrators and

00:44:34.609 --> 00:44:42.160
stuff until maybe ten years ago, eight
years ago, so yeah it's really nobody

00:44:42.160 --> 00:44:44.589
did any accounting of this I can only
guess.

00:44:56.669 --> 00:45:03.810
AM: Hello a tools question. I a few years
back I saw some interesting examples of

00:45:03.810 --> 00:45:09.089
saltstack use for Wikimedia but right now
I see only Puppet that come in mentioned

00:45:09.089 --> 00:45:17.819
so kind of what happened with that
Amir: I think we dished saltstack you -

00:45:17.819 --> 00:45:22.970
I don't I cannot because none of us are in
the Cloud Services team and I don't think

00:45:22.970 --> 00:45:27.380
I can answer you but if you look at the
wikitech.wikimedia.org, it's

00:45:27.380 --> 00:45:30.869
probably if last time I checked says like
it's deprecated and obsolete we don't use

00:45:30.869 --> 00:45:32.144
it anymore.

00:45:37.394 --> 00:45:39.920
AM: Do you use the bat-ropes like the top

00:45:39.920 --> 00:45:46.130
runners to fill spare capacity on the web
serving servers or do you have dedicated

00:45:46.130 --> 00:45:51.589
servers for the roles.
Lucas: I think they're dedicated.

00:45:51.589 --> 00:45:56.390
Amir: The job runners if you're asking job runners 
are dedicated yes they are they are I

00:45:56.390 --> 00:46:02.910
think 5 per primary data center so
Daniel: Yeah they don't, I mean do we do we

00:46:02.910 --> 00:46:06.559
actually have any spare capacity on
anything? We don't have that much hardware

00:46:06.559 --> 00:46:08.700
everything is pretty much at a hundred
percent.

00:46:08.700 --> 00:46:14.109
Lucas: I think we still have some server
that is just called misc1111 or something

00:46:14.109 --> 00:46:18.620
which run five different things at once,
you can look for those on wikitech.

00:46:18.620 --> 00:46:25.820
Amir: But but we go oh sorry it's not five
it's 20 per data center 20 per primary

00:46:25.820 --> 00:46:31.440
data center that's our job runner and they
run 700 jobs per second.

00:46:31.440 --> 00:46:35.690
Lucas: And I think that does not include
the video scaler so those are separate

00:46:35.690 --> 00:46:38.109
again
Amir: No, they merged them in like a month

00:46:38.109 --> 00:46:40.040
ago
Lucas: Okay, cool

00:46:47.470 --> 00:46:51.420
AM: Maybe a little bit off topic that can
tell us a little bit about decision making

00:46:51.420 --> 00:46:55.750
process for- for technical decision,
architecture decisions, how does it work

00:46:55.750 --> 00:47:01.890
in an organization like this: decision
making process for architectural

00:47:01.890 --> 00:47:03.409
decisions for example.

00:47:08.279 --> 00:47:11.009
Daniel: Yeah so Wikimedia has a

00:47:11.009 --> 00:47:16.539
committee for making high-level technical
decisions, it's called a Wikimedia

00:47:16.539 --> 00:47:23.609
Technical Committee, techcom and we run an
RFC process so any decision that is a

00:47:23.609 --> 00:47:27.540
cross-cutting strategic are especially
hard to undo should go through this

00:47:27.540 --> 00:47:33.579
process and it's pretty informal,
basically you file a ticket and start

00:47:33.579 --> 00:47:38.000
this process. It gets announced
in the mailing list, hopefully you get

00:47:38.000 --> 00:47:45.009
input and feedback and at some point it is
it's approved for implementation. We're

00:47:45.009 --> 00:47:48.640
currently looking into improving this
process, it's not- sometimes it works

00:47:48.640 --> 00:47:52.200
pretty well, sometimes things don't get
that much feedback but it still it makes

00:47:52.200 --> 00:47:55.890
sure that people are aware of these high-
level decisions

00:47:55.890 --> 00:47:59.790
Amir: Daniel is the chair of that
committee

00:48:02.160 --> 00:48:07.839
Daniel: Yeah, if you want to complain
about the process, please do.

00:48:13.549 --> 00:48:21.440
AM: yes regarding CI and CD across along the
pipeline, of course with that much traffic

00:48:21.440 --> 00:48:27.359
you want to keep everything consistent
right. So is there any testing

00:48:27.359 --> 00:48:32.150
strategies that you have said internally,
like of course unit tests integration

00:48:32.150 --> 00:48:35.790
tests but do you do something like
continuous end to end testing on beta

00:48:35.790 --> 00:48:40.100
instances?
Amir: So if we have beta cluster but also

00:48:40.100 --> 00:48:44.670
we do deploy, we call it train and so
we deploy once a week, all of the changes

00:48:44.670 --> 00:48:50.349
gets merged to one, like a branch and the
branch gets cut in every Tuesday and it

00:48:50.349 --> 00:48:54.680
first goes to the test wikis and
then it goes to all of the wikis that are

00:48:54.680 --> 00:48:59.270
not Wikipedia except Catalan and Hebrew
Wikipedia. So basically Hebrew and Catalan

00:48:59.270 --> 00:49:03.759
Wikipedia volunteer to be the guinea pigs
of the next wikis and if everything works

00:49:03.759 --> 00:49:07.599
fine usually it goes there and is like oh
the fatal mater and we have a logging and

00:49:07.599 --> 00:49:12.579
then it's like okay we need to fix this
and we fix it immediately and then it goes

00:49:12.579 --> 00:49:18.690
live to all wikis. This is one way of
looking at it well so okay yeah

00:49:18.690 --> 00:49:23.279
Daniel: So, our test coverage is not as
great as it should be and so we kind of,

00:49:23.279 --> 00:49:30.970
you know, abuse our users for this. We
are, of course, working to improve this

00:49:30.970 --> 00:49:37.230
and one thing that we started recently is
a program for creating end-to-end tests

00:49:37.230 --> 00:49:43.460
for all the API modules we have, in the
hope that we can thereby cover pretty much

00:49:43.460 --> 00:49:49.849
all of the application logic bypassing the
user interface. I mean, full end-to-end

00:49:49.849 --> 00:49:52.770
should, of course, include the user
interface but user interface tests are

00:49:52.770 --> 00:49:58.180
pretty brittle and often tests you know
where things are on the screen and it just

00:49:58.180 --> 00:50:02.559
seems to us that it makes a lot of sense
to have more- to have tests that actually

00:50:02.559 --> 00:50:07.259
test the application logic for what the
system actually should be doing, rather

00:50:07.259 --> 00:50:15.910
than what it should look like and, yeah,
we are currently working on making- so

00:50:15.910 --> 00:50:20.210
yeah, basically this has been a proof of
concept and we're currently working to

00:50:20.210 --> 00:50:27.079
actually integrate it in- in CI. That
perhaps should land once everyone is back

00:50:27.079 --> 00:50:34.560
from the vacations and then we have to
write about a thousand or so tests, I

00:50:34.560 --> 00:50:37.930
guess.
Lucas: I think there's also a plan to move

00:50:37.930 --> 00:50:42.559
to a system where we actually deploy
basically after every commit and can

00:50:42.559 --> 00:50:45.910
immediately roll back if something goes
wrong but that's more midterm stuff and

00:50:45.910 --> 00:50:48.339
I'm not sure what the current status of
that proposal is

00:50:48.339 --> 00:50:50.450
Amir: And it will be in Kubernetes, so it
will be completely different

00:50:50.450 --> 00:50:55.529
Daniel: That would be amazing
Lucas: But right now, we are on this

00:50:55.529 --> 00:50:59.730
weekly basis, if something goes wrong, we
roll back to the last week's version of

00:50:59.730 --> 00:51:06.049
the code
Herald: Are there are any questions-

00:51:06.049 --> 00:51:18.549
questions left? Sorry. Yeah. Okay, um, I
don't think so. So, yeah, thank you for

00:51:18.549 --> 00:51:25.329
this wonderful talk. Thank you for all
your questions. Um, yeah, I hope you liked

00:51:25.329 --> 00:51:29.750
it. Um, see you around, yeah.

00:51:29.750 --> 00:51:33.725
<i>Applause</i>

00:51:33.725 --> 00:51:39.270
<i>Music</i>

00:51:39.270 --> 00:52:01.000
Subtitles created by c3subtitles.de
in the year 2021. Join, and help us!