1
00:00:00,000 --> 00:00:18,772
Music
2
00:00:18,772 --> 00:00:25,332
Herald:Hi! Welcome, welcome in Wikipaka-
WG, in this extremely crowded Esszimmer.
3
00:00:25,332 --> 00:00:32,079
I'm Jakob, I'm your Herald for tonight
until 10:00 and I'm here to welcome you
4
00:00:32,079 --> 00:00:36,690
and to welcome these wonderful three guys
on the stage. They're going to talk about
5
00:00:36,690 --> 00:00:44,710
the infrastructure of Wikipedia.
And yeah, they are Lucas, Amir, and Daniel
6
00:00:44,710 --> 00:00:52,970
and I hope you'll have fun!
Applause
7
00:00:52,970 --> 00:00:57,059
Amir Sarabadani: Hello, my name is
Amir, um, I'm a software engineer at
8
00:00:57,059 --> 00:01:01,130
Wikimedia Deutschland, which is the German
chapter of Wikimedia Foundation. Wikimedia
9
00:01:01,130 --> 00:01:06,520
Foundation runs Wikipedia. Here is Lucas.
Lucas is also a software engineer, at
10
00:01:06,520 --> 00:01:10,300
Wikimedia Deutschland, and Daniel here is
a software architect at Wikimedia
11
00:01:10,300 --> 00:01:15,110
Foundation. We are all based in Germany,
Daniel in Leipzig, we are in Berlin. And
12
00:01:15,110 --> 00:01:21,420
today we want to talk about how we run
Wikipedia, with using donors' money and
13
00:01:21,420 --> 00:01:29,910
not lots of advertisement and collecting
data. So in this talk, first we are going
14
00:01:29,910 --> 00:01:34,860
to go on an inside-out approach. So we are
going to first talk about the application
15
00:01:34,860 --> 00:01:39,830
layer and then the outside layers, and
then we go to an outside-in approach and
16
00:01:39,830 --> 00:01:48,635
then talk about how you're going to hit
Wikipedia from the outside.
17
00:01:48,635 --> 00:01:53,320
So first of all, let's some,
let me get you some information. First of
18
00:01:53,320 --> 00:01:57,259
all, all of Wikimedia, Wikipedia
infrastructure is run by Wikimedia
19
00:01:57,259 --> 00:02:01,810
Foundation, an American nonprofit
charitable organization. We don't run any
20
00:02:01,810 --> 00:02:07,960
ads and we are only 370 people. If you
count Wikimedia Deutschland or all other
21
00:02:07,960 --> 00:02:12,500
chapters, it's around 500 people in total.
It's nothing compared to the companies
22
00:02:12,500 --> 00:02:19,530
outside. But all of the content is
managed by volunteers. Even our staff
23
00:02:19,530 --> 00:02:24,170
doesn't do edits, add content to
Wikipedia. And we support 300 languages,
24
00:02:24,170 --> 00:02:29,501
which is a very large number. And
Wikipedia, it's eighteen years old, so it
25
00:02:29,501 --> 00:02:37,950
can vote now. And also, Wikipedia has some
really, really weird articles. Um, I want
26
00:02:37,950 --> 00:02:42,510
to ask you, what is your, if you have
encountered any really weird article
27
00:02:42,510 --> 00:02:47,970
in Wikipedia? My favorite is a list of
people who died on the toilet. But if you
28
00:02:47,970 --> 00:02:54,620
know anything, raise your hands. Uh, do
you know any weird articles in Wikipedia?
29
00:02:54,620 --> 00:02:58,750
Do you know some?
Daniel Kinzler: Oh, the classic one….
30
00:02:58,750 --> 00:03:03,600
Amir: You need to unmute yourself. Oh,
okay.
31
00:03:03,600 --> 00:03:09,551
Daniel: This is technology. I don't know
anything about technology. OK, no. The, my
32
00:03:09,551 --> 00:03:13,900
favorite example is "people killed by
their own invention". That's yeah. That's
33
00:03:13,900 --> 00:03:20,510
a lot of fun. Look it up. It's amazing.
Lucas Werkmeister: There's also a list,
34
00:03:20,510 --> 00:03:24,810
there is also a list of prison escapes
using helicopters. I almost said
35
00:03:24,810 --> 00:03:28,790
helicopter escapes using prisons, which
doesn't make any sense. But that was also
36
00:03:28,790 --> 00:03:31,830
a very interesting list.
Daniel: I think we also have a category of
37
00:03:31,830 --> 00:03:35,310
lists of lists of lists.
Amir: That's a page.
38
00:03:35,310 --> 00:03:39,040
Lucas: And every few months someone thinks
it's funny to redirect it to Russel's
39
00:03:39,040 --> 00:03:42,940
paradox or so.
Daniel: Yeah.
40
00:03:42,940 --> 00:03:49,209
Amir: But also beside that, people cannot
read Wikipedia in Turkey or China. But
41
00:03:49,209 --> 00:03:54,450
three days ago, actually, the block in
Turkey was ruled unconstitutional, but
42
00:03:54,450 --> 00:04:01,000
it's not lifted yet. Hopefully they will
lift it soon. Um, so Wikipedia, Wikimedia
43
00:04:01,000 --> 00:04:05,660
projects is just not Wikipedia. It's lots
and lots of projects. Some of them are not
44
00:04:05,660 --> 00:04:11,650
as successful as the Wikipedia. Um, uh,
like Wikinews. But uh, for example,
45
00:04:11,650 --> 00:04:16,190
Wikipedia is the most successful one, and
there's another one, that's Wikidata. It's
46
00:04:16,190 --> 00:04:21,680
being developed by Wikimedia Deutschland.
I mean the Wikidata team, with Lucas, um,
47
00:04:21,680 --> 00:04:26,520
and it's being used – it's infobox – it
has the data that Wikipedia or Google
48
00:04:26,520 --> 00:04:31,449
Knowledge Graph or Siri or Alexa uses.
It's basically, it's sort of a backbone of
49
00:04:31,449 --> 00:04:37,981
all of the data, uh, through the whole
Internet. Um, so our infrastructure. Let
50
00:04:37,981 --> 00:04:42,910
me… So first of all, our infrastructure is
all Open Source. By principle, we never
51
00:04:42,910 --> 00:04:48,081
use any commercial software. Uh, we could
use a lots of things. They are even
52
00:04:48,081 --> 00:04:54,330
sometimes were given us for free, but we
were, refused to use them. Second
53
00:04:54,330 --> 00:04:59,060
thing is we have two primary data center
for like failovers, when, for example, a
54
00:04:59,060 --> 00:05:03,960
whole datacenter goes offline, so we can
failover to another data center. We have
55
00:05:03,960 --> 00:05:11,100
three caching points of presence or
CDNs. Our CDNs are all over the world. Uh,
56
00:05:11,100 --> 00:05:15,180
also, we have our own CDN. We don't have,
we don't use CloudFlare, because
57
00:05:15,180 --> 00:05:20,960
CloudFlare, we care about the privacy of
the users and is very important that, for
58
00:05:20,960 --> 00:05:25,490
example, people edit from countries that
might be, uh, dangerous for them to edit
59
00:05:25,490 --> 00:05:29,810
Wikipedia. So we really care to keep the
data as protected as possible.
60
00:05:29,810 --> 00:05:32,400
Applause
61
00:05:32,400 --> 00:05:39,460
Amir: Uh, we have 17 billion page views
per month, and, which goes up and down
62
00:05:39,460 --> 00:05:44,350
based on the season and everything, we
have around 100 to 200 thousand requests
63
00:05:44,350 --> 00:05:48,449
per second. It's different from the
pageview because requests can be requests
64
00:05:48,449 --> 00:05:54,540
to the objects, can be API, can be lots of
things. And we have 300,000 new editors
65
00:05:54,540 --> 00:06:03,120
per month and we run all of this with 1300
bare metal servers. So right now, Daniel
66
00:06:03,120 --> 00:06:07,010
is going to talk about the application
layer and the inside of that
67
00:06:07,010 --> 00:06:11,830
infrastructure.
Daniel: Thanks, Amir. Oh, the clicky
68
00:06:11,830 --> 00:06:20,330
thing. Thank you. So the application layer
is basically the software that actually
69
00:06:20,330 --> 00:06:25,050
does what a wiki does, right? It lets you
edit pages, create or update pages and
70
00:06:25,050 --> 00:06:29,650
then search the page views. interference
noise The challenge for Wikipedia, of
71
00:06:29,650 --> 00:06:37,150
course, is serving all the many page views
that Amir just described. The core of the
72
00:06:37,150 --> 00:06:42,690
application is a classic LAMP application.
interference noise I have to stop
73
00:06:42,690 --> 00:06:50,130
moving. Yes? Is that it? It's a classic
LAMP stack application. So it's written in
74
00:06:50,130 --> 00:06:57,080
PHP, it runs on an Apache server. It uses
MySQL as a database in the backend. We
75
00:06:57,080 --> 00:07:01,630
used to use a HHVM instead of the… Yeah,
we…
76
00:07:01,630 --> 00:07:13,830
Herald: Hier. Sorry. Nimm mal das hier.
Daniel: Hello. We used to use HHVM as the
77
00:07:13,830 --> 00:07:20,810
PHP engine, but we just switched back to
the mainstream PHP, using PHP 7.2 now,
78
00:07:20,810 --> 00:07:24,720
because Facebook decided that HHVM is
going to be incompatible with the standard
79
00:07:24,720 --> 00:07:35,430
and they were just basically developing it
for, for themselves. Right. So we have
80
00:07:35,430 --> 00:07:42,740
separate clusters of servers for serving
requests, for serving different requests,
81
00:07:42,740 --> 00:07:48,020
page views on the one hand, and also
handling edits. Then we have a cluster for
82
00:07:48,020 --> 00:07:55,350
handling API calls and then we have a
bunch of servers set up to handle
83
00:07:55,350 --> 00:08:01,050
asynchronous jobs, things that happen in
the background, the job runners, and…
84
00:08:01,050 --> 00:08:05,240
I guess video scaling is a very obvious
example of that. It just takes too long to
85
00:08:05,240 --> 00:08:11,720
do it on the fly. But we use it for many
other things as well. MediaWiki, MediaWiki
86
00:08:11,720 --> 00:08:15,930
is kind of an amazing thing because you
can just install it on your own shared-
87
00:08:15,930 --> 00:08:23,419
hosting, 10-bucks-a-month's webspace and
it will run. But you can also use it to,
88
00:08:23,419 --> 00:08:29,270
you know, serve half the world. And so
it's a very powerful and versatile system,
89
00:08:29,270 --> 00:08:34,479
which also… I mean, this, this wide span
of different applications also creates
90
00:08:34,479 --> 00:08:41,000
problems. That's something that I will
talk about tomorrow. But for now, let's
91
00:08:41,000 --> 00:08:49,230
look at the fun things. So if you want to
serve a lot of page views, you have to do
92
00:08:49,230 --> 00:08:55,550
a lot of caching. And so we have a whole…
yeah, a whole set of different caching
93
00:08:55,550 --> 00:09:00,880
systems. The most important one is
probably the parser cache. So as you
94
00:09:00,880 --> 00:09:07,431
probably know, wiki pages are created in,
in a markup language, Wikitext, and they
95
00:09:07,431 --> 00:09:13,290
need to be parsed and turned into HTML.
And the result of that parsing is, of
96
00:09:13,290 --> 00:09:19,940
course, cached. And that cache is semi-
persistent, it… nothing really ever drops
97
00:09:19,940 --> 00:09:25,060
out of it. It's a huge thing. And it's, it
lives in a dedicated MySQL database
98
00:09:25,060 --> 00:09:33,490
system. Yeah. We use memcached a lot for
all kinds of miscellaneous things,
99
00:09:33,490 --> 00:09:38,930
anything that we need to keep around and
share between server instances. And we
100
00:09:38,930 --> 00:09:43,589
have been using redis for a while, for
anything that we want to have available,
101
00:09:43,589 --> 00:09:47,560
not just between different servers, but
also between different data centers,
102
00:09:47,560 --> 00:09:53,200
because redis is a bit better about
synchronizing things between, between
103
00:09:53,200 --> 00:09:59,820
different systems, we still use it for
session storage, especially, though we are
104
00:09:59,820 --> 00:10:09,600
about to move away from that and we'll be
using Cassandra for session storage. We
105
00:10:09,600 --> 00:10:19,310
have a bunch of additional services
running for specialized purposes, like
106
00:10:19,310 --> 00:10:27,120
scaling images, rendering formulas, math
formulas, ORES is pretty interesting. ORES
107
00:10:27,120 --> 00:10:33,400
is a system for automatically detecting
vandalism or rating edits. So this is a
108
00:10:33,400 --> 00:10:38,120
machine learning based system for
detecting problems and highlighting edits
109
00:10:38,120 --> 00:10:45,060
that may not be, may not be great and need
more attention. We have some additional
110
00:10:45,060 --> 00:10:50,940
services that process our content for
consumption on mobile devices, chopping
111
00:10:50,940 --> 00:10:56,480
pages up into bits and pieces that then
can be consumed individually and many,
112
00:10:56,480 --> 00:11:08,200
many more. In the background, we also have
to manage events, right, we use Kafka for
113
00:11:08,200 --> 00:11:14,640
message queuing, and we use that to notify
different parts of the system about
114
00:11:14,640 --> 00:11:19,980
changes. On the one hand, we use that to
feed the job runners that I just
115
00:11:19,980 --> 00:11:27,540
mentioned. But we also use it, for
instance, to purge the entries in the
116
00:11:27,540 --> 00:11:35,050
CDN when pages become updated and things
like that. OK, the next session is going
117
00:11:35,050 --> 00:11:40,269
to be about the databases. Are there, very
quickly, we will have quite a bit of time
118
00:11:40,269 --> 00:11:45,230
for discussion afterwards. But are there
any questions right now about what we said
119
00:11:45,230 --> 00:11:57,120
so far? Everything extremely crystal
clear. OK, no clarity is left? I see. Oh,
120
00:11:57,120 --> 00:12:07,570
one question, in the back.
Q: Can you maybe turn the volume up a
121
00:12:07,570 --> 00:12:20,220
little bit? Thank you.
Daniel: Yeah, I think this is your
122
00:12:20,220 --> 00:12:27,959
section, right? Oh, its Amir again. Sorry.
Amir: So I want to talk about my favorite
123
00:12:27,959 --> 00:12:32,279
topic, the dungeons of, dungeons of every
production system, databases. The database
124
00:12:32,279 --> 00:12:39,580
of Wikipedia is really interesting and
complicated on its own. We use MariaDB, we
125
00:12:39,580 --> 00:12:45,870
switched from MySQL in 2013 for lots of
complicated reasons. As, as I said,
126
00:12:45,870 --> 00:12:50,200
because we are really open source, you can
go and not just check our database tree,
127
00:12:50,200 --> 00:12:55,310
that says, like, how it looks and what's
the replicas and masters. Actually, you
128
00:12:55,310 --> 00:12:59,650
can even query the Wikipedia's database
live when you have that, you can just go
129
00:12:59,650 --> 00:13:02,930
to that address and login with your
Wikipedia account and just can do whatever
130
00:13:02,930 --> 00:13:07,430
you want. Like, it was a funny thing that
a couple of months ago, someone sent me a
131
00:13:07,430 --> 00:13:12,970
message, sent me a message like, oh, I
found a security issue. You can just query
132
00:13:12,970 --> 00:13:18,000
Wikipedia's database. I was like, no, no,
it's actually, we, we let this happen.
133
00:13:18,000 --> 00:13:21,900
It's like, it's sanitized. We removed the
password hashes and everything. But still,
134
00:13:21,900 --> 00:13:27,779
you can use this. And, but if you wanted
to say, like, how the clusters work, the
135
00:13:27,779 --> 00:13:32,029
database clusters, because it gets too
big, they first started sharding, but now
136
00:13:32,029 --> 00:13:36,279
we have sections that are basically
different clusters. Uh, really large wikis
137
00:13:36,279 --> 00:13:42,839
have their own section. For example,
English Wikipedia is s1. German Wikipedia
138
00:13:42,839 --> 00:13:50,820
with two or three other small wikis are in
s5. Wikidata is on s8, and so on. And
139
00:13:50,820 --> 00:13:56,250
each section have a master and several
replicas. But one of the replicas is
140
00:13:56,250 --> 00:14:01,700
actually a master in another data center
because of the failover that I told you.
141
00:14:01,700 --> 00:14:08,079
So it is, basically two layers of
replication exist. This is, what I'm
142
00:14:08,079 --> 00:14:13,070
telling you, is about metadata. But for
Wikitext, we also need to have a complete
143
00:14:13,070 --> 00:14:19,450
different set of databases. But it can be,
we use consistent hashing to just scale it
144
00:14:19,450 --> 00:14:27,630
horizontally so we can just put more
databases on it, for that. Uh, but I don't
145
00:14:27,630 --> 00:14:32,070
know if you know it, but Wikipedia stores
every edit. So you have the text of,
146
00:14:32,070 --> 00:14:36,930
Wikitext of every edit in the whole
history in the database. Uhm, also we have
147
00:14:36,930 --> 00:14:41,910
parser cache that Daniel explained, and
parser cache is also consistent hashing.
148
00:14:41,910 --> 00:14:47,000
So we just can horizontally scale it. But
for metadata, it is slightly more
149
00:14:47,000 --> 00:14:56,440
complicated. Um, metadata shows and is
being used to render the page. So in order
150
00:14:56,440 --> 00:15:01,680
to do this, this is, for example, a very
short version of the database tree that I
151
00:15:01,680 --> 00:15:07,019
showed you. You can even go and look for
other ones but this is a s1. s1 eqiad this
152
00:15:07,019 --> 00:15:12,100
is the main data center the master is this
number and it replicates to some of this
153
00:15:12,100 --> 00:15:16,860
and then this 7, the second one that this
was with 2000 because it's the second data
154
00:15:16,860 --> 00:15:24,750
center and it's a master of the other one.
And it has its own replications
155
00:15:24,750 --> 00:15:30,680
between cross three replications because
the master, that master data center is in
156
00:15:30,680 --> 00:15:37,399
Ashburn, Virginia. The second data center
is in Dallas, Texas. So they need to have a
157
00:15:37,399 --> 00:15:43,220
cross DC replication and that happens
with a TLS to make sure that no one starts
158
00:15:43,220 --> 00:15:49,200
to listen to, in between these two, and we
have snapshots and even dumps of the whole
159
00:15:49,200 --> 00:15:53,440
history of Wikipedia. You can go to
dumps.wikimedia.org and download the whole
160
00:15:53,440 --> 00:15:59,130
reserve every wiki you want, except the
ones that we had to remove for privacy
161
00:15:59,130 --> 00:16:04,899
reasons and with a lots and lots of
backups. I recently realized we have lots
162
00:16:04,899 --> 00:16:15,149
of backups. And in total it is 570 TB of data
and total 150 database servers and a
163
00:16:15,149 --> 00:16:20,269
queries that happens to them is around
350,000 queries per second and, in total,
164
00:16:20,269 --> 00:16:29,459
it requires 70 terabytes of RAM. So and
also we have another storage section that
165
00:16:29,459 --> 00:16:35,000
called Elasticsearch which you can guess
it- it's being used for search, on the top
166
00:16:35,000 --> 00:16:39,050
right, if you're using desktop. It's
different in mobile, I think. And also it
167
00:16:39,050 --> 00:16:44,610
depends on if you're rtl language as well,
but also it runs by a team called search
168
00:16:44,610 --> 00:16:47,550
platform because none of us are from
search platform we cannot explain it this
169
00:16:47,550 --> 00:16:54,010
much we don't know much how it works it
slightly. Also we have a media storage for
170
00:16:54,010 --> 00:16:58,420
all of the free pictures that's being
uploaded to Wikimedia like, for example,
171
00:16:58,420 --> 00:17:02,400
if you have a category in Commons. Commons
is our wiki that holds all of the free
172
00:17:02,400 --> 00:17:08,130
media and if we have a category in Commons
called cats looking at left and you have
173
00:17:08,130 --> 00:17:15,630
category cats looking at right so we have
lots and lots of images. It's 390 terabytes
174
00:17:15,630 --> 00:17:20,620
of media, 1 billion object and uses Swift.
Swift is the object is storage component
175
00:17:20,620 --> 00:17:29,190
of OpenStack and it has it has several
layers of caching, frontend, backend.
176
00:17:29,190 --> 00:17:36,799
Yeah, that's mostly it. And we want to
talk about traffic now and so this picture
177
00:17:36,799 --> 00:17:43,929
is when Sweden in 1967 moved from a left-
driving from left to there driving to
178
00:17:43,929 --> 00:17:48,999
right. This is basically what happens in
Wikipedia infrastructure as well. So we
179
00:17:48,999 --> 00:17:54,942
have five caching layers and the most
recent one is eqsin which is in Singapore,
180
00:17:54,942 --> 00:17:59,310
the three one are just CDN ulsfo, codfw,
esams and eqsin. Sorry, ulsfo, esams and
181
00:17:59,310 --> 00:18:06,590
eqsin are just CDNs. We have also two
points of presence, one in Chicago and the
182
00:18:06,590 --> 00:18:15,080
other one is also in Amsterdam, but we
don't get to that. So, we have, as I said,
183
00:18:15,080 --> 00:18:20,230
we have our own content delivery network
with our traffic or allocation is done by
184
00:18:20,230 --> 00:18:26,860
GeoDNS which actually is written and
maintained by one of the traffic people,
185
00:18:26,860 --> 00:18:32,140
and we can pool and depool DCs. It has a
time to live of 10 minute- 10 minutes, so
186
00:18:32,140 --> 00:18:37,950
if a data center goes down. We have - it
takes 10 minutes to actually propagate for
187
00:18:37,950 --> 00:18:47,110
being depooled and repooled again. And we
use LVS as transport layer and this layer
188
00:18:47,110 --> 00:18:55,799
3 and 4 of the Linux load balancer for
Linux and supports consistent hashing and
189
00:18:55,799 --> 00:19:00,679
also we ever got we grow so big that we
needed to have something that manages the
190
00:19:00,679 --> 00:19:07,100
load balancer so we wrote something our
own system is called pybal. And also we -
191
00:19:07,100 --> 00:19:11,210
lots of companies actually peer with us. We
for example directly connect to
192
00:19:11,210 --> 00:19:20,440
Amsterdam amps X. So this is how the
caching works, which is, anyway, it's
193
00:19:20,440 --> 00:19:24,779
there is lots of reasons for this. Let's
just get the started. We use TLS, we
194
00:19:24,779 --> 00:19:31,080
support TLS 1.2 where we have K then
the first layer we have nginx-. Do you
195
00:19:31,080 --> 00:19:40,049
know it - does anyone know what nginx-
means? And so that's related but not - not
196
00:19:40,049 --> 00:19:46,780
correct. So we have nginx which is the free
version and we have nginx plus which is
197
00:19:46,780 --> 00:19:51,729
the commercial version and nginx. But we
don't use nginx to do load balancing or
198
00:19:51,729 --> 00:19:56,389
anything so we stripped out everything
from it, and we just use it for TLS
199
00:19:56,389 --> 00:20:02,019
termination so we call it nginx-, is an
internal joke. So and then we have Varnish
200
00:20:02,019 --> 00:20:09,809
frontend. Varnish also is a caching layer
and this is the frontend is on the memory
201
00:20:09,809 --> 00:20:15,000
which is very very fast and you have the
backend which is on the storage and the
202
00:20:15,000 --> 00:20:22,559
hard disk but this is slow. The fun thing
is like just CDN caching layer takes 90%
203
00:20:22,559 --> 00:20:26,869
of our requests. Its response and 90% of
because just gets to the Varnish and just
204
00:20:26,869 --> 00:20:34,720
return and then with doesn't work it goes
through the application layer. The Varnish
205
00:20:34,720 --> 00:20:41,259
holds-- it has a TTL of 24 hours so if you
change an article, it also get invalidated
206
00:20:41,259 --> 00:20:47,159
by the application. So if someone added the
CDN actually purges the result. And the
207
00:20:47,159 --> 00:20:52,330
thing is, the frontend is shorted that can
spike by request so you come here load
208
00:20:52,330 --> 00:20:56,470
balancer just randomly sends your request
to a frontend but then the backend is
209
00:20:56,470 --> 00:21:00,989
actually, if the frontend can't find it,
it sends it to the backend and the backend
210
00:21:00,989 --> 00:21:09,700
is actually sort of - how is it called? -
it's a used hash by request, so, for
211
00:21:09,700 --> 00:21:15,402
example, article of Barack Obama is only
being served from one node in the data
212
00:21:15,402 --> 00:21:22,059
center in the CDN. If none of this works it
actually hits the other data center. So,
213
00:21:22,059 --> 00:21:29,940
yeah, I actually explained all of this. So
we have two - two caching clusters and one
214
00:21:29,940 --> 00:21:35,820
is called text and the other one is called
upload, it's not confusing at all, and if
215
00:21:35,820 --> 00:21:42,559
you want to find out, you can just do mtr
en.wikipedia.org and you - you're - the end
216
00:21:42,559 --> 00:21:49,909
node is text-lb.wikimedia.org which is the
our text storage but if you go to
217
00:21:49,909 --> 00:21:57,789
upload.wikimedia.org, you get to hit the
upload cluster. Yeah this is so far, what
218
00:21:57,789 --> 00:22:03,669
is it, and it has lots of problems because
a) varnish is open core, so the version
219
00:22:03,669 --> 00:22:09,309
that you use is open source we don't use
the commercial one, but the open core one
220
00:22:09,309 --> 00:22:21,009
doesn't support TLS. What? What happened?
Okay. No, no, no! You should I just-
221
00:22:21,009 --> 00:22:35,789
you're not supposed to see this. Okay,
sorry for the- huh? Okay, okay sorry. So
222
00:22:35,789 --> 00:22:40,119
Varnish has lots of problems, Varnish is
open core, it doesn't support TLS
223
00:22:40,119 --> 00:22:45,220
termination which makes us to have this
nginx- their system just to do TLS
224
00:22:45,220 --> 00:22:49,539
termination, makes our system complicated.
It doesn't work very well with so if that
225
00:22:49,539 --> 00:22:55,970
causes us to have a cron job to restart
every Varnish node twice a week. We have a
226
00:22:55,970 --> 00:23:04,330
cron job that this restarts every Vanish
node which is embarrassing, but also, on
227
00:23:04,330 --> 00:23:08,809
the other hand then the end of Varnish
like backend wants to talk to the
228
00:23:08,809 --> 00:23:13,010
application layer, it also doesn't support
terminate - TLS termination, so we use
229
00:23:13,010 --> 00:23:19,970
IPSec which is even more embarrassing, but
we are changing it. So we call it, if you
230
00:23:19,970 --> 00:23:25,080
are using a particular fixed server which
is very very nice and it's also open
231
00:23:25,080 --> 00:23:31,070
source, a fully open source like in with
Apache Foundation, Apache does the TLS,
232
00:23:31,070 --> 00:23:37,169
does the TLS by termination and still
for now we have a Varnish frontend that
233
00:23:37,169 --> 00:23:44,809
still exists but a backend is also going
to change to the ATS, so we call this ATS
234
00:23:44,809 --> 00:23:49,970
sandwich. Two ATS happening between and
there the middle there's a Varnish. The
235
00:23:49,970 --> 00:23:55,269
good thing is that the TLS termination
when it moves to ATS, you can actually use
236
00:23:55,269 --> 00:24:01,499
TLS 1.3 which is more modern and more
secure and even very faster so it
237
00:24:01,499 --> 00:24:05,889
basically drops 100 milliseconds from
every request that goes to Wikipedia.
238
00:24:05,889 --> 00:24:12,350
That translates to centuries of our
users' time every month, but ATS is going
239
00:24:12,350 --> 00:24:19,480
on and hopefully it will go live soon and
once these are done, so this is the new
240
00:24:19,480 --> 00:24:25,669
version. And, as I said, the TLS and when
we can do this we can actually use the
241
00:24:25,669 --> 00:24:36,519
more secure instead of IPSec to talk about
between data centers. Yes. And now it's
242
00:24:36,519 --> 00:24:42,260
time that Lucas talks about what happens
when you type in en.wikipedia.org.
243
00:24:42,260 --> 00:24:44,879
Lucas: Yes, this makes sense, thank you.
244
00:24:44,879 --> 00:24:49,070
So, first of all, what you see on the
slide here as the image doesn't really
245
00:24:49,070 --> 00:24:52,299
have anything to do with what happens when
you type in wikipedia.org because it's an
246
00:24:52,299 --> 00:24:57,249
offline Wikipedia reader but it's just a
nice image. So this is basically a summary
247
00:24:57,249 --> 00:25:02,850
of everything they already said, so if,
which is the most common case, you are
248
00:25:02,850 --> 00:25:10,969
lucky and get a URL which is cached, then,
so, first your computer asked for the IP
249
00:25:10,969 --> 00:25:15,619
address of en.wikipedia.org it reaches
this whole DNS daemon and because we're at
250
00:25:15,619 --> 00:25:19,239
Congress here it tells you the closest
data center is the one in Amsterdam, so
251
00:25:19,239 --> 00:25:25,759
esams and it's going to hit the edge, what
we call load bouncers/router there, then
252
00:25:25,759 --> 00:25:31,929
going through TLS termination through
nginx- and then it's going to hit the
253
00:25:31,929 --> 00:25:36,809
Varnish caching server, either frontend or
backends and then you get a response and
254
00:25:36,809 --> 00:25:40,940
that's already it and nothing else is ever
bothered again. It doesn't even reach any
255
00:25:40,940 --> 00:25:46,320
other data center which is very nice and
so that's, you said around 90% of the
256
00:25:46,320 --> 00:25:52,419
requests we get, and if you're unlucky and
the URL you requested is not in the
257
00:25:52,419 --> 00:25:57,400
Varnish in the Amsterdam data center then
it gets forwarded to the eqiad data
258
00:25:57,400 --> 00:26:01,519
center, which is the primary one and there
it still has a chance to hit the cache and
259
00:26:01,519 --> 00:26:04,840
perhaps this time it's there and then the
response is going to get cached in the
260
00:26:04,840 --> 00:26:09,739
frontend, no, in the Amsterdam Varnish and
you're also going to get a response and we
261
00:26:09,739 --> 00:26:13,639
still don't have to run any application
stuff. If we do have to hit any
262
00:26:13,639 --> 00:26:17,450
application stuff and then Varnish is
going to forward that, if it's
263
00:26:17,450 --> 00:26:22,970
upload.wikimedia.org, it goes to the media
storage Swift, if it's any other domain it
264
00:26:22,970 --> 00:26:28,450
goes to MediaWiki and then MediaWiki does
a ton of work to connect to the database,
265
00:26:28,450 --> 00:26:33,529
in this case the first shard for English
Wikipedia, get the wiki text from there,
266
00:26:33,529 --> 00:26:38,599
get the wiki text of all the related pages
and templates. No, wait I forgot
267
00:26:38,599 --> 00:26:43,519
something. First it checks if the HTML for
this page is available in parser cache, so
268
00:26:43,519 --> 00:26:46,909
that's another caching layer, and this
application cache - this parser cache
269
00:26:46,909 --> 00:26:53,529
might either be memcached or the database
cache behind it and if it's not there,
270
00:26:53,529 --> 00:26:57,679
then it has to go get the wikitext, get
all the related things and render that
271
00:26:57,679 --> 00:27:03,679
into HTML which takes a long time and goes
through some pretty ancient code and if
272
00:27:03,679 --> 00:27:07,779
you are doing an edit or an upload, it's
even worse, because then always has to go
273
00:27:07,779 --> 00:27:13,969
to MediaWiki and then it not only has to
store this new edit, either in the media
274
00:27:13,969 --> 00:27:19,629
back-end or in the database, it also has
update a bunch of stuff, like, especially
275
00:27:19,629 --> 00:27:25,200
if you-- first of all, it has to purge the
cache, it has to tell all the Varnish
276
00:27:25,200 --> 00:27:28,999
servers that there's a new version of this
URL available so that it doesn't take a
277
00:27:28,999 --> 00:27:33,940
full day until the time-to-live expires.
It also has to update a bunch of things,
278
00:27:33,940 --> 00:27:38,639
for example, if you edited a template, it
might have been used in a million pages
279
00:27:38,639 --> 00:27:43,750
and the next time anyone requests one of
those million pages, those should also
280
00:27:43,750 --> 00:27:49,019
actually be rendered again using the new
version of the template so it has to
281
00:27:49,019 --> 00:27:54,149
invalidate the cache for all of those and
all that is deferred through the job queue
282
00:27:54,149 --> 00:28:01,440
and it might have to calculate thumbnails
if you uploaded the file or create a -
283
00:28:01,440 --> 00:28:06,609
retranscode media files because maybe you
uploaded in - what do we support? - you
284
00:28:06,609 --> 00:28:09,839
upload in WebM and the browser only
supports some other media codec or
285
00:28:09,839 --> 00:28:12,869
something, we transcode that and also
encode it down to the different
286
00:28:12,869 --> 00:28:19,740
resolutions, so then it goes through that
whole dance and, yeah, that was already
287
00:28:19,740 --> 00:28:23,769
those slides. Is Amir going to talk again
about how we manage -
288
00:28:23,769 --> 00:28:29,519
Amir: I mean okay yeah I quickly come back
just for a short break to talk about
289
00:28:29,519 --> 00:28:36,690
managing to manage because managing 100-
1300 bare metal hardware plus a Kubernetes
290
00:28:36,690 --> 00:28:42,700
cluster is not easy, so what we do is that
we use Puppet for configuration
291
00:28:42,700 --> 00:28:48,220
management in our bare metal systems, it's
fun, five to 50,000 lines of Puppet code. I
292
00:28:48,220 --> 00:28:52,119
mean, lines of code is not a great
indicator but you can roughly get an
293
00:28:52,119 --> 00:28:59,149
estimate of how its things work and we
have 100,000 lines of Ruby and we have our
294
00:28:59,149 --> 00:29:04,429
CI and CD cluster, we have so we don't
store anything in GitHub or GitLab, we
295
00:29:04,429 --> 00:29:10,559
have our own system which is based on
Gerrit and for that we have a system of
296
00:29:10,559 --> 00:29:15,539
Jenkins and the Jenkins does all of this
kind of things and also because we have a
297
00:29:15,539 --> 00:29:21,960
Kubernetes cluster for services, some of
our services, if you make a merger change
298
00:29:21,960 --> 00:29:26,440
in the Gerrit it also builds the Docker
files and containers and push it up to the
299
00:29:26,440 --> 00:29:35,440
production and also in order to run remote
SSH commands, we have cumin that's like in
300
00:29:35,440 --> 00:29:39,200
the house automation and we built this
farm for our systems and for example you
301
00:29:39,200 --> 00:29:45,570
go there and say ok we pull this node or
run this command in all of the data
302
00:29:45,570 --> 00:29:52,889
Varnish nodes that I told you like you
want to restart them. And with this I get
303
00:29:52,889 --> 00:29:57,899
back to Lucas.
Lucas: So, I am going to talk a bit more
304
00:29:57,899 --> 00:30:01,929
about Wikimedia Cloud Services which is a
bit different in that it's not really our
305
00:30:01,929 --> 00:30:06,269
production stuff but it's where you
people, the volunteers of the Wikimedia
306
00:30:06,269 --> 00:30:11,489
movement can run their own code, so you
can request a project which is kind of a
307
00:30:11,489 --> 00:30:15,509
group of users and then you get assigned a
pool of you have this much CPU and this
308
00:30:15,509 --> 00:30:20,999
much RAM and you can create virtual
machines with those resources and then do
309
00:30:20,999 --> 00:30:29,119
stuff there and run basically whatever you
want, to create and boot and shut down the
310
00:30:29,119 --> 00:30:33,360
VMs and stuff we use OpenStack and there's
a Horizon frontend for that which you use
311
00:30:33,360 --> 00:30:36,409
through the browser and it's largely out
all the time but otherwise it works pretty
312
00:30:36,409 --> 00:30:42,619
well. Internally, ideally you manage the
VMs using Puppet but a lot of people just
313
00:30:42,619 --> 00:30:47,860
SSH in and then do whatever they need to
set up the VM manually and it happens,
314
00:30:47,860 --> 00:30:52,759
well, and there's a few big projects like
Toolforge where you can run your own web-
315
00:30:52,759 --> 00:30:57,499
based tools or the beta cluster which is
basically a copy of some of the biggest
316
00:30:57,499 --> 00:31:02,499
wikis like there's a beta English
Wikipedia, beta Wikidata, beta Wikimedia
317
00:31:02,499 --> 00:31:08,320
Commons using mostly the same
configuration as production but using the
318
00:31:08,320 --> 00:31:12,450
current master version of the software
instead of whatever we deploy once a week so
319
00:31:12,450 --> 00:31:15,840
if there's a bug, we see it earlier
hopefully, even if we didn't catch it
320
00:31:15,840 --> 00:31:20,279
locally, because the beta cluster is more
similar to the production environment and
321
00:31:20,279 --> 00:31:24,230
also the continuous - continuous
integration service run in Wikimedia Cloud
322
00:31:24,230 --> 00:31:28,979
Services as well. Yeah and also you have
to have Kubernetes somewhere on these
323
00:31:28,979 --> 00:31:33,609
slides right, so you can use that to
distribute work between the tools in
324
00:31:33,609 --> 00:31:37,179
Toolforge or you can use the grid engine
which does a similar thing but it's like
325
00:31:37,179 --> 00:31:42,519
three decades old and through five forks
now I think the current fork we use is son
326
00:31:42,519 --> 00:31:46,999
of grid engine and I don't know what it
was called before, but that's Cloud
327
00:31:46,999 --> 00:31:54,789
Services.
Amir: So in a nutshell, this is our - our
328
00:31:54,789 --> 00:32:01,090
systems. We have 1300 bare metal services
with lots and lots of caching, like lots
329
00:32:01,090 --> 00:32:06,919
of layers of caching, because mostly we
serves read and we can just keep them as a
330
00:32:06,919 --> 00:32:12,179
cached version and all of this is open
source, you can contribute to it, if you
331
00:32:12,179 --> 00:32:18,089
want to and there's a lot of configuration
is also open and I - this is the way I got
332
00:32:18,089 --> 00:32:21,940
hired like I open it started contributing
to the system I feel like yeah we can-
333
00:32:21,940 --> 00:32:31,549
come and work for us, so this is a -
Daniel: That's actually how all of us got
334
00:32:31,549 --> 00:32:38,350
hired.
Amir: So yeah, and this is the whole thing
335
00:32:38,350 --> 00:32:47,570
that happens in Wikimedia and if you want
to - no, if you want to help us, we are
336
00:32:47,570 --> 00:32:51,419
hiring. You can just go to jobs at
wikimedia.org, if you want to work for
337
00:32:51,419 --> 00:32:54,379
Wikimedia Foundation. If you want to work
with Wikimedia Deutschland, you can go to
338
00:32:54,379 --> 00:32:59,179
wikimedia.de and at the bottom there's a
link for jobs because the links got too
339
00:32:59,179 --> 00:33:03,469
long. If you can contribute, if you want
to contribute to us, there is so many ways
340
00:33:03,469 --> 00:33:07,929
to contribute, as I said, there's so many
bugs, we have our own graphical system,
341
00:33:07,929 --> 00:33:12,721
you can just look at the monitor and a
Phabricator is our bug tracker, you can
342
00:33:12,721 --> 00:33:20,639
just go there and find the bug and fix
things. Actually, we have one repository
343
00:33:20,639 --> 00:33:26,469
that is private but it only holds the
certificate for as TLS and things that are
344
00:33:26,469 --> 00:33:31,499
really really private then we cannot
remove them. But also there are
345
00:33:31,499 --> 00:33:33,779
documentations, the documentation for
infrastructure is at
346
00:33:33,779 --> 00:33:40,409
wikitech.wikimedia.org and documentation
for configuration is at noc.wikimedia.org
347
00:33:40,409 --> 00:33:46,599
plus the documentation of our codebase.
The documentation for MediaWiki itself is
348
00:33:46,599 --> 00:33:52,989
at mediawiki.org and also we have a our
own system of URL shortener you can go to
349
00:33:52,989 --> 00:33:58,789
w.wiki and short and shorten any URL in
Wikimedia structure so we reserved the
350
00:33:58,789 --> 00:34:08,779
dollar sign for the donate site and yeah,
you have any questions, please.
351
00:34:08,779 --> 00:34:16,540
Applause
352
00:34:16,540 --> 00:34:21,679
Daniel: It's if you know we have quite a bit of
time for questions so if anything wasn't
353
00:34:21,679 --> 00:34:27,149
clear or they're curious about anything
please, please ask.
354
00:34:27,149 --> 00:34:37,200
AM: So one question what is not in the
presentation. Do you have any efforts with
355
00:34:37,200 --> 00:34:42,460
hacking attacks?
Amir: So the first rule of security issues
356
00:34:42,460 --> 00:34:49,210
is that we don't talk about security issues
but let's say this baby has all sorts of
357
00:34:49,210 --> 00:34:56,240
attacks happening, we have usually we have
DDo. Once there was happening a couple of
358
00:34:56,240 --> 00:34:59,819
months ago that was very successful. I
don't know if you read the news about
359
00:34:59,819 --> 00:35:05,200
that, but we also, we have a infrastructure
to handle this, we have a security team
360
00:35:05,200 --> 00:35:12,740
that handles these cases and yes.
AM: Hello how do you manage access to your
361
00:35:12,740 --> 00:35:20,069
infrastructure from your employees?
Amir: So it's SS-- so we have a LDAP
362
00:35:20,069 --> 00:35:25,390
group and LDAP for the web-based
systems but for SSH and for this ssh we
363
00:35:25,390 --> 00:35:30,660
have strict protocols and then you get a
private key and some people usually
364
00:35:30,660 --> 00:35:35,480
protect their private key using UV keys
and then you have you can SSH to the
365
00:35:35,480 --> 00:35:40,420
system basically.
Lucas: Yeah, well, there's some
366
00:35:40,420 --> 00:35:44,720
firewalling setup but there's only one
server for data center that you can
367
00:35:44,720 --> 00:35:48,221
actually reach through SSH and then you
have to tunnel through that to get to any
368
00:35:48,221 --> 00:35:51,359
other server.
Amir: And also, like, we have we have a
369
00:35:51,359 --> 00:35:55,500
internal firewall and it's basically if
you go to the inside of the production you
370
00:35:55,500 --> 00:36:01,450
cannot talk to the outside. You even, you
for example do git clone github.org, it
371
00:36:01,450 --> 00:36:07,200
doesn't, github.com doesn't work. It
only can access tools that are for inside
372
00:36:07,200 --> 00:36:13,390
Wikimedia Foundation infrastructure.
AM: Okay, hi, you said you do TLS
373
00:36:13,390 --> 00:36:18,640
termination through nginx, do you still
allow non-HTTPS so it should be non-secure access.
374
00:36:18,640 --> 00:36:22,780
Amir: No we dropped it a really long
time ago but also
375
00:36:22,780 --> 00:36:25,069
Lucas: 2013 or so
Amir: Yeah, 2015
376
00:36:25,069 --> 00:36:28,651
Lucas: 2015
Amir: 2013 started serving the most of the
377
00:36:28,651 --> 00:36:35,740
traffic but 15, we dropped all of the
HTTP- non-HTTPS protocols and recently even
378
00:36:35,740 --> 00:36:43,940
dropped and we are not serving any SSL
requests anymore and TLS 1.1 is also being
379
00:36:43,940 --> 00:36:48,460
phased out, so we are sending you a warning
to the users like you're using TLS 1.1,
380
00:36:48,460 --> 00:36:54,810
please migrate to these new things that
came out around 10 years ago, so yeah
381
00:36:54,810 --> 00:36:59,849
Lucas: Yeah I think the deadline for that
is like February 2020 or something then
382
00:36:59,849 --> 00:37:04,710
we'll only have TLS 1.2
Amir: And soon we are going to support TLS
383
00:37:04,710 --> 00:37:06,640
1.3
Lucas: Yeah
384
00:37:06,640 --> 00:37:12,460
Are there any questions?
Q: so does read-only traffic
385
00:37:12,460 --> 00:37:18,029
from logged in users hit all the way
through to the parser cache or is there
386
00:37:18,029 --> 00:37:22,280
another layer of caching for that?
Amir: Yes we, you bypass all of
387
00:37:22,280 --> 00:37:28,470
that, you can.
Daniel: We need one more microphone. Yes,
388
00:37:28,470 --> 00:37:33,869
it actually does and this is a pretty big
problem and something we want to look into
389
00:37:33,869 --> 00:37:38,930
clears throat but it requires quite a
bit of rearchitecting. If you are
390
00:37:38,930 --> 00:37:44,250
interested in this kind of thing, maybe
come to my talk tomorrow at noon.
391
00:37:44,250 --> 00:37:48,819
Amir: Yeah one reason we can, we are
planning to do is active active so we have
392
00:37:48,819 --> 00:37:56,500
two primaries and the read request gets
request - from like the users can hit
393
00:37:56,500 --> 00:37:58,460
their secondary data center instead of the
main one.
394
00:37:58,460 --> 00:38:03,990
Lucas: I think there was a question way in
the back there, for some time already
395
00:38:03,990 --> 00:38:13,950
AM: Hi, I got a question. I read on the
Wikitech that you are using karate as a
396
00:38:13,950 --> 00:38:19,040
validation platform for some parts, can
you tell us something about this or what
397
00:38:19,040 --> 00:38:24,619
parts of Wikipedia or Wikimedia are hosted
on this platform?
398
00:38:24,619 --> 00:38:29,589
Amir: I am I'm not oh sorry so I don't
know this kind of very very sure but take
399
00:38:29,589 --> 00:38:34,390
it with a grain of salt but as far as I
know karate is used to build a very small
400
00:38:34,390 --> 00:38:39,829
VMs in productions that we need for very
very small micro sites that we serve to
401
00:38:39,829 --> 00:38:45,619
the users. So we built just one or two VMs,
we don't use it very as often as I think
402
00:38:45,619 --> 00:38:54,819
so.
AM: Do you also think about open hardware?
403
00:38:54,819 --> 00:39:03,950
Amir: I don't, you can
Daniel: Not - not for servers. I think for
404
00:39:03,950 --> 00:39:07,500
the offline Reader project, but this is not
actually run by the Foundation, it's
405
00:39:07,500 --> 00:39:10,289
supported but it's not something that the
Foundation does. They were sort of
406
00:39:10,289 --> 00:39:15,100
thinking about open hardware but really
open hardware in practice usually means,
407
00:39:15,100 --> 00:39:19,609
you - you don't, you know, if you really
want to go down to the chip design, it's
408
00:39:19,609 --> 00:39:25,210
pretty tough, so yeah, it's- it's it- it's
usually not practical, sadly.
409
00:39:25,210 --> 00:39:31,660
Amir: And one thing I can say but this is
that we have a some machine - machines that
410
00:39:31,660 --> 00:39:37,150
are really powerful that we give to the
researchers to run analysis on the between
411
00:39:37,150 --> 00:39:43,369
this itself and we needed to have GPUs for
those but the problem was - was there
412
00:39:43,369 --> 00:39:49,109
wasn't any open source driver for them so
we migrated and use AMD I think, but AMD
413
00:39:49,109 --> 00:39:53,609
didn't fit in the rack it was a quite a
endeavor to get it to work for our
414
00:39:53,609 --> 00:40:03,710
researchers to help you CPU.
AM: I'm still impressed that you answer
415
00:40:03,710 --> 00:40:10,920
90% out of the cache. Do all people access
the same pages or is the cache that huge?
416
00:40:10,920 --> 00:40:21,160
So what percentage of - of the whole
database is in the cache then?
417
00:40:21,160 --> 00:40:29,760
Daniel: I don't have the exact numbers to
be honest, but a large percentage of the
418
00:40:29,760 --> 00:40:36,769
whole database is in the cache. I mean it
expires after 24 hours so really obscure
419
00:40:36,769 --> 00:40:43,430
stuff isn't there but I mean it's- it's a-
it's a- it's a power-law distribution
420
00:40:43,430 --> 00:40:47,890
right? You have a few pages that are
accessed a lot and you have many many many
421
00:40:47,890 --> 00:40:55,420
pages that are not actually accessed
at all for a week or so except maybe for a
422
00:40:55,420 --> 00:41:01,740
crawler, so I don't know a number. My
guess would be it's less than 50% that is
423
00:41:01,740 --> 00:41:06,520
actually cached but, you know, that still
covers 90%-- it's probably the top 10% of
424
00:41:06,520 --> 00:41:11,630
pages would still cover 90% of the
pageviews, but I don't-- this would be
425
00:41:11,630 --> 00:41:15,509
actually-- I should look this up, it would
be interesting numbers to have, yes.
426
00:41:15,509 --> 00:41:20,710
Lucas: Do you know if this is 90% of the
pageviews or 90% of the get requests
427
00:41:20,710 --> 00:41:24,279
because, like, requests for the JavaScript
would also be cached more often, I assume
428
00:41:24,279 --> 00:41:27,529
Daniel: I would expect that for non-
pageviews, it's even higher
429
00:41:27,529 --> 00:41:30,010
Lucas: Yeah
Daniel: Yeah, because you know all the
430
00:41:30,010 --> 00:41:34,150
icons and- and, you know, JavaScript
bundles and CSS and stuff doesn't ever
431
00:41:34,150 --> 00:41:40,309
change
Lucas: I'm gonna say for every 180 min 90%
432
00:41:40,309 --> 00:41:50,790
but there's a question back there
AM: Hey. Do your data centers run on green
433
00:41:50,790 --> 00:41:55,220
energy?
Amir: Very valid question. So, the
434
00:41:55,220 --> 00:42:03,450
Amsterdam city n1 is a full green but the
other ones are partially green, partially
435
00:42:03,450 --> 00:42:10,840
coal and like gas. As far as I know, there
are some plans to make them move away from
436
00:42:10,840 --> 00:42:15,170
it but the other hand we realized that if
we don't produce as much as a carbon
437
00:42:15,170 --> 00:42:21,349
emission because we don't have much servers
and we don't use much data, there was a
438
00:42:21,349 --> 00:42:26,789
summation and that we realized our carbon
emission is basically as the same as 200
439
00:42:26,789 --> 00:42:34,720
and in the datacenter plus all of their
travel that all of this have to and all of
440
00:42:34,720 --> 00:42:37,880
the events is 250 households, it's very
very small it's I think it's one
441
00:42:37,880 --> 00:42:44,890
thousandth of the comparable
traffic with Facebook even if you just cut
442
00:42:44,890 --> 00:42:50,650
down with the same traffic because
Facebook collects the data, it runs very
443
00:42:50,650 --> 00:42:54,269
sophisticated machine learning algorithms
that's that's a real complicate, but for
444
00:42:54,269 --> 00:43:01,119
Wikimedia, we don't do this so we don't
need much energy. Does - does the answer
445
00:43:01,119 --> 00:43:04,920
your question?
Herald: Do we have any other
446
00:43:04,920 --> 00:43:15,720
questions left? Yeah sorry
AM: hi how many developers do you need to
447
00:43:15,720 --> 00:43:19,789
maintain the whole infrastructure and how
many developers or let's say head
448
00:43:19,789 --> 00:43:24,500
developer hours you needed to build the
whole infrastructure like the question is
449
00:43:24,500 --> 00:43:29,329
because what I find very interesting about
the talk it's a non-profit, so as an
450
00:43:29,329 --> 00:43:34,109
example for other nonprofits is how much
money are we talking about in order to
451
00:43:34,109 --> 00:43:38,760
build something like this as a digital
common.
452
00:43:45,630 --> 00:43:48,980
Daniel: If this is just about actually
running all this so just operations is
453
00:43:48,980 --> 00:43:53,530
less than 20 people I think which makes if
you if you basically divide the requests
454
00:43:53,530 --> 00:43:59,869
per second by people you get to something
like 8,000 requests per second per
455
00:43:59,869 --> 00:44:04,369
operations engineer which I think is a
pretty impressive number. This is probably
456
00:44:04,369 --> 00:44:09,809
a lot higher I would I would really like
to know if there's any organization that
457
00:44:09,809 --> 00:44:17,270
tops that. I don't actually know the whole
the the actual operations budget I know is
458
00:44:17,270 --> 00:44:24,559
it two two-digit millions annually. Total
hours for building this over the last 18
459
00:44:24,559 --> 00:44:29,069
years, I have no idea. For the for the
first five or so years, the people doing
460
00:44:29,069 --> 00:44:34,609
it were actually volunteers. We still had
volunteer database administrators and
461
00:44:34,609 --> 00:44:42,160
stuff until maybe ten years ago, eight
years ago, so yeah it's really nobody
462
00:44:42,160 --> 00:44:44,589
did any accounting of this I can only
guess.
463
00:44:56,669 --> 00:45:03,810
AM: Hello a tools question. I a few years
back I saw some interesting examples of
464
00:45:03,810 --> 00:45:09,089
saltstack use for Wikimedia but right now
I see only Puppet that come in mentioned
465
00:45:09,089 --> 00:45:17,819
so kind of what happened with that
Amir: I think we dished saltstack you -
466
00:45:17,819 --> 00:45:22,970
I don't I cannot because none of us are in
the Cloud Services team and I don't think
467
00:45:22,970 --> 00:45:27,380
I can answer you but if you look at the
wikitech.wikimedia.org, it's
468
00:45:27,380 --> 00:45:30,869
probably if last time I checked says like
it's deprecated and obsolete we don't use
469
00:45:30,869 --> 00:45:32,144
it anymore.
470
00:45:37,394 --> 00:45:39,920
AM: Do you use the bat-ropes like the top
471
00:45:39,920 --> 00:45:46,130
runners to fill spare capacity on the web
serving servers or do you have dedicated
472
00:45:46,130 --> 00:45:51,589
servers for the roles.
Lucas: I think they're dedicated.
473
00:45:51,589 --> 00:45:56,390
Amir: The job runners if you're asking job runners
are dedicated yes they are they are I
474
00:45:56,390 --> 00:46:02,910
think 5 per primary data center so
Daniel: Yeah they don't, I mean do we do we
475
00:46:02,910 --> 00:46:06,559
actually have any spare capacity on
anything? We don't have that much hardware
476
00:46:06,559 --> 00:46:08,700
everything is pretty much at a hundred
percent.
477
00:46:08,700 --> 00:46:14,109
Lucas: I think we still have some server
that is just called misc1111 or something
478
00:46:14,109 --> 00:46:18,620
which run five different things at once,
you can look for those on wikitech.
479
00:46:18,620 --> 00:46:25,820
Amir: But but we go oh sorry it's not five
it's 20 per data center 20 per primary
480
00:46:25,820 --> 00:46:31,440
data center that's our job runner and they
run 700 jobs per second.
481
00:46:31,440 --> 00:46:35,690
Lucas: And I think that does not include
the video scaler so those are separate
482
00:46:35,690 --> 00:46:38,109
again
Amir: No, they merged them in like a month
483
00:46:38,109 --> 00:46:40,040
ago
Lucas: Okay, cool
484
00:46:47,470 --> 00:46:51,420
AM: Maybe a little bit off topic that can
tell us a little bit about decision making
485
00:46:51,420 --> 00:46:55,750
process for- for technical decision,
architecture decisions, how does it work
486
00:46:55,750 --> 00:47:01,890
in an organization like this: decision
making process for architectural
487
00:47:01,890 --> 00:47:03,409
decisions for example.
488
00:47:08,279 --> 00:47:11,009
Daniel: Yeah so Wikimedia has a
489
00:47:11,009 --> 00:47:16,539
committee for making high-level technical
decisions, it's called a Wikimedia
490
00:47:16,539 --> 00:47:23,609
Technical Committee, techcom and we run an
RFC process so any decision that is a
491
00:47:23,609 --> 00:47:27,540
cross-cutting strategic are especially
hard to undo should go through this
492
00:47:27,540 --> 00:47:33,579
process and it's pretty informal,
basically you file a ticket and start
493
00:47:33,579 --> 00:47:38,000
this process. It gets announced
in the mailing list, hopefully you get
494
00:47:38,000 --> 00:47:45,009
input and feedback and at some point it is
it's approved for implementation. We're
495
00:47:45,009 --> 00:47:48,640
currently looking into improving this
process, it's not- sometimes it works
496
00:47:48,640 --> 00:47:52,200
pretty well, sometimes things don't get
that much feedback but it still it makes
497
00:47:52,200 --> 00:47:55,890
sure that people are aware of these high-
level decisions
498
00:47:55,890 --> 00:47:59,790
Amir: Daniel is the chair of that
committee
499
00:48:02,160 --> 00:48:07,839
Daniel: Yeah, if you want to complain
about the process, please do.
500
00:48:13,549 --> 00:48:21,440
AM: yes regarding CI and CD across along the
pipeline, of course with that much traffic
501
00:48:21,440 --> 00:48:27,359
you want to keep everything consistent
right. So is there any testing
502
00:48:27,359 --> 00:48:32,150
strategies that you have said internally,
like of course unit tests integration
503
00:48:32,150 --> 00:48:35,790
tests but do you do something like
continuous end to end testing on beta
504
00:48:35,790 --> 00:48:40,100
instances?
Amir: So if we have beta cluster but also
505
00:48:40,100 --> 00:48:44,670
we do deploy, we call it train and so
we deploy once a week, all of the changes
506
00:48:44,670 --> 00:48:50,349
gets merged to one, like a branch and the
branch gets cut in every Tuesday and it
507
00:48:50,349 --> 00:48:54,680
first goes to the test wikis and
then it goes to all of the wikis that are
508
00:48:54,680 --> 00:48:59,270
not Wikipedia except Catalan and Hebrew
Wikipedia. So basically Hebrew and Catalan
509
00:48:59,270 --> 00:49:03,759
Wikipedia volunteer to be the guinea pigs
of the next wikis and if everything works
510
00:49:03,759 --> 00:49:07,599
fine usually it goes there and is like oh
the fatal mater and we have a logging and
511
00:49:07,599 --> 00:49:12,579
then it's like okay we need to fix this
and we fix it immediately and then it goes
512
00:49:12,579 --> 00:49:18,690
live to all wikis. This is one way of
looking at it well so okay yeah
513
00:49:18,690 --> 00:49:23,279
Daniel: So, our test coverage is not as
great as it should be and so we kind of,
514
00:49:23,279 --> 00:49:30,970
you know, abuse our users for this. We
are, of course, working to improve this
515
00:49:30,970 --> 00:49:37,230
and one thing that we started recently is
a program for creating end-to-end tests
516
00:49:37,230 --> 00:49:43,460
for all the API modules we have, in the
hope that we can thereby cover pretty much
517
00:49:43,460 --> 00:49:49,849
all of the application logic bypassing the
user interface. I mean, full end-to-end
518
00:49:49,849 --> 00:49:52,770
should, of course, include the user
interface but user interface tests are
519
00:49:52,770 --> 00:49:58,180
pretty brittle and often tests you know
where things are on the screen and it just
520
00:49:58,180 --> 00:50:02,559
seems to us that it makes a lot of sense
to have more- to have tests that actually
521
00:50:02,559 --> 00:50:07,259
test the application logic for what the
system actually should be doing, rather
522
00:50:07,259 --> 00:50:15,910
than what it should look like and, yeah,
we are currently working on making- so
523
00:50:15,910 --> 00:50:20,210
yeah, basically this has been a proof of
concept and we're currently working to
524
00:50:20,210 --> 00:50:27,079
actually integrate it in- in CI. That
perhaps should land once everyone is back
525
00:50:27,079 --> 00:50:34,560
from the vacations and then we have to
write about a thousand or so tests, I
526
00:50:34,560 --> 00:50:37,930
guess.
Lucas: I think there's also a plan to move
527
00:50:37,930 --> 00:50:42,559
to a system where we actually deploy
basically after every commit and can
528
00:50:42,559 --> 00:50:45,910
immediately roll back if something goes
wrong but that's more midterm stuff and
529
00:50:45,910 --> 00:50:48,339
I'm not sure what the current status of
that proposal is
530
00:50:48,339 --> 00:50:50,450
Amir: And it will be in Kubernetes, so it
will be completely different
531
00:50:50,450 --> 00:50:55,529
Daniel: That would be amazing
Lucas: But right now, we are on this
532
00:50:55,529 --> 00:50:59,730
weekly basis, if something goes wrong, we
roll back to the last week's version of
533
00:50:59,730 --> 00:51:06,049
the code
Herald: Are there are any questions-
534
00:51:06,049 --> 00:51:18,549
questions left? Sorry. Yeah. Okay, um, I
don't think so. So, yeah, thank you for
535
00:51:18,549 --> 00:51:25,329
this wonderful talk. Thank you for all
your questions. Um, yeah, I hope you liked
536
00:51:25,329 --> 00:51:29,750
it. Um, see you around, yeah.
537
00:51:29,750 --> 00:51:33,725
Applause
538
00:51:33,725 --> 00:51:39,270
Music
539
00:51:39,270 --> 00:52:01,000
Subtitles created by c3subtitles.de
in the year 2021. Join, and help us!