WEBVTT 00:00:00.000 --> 00:00:18.890 Music 00:00:18.890 --> 00:00:24.350 Herald: Hello everybody, we are ready to get started we have Lucas and Amir here 00:00:24.350 --> 00:00:29.170 and they want to give us a quick introduction of a project from the 00:00:29.170 --> 00:00:33.540 wikimedia foundation called "cloud services" and how it might be may be 00:00:33.540 --> 00:00:39.110 useful to all of us. So let's give a round of welcoming applause to Lucas and Amir. 00:00:39.110 --> 00:00:42.850 Applause 00:00:42.850 --> 00:00:49.490 Lucas: Thanks! yea, hello. So "wikimedia cloud services" is basically this big 00:00:49.490 --> 00:00:55.230 collection of all kinds of different things which are useful if you want to do 00:00:55.230 --> 00:00:58.780 taking your things in the wikimedia universe like with wikipedia or other 00:00:58.780 --> 00:01:05.920 projects and you get them free of charge or you can just use them and the only 00:01:05.920 --> 00:01:09.880 requirement is that you use them for something that's a kind of relevant to the 00:01:09.880 --> 00:01:14.400 mission of wikimedia of promoting free knowledge and that kind of stuff and it's 00:01:14.400 --> 00:01:18.360 kind of split into the things that you can do with your regular wikimedia account 00:01:18.360 --> 00:01:22.750 which any registered user can do and then there's also things you need a special 00:01:22.750 --> 00:01:26.250 account for on a different system called wiki tech and Amir is going to talk more 00:01:26.250 --> 00:01:30.030 about those later but first let's just look into some of the things you can do 00:01:30.030 --> 00:01:34.420 with your regular wikimedia account. And if you want to follow any of these links 00:01:34.420 --> 00:01:39.220 there's a shortcut here. I was about to switch the next tab, so let's just stay 00:01:39.220 --> 00:01:45.211 here for a few seconds yeah. So the first thing is the API sandbox which is if you 00:01:45.211 --> 00:01:51.590 want to use the MediaWiki API to figure out what you have on a page or to make 00:01:51.590 --> 00:01:55.820 edits or any kind of stuff. The API sandbox is a special page that's really 00:01:55.820 --> 00:02:00.630 useful to find out how to use the API for example here's all the different actions I 00:02:00.630 --> 00:02:08.179 can use that say query is the kind of general catch-all action that's here and 00:02:08.179 --> 00:02:12.700 then I get down here a list of all the parameters I can use with queries such as: 00:02:12.700 --> 00:02:20.160 I want to have all the user info and what kind of user info do I want? I want 00:02:20.160 --> 00:02:24.280 options, blablabla. I would like to have some different format versions. So it 00:02:24.280 --> 00:02:29.070 gives you all these nice inputs for figuring out exactly how to use the API 00:02:29.070 --> 00:02:33.000 what's valid what's not valid and then you can make the API request and there you get 00:02:33.000 --> 00:02:38.440 a response and we can't read anything because it's zoomed in way too much. But 00:02:38.440 --> 00:02:42.880 it's very helpful when trying to use the API and then in the end here you can see 00:02:42.880 --> 00:02:49.280 what you need to do in your own code to make the same API request. And for 00:02:49.280 --> 00:02:54.949 anything that you can't do with the normal API - so if you want to do some kind of 00:02:54.949 --> 00:02:59.570 more expensive analysis - you can often do that with Quarry, which is a tool that 00:02:59.570 --> 00:03:04.910 lets you write SQL queries against databases that are almost like the ones in 00:03:04.910 --> 00:03:09.310 production like you don't have user passwords and stuff but you'll have all 00:03:09.310 --> 00:03:16.100 the database tables with page metadata and connections between them and the logs and 00:03:16.100 --> 00:03:20.370 all kinds of stuff and you can just write your SQL here send it and you get the 00:03:20.370 --> 00:03:25.310 results for example here's the number of lexemes published a days so it's some kind 00:03:25.310 --> 00:03:30.790 of selecting from the page where the namespace is the lexeme namespace and 00:03:30.790 --> 00:03:41.260 grouping that by the date and then we get something like all the way down to 00:03:41.260 --> 00:03:46.260 September which is apparently when I ran this query there were here there were 116 00:03:46.260 --> 00:03:52.630 lexemes created in this day. Or here someone had a list of edits to JavaScript 00:03:52.630 --> 00:03:57.810 and CSS pages on Hungarian Wikipedia so you can run these queries against any Wiki 00:03:57.810 --> 00:04:07.090 you like, like this here in wikipedia one. And if you can't get by with just SQL what 00:04:07.090 --> 00:04:13.340 you also have is this thing called Paws, which gives you a Jupiter(?) instance if 00:04:13.340 --> 00:04:18.970 you've heard of that you can basically write your own Python code here and do it 00:04:18.970 --> 00:04:23.900 in a very convenient way because there's all kinds of auto-completion and helpful 00:04:23.900 --> 00:04:36.780 things. So i can just try to copy this and run the code (then I needed a new cell 00:04:36.780 --> 00:04:43.000 below it… there we go, Thanks!) and if I type item I should get helpful hints what 00:04:43.000 --> 00:04:50.350 I can do with the item (if it's not hanging or something or the tab control 00:04:50.350 --> 00:04:58.650 space no oh there we go yeah) and it's also a very useful way to work with py- 00:04:58.650 --> 00:05:07.310 wiki-bot or you can also directly get normal shell here. And one thing (oops did 00:05:07.310 --> 00:05:11.890 I click and wrong thing? I would like to have oh no I don't want a bash notebook I 00:05:11.890 --> 00:05:20.750 want a new terminal that's what I want). And here you have for example database 00:05:20.750 --> 00:05:33.010 dumps in (where was it?) public/dumps/ something public again… So if you want to 00:05:33.010 --> 00:05:40.660 do some kind of analysis here on the data dumps you can get them here and then have 00:05:40.660 --> 00:05:47.530 all the computing that you want I guess to analyze the wiki more thoroughly and all 00:05:47.530 --> 00:05:51.720 of this is hosted in the Wikimedia Cloud for you and you don't need your own server 00:05:51.720 --> 00:05:56.950 or anything. Oh yeah I had two more examples of that, for example here: I use 00:05:56.950 --> 00:06:01.750 that too so there were a lot of items on Wikipedia where there was some encoding 00:06:01.750 --> 00:06:06.360 error, this should be an apostrophe like down here and instead it was this kind of 00:06:06.360 --> 00:06:12.680 I with an accent and I hacked together some ugly Java/Python code to make all of 00:06:12.680 --> 00:06:16.609 these edits and it was already logged in as well I didn't need to worry about 00:06:16.609 --> 00:06:21.060 logging in or having a password or anything. So it's a very convenient way to 00:06:21.060 --> 00:06:29.280 make edits as well. Or you can build something nicer here you can insert like 00:06:29.280 --> 00:06:36.950 markdown cells to explain what you're doing and how the code works and build 00:06:36.950 --> 00:06:41.700 nice notebooks like that, which are almost self-explanatory. And those are some of 00:06:41.700 --> 00:06:44.880 the things you can do just with your Wikimedia account and now Amir is going to 00:06:44.880 --> 00:06:49.180 talk about some other things. Amir: Thanks Lucas! So the thing that we 00:06:49.180 --> 00:06:55.010 can do is that maybe some of you like me think that doing thing in browser is for 00:06:55.010 --> 00:07:00.130 kids I need to do things in terminal I need to do connected system and then you 00:07:00.130 --> 00:07:05.340 can access for a wiki tech account which you can just make a wiki tech account in 00:07:05.340 --> 00:07:11.980 this place called wiki tech. (where is the li… no no but I do'… the main thing, the 00:07:11.980 --> 00:07:20.520 main list. yeah okay) And so in here so and then you make a wiki tech account and 00:07:20.520 --> 00:07:24.650 it gets approved quickly and then you get the shell and then you can just quickly go 00:07:24.650 --> 00:07:30.440 there (where is yer…) and you can go to this shell and just log in and then you 00:07:30.440 --> 00:07:35.260 have access to day a big set of nodes in the cloud and you can just do whatever you 00:07:35.260 --> 00:07:40.520 want. Also you have access to the core dumps and you have access to the replica 00:07:40.520 --> 00:07:58.670 database. Let me show it to you. [mumbling] So for example you can go to LS 00:07:58.670 --> 00:08:13.750 /public/dumps/public/wikidatawiki/ and then you get - oh there's like all sorts 00:08:13.750 --> 00:08:18.790 of time and everything that you want to, but if you also… you can do something else 00:08:18.790 --> 00:08:32.780 is that you can just do SQL wikidatawiki and then you go inside the wikidata's 00:08:32.780 --> 00:08:36.150 database, I mean it does you don't have the rights you can you cannot write to 00:08:36.150 --> 00:08:40.329 their replica because it's a replica and also it's sanitized so it doesn't have 00:08:40.329 --> 00:08:49.740 their like hash of user password and stuff like that but still you can do just select 00:08:49.740 --> 00:09:09.130 varies from recent changes limits five and yeah and then you get all of the things 00:09:09.130 --> 00:09:15.140 that you want you cannot even describe anything you want to directly into their 00:09:15.140 --> 00:09:20.171 system and then there is also we have something called the job grid so you can 00:09:20.171 --> 00:09:25.310 just put a crown and anything that you want to or just it's something run 00:09:25.310 --> 00:09:30.690 something directly and you goes to the a big note of cloud kubernetes and then just 00:09:30.690 --> 00:09:35.820 runs everything that you want to in its here there's a more information about it 00:09:35.820 --> 00:09:42.690 in here there's a like a long help that it says like oh I used to run this job and 00:09:42.690 --> 00:09:48.260 then job of what it does and you can get this so you just need to it's a bash 00:09:48.260 --> 00:09:54.780 command you can run any bash command and send it okay return me this output to this 00:09:54.780 --> 00:10:00.140 place and the other places one thing that you can do is also there's a web server 00:10:00.140 --> 00:10:05.380 that you can access everything directly so you can just put a PHP file there and into 00:10:05.380 --> 00:10:12.720 the Apache and then yet for example this is this is an example that we built 00:10:12.720 --> 00:10:19.380 together I think two two Christmases ago but this was like you can just see this is 00:10:19.380 --> 00:10:23.710 a piece before the source code is available and you just copy pasted that 00:10:23.710 --> 00:10:28.350 source code into like a directory and it was there and every time we click on it 00:10:28.350 --> 00:10:32.210 and you get most of the edits that happen on description wiki data that might be 00:10:32.210 --> 00:10:38.040 vandalism and we can fix it also a this is not just the only thing that you can do 00:10:38.040 --> 00:10:45.050 with this is that you can also put a Python flask application is this the file 00:10:45.050 --> 00:10:50.714 implants and then this can be just a Python application and you can just have 00:10:50.714 --> 00:10:57.830 the file there and also know JSON Java there's so many of them also you can have 00:10:57.830 --> 00:11:01.230 own database like I have something that has its own database for example quick 00:11:01.230 --> 00:11:09.520 categories in here has jobs that are here this is this tool for its own built-in 00:11:09.520 --> 00:11:15.480 database inside our select cloud services and its uses it just fine you can do that 00:11:15.480 --> 00:11:21.930 as well and also there's a cloud VPS that it doesn't do any kubernetes it just you 00:11:21.930 --> 00:11:27.070 can make a VPS of your own and then do whatever you want with it so for example 00:11:27.070 --> 00:11:31.770 and you get a project and you get the quota it's a slightly more limited but 00:11:31.770 --> 00:11:35.799 also you have access to the whole VPS you have sudo rights on it you can do whatever 00:11:35.799 --> 00:11:40.400 you feel like about it so we have like for example this project in here and it's 00:11:40.400 --> 00:11:47.050 called tools and then there's proxies and you can for example go into that instance 00:11:47.050 --> 00:11:52.170 and reboot it and do whatever you want and you can make new instance and look at your 00:11:52.170 --> 00:11:58.770 culture and look at everything else there and also you can also make it even a wiki 00:11:58.770 --> 00:12:05.740 on one of those cloud VPS systems which is for example we did in here in here if you 00:12:05.740 --> 00:12:10.750 look at it it's just a wiki and the difference is that for other ones for 00:12:10.750 --> 00:12:14.590 example for the vandalism dashboard you have tools that wmf labs org and then 00:12:14.590 --> 00:12:22.149 slash WD w VD which is the tool itself but in here we get our own subdomain so which 00:12:22.149 --> 00:12:28.870 will be wiki data - like seam that flew out the wmf labs org and you can even put 00:12:28.870 --> 00:12:35.900 all sorts of add subdomains for the wmf labs or as long it's not taken so you can 00:12:35.900 --> 00:12:42.260 build a media week instance instance or you can just complete a new software 00:12:42.260 --> 00:12:47.810 anything you can put a word processor who cares and then you can use it it's very 00:12:47.810 --> 00:12:58.970 simple your own thing and you can help lots of experience. Anything else? 00:12:58.970 --> 00:13:00.250 Lucas: I don't think so. Most 00:13:00.250 --> 00:13:06.430 important I would say is tool Forge to run your websites or if that's not enough for 00:13:06.430 --> 00:13:10.550 you cloud VPS and then you get your own VMware you can do absolutely anything you 00:13:10.550 --> 00:13:19.970 want as long as it matches those rules and stuff and I think that's it are there any questions? 00:13:19.970 --> 00:13:24.890 Herald: Hello thank you very much for the talk that was very quick so maybe 00:13:24.890 --> 00:13:34.600 anybody has a question here I'll give you my microphone to ask it. I don't see any 00:13:34.600 --> 00:13:41.630 hands nope okay I don't think we have questions but if you're just too shy to 00:13:41.630 --> 00:13:47.019 ask I think these guys always hanging around here around the wikipaka wiki so 00:13:47.019 --> 00:13:52.510 if you have anything you want to talk about you'll find them later okay then 00:13:52.510 --> 00:13:56.130 give a round of applause again for Lucas and Amir. 00:13:56.130 --> 00:13:58.830 Applause 00:13:58.830 --> 00:14:26.000 Music