[Script Info] Title: [Events] Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text Dialogue: 0,0:00:01.31,0:00:06.42,Default,,0000,0000,0000,,all right so welcome to today's lecture Dialogue: 0,0:00:04.44,0:00:08.76,Default,,0000,0000,0000,,which is going to be on data wrangling Dialogue: 0,0:00:06.42,0:00:10.62,Default,,0000,0000,0000,,and data wrangling might be a phrase it Dialogue: 0,0:00:08.76,0:00:12.63,Default,,0000,0000,0000,,sounds a little bit odd to you but the Dialogue: 0,0:00:10.62,0:00:14.94,Default,,0000,0000,0000,,basic idea of data wrangling is that you Dialogue: 0,0:00:12.63,0:00:16.80,Default,,0000,0000,0000,,have data in one format and you want it Dialogue: 0,0:00:14.94,0:00:18.93,Default,,0000,0000,0000,,in some different format and this Dialogue: 0,0:00:16.80,0:00:20.82,Default,,0000,0000,0000,,happens all of the time I'm not just Dialogue: 0,0:00:18.93,0:00:22.86,Default,,0000,0000,0000,,talking about like converting images but Dialogue: 0,0:00:20.82,0:00:25.08,Default,,0000,0000,0000,,it could be like you have a text file or Dialogue: 0,0:00:22.86,0:00:27.48,Default,,0000,0000,0000,,a log file and what you really want this Dialogue: 0,0:00:25.08,0:00:29.43,Default,,0000,0000,0000,,data in some other format like you want Dialogue: 0,0:00:27.48,0:00:32.40,Default,,0000,0000,0000,,a graph or you want statistics over the Dialogue: 0,0:00:29.43,0:00:35.16,Default,,0000,0000,0000,,data anything that goes from one piece Dialogue: 0,0:00:32.40,0:00:37.11,Default,,0000,0000,0000,,of data to another representation of Dialogue: 0,0:00:35.16,0:00:40.08,Default,,0000,0000,0000,,that data is what I would call data Dialogue: 0,0:00:37.11,0:00:42.18,Default,,0000,0000,0000,,wrangling we've seen some examples of Dialogue: 0,0:00:40.08,0:00:43.74,Default,,0000,0000,0000,,this kind of data wrangling already Dialogue: 0,0:00:42.18,0:00:45.75,Default,,0000,0000,0000,,previously in the semester like Dialogue: 0,0:00:43.74,0:00:48.00,Default,,0000,0000,0000,,basically whenever you use the pipe Dialogue: 0,0:00:45.75,0:00:49.74,Default,,0000,0000,0000,,operator that lets you sort of take Dialogue: 0,0:00:48.00,0:00:51.45,Default,,0000,0000,0000,,output from one program and feed it Dialogue: 0,0:00:49.74,0:00:54.15,Default,,0000,0000,0000,,through another program you are doing Dialogue: 0,0:00:51.45,0:00:55.29,Default,,0000,0000,0000,,data wrangling in one way or another but Dialogue: 0,0:00:54.15,0:00:57.96,Default,,0000,0000,0000,,we're going to do in this lecture is Dialogue: 0,0:00:55.29,0:00:59.85,Default,,0000,0000,0000,,take a look at some of the fancier ways Dialogue: 0,0:00:57.96,0:01:01.86,Default,,0000,0000,0000,,you can do data wrangling and some of Dialogue: 0,0:00:59.85,0:01:05.64,Default,,0000,0000,0000,,the really useful ways you can do data Dialogue: 0,0:01:01.86,0:01:06.99,Default,,0000,0000,0000,,wrangling in order to do any kind of Dialogue: 0,0:01:05.64,0:01:09.00,Default,,0000,0000,0000,,data wrangling though you need a data Dialogue: 0,0:01:06.99,0:01:12.24,Default,,0000,0000,0000,,source you need some data to operate on Dialogue: 0,0:01:09.00,0:01:14.40,Default,,0000,0000,0000,,in the first place and there are a lot Dialogue: 0,0:01:12.24,0:01:16.56,Default,,0000,0000,0000,,of good candidates for that kind of data Dialogue: 0,0:01:14.40,0:01:18.93,Default,,0000,0000,0000,,we give some examples in the exercise Dialogue: 0,0:01:16.56,0:01:20.58,Default,,0000,0000,0000,,section for today's lecture notes in Dialogue: 0,0:01:18.93,0:01:23.40,Default,,0000,0000,0000,,this particular one though I'm going to Dialogue: 0,0:01:20.58,0:01:25.50,Default,,0000,0000,0000,,be using a system log so I have a server Dialogue: 0,0:01:23.40,0:01:27.18,Default,,0000,0000,0000,,that's running somewhere the Netherlands Dialogue: 0,0:01:25.50,0:01:29.75,Default,,0000,0000,0000,,because that seemed like a reasonable Dialogue: 0,0:01:27.18,0:01:32.79,Default,,0000,0000,0000,,thing at the time and on that server Dialogue: 0,0:01:29.75,0:01:34.38,Default,,0000,0000,0000,,it's running sort of a regular logging Dialogue: 0,0:01:32.79,0:01:36.63,Default,,0000,0000,0000,,daemon that comes with system Deeb's Dialogue: 0,0:01:34.38,0:01:39.03,Default,,0000,0000,0000,,it's a sort of relatively standard Linux Dialogue: 0,0:01:36.63,0:01:41.88,Default,,0000,0000,0000,,logging mechanism and there's a command Dialogue: 0,0:01:39.03,0:01:44.70,Default,,0000,0000,0000,,called journal CTL on Linux systems that Dialogue: 0,0:01:41.88,0:01:46.44,Default,,0000,0000,0000,,will let you view the system log and so Dialogue: 0,0:01:44.70,0:01:48.69,Default,,0000,0000,0000,,what I'm gonna do is I'm gonna do some Dialogue: 0,0:01:46.44,0:01:50.01,Default,,0000,0000,0000,,transformations over that log and see if Dialogue: 0,0:01:48.69,0:01:52.83,Default,,0000,0000,0000,,we can extract something interesting Dialogue: 0,0:01:50.01,0:01:56.28,Default,,0000,0000,0000,,from it you'll see though that if I run Dialogue: 0,0:01:52.83,0:01:59.33,Default,,0000,0000,0000,,this command I end up with a lot of data Dialogue: 0,0:01:56.28,0:02:01.98,Default,,0000,0000,0000,,because this is a log that has just like Dialogue: 0,0:01:59.33,0:02:03.36,Default,,0000,0000,0000,,there's a lot of stuff in it right a lot Dialogue: 0,0:02:01.98,0:02:06.30,Default,,0000,0000,0000,,of things have happened on my server and Dialogue: 0,0:02:03.36,0:02:08.25,Default,,0000,0000,0000,,this goes back to like January first and Dialogue: 0,0:02:06.30,0:02:10.56,Default,,0000,0000,0000,,their logs that go even further back on Dialogue: 0,0:02:08.25,0:02:12.12,Default,,0000,0000,0000,,this there's a lot of stuff so the first Dialogue: 0,0:02:10.56,0:02:13.44,Default,,0000,0000,0000,,thing we're gonna do is try to limit it Dialogue: 0,0:02:12.12,0:02:16.26,Default,,0000,0000,0000,,down to you only Dialogue: 0,0:02:13.44,0:02:18.06,Default,,0000,0000,0000,,one piece of content and here the grep Dialogue: 0,0:02:16.26,0:02:19.83,Default,,0000,0000,0000,,command is your friend so we're gonna Dialogue: 0,0:02:18.06,0:02:23.22,Default,,0000,0000,0000,,pipe this through grep and we're gonna Dialogue: 0,0:02:19.83,0:02:24.81,Default,,0000,0000,0000,,pipe for SSH right so SSH we haven't Dialogue: 0,0:02:23.22,0:02:26.76,Default,,0000,0000,0000,,really talked to you about yet but it is Dialogue: 0,0:02:24.81,0:02:28.56,Default,,0000,0000,0000,,a way to access computers remotely Dialogue: 0,0:02:26.76,0:02:30.78,Default,,0000,0000,0000,,through the command line and in Dialogue: 0,0:02:28.56,0:02:32.19,Default,,0000,0000,0000,,particular what happens when you put a Dialogue: 0,0:02:30.78,0:02:34.08,Default,,0000,0000,0000,,server on the public Internet is that Dialogue: 0,0:02:32.19,0:02:35.70,Default,,0000,0000,0000,,lots and lots of people around the world Dialogue: 0,0:02:34.08,0:02:37.53,Default,,0000,0000,0000,,to try to connect to it and log in and Dialogue: 0,0:02:35.70,0:02:39.36,Default,,0000,0000,0000,,take over your server and so I want to Dialogue: 0,0:02:37.53,0:02:41.48,Default,,0000,0000,0000,,see how those people are trying to do Dialogue: 0,0:02:39.36,0:02:44.85,Default,,0000,0000,0000,,that and so I'm going to grep for SSH Dialogue: 0,0:02:41.48,0:02:47.70,Default,,0000,0000,0000,,and you'll see pretty quickly that this Dialogue: 0,0:02:44.85,0:02:51.27,Default,,0000,0000,0000,,also generates a bunch of content at Dialogue: 0,0:02:47.70,0:02:55.98,Default,,0000,0000,0000,,least in theory this is gonna be real Dialogue: 0,0:02:51.27,0:02:58.65,Default,,0000,0000,0000,,slow there we go so this generates tons Dialogue: 0,0:02:55.98,0:03:00.24,Default,,0000,0000,0000,,and tons and tons of content and it's Dialogue: 0,0:02:58.65,0:03:01.86,Default,,0000,0000,0000,,really hard to even just visualize Dialogue: 0,0:03:00.24,0:03:05.07,Default,,0000,0000,0000,,what's going on here so let's look at Dialogue: 0,0:03:01.86,0:03:06.66,Default,,0000,0000,0000,,only what user names people have used to Dialogue: 0,0:03:05.07,0:03:09.78,Default,,0000,0000,0000,,try to log into my server so you'll see Dialogue: 0,0:03:06.66,0:03:12.54,Default,,0000,0000,0000,,some of these lines say disconnected Dialogue: 0,0:03:09.78,0:03:14.94,Default,,0000,0000,0000,,disconnected from invalid user and then Dialogue: 0,0:03:12.54,0:03:17.43,Default,,0000,0000,0000,,some user name I want only those lines Dialogue: 0,0:03:14.94,0:03:19.08,Default,,0000,0000,0000,,that's all I really care about I'm gonna Dialogue: 0,0:03:17.43,0:03:21.75,Default,,0000,0000,0000,,make one more change here though which Dialogue: 0,0:03:19.08,0:03:26.46,Default,,0000,0000,0000,,is if you think about how this pipeline Dialogue: 0,0:03:21.75,0:03:29.16,Default,,0000,0000,0000,,does if I here do this connected from so Dialogue: 0,0:03:26.46,0:03:31.32,Default,,0000,0000,0000,,this pipeline at the bottom here what Dialogue: 0,0:03:29.16,0:03:33.42,Default,,0000,0000,0000,,that will do is it will send the entire Dialogue: 0,0:03:31.32,0:03:36.21,Default,,0000,0000,0000,,log file over the network to my machine Dialogue: 0,0:03:33.42,0:03:38.25,Default,,0000,0000,0000,,and then locally run grep to find only Dialogue: 0,0:03:36.21,0:03:40.53,Default,,0000,0000,0000,,the lines to contained ssh and then Dialogue: 0,0:03:38.25,0:03:42.15,Default,,0000,0000,0000,,locally filter them further this seems a Dialogue: 0,0:03:40.53,0:03:44.22,Default,,0000,0000,0000,,little bit wasteful because i don't care Dialogue: 0,0:03:42.15,0:03:45.96,Default,,0000,0000,0000,,about most of these lines and the remote Dialogue: 0,0:03:44.22,0:03:48.90,Default,,0000,0000,0000,,site is also running a shell so what I Dialogue: 0,0:03:45.96,0:03:51.51,Default,,0000,0000,0000,,can actually do is I can have that Dialogue: 0,0:03:48.90,0:03:53.52,Default,,0000,0000,0000,,entire command run on the server right Dialogue: 0,0:03:51.51,0:03:55.20,Default,,0000,0000,0000,,so I'm telling you SSH the command I Dialogue: 0,0:03:53.52,0:03:57.42,Default,,0000,0000,0000,,want you to run on the server is this Dialogue: 0,0:03:55.20,0:04:01.23,Default,,0000,0000,0000,,pipeline of three things and then what I Dialogue: 0,0:03:57.42,0:04:02.70,Default,,0000,0000,0000,,get back I want to pipe through less so Dialogue: 0,0:04:01.23,0:04:04.26,Default,,0000,0000,0000,,what does this do well it's gonna do Dialogue: 0,0:04:02.70,0:04:06.15,Default,,0000,0000,0000,,that same filtering that we did but it's Dialogue: 0,0:04:04.26,0:04:08.28,Default,,0000,0000,0000,,gonna do it on the server side and the Dialogue: 0,0:04:06.15,0:04:11.73,Default,,0000,0000,0000,,server is only going to send me those Dialogue: 0,0:04:08.28,0:04:13.29,Default,,0000,0000,0000,,lines that I care about and then when I Dialogue: 0,0:04:11.73,0:04:16.32,Default,,0000,0000,0000,,pipe it locally through the program Dialogue: 0,0:04:13.29,0:04:17.52,Default,,0000,0000,0000,,called less less is a pager you'll see Dialogue: 0,0:04:16.32,0:04:19.29,Default,,0000,0000,0000,,some examples of this you've actually Dialogue: 0,0:04:17.52,0:04:21.90,Default,,0000,0000,0000,,seen some of them already like when you Dialogue: 0,0:04:19.29,0:04:24.18,Default,,0000,0000,0000,,type man and some command that opens in Dialogue: 0,0:04:21.90,0:04:26.67,Default,,0000,0000,0000,,a pager and a pagers is a convenient way Dialogue: 0,0:04:24.18,0:04:27.39,Default,,0000,0000,0000,,to take a long piece of content and fit Dialogue: 0,0:04:26.67,0:04:29.76,Default,,0000,0000,0000,,it into your term Dialogue: 0,0:04:27.39,0:04:31.89,Default,,0000,0000,0000,,window and have you scrolled down and Dialogue: 0,0:04:29.76,0:04:33.15,Default,,0000,0000,0000,,scroll up and navigate it so that it Dialogue: 0,0:04:31.89,0:04:36.12,Default,,0000,0000,0000,,doesn't just like scroll past your Dialogue: 0,0:04:33.15,0:04:37.41,Default,,0000,0000,0000,,screen and so if I run this it still Dialogue: 0,0:04:36.12,0:04:40.80,Default,,0000,0000,0000,,takes a little while because it has to Dialogue: 0,0:04:37.41,0:04:42.92,Default,,0000,0000,0000,,parse through a lot of log files and in Dialogue: 0,0:04:40.80,0:04:45.93,Default,,0000,0000,0000,,particular grep is buffering and Dialogue: 0,0:04:42.92,0:04:46.92,Default,,0000,0000,0000,,therefore it decides to be relatively Dialogue: 0,0:04:45.93,0:04:56.04,Default,,0000,0000,0000,,unhelpful Dialogue: 0,0:04:46.92,0:05:01.26,Default,,0000,0000,0000,,I may do this without let's see if Dialogue: 0,0:04:56.04,0:05:05.19,Default,,0000,0000,0000,,that's more helpful why doesn't it want Dialogue: 0,0:05:01.26,0:05:09.95,Default,,0000,0000,0000,,to be helpful to me fine I'm gonna cheat Dialogue: 0,0:05:05.19,0:05:09.95,Default,,0000,0000,0000,,a little just ignore me Dialogue: 0,0:05:17.38,0:05:22.52,Default,,0000,0000,0000,,or the internet is really slow those are Dialogue: 0,0:05:20.57,0:05:27.14,Default,,0000,0000,0000,,two possible options luckily there's a Dialogue: 0,0:05:22.52,0:05:30.47,Default,,0000,0000,0000,,fix for that because previously I have Dialogue: 0,0:05:27.14,0:05:33.08,Default,,0000,0000,0000,,run the following command so this Dialogue: 0,0:05:30.47,0:05:34.34,Default,,0000,0000,0000,,command just takes the output of that Dialogue: 0,0:05:33.08,0:05:36.56,Default,,0000,0000,0000,,command and sticks it into a file Dialogue: 0,0:05:34.34,0:05:38.66,Default,,0000,0000,0000,,locally on my computer alright so I ran Dialogue: 0,0:05:36.56,0:05:40.97,Default,,0000,0000,0000,,this when I was up in my office and so Dialogue: 0,0:05:38.66,0:05:43.49,Default,,0000,0000,0000,,what this did is it downloaded all of Dialogue: 0,0:05:40.97,0:05:45.53,Default,,0000,0000,0000,,the SSH log entries that matched Dialogue: 0,0:05:43.49,0:05:47.33,Default,,0000,0000,0000,,disconnect from so I have those locally Dialogue: 0,0:05:45.53,0:05:49.07,Default,,0000,0000,0000,,and this is really handy right there's Dialogue: 0,0:05:47.33,0:05:50.99,Default,,0000,0000,0000,,no reason for me to stream the full log Dialogue: 0,0:05:49.07,0:05:52.64,Default,,0000,0000,0000,,every single time because I know that Dialogue: 0,0:05:50.99,0:05:55.22,Default,,0000,0000,0000,,that starting pattern is what I'm going Dialogue: 0,0:05:52.64,0:05:57.26,Default,,0000,0000,0000,,to want anyway so we can take a look at Dialogue: 0,0:05:55.22,0:05:59.48,Default,,0000,0000,0000,,SSH dot log and you will see there are Dialogue: 0,0:05:57.26,0:06:01.76,Default,,0000,0000,0000,,lots and lots and lots of lines that all Dialogue: 0,0:05:59.48,0:06:04.94,Default,,0000,0000,0000,,say disconnected from invalid user Dialogue: 0,0:06:01.76,0:06:06.23,Default,,0000,0000,0000,,authenticating users etc right so these Dialogue: 0,0:06:04.94,0:06:08.87,Default,,0000,0000,0000,,are the lines that we have to work on Dialogue: 0,0:06:06.23,0:06:10.55,Default,,0000,0000,0000,,and this also means that going forward Dialogue: 0,0:06:08.87,0:06:12.50,Default,,0000,0000,0000,,we don't have to go through this whole Dialogue: 0,0:06:10.55,0:06:16.22,Default,,0000,0000,0000,,SSH process we can just cat that file Dialogue: 0,0:06:12.50,0:06:18.08,Default,,0000,0000,0000,,and then operate it on it directly so Dialogue: 0,0:06:16.22,0:06:21.68,Default,,0000,0000,0000,,here I can also demonstrate this pager Dialogue: 0,0:06:18.08,0:06:23.72,Default,,0000,0000,0000,,so if I do cat s is a cat SSH dot log Dialogue: 0,0:06:21.68,0:06:25.22,Default,,0000,0000,0000,,and I pipe it through less it gives me a Dialogue: 0,0:06:23.72,0:06:28.85,Default,,0000,0000,0000,,pager where I can scroll up and down Dialogue: 0,0:06:25.22,0:06:30.56,Default,,0000,0000,0000,,make that a little bit smaller maybe so Dialogue: 0,0:06:28.85,0:06:33.32,Default,,0000,0000,0000,,I can scroll this file screw through Dialogue: 0,0:06:30.56,0:06:36.26,Default,,0000,0000,0000,,this file and I can do so with what are Dialogue: 0,0:06:33.32,0:06:37.82,Default,,0000,0000,0000,,roughly vim bindings so control you to Dialogue: 0,0:06:36.26,0:06:42.77,Default,,0000,0000,0000,,scroll up control D to scroll down and Dialogue: 0,0:06:37.82,0:06:45.17,Default,,0000,0000,0000,,cue to exit this is still a lot of Dialogue: 0,0:06:42.77,0:06:47.00,Default,,0000,0000,0000,,content though and these lines contain a Dialogue: 0,0:06:45.17,0:06:48.44,Default,,0000,0000,0000,,bunch of garbage that I'm not really Dialogue: 0,0:06:47.00,0:06:50.03,Default,,0000,0000,0000,,interested in what I really want to see Dialogue: 0,0:06:48.44,0:06:52.61,Default,,0000,0000,0000,,is what are what are these user names Dialogue: 0,0:06:50.03,0:06:55.79,Default,,0000,0000,0000,,and here the tool that we're going to Dialogue: 0,0:06:52.61,0:06:59.21,Default,,0000,0000,0000,,start using is one called sent said is a Dialogue: 0,0:06:55.79,0:07:01.04,Default,,0000,0000,0000,,stream editor that's modify or it's it's Dialogue: 0,0:06:59.21,0:07:04.10,Default,,0000,0000,0000,,a modification of a much earlier program Dialogue: 0,0:07:01.04,0:07:05.54,Default,,0000,0000,0000,,called edie which was a really weird Dialogue: 0,0:07:04.10,0:07:12.32,Default,,0000,0000,0000,,editor that none of you will probably Dialogue: 0,0:07:05.54,0:07:16.27,Default,,0000,0000,0000,,want to use yeah Oh tsp is the name of Dialogue: 0,0:07:12.32,0:07:16.27,Default,,0000,0000,0000,,my the remote computer I'm connecting to Dialogue: 0,0:07:16.39,0:07:23.72,Default,,0000,0000,0000,,so said is a stream editor and it Dialogue: 0,0:07:19.85,0:07:26.06,Default,,0000,0000,0000,,basically lets you make changes to the Dialogue: 0,0:07:23.72,0:07:28.49,Default,,0000,0000,0000,,contents of a stream you can think of it Dialogue: 0,0:07:26.06,0:07:29.87,Default,,0000,0000,0000,,a little bit like doing replacements but Dialogue: 0,0:07:28.49,0:07:30.41,Default,,0000,0000,0000,,it's actually a full programming Dialogue: 0,0:07:29.87,0:07:33.44,Default,,0000,0000,0000,,language Dialogue: 0,0:07:30.41,0:07:35.18,Default,,0000,0000,0000,,over the stream that is given one of the Dialogue: 0,0:07:33.44,0:07:38.06,Default,,0000,0000,0000,,most common things you do with said Dialogue: 0,0:07:35.18,0:07:40.61,Default,,0000,0000,0000,,though is to just run replacement Dialogue: 0,0:07:38.06,0:07:44.59,Default,,0000,0000,0000,,expressions on an input stream what do Dialogue: 0,0:07:40.61,0:07:44.59,Default,,0000,0000,0000,,these looks like well let me show you Dialogue: 0,0:07:45.16,0:07:50.00,Default,,0000,0000,0000,,here I'm gonna pipe this sue said and Dialogue: 0,0:07:47.78,0:07:52.54,Default,,0000,0000,0000,,I'm going to say that I want to remove Dialogue: 0,0:07:50.00,0:07:58.37,Default,,0000,0000,0000,,everything that comes before Dialogue: 0,0:07:52.54,0:08:00.98,Default,,0000,0000,0000,,disconnected from so this might look a Dialogue: 0,0:07:58.37,0:08:03.95,Default,,0000,0000,0000,,little weird the observation is that the Dialogue: 0,0:08:00.98,0:08:06.23,Default,,0000,0000,0000,,date and the host name and the sort of Dialogue: 0,0:08:03.95,0:08:07.31,Default,,0000,0000,0000,,process ID of the SSH daemon I don't Dialogue: 0,0:08:06.23,0:08:09.74,Default,,0000,0000,0000,,care about I can just remove that Dialogue: 0,0:08:07.31,0:08:11.93,Default,,0000,0000,0000,,straightaway and I can also remove that Dialogue: 0,0:08:09.74,0:08:13.58,Default,,0000,0000,0000,,like disconnected from bit because that Dialogue: 0,0:08:11.93,0:08:15.17,Default,,0000,0000,0000,,seems to be present in every single log Dialogue: 0,0:08:13.58,0:08:18.20,Default,,0000,0000,0000,,entry so I just want to get rid of it Dialogue: 0,0:08:15.17,0:08:20.36,Default,,0000,0000,0000,,and so what I write is a set expression Dialogue: 0,0:08:18.20,0:08:21.98,Default,,0000,0000,0000,,in this particular case it's an S Dialogue: 0,0:08:20.36,0:08:25.73,Default,,0000,0000,0000,,expression which is a substitute Dialogue: 0,0:08:21.98,0:08:27.62,Default,,0000,0000,0000,,expression it takes two arguments that Dialogue: 0,0:08:25.73,0:08:30.59,Default,,0000,0000,0000,,are basically enclosed in these slashes Dialogue: 0,0:08:27.62,0:08:32.36,Default,,0000,0000,0000,,so the first one is the search string Dialogue: 0,0:08:30.59,0:08:34.43,Default,,0000,0000,0000,,and the second one which is currently Dialogue: 0,0:08:32.36,0:08:36.47,Default,,0000,0000,0000,,empty is a replacement string so here Dialogue: 0,0:08:34.43,0:08:39.56,Default,,0000,0000,0000,,I'm saying search for the following Dialogue: 0,0:08:36.47,0:08:40.82,Default,,0000,0000,0000,,pattern and replace it with blank and Dialogue: 0,0:08:39.56,0:08:43.10,Default,,0000,0000,0000,,then I'm gonna pipe it into less at the Dialogue: 0,0:08:40.82,0:08:45.38,Default,,0000,0000,0000,,end do you see that now what it's done Dialogue: 0,0:08:43.10,0:08:49.76,Default,,0000,0000,0000,,is trim off the beginning of all these Dialogue: 0,0:08:45.38,0:08:52.22,Default,,0000,0000,0000,,lines and that seems really handy but Dialogue: 0,0:08:49.76,0:08:54.74,Default,,0000,0000,0000,,you might wonder what is this pattern Dialogue: 0,0:08:52.22,0:08:57.89,Default,,0000,0000,0000,,that I've built up here right this is Dialogue: 0,0:08:54.74,0:08:59.48,Default,,0000,0000,0000,,this dot star what does that mean this Dialogue: 0,0:08:57.89,0:09:01.82,Default,,0000,0000,0000,,is an example of a regular expression Dialogue: 0,0:08:59.48,0:09:03.62,Default,,0000,0000,0000,,and regular expressions are something Dialogue: 0,0:09:01.82,0:09:04.97,Default,,0000,0000,0000,,that you may have come across in Dialogue: 0,0:09:03.62,0:09:06.71,Default,,0000,0000,0000,,programming in the past Dialogue: 0,0:09:04.97,0:09:08.03,Default,,0000,0000,0000,,but it's something that once you go into Dialogue: 0,0:09:06.71,0:09:09.92,Default,,0000,0000,0000,,the command line you will find yourself Dialogue: 0,0:09:08.03,0:09:12.55,Default,,0000,0000,0000,,using a lot especially for this kind of Dialogue: 0,0:09:09.92,0:09:16.04,Default,,0000,0000,0000,,data wrangling regular expressions are Dialogue: 0,0:09:12.55,0:09:18.08,Default,,0000,0000,0000,,essentially a powerful way to match text Dialogue: 0,0:09:16.04,0:09:19.58,Default,,0000,0000,0000,,you can use it for other things than Dialogue: 0,0:09:18.08,0:09:23.03,Default,,0000,0000,0000,,text too but Texas the most common Dialogue: 0,0:09:19.58,0:09:26.84,Default,,0000,0000,0000,,example and in regular expressions you Dialogue: 0,0:09:23.03,0:09:29.81,Default,,0000,0000,0000,,have a number of special characters that Dialogue: 0,0:09:26.84,0:09:31.58,Default,,0000,0000,0000,,say don't just match this character but Dialogue: 0,0:09:29.81,0:09:34.21,Default,,0000,0000,0000,,match for example a particular type of Dialogue: 0,0:09:31.58,0:09:36.98,Default,,0000,0000,0000,,character or a particular set of options Dialogue: 0,0:09:34.21,0:09:39.77,Default,,0000,0000,0000,,it essentially generates a program for Dialogue: 0,0:09:36.98,0:09:42.04,Default,,0000,0000,0000,,you that searches the given text dot for Dialogue: 0,0:09:39.77,0:09:46.00,Default,,0000,0000,0000,,example means any single Dialogue: 0,0:09:42.04,0:09:48.73,Default,,0000,0000,0000,,character and star if you follow a Dialogue: 0,0:09:46.00,0:09:51.91,Default,,0000,0000,0000,,character with a star it means zero or Dialogue: 0,0:09:48.73,0:09:54.40,Default,,0000,0000,0000,,more of that character and so in this Dialogue: 0,0:09:51.91,0:09:57.58,Default,,0000,0000,0000,,case is pattern of saying zero or more Dialogue: 0,0:09:54.40,0:10:00.49,Default,,0000,0000,0000,,of any character followed by the literal Dialogue: 0,0:09:57.58,0:10:02.68,Default,,0000,0000,0000,,string disconnected from I'm saying Dialogue: 0,0:10:00.49,0:10:05.56,Default,,0000,0000,0000,,match that and then replace it with Dialogue: 0,0:10:02.68,0:10:07.66,Default,,0000,0000,0000,,blank regular expressions have a number Dialogue: 0,0:10:05.56,0:10:09.31,Default,,0000,0000,0000,,of these kind of special characters that Dialogue: 0,0:10:07.66,0:10:11.50,Default,,0000,0000,0000,,have various meanings you can take Dialogue: 0,0:10:09.31,0:10:12.46,Default,,0000,0000,0000,,advantage of I talked about star which Dialogue: 0,0:10:11.50,0:10:14.56,Default,,0000,0000,0000,,is zero or more Dialogue: 0,0:10:12.46,0:10:16.15,Default,,0000,0000,0000,,there's also Plus which is one or more Dialogue: 0,0:10:14.56,0:10:17.62,Default,,0000,0000,0000,,right so this is saying I want the Dialogue: 0,0:10:16.15,0:10:19.14,Default,,0000,0000,0000,,previous expression to match at least Dialogue: 0,0:10:17.62,0:10:22.51,Default,,0000,0000,0000,,once Dialogue: 0,0:10:19.14,0:10:24.91,Default,,0000,0000,0000,,you also have square brackets so square Dialogue: 0,0:10:22.51,0:10:27.18,Default,,0000,0000,0000,,brackets let you match one of many Dialogue: 0,0:10:24.91,0:10:29.80,Default,,0000,0000,0000,,different characters so here let us Dialogue: 0,0:10:27.18,0:10:36.37,Default,,0000,0000,0000,,build up a string list something like a Dialogue: 0,0:10:29.80,0:10:41.68,Default,,0000,0000,0000,,BA and I want to substitute a and B with Dialogue: 0,0:10:36.37,0:10:43.90,Default,,0000,0000,0000,,nothing okay so here what I'm telling Dialogue: 0,0:10:41.68,0:10:46.54,Default,,0000,0000,0000,,the pattern to do is to replace any Dialogue: 0,0:10:43.90,0:10:50.08,Default,,0000,0000,0000,,character that is either A or B with Dialogue: 0,0:10:46.54,0:10:52.81,Default,,0000,0000,0000,,nothing so if I make the first character Dialogue: 0,0:10:50.08,0:10:54.10,Default,,0000,0000,0000,,B it will still produce BA you might Dialogue: 0,0:10:52.81,0:10:56.02,Default,,0000,0000,0000,,wonder though why did it only replace Dialogue: 0,0:10:54.10,0:10:57.70,Default,,0000,0000,0000,,once well it's because what regular Dialogue: 0,0:10:56.02,0:11:00.16,Default,,0000,0000,0000,,expressions will do especially in this Dialogue: 0,0:10:57.70,0:11:01.57,Default,,0000,0000,0000,,default mode is they will just match the Dialogue: 0,0:11:00.16,0:11:04.27,Default,,0000,0000,0000,,pattern once and then apply the Dialogue: 0,0:11:01.57,0:11:07.36,Default,,0000,0000,0000,,replacement once per line that is what's Dialogue: 0,0:11:04.27,0:11:09.28,Default,,0000,0000,0000,,said normally does you can provide the G Dialogue: 0,0:11:07.36,0:11:12.25,Default,,0000,0000,0000,,modifier which says do this as many Dialogue: 0,0:11:09.28,0:11:14.14,Default,,0000,0000,0000,,times as it keeps matching which in this Dialogue: 0,0:11:12.25,0:11:15.79,Default,,0000,0000,0000,,case would erase the entire line because Dialogue: 0,0:11:14.14,0:11:18.70,Default,,0000,0000,0000,,every single character is either an A or Dialogue: 0,0:11:15.79,0:11:21.10,Default,,0000,0000,0000,,a B if I added a C here and remove Dialogue: 0,0:11:18.70,0:11:23.02,Default,,0000,0000,0000,,everything but the C if I added other Dialogue: 0,0:11:21.10,0:11:24.37,Default,,0000,0000,0000,,characters in the middle of this string Dialogue: 0,0:11:23.02,0:11:26.26,Default,,0000,0000,0000,,somewhere they would all be preserved Dialogue: 0,0:11:24.37,0:11:34.21,Default,,0000,0000,0000,,but anything that is an A or and B is Dialogue: 0,0:11:26.26,0:11:37.89,Default,,0000,0000,0000,,removed you can also do things like add Dialogue: 0,0:11:34.21,0:11:37.89,Default,,0000,0000,0000,,modifiers to this for example Dialogue: 0,0:11:42.33,0:11:51.73,Default,,0000,0000,0000,,what would this do this is saying I want Dialogue: 0,0:11:46.72,0:11:52.80,Default,,0000,0000,0000,,zero or more of the string a B and I'm Dialogue: 0,0:11:51.73,0:11:55.27,Default,,0000,0000,0000,,gonna replace them with nothing Dialogue: 0,0:11:52.80,0:11:57.40,Default,,0000,0000,0000,,this means that if I have a standalone a Dialogue: 0,0:11:55.27,0:11:59.56,Default,,0000,0000,0000,,it will not be replaced if I have a Dialogue: 0,0:11:57.40,0:12:01.54,Default,,0000,0000,0000,,standalone B it will not be replaced but Dialogue: 0,0:11:59.56,0:12:09.58,Default,,0000,0000,0000,,if I have the string a B it will be Dialogue: 0,0:12:01.54,0:12:11.94,Default,,0000,0000,0000,,removed which yeah what are they said is Dialogue: 0,0:12:09.58,0:12:11.94,Default,,0000,0000,0000,,stupid Dialogue: 0,0:12:12.34,0:12:18.25,Default,,0000,0000,0000,,the - a here is because said is a really Dialogue: 0,0:12:15.16,0:12:19.93,Default,,0000,0000,0000,,old tool and so it supports only a very Dialogue: 0,0:12:18.25,0:12:22.27,Default,,0000,0000,0000,,old version of very cool expressions Dialogue: 0,0:12:19.93,0:12:24.07,Default,,0000,0000,0000,,generally you will want to run it with - Dialogue: 0,0:12:22.27,0:12:25.81,Default,,0000,0000,0000,,capital e which makes it use a more Dialogue: 0,0:12:24.07,0:12:28.62,Default,,0000,0000,0000,,modern syntax that supports more things Dialogue: 0,0:12:25.81,0:12:30.94,Default,,0000,0000,0000,,if you are in a place where you can't Dialogue: 0,0:12:28.62,0:12:33.16,Default,,0000,0000,0000,,you have to prefix these with back Dialogue: 0,0:12:30.94,0:12:35.65,Default,,0000,0000,0000,,slashes to say I want the special Dialogue: 0,0:12:33.16,0:12:37.18,Default,,0000,0000,0000,,meaning of parenthesis otherwise they Dialogue: 0,0:12:35.65,0:12:39.99,Default,,0000,0000,0000,,were just match a literal parenthesis Dialogue: 0,0:12:37.18,0:12:43.51,Default,,0000,0000,0000,,which is probably not what you want so Dialogue: 0,0:12:39.99,0:12:46.39,Default,,0000,0000,0000,,notice how this replaced the a B here Dialogue: 0,0:12:43.51,0:12:48.79,Default,,0000,0000,0000,,and it replaced the a be here but it Dialogue: 0,0:12:46.39,0:12:51.04,Default,,0000,0000,0000,,left this C and it also left the a at Dialogue: 0,0:12:48.79,0:12:54.10,Default,,0000,0000,0000,,the end because that a does not match Dialogue: 0,0:12:51.04,0:12:55.74,Default,,0000,0000,0000,,this pattern anymore and you can group Dialogue: 0,0:12:54.10,0:12:58.18,Default,,0000,0000,0000,,these patterns in whatever ways you want Dialogue: 0,0:12:55.74,0:13:00.85,Default,,0000,0000,0000,,you also have things like alternations Dialogue: 0,0:12:58.18,0:13:07.42,Default,,0000,0000,0000,,you can say anything that matches a b or Dialogue: 0,0:13:00.85,0:13:10.51,Default,,0000,0000,0000,,b c i want to remove and here you'll Dialogue: 0,0:13:07.42,0:13:12.22,Default,,0000,0000,0000,,notice that this a b got removed this bc Dialogue: 0,0:13:10.51,0:13:14.74,Default,,0000,0000,0000,,did not get removed even though it Dialogue: 0,0:13:12.22,0:13:17.95,Default,,0000,0000,0000,,matches the pattern because the a b had Dialogue: 0,0:13:14.74,0:13:20.50,Default,,0000,0000,0000,,already been removed this a b is removed Dialogue: 0,0:13:17.95,0:13:22.96,Default,,0000,0000,0000,,right but the c stays in place this a b Dialogue: 0,0:13:20.50,0:13:25.87,Default,,0000,0000,0000,,is removed and this c states because it Dialogue: 0,0:13:22.96,0:13:29.47,Default,,0000,0000,0000,,still does not match that if I made this Dialogue: 0,0:13:25.87,0:13:31.75,Default,,0000,0000,0000,,if I remove this a then now this a B Dialogue: 0,0:13:29.47,0:13:34.00,Default,,0000,0000,0000,,pattern will not match this B so it'll Dialogue: 0,0:13:31.75,0:13:36.28,Default,,0000,0000,0000,,be preserved and then BC will match BC Dialogue: 0,0:13:34.00,0:13:37.81,Default,,0000,0000,0000,,and it'll go away Dialogue: 0,0:13:36.28,0:13:39.94,Default,,0000,0000,0000,,Regulus presence can be all sorts of Dialogue: 0,0:13:37.81,0:13:41.53,Default,,0000,0000,0000,,complicated when you first encounter Dialogue: 0,0:13:39.94,0:13:42.79,Default,,0000,0000,0000,,them and even once you get more Dialogue: 0,0:13:41.53,0:13:45.16,Default,,0000,0000,0000,,experience with them they can be Dialogue: 0,0:13:42.79,0:13:47.77,Default,,0000,0000,0000,,daunting to look at and this is why very Dialogue: 0,0:13:45.16,0:13:49.60,Default,,0000,0000,0000,,often you want to use something like a Dialogue: 0,0:13:47.77,0:13:51.70,Default,,0000,0000,0000,,regular expression debugger which we'll Dialogue: 0,0:13:49.60,0:13:52.56,Default,,0000,0000,0000,,look at in a little bit but first let's Dialogue: 0,0:13:51.70,0:13:55.50,Default,,0000,0000,0000,,try to make up a Dialogue: 0,0:13:52.56,0:13:57.30,Default,,0000,0000,0000,,pattern that will match the logs and and Dialogue: 0,0:13:55.50,0:14:00.39,Default,,0000,0000,0000,,match the logs that we've been working Dialogue: 0,0:13:57.30,0:14:02.07,Default,,0000,0000,0000,,with so far so here I'm gonna just sort Dialogue: 0,0:14:00.39,0:14:04.68,Default,,0000,0000,0000,,of extract a couple of lines from this Dialogue: 0,0:14:02.07,0:14:08.91,Default,,0000,0000,0000,,file let's say the first five so these Dialogue: 0,0:14:04.68,0:14:12.30,Default,,0000,0000,0000,,lines all now look like this right and Dialogue: 0,0:14:08.91,0:14:15.36,Default,,0000,0000,0000,,what we want to do is we want to only Dialogue: 0,0:14:12.30,0:14:21.21,Default,,0000,0000,0000,,have the user name okay so what might Dialogue: 0,0:14:15.36,0:14:30.12,Default,,0000,0000,0000,,this look like well here's one thing we Dialogue: 0,0:14:21.21,0:14:32.67,Default,,0000,0000,0000,,could try to do actually let me show you Dialogue: 0,0:14:30.12,0:14:34.37,Default,,0000,0000,0000,,one except one thing first let me take a Dialogue: 0,0:14:32.67,0:14:38.99,Default,,0000,0000,0000,,line that says something like Dialogue: 0,0:14:34.37,0:14:44.28,Default,,0000,0000,0000,,disconnected from invalid user Dialogue: 0,0:14:38.99,0:14:46.62,Default,,0000,0000,0000,,disconnected from maybe four to one one Dialogue: 0,0:14:44.28,0:14:49.74,Default,,0000,0000,0000,,whatever okay so this is an example of a Dialogue: 0,0:14:46.62,0:14:54.20,Default,,0000,0000,0000,,login line where someone tried to login Dialogue: 0,0:14:49.74,0:14:54.20,Default,,0000,0000,0000,,with the username disconnected from Dialogue: 0,0:14:54.50,0:15:05.40,Default,,0000,0000,0000,,missing an S disconnected thank you Dialogue: 0,0:15:03.20,0:15:08.31,Default,,0000,0000,0000,,you'll notice that this actually removed Dialogue: 0,0:15:05.40,0:15:10.77,Default,,0000,0000,0000,,the username as well and this is because Dialogue: 0,0:15:08.31,0:15:11.94,Default,,0000,0000,0000,,when you use dot star and any of these Dialogue: 0,0:15:10.77,0:15:14.49,Default,,0000,0000,0000,,sort of range expressions indirect Dialogue: 0,0:15:11.94,0:15:17.07,Default,,0000,0000,0000,,expressions they are greedy they will Dialogue: 0,0:15:14.49,0:15:19.89,Default,,0000,0000,0000,,match as much as they can so in this Dialogue: 0,0:15:17.07,0:15:22.13,Default,,0000,0000,0000,,case this was the username that we Dialogue: 0,0:15:19.89,0:15:24.93,Default,,0000,0000,0000,,wanted to retain but this pattern Dialogue: 0,0:15:22.13,0:15:27.06,Default,,0000,0000,0000,,actually matched all the way up until Dialogue: 0,0:15:24.93,0:15:28.62,Default,,0000,0000,0000,,the second occurrence of it or the last Dialogue: 0,0:15:27.06,0:15:30.96,Default,,0000,0000,0000,,occurrence of it and so everything Dialogue: 0,0:15:28.62,0:15:33.00,Default,,0000,0000,0000,,before it including the username itself Dialogue: 0,0:15:30.96,0:15:34.47,Default,,0000,0000,0000,,got removed and so we need to come up Dialogue: 0,0:15:33.00,0:15:36.15,Default,,0000,0000,0000,,with a slightly clever or matching Dialogue: 0,0:15:34.47,0:15:38.19,Default,,0000,0000,0000,,strategy than just saying sort of dot Dialogue: 0,0:15:36.15,0:15:39.96,Default,,0000,0000,0000,,star because it means that if we have Dialogue: 0,0:15:38.19,0:15:41.34,Default,,0000,0000,0000,,particularly adversarial input we might Dialogue: 0,0:15:39.96,0:15:44.43,Default,,0000,0000,0000,,end up with something that we didn't Dialogue: 0,0:15:41.34,0:15:47.67,Default,,0000,0000,0000,,expect okay so let's see how we might Dialogue: 0,0:15:44.43,0:15:56.85,Default,,0000,0000,0000,,try to match these lines let's just do a Dialogue: 0,0:15:47.67,0:16:00.66,Default,,0000,0000,0000,,head first well let's try to construct Dialogue: 0,0:15:56.85,0:16:02.97,Default,,0000,0000,0000,,this up from the beginning we first of Dialogue: 0,0:16:00.66,0:16:05.19,Default,,0000,0000,0000,,all know that we want - capital e right Dialogue: 0,0:16:02.97,0:16:07.17,Default,,0000,0000,0000,,because we want to not have to put all Dialogue: 0,0:16:05.19,0:16:09.84,Default,,0000,0000,0000,,these back slashes everywhere Dialogue: 0,0:16:07.17,0:16:14.88,Default,,0000,0000,0000,,these lines look like they say from and Dialogue: 0,0:16:09.84,0:16:16.77,Default,,0000,0000,0000,,then some of them say invalid but some Dialogue: 0,0:16:14.88,0:16:19.17,Default,,0000,0000,0000,,of them do not right this line has Dialogue: 0,0:16:16.77,0:16:21.69,Default,,0000,0000,0000,,invalid that one does not question mark Dialogue: 0,0:16:19.17,0:16:26.03,Default,,0000,0000,0000,,here is saying zero or one so I want Dialogue: 0,0:16:21.69,0:16:31.32,Default,,0000,0000,0000,,zero or zero or one of invalid space Dialogue: 0,0:16:26.03,0:16:34.32,Default,,0000,0000,0000,,user what else well that's going to be a Dialogue: 0,0:16:31.32,0:16:36.53,Default,,0000,0000,0000,,double space so we can't have that and Dialogue: 0,0:16:34.32,0:16:40.44,Default,,0000,0000,0000,,then there's gonna be some username and Dialogue: 0,0:16:36.53,0:16:43.16,Default,,0000,0000,0000,,then there's gonna be what exactly is Dialogue: 0,0:16:40.44,0:16:46.29,Default,,0000,0000,0000,,gonna be what looks like an IP address Dialogue: 0,0:16:43.16,0:16:50.19,Default,,0000,0000,0000,,so here we can use our range syntax and Dialogue: 0,0:16:46.29,0:16:53.49,Default,,0000,0000,0000,,say zero to nine and a dot right that's Dialogue: 0,0:16:50.19,0:16:58.17,Default,,0000,0000,0000,,what IP addresses are and we want many Dialogue: 0,0:16:53.49,0:17:00.30,Default,,0000,0000,0000,,of those then it says porch so we're Dialogue: 0,0:16:58.17,0:17:03.06,Default,,0000,0000,0000,,just going to match a literal port and Dialogue: 0,0:17:00.30,0:17:07.98,Default,,0000,0000,0000,,then another number zero to nine and Dialogue: 0,0:17:03.06,0:17:09.15,Default,,0000,0000,0000,,we're going to wand plus of that the Dialogue: 0,0:17:07.98,0:17:10.05,Default,,0000,0000,0000,,other thing we're going to do here is Dialogue: 0,0:17:09.15,0:17:11.88,Default,,0000,0000,0000,,we're going to do what's known as Dialogue: 0,0:17:10.05,0:17:13.44,Default,,0000,0000,0000,,anchoring the regular expression so Dialogue: 0,0:17:11.88,0:17:15.78,Default,,0000,0000,0000,,there are two special characters and Dialogue: 0,0:17:13.44,0:17:17.70,Default,,0000,0000,0000,,regular expressions there's carrot or Dialogue: 0,0:17:15.78,0:17:19.80,Default,,0000,0000,0000,,hat which matches the beginning of a Dialogue: 0,0:17:17.70,0:17:22.44,Default,,0000,0000,0000,,line and there's dollar which matches Dialogue: 0,0:17:19.80,0:17:24.84,Default,,0000,0000,0000,,the end of a line so here we're gonna Dialogue: 0,0:17:22.44,0:17:27.99,Default,,0000,0000,0000,,say that this regression has to match Dialogue: 0,0:17:24.84,0:17:29.76,Default,,0000,0000,0000,,the complete line the reason we do this Dialogue: 0,0:17:27.99,0:17:33.29,Default,,0000,0000,0000,,is because imagine that someone made Dialogue: 0,0:17:29.76,0:17:35.25,Default,,0000,0000,0000,,their username the entire log string Dialogue: 0,0:17:33.29,0:17:38.46,Default,,0000,0000,0000,,then now if you try to match this Dialogue: 0,0:17:35.25,0:17:40.73,Default,,0000,0000,0000,,pattern it would match the username Dialogue: 0,0:17:38.46,0:17:42.99,Default,,0000,0000,0000,,itself which is not what we want Dialogue: 0,0:17:40.73,0:17:44.49,Default,,0000,0000,0000,,generally you will want to try to anchor Dialogue: 0,0:17:42.99,0:17:46.86,Default,,0000,0000,0000,,your patterns wherever you can to avoid Dialogue: 0,0:17:44.49,0:17:49.92,Default,,0000,0000,0000,,those kind of oddities okay let's see Dialogue: 0,0:17:46.86,0:17:51.96,Default,,0000,0000,0000,,what that gave us that removed many of Dialogue: 0,0:17:49.92,0:17:54.36,Default,,0000,0000,0000,,the lines but not all of them so this Dialogue: 0,0:17:51.96,0:17:56.88,Default,,0000,0000,0000,,one for example includes this pre off at Dialogue: 0,0:17:54.36,0:18:02.76,Default,,0000,0000,0000,,the end so we'll want to cut that off if Dialogue: 0,0:17:56.88,0:18:04.55,Default,,0000,0000,0000,,there's a space pre off square brackets Dialogue: 0,0:18:02.76,0:18:07.35,Default,,0000,0000,0000,,our specials we need to escape them Dialogue: 0,0:18:04.55,0:18:10.65,Default,,0000,0000,0000,,right now let's see what happens if we Dialogue: 0,0:18:07.35,0:18:12.36,Default,,0000,0000,0000,,try more lines of this no it still gets Dialogue: 0,0:18:10.65,0:18:13.71,Default,,0000,0000,0000,,something weird some of these lines are Dialogue: 0,0:18:12.36,0:18:16.74,Default,,0000,0000,0000,,not empty right which means that the Dialogue: 0,0:18:13.71,0:18:18.99,Default,,0000,0000,0000,,pattern did not match this one for Dialogue: 0,0:18:16.74,0:18:20.01,Default,,0000,0000,0000,,example it says authenticating user Dialogue: 0,0:18:18.99,0:18:24.69,Default,,0000,0000,0000,,instead of invalid Dialogue: 0,0:18:20.01,0:18:27.30,Default,,0000,0000,0000,,user okay so as to match invalid or Dialogue: 0,0:18:24.69,0:18:30.90,Default,,0000,0000,0000,,authenticated zero or one time before Dialogue: 0,0:18:27.30,0:18:34.53,Default,,0000,0000,0000,,user how about now okay that looks Dialogue: 0,0:18:30.90,0:18:36.99,Default,,0000,0000,0000,,pretty promising but this output is not Dialogue: 0,0:18:34.53,0:18:38.88,Default,,0000,0000,0000,,particularly helpful right here we've Dialogue: 0,0:18:36.99,0:18:41.36,Default,,0000,0000,0000,,just erased every line of our log files Dialogue: 0,0:18:38.88,0:18:43.89,Default,,0000,0000,0000,,successfully which is not very helpful Dialogue: 0,0:18:41.36,0:18:46.11,Default,,0000,0000,0000,,instead what we really wanted to do is Dialogue: 0,0:18:43.89,0:18:48.78,Default,,0000,0000,0000,,when we match the username right over Dialogue: 0,0:18:46.11,0:18:50.31,Default,,0000,0000,0000,,here we really wanted to remember what Dialogue: 0,0:18:48.78,0:18:53.31,Default,,0000,0000,0000,,that username was because that is what Dialogue: 0,0:18:50.31,0:18:55.77,Default,,0000,0000,0000,,we want to print out and the way we can Dialogue: 0,0:18:53.31,0:19:00.30,Default,,0000,0000,0000,,do that in regular expressions is using Dialogue: 0,0:18:55.77,0:19:03.63,Default,,0000,0000,0000,,something like capture groups so capture Dialogue: 0,0:19:00.30,0:19:06.57,Default,,0000,0000,0000,,groups are a way to say that I want to Dialogue: 0,0:19:03.63,0:19:10.35,Default,,0000,0000,0000,,remember this value and reuse it later Dialogue: 0,0:19:06.57,0:19:12.18,Default,,0000,0000,0000,,and in regular expressions any bracketed Dialogue: 0,0:19:10.35,0:19:14.46,Default,,0000,0000,0000,,expression any parenthesis expression is Dialogue: 0,0:19:12.18,0:19:16.77,Default,,0000,0000,0000,,going to be such a capture group so we Dialogue: 0,0:19:14.46,0:19:18.57,Default,,0000,0000,0000,,already actually have one here which is Dialogue: 0,0:19:16.77,0:19:20.85,Default,,0000,0000,0000,,this first group and now we're creating Dialogue: 0,0:19:18.57,0:19:22.59,Default,,0000,0000,0000,,a second one here notice that these Dialogue: 0,0:19:20.85,0:19:24.87,Default,,0000,0000,0000,,parentheses don't do anything to the Dialogue: 0,0:19:22.59,0:19:27.21,Default,,0000,0000,0000,,matching right because they're just Dialogue: 0,0:19:24.87,0:19:28.80,Default,,0000,0000,0000,,saying this expression as a unit but we Dialogue: 0,0:19:27.21,0:19:32.55,Default,,0000,0000,0000,,don't have any modifiers after it so Dialogue: 0,0:19:28.80,0:19:34.98,Default,,0000,0000,0000,,it's just match one-time and then the Dialogue: 0,0:19:32.55,0:19:36.81,Default,,0000,0000,0000,,reason matching groups are are useful or Dialogue: 0,0:19:34.98,0:19:38.37,Default,,0000,0000,0000,,capture groups are useful is because you Dialogue: 0,0:19:36.81,0:19:40.92,Default,,0000,0000,0000,,can refer back to them in the Dialogue: 0,0:19:38.37,0:19:43.80,Default,,0000,0000,0000,,replacement so in the replacement here I Dialogue: 0,0:19:40.92,0:19:45.63,Default,,0000,0000,0000,,can say backslash two this is the way Dialogue: 0,0:19:43.80,0:19:47.76,Default,,0000,0000,0000,,that you refer to the name of a capture Dialogue: 0,0:19:45.63,0:19:50.25,Default,,0000,0000,0000,,group in this say I'm in this case I'm Dialogue: 0,0:19:47.76,0:19:53.34,Default,,0000,0000,0000,,saying match the entire line and then in Dialogue: 0,0:19:50.25,0:19:55.38,Default,,0000,0000,0000,,the replacement put in the value you Dialogue: 0,0:19:53.34,0:19:57.33,Default,,0000,0000,0000,,captured in the second capture group Dialogue: 0,0:19:55.38,0:20:00.02,Default,,0000,0000,0000,,right remember this is the first capture Dialogue: 0,0:19:57.33,0:20:03.33,Default,,0000,0000,0000,,group and this is the second one and Dialogue: 0,0:20:00.02,0:20:05.67,Default,,0000,0000,0000,,this gives me all the usernames now if Dialogue: 0,0:20:03.33,0:20:08.58,Default,,0000,0000,0000,,you look back at what we wrote this is Dialogue: 0,0:20:05.67,0:20:10.05,Default,,0000,0000,0000,,pretty complicated right it might make Dialogue: 0,0:20:08.58,0:20:12.00,Default,,0000,0000,0000,,sense now that we walk through it and Dialogue: 0,0:20:10.05,0:20:14.13,Default,,0000,0000,0000,,why it had to be the way it was but this Dialogue: 0,0:20:12.00,0:20:16.14,Default,,0000,0000,0000,,is like not obvious that this is how Dialogue: 0,0:20:14.13,0:20:19.68,Default,,0000,0000,0000,,these lines work and this is where a Dialogue: 0,0:20:16.14,0:20:22.26,Default,,0000,0000,0000,,regular expression debugger can come in Dialogue: 0,0:20:19.68,0:20:25.41,Default,,0000,0000,0000,,really really handy so we have one here Dialogue: 0,0:20:22.26,0:20:27.51,Default,,0000,0000,0000,,there are many online but here I've sort Dialogue: 0,0:20:25.41,0:20:31.71,Default,,0000,0000,0000,,of pre filled in this expression that we Dialogue: 0,0:20:27.51,0:20:34.38,Default,,0000,0000,0000,,just used and notice that it it tells me Dialogue: 0,0:20:31.71,0:20:37.47,Default,,0000,0000,0000,,all the matching does in fact now this Dialogue: 0,0:20:34.38,0:20:42.95,Default,,0000,0000,0000,,window is a little small with this font Dialogue: 0,0:20:37.47,0:20:45.62,Default,,0000,0000,0000,,size but if I do hear this explanation Dialogue: 0,0:20:42.95,0:20:48.32,Default,,0000,0000,0000,,says dot star matches any character Dialogue: 0,0:20:45.62,0:20:52.17,Default,,0000,0000,0000,,between zero and unlimited times Dialogue: 0,0:20:48.32,0:20:54.27,Default,,0000,0000,0000,,followed by disconnected from literally Dialogue: 0,0:20:52.17,0:20:56.79,Default,,0000,0000,0000,,followed by a capture group and then Dialogue: 0,0:20:54.27,0:20:59.19,Default,,0000,0000,0000,,walks you through all the stuff and Dialogue: 0,0:20:56.79,0:21:00.96,Default,,0000,0000,0000,,that's one thing but it also lets you've Dialogue: 0,0:20:59.19,0:21:03.51,Default,,0000,0000,0000,,given a test string and then matches the Dialogue: 0,0:21:00.96,0:21:05.37,Default,,0000,0000,0000,,pattern against every single test string Dialogue: 0,0:21:03.51,0:21:07.46,Default,,0000,0000,0000,,that you give and highlights what the Dialogue: 0,0:21:05.37,0:21:11.49,Default,,0000,0000,0000,,different capture groups for example are Dialogue: 0,0:21:07.46,0:21:15.06,Default,,0000,0000,0000,,so here we made user a capture group Dialogue: 0,0:21:11.49,0:21:16.98,Default,,0000,0000,0000,,right so it'll say okay the full string Dialogue: 0,0:21:15.06,0:21:19.11,Default,,0000,0000,0000,,matched right the whole thing is blue so Dialogue: 0,0:21:16.98,0:21:21.18,Default,,0000,0000,0000,,it matched Green is the first capture Dialogue: 0,0:21:19.11,0:21:23.37,Default,,0000,0000,0000,,group red is the second capture group Dialogue: 0,0:21:21.18,0:21:26.13,Default,,0000,0000,0000,,and this is the third because preauth Dialogue: 0,0:21:23.37,0:21:27.75,Default,,0000,0000,0000,,was also put into parenthesis and this Dialogue: 0,0:21:26.13,0:21:31.02,Default,,0000,0000,0000,,can be a handy way to try to debug your Dialogue: 0,0:21:27.75,0:21:35.61,Default,,0000,0000,0000,,regular expressions for example if I put Dialogue: 0,0:21:31.02,0:21:41.07,Default,,0000,0000,0000,,disconnected from and let's add a new Dialogue: 0,0:21:35.61,0:21:45.24,Default,,0000,0000,0000,,line here and I make the username Dialogue: 0,0:21:41.07,0:21:46.53,Default,,0000,0000,0000,,disconnected from now that line already Dialogue: 0,0:21:45.24,0:21:49.95,Default,,0000,0000,0000,,had the username be disconnect from Dialogue: 0,0:21:46.53,0:21:54.15,Default,,0000,0000,0000,,great here me of thinking ahead you'll Dialogue: 0,0:21:49.95,0:21:56.01,Default,,0000,0000,0000,,notice that with this pattern this was Dialogue: 0,0:21:54.15,0:21:58.74,Default,,0000,0000,0000,,no longer a problem because it got Dialogue: 0,0:21:56.01,0:22:02.58,Default,,0000,0000,0000,,matched the username what happens if we Dialogue: 0,0:21:58.74,0:22:07.17,Default,,0000,0000,0000,,take this entire line or this entire Dialogue: 0,0:22:02.58,0:22:13.83,Default,,0000,0000,0000,,line and make that the username now what Dialogue: 0,0:22:07.17,0:22:15.18,Default,,0000,0000,0000,,happens it gets really confused right so Dialogue: 0,0:22:13.83,0:22:18.39,Default,,0000,0000,0000,,this is where regular expressions can be Dialogue: 0,0:22:15.18,0:22:21.78,Default,,0000,0000,0000,,a pain to get right because it now tries Dialogue: 0,0:22:18.39,0:22:23.97,Default,,0000,0000,0000,,to match it matches the first place Dialogue: 0,0:22:21.78,0:22:27.42,Default,,0000,0000,0000,,where username appears or the first Dialogue: 0,0:22:23.97,0:22:29.70,Default,,0000,0000,0000,,invalid in this case the second invalid Dialogue: 0,0:22:27.42,0:22:31.83,Default,,0000,0000,0000,,because this is greedy we can make this Dialogue: 0,0:22:29.70,0:22:36.36,Default,,0000,0000,0000,,non greedy by putting a question mark Dialogue: 0,0:22:31.83,0:22:38.52,Default,,0000,0000,0000,,here so if you suffix a plus or a star Dialogue: 0,0:22:36.36,0:22:40.86,Default,,0000,0000,0000,,with a question mark it becomes a non Dialogue: 0,0:22:38.52,0:22:42.54,Default,,0000,0000,0000,,greedy match so it will not try to match Dialogue: 0,0:22:40.86,0:22:43.82,Default,,0000,0000,0000,,as much as possible and then you see Dialogue: 0,0:22:42.54,0:22:46.03,Default,,0000,0000,0000,,that this actually gets parsed correctly Dialogue: 0,0:22:43.82,0:22:47.95,Default,,0000,0000,0000,,because this dots Dialogue: 0,0:22:46.03,0:22:49.48,Default,,0000,0000,0000,,we'll stop at the first disconnected Dialogue: 0,0:22:47.95,0:22:52.45,Default,,0000,0000,0000,,from which is the one that's actually Dialogue: 0,0:22:49.48,0:22:57.07,Default,,0000,0000,0000,,emitted by SSH the one that actually Dialogue: 0,0:22:52.45,0:22:58.72,Default,,0000,0000,0000,,appears in our logs as you can probably Dialogue: 0,0:22:57.07,0:23:00.79,Default,,0000,0000,0000,,tell from the explanation of this so far Dialogue: 0,0:22:58.72,0:23:03.13,Default,,0000,0000,0000,,regular expressions can get really Dialogue: 0,0:23:00.79,0:23:05.32,Default,,0000,0000,0000,,complicated and there are all sorts of Dialogue: 0,0:23:03.13,0:23:07.33,Default,,0000,0000,0000,,weird modifiers that you might have to Dialogue: 0,0:23:05.32,0:23:09.13,Default,,0000,0000,0000,,apply in your pattern the only way to Dialogue: 0,0:23:07.33,0:23:10.75,Default,,0000,0000,0000,,really learn them is to start with Dialogue: 0,0:23:09.13,0:23:12.97,Default,,0000,0000,0000,,simple ones and then build them up until Dialogue: 0,0:23:10.75,0:23:14.86,Default,,0000,0000,0000,,they match what you need often you're Dialogue: 0,0:23:12.97,0:23:16.15,Default,,0000,0000,0000,,just doing some like one-off job like Dialogue: 0,0:23:14.86,0:23:17.77,Default,,0000,0000,0000,,when we're hacking out the user names Dialogue: 0,0:23:16.15,0:23:19.87,Default,,0000,0000,0000,,here and you don't need to care about Dialogue: 0,0:23:17.77,0:23:21.61,Default,,0000,0000,0000,,all the special conditions right you Dialogue: 0,0:23:19.87,0:23:24.19,Default,,0000,0000,0000,,don't have to care about someone having Dialogue: 0,0:23:21.61,0:23:26.02,Default,,0000,0000,0000,,the SSH username perfectly match your Dialogue: 0,0:23:24.19,0:23:27.43,Default,,0000,0000,0000,,login format that's probably not Dialogue: 0,0:23:26.02,0:23:29.44,Default,,0000,0000,0000,,something that matters because you're Dialogue: 0,0:23:27.43,0:23:30.73,Default,,0000,0000,0000,,just trying to find the usernames but Dialogue: 0,0:23:29.44,0:23:32.71,Default,,0000,0000,0000,,regular expressions are really powerful Dialogue: 0,0:23:30.73,0:23:33.73,Default,,0000,0000,0000,,and you want to be careful if you're Dialogue: 0,0:23:32.71,0:23:36.87,Default,,0000,0000,0000,,doing something where it actually Dialogue: 0,0:23:33.73,0:23:36.87,Default,,0000,0000,0000,,matters you had a question Dialogue: 0,0:23:41.38,0:23:47.56,Default,,0000,0000,0000,,regular expressions by default only Dialogue: 0,0:23:43.51,0:23:58.63,Default,,0000,0000,0000,,match per line anyway they will not Dialogue: 0,0:23:47.56,0:24:01.21,Default,,0000,0000,0000,,match across new lines so so the way Dialogue: 0,0:23:58.63,0:24:04.68,Default,,0000,0000,0000,,that said works is that it operates per Dialogue: 0,0:24:01.21,0:24:10.39,Default,,0000,0000,0000,,line and so said we'll do this Dialogue: 0,0:24:04.68,0:24:12.25,Default,,0000,0000,0000,,expression for every line okay questions Dialogue: 0,0:24:10.39,0:24:14.41,Default,,0000,0000,0000,,about regular sessions or this pattern Dialogue: 0,0:24:12.25,0:24:16.39,Default,,0000,0000,0000,,so far it is a complicated pattern so if Dialogue: 0,0:24:14.41,0:24:17.56,Default,,0000,0000,0000,,it if it feels confusing like don't be Dialogue: 0,0:24:16.39,0:24:31.45,Default,,0000,0000,0000,,worried about it look at it in the Dialogue: 0,0:24:17.56,0:24:33.55,Default,,0000,0000,0000,,debugger later yep so so keep in mind Dialogue: 0,0:24:31.45,0:24:36.13,Default,,0000,0000,0000,,that the we're assuming here that the Dialogue: 0,0:24:33.55,0:24:38.59,Default,,0000,0000,0000,,user only has control over their Dialogue: 0,0:24:36.13,0:24:41.80,Default,,0000,0000,0000,,username right so the worst that they Dialogue: 0,0:24:38.59,0:24:43.51,Default,,0000,0000,0000,,could do is take like this entire entry Dialogue: 0,0:24:41.80,0:24:48.49,Default,,0000,0000,0000,,and make that the username let's see Dialogue: 0,0:24:43.51,0:24:51.49,Default,,0000,0000,0000,,what happens right so that's the works Dialogue: 0,0:24:48.49,0:24:53.71,Default,,0000,0000,0000,,and the reason for this is this question Dialogue: 0,0:24:51.49,0:24:56.20,Default,,0000,0000,0000,,mark means that the moment we hit the Dialogue: 0,0:24:53.71,0:24:58.82,Default,,0000,0000,0000,,disconnect keyword we start parsing the Dialogue: 0,0:24:56.20,0:25:00.77,Default,,0000,0000,0000,,rest of the pattern right and the Dialogue: 0,0:24:58.82,0:25:03.20,Default,,0000,0000,0000,,first occurrence of disconnected is Dialogue: 0,0:25:00.77,0:25:05.72,Default,,0000,0000,0000,,printed by SSH before anything the user Dialogue: 0,0:25:03.20,0:25:08.21,Default,,0000,0000,0000,,controls so in this particular instance Dialogue: 0,0:25:05.72,0:25:21.05,Default,,0000,0000,0000,,even this will not confuse the pattern Dialogue: 0,0:25:08.21,0:25:24.92,Default,,0000,0000,0000,,yep if well so if you're writing a this Dialogue: 0,0:25:21.05,0:25:26.15,Default,,0000,0000,0000,,sort of odd matching will in general Dialogue: 0,0:25:24.92,0:25:29.12,Default,,0000,0000,0000,,when you're doing data wrangling is like Dialogue: 0,0:25:26.15,0:25:31.37,Default,,0000,0000,0000,,not security it's not security related Dialogue: 0,0:25:29.12,0:25:33.89,Default,,0000,0000,0000,,but it might mean that you get really Dialogue: 0,0:25:31.37,0:25:35.30,Default,,0000,0000,0000,,weird data back and so if you're doing Dialogue: 0,0:25:33.89,0:25:37.40,Default,,0000,0000,0000,,something like plotting data you might Dialogue: 0,0:25:35.30,0:25:39.56,Default,,0000,0000,0000,,drop data points that matter you might Dialogue: 0,0:25:37.40,0:25:41.45,Default,,0000,0000,0000,,parse out the wrong number and then like Dialogue: 0,0:25:39.56,0:25:43.37,Default,,0000,0000,0000,,your plot suddenly have data points that Dialogue: 0,0:25:41.45,0:25:45.56,Default,,0000,0000,0000,,weren't in the original data and so it's Dialogue: 0,0:25:43.37,0:25:47.42,Default,,0000,0000,0000,,more that if you find yourself writing a Dialogue: 0,0:25:45.56,0:25:49.07,Default,,0000,0000,0000,,complicated regular expression like Dialogue: 0,0:25:47.42,0:25:51.71,Default,,0000,0000,0000,,double check that it's actually matching Dialogue: 0,0:25:49.07,0:25:56.57,Default,,0000,0000,0000,,what you think it's matching and even if Dialogue: 0,0:25:51.71,0:25:58.22,Default,,0000,0000,0000,,it's not security related and as you can Dialogue: 0,0:25:56.57,0:26:00.95,Default,,0000,0000,0000,,imagine these patterns can get really Dialogue: 0,0:25:58.22,0:26:02.81,Default,,0000,0000,0000,,complicated like for example there's a Dialogue: 0,0:26:00.95,0:26:04.21,Default,,0000,0000,0000,,big debate about how do you match an Dialogue: 0,0:26:02.81,0:26:06.23,Default,,0000,0000,0000,,email address with a regular expression Dialogue: 0,0:26:04.21,0:26:08.87,Default,,0000,0000,0000,,and you might think of something like Dialogue: 0,0:26:06.23,0:26:10.85,Default,,0000,0000,0000,,this so this is a very straightforward Dialogue: 0,0:26:08.87,0:26:13.91,Default,,0000,0000,0000,,one that just says letters and numbers Dialogue: 0,0:26:10.85,0:26:15.62,Default,,0000,0000,0000,,and rotor scores some percent followed Dialogue: 0,0:26:13.91,0:26:17.80,Default,,0000,0000,0000,,by a plus because in Gmail you can have Dialogue: 0,0:26:15.62,0:26:22.10,Default,,0000,0000,0000,,pluses in email addresses with a suffix Dialogue: 0,0:26:17.80,0:26:24.62,Default,,0000,0000,0000,,in this case the plus is just for any Dialogue: 0,0:26:22.10,0:26:25.73,Default,,0000,0000,0000,,number of these but at least one because Dialogue: 0,0:26:24.62,0:26:26.93,Default,,0000,0000,0000,,you can't have an email address that Dialogue: 0,0:26:25.73,0:26:29.27,Default,,0000,0000,0000,,doesn't have anything before the ad and Dialogue: 0,0:26:26.93,0:26:31.79,Default,,0000,0000,0000,,then similarly after the domain right Dialogue: 0,0:26:29.27,0:26:33.14,Default,,0000,0000,0000,,and the top-level domain has to be at Dialogue: 0,0:26:31.79,0:26:35.06,Default,,0000,0000,0000,,least two characters and can't include Dialogue: 0,0:26:33.14,0:26:38.00,Default,,0000,0000,0000,,digits right you can have it calm but Dialogue: 0,0:26:35.06,0:26:40.04,Default,,0000,0000,0000,,you can't have adopt seven it turns out Dialogue: 0,0:26:38.00,0:26:42.14,Default,,0000,0000,0000,,this is not really correct right there Dialogue: 0,0:26:40.04,0:26:43.22,Default,,0000,0000,0000,,are a bunch of valid email addresses Dialogue: 0,0:26:42.14,0:26:44.36,Default,,0000,0000,0000,,that will not be matched by this and Dialogue: 0,0:26:43.22,0:26:45.56,Default,,0000,0000,0000,,they're a bunch of invalid email Dialogue: 0,0:26:44.36,0:26:50.63,Default,,0000,0000,0000,,addresses that will be matched by this Dialogue: 0,0:26:45.56,0:26:52.40,Default,,0000,0000,0000,,so there are many many suggestions and Dialogue: 0,0:26:50.63,0:26:54.53,Default,,0000,0000,0000,,there are people who've built like full Dialogue: 0,0:26:52.40,0:26:58.46,Default,,0000,0000,0000,,test suites to try to see which regular Dialogue: 0,0:26:54.53,0:27:00.89,Default,,0000,0000,0000,,expression is best and this is this Dialogue: 0,0:26:58.46,0:27:02.90,Default,,0000,0000,0000,,particular one is for URLs there are Dialogue: 0,0:27:00.89,0:27:06.47,Default,,0000,0000,0000,,similar ones for email where they found Dialogue: 0,0:27:02.90,0:27:07.91,Default,,0000,0000,0000,,that the best one is this one I don't Dialogue: 0,0:27:06.47,0:27:10.79,Default,,0000,0000,0000,,recommend you trying to understand this Dialogue: 0,0:27:07.91,0:27:13.72,Default,,0000,0000,0000,,pattern but this one apparently will all Dialogue: 0,0:27:10.79,0:27:15.83,Default,,0000,0000,0000,,most perfectly match the what the like Dialogue: 0,0:27:13.72,0:27:17.84,Default,,0000,0000,0000,,internet standard for email addresses Dialogue: 0,0:27:15.83,0:27:20.00,Default,,0000,0000,0000,,says as a valid email address and that Dialogue: 0,0:27:17.84,0:27:22.25,Default,,0000,0000,0000,,includes all sorts of weird Unicode code Dialogue: 0,0:27:20.00,0:27:24.44,Default,,0000,0000,0000,,points this is just to say regular Dialogue: 0,0:27:22.25,0:27:26.06,Default,,0000,0000,0000,,expressions can be really hairy and if Dialogue: 0,0:27:24.44,0:27:28.88,Default,,0000,0000,0000,,you end up somewhere like this there's Dialogue: 0,0:27:26.06,0:27:30.62,Default,,0000,0000,0000,,probably a better way to do it for Dialogue: 0,0:27:28.88,0:27:35.32,Default,,0000,0000,0000,,example if you find yourself trying to Dialogue: 0,0:27:30.62,0:27:38.30,Default,,0000,0000,0000,,parse HTML or something or parse like Dialogue: 0,0:27:35.32,0:27:40.31,Default,,0000,0000,0000,,parse JSON where they're expressions you Dialogue: 0,0:27:38.30,0:27:42.23,Default,,0000,0000,0000,,should probably use a different tool and Dialogue: 0,0:27:40.31,0:27:44.48,Default,,0000,0000,0000,,there is an exercise that has you do Dialogue: 0,0:27:42.23,0:27:49.96,Default,,0000,0000,0000,,this not with the regular sessions point Dialogue: 0,0:27:44.48,0:27:53.18,Default,,0000,0000,0000,,you yeah that it's there's all sorts of Dialogue: 0,0:27:49.96,0:27:54.74,Default,,0000,0000,0000,,suggestions and they give you deep deep Dialogue: 0,0:27:53.18,0:27:56.66,Default,,0000,0000,0000,,dives into how they works if you want to Dialogue: 0,0:27:54.74,0:28:01.67,Default,,0000,0000,0000,,look that up it's it's in the lecture Dialogue: 0,0:27:56.66,0:28:04.28,Default,,0000,0000,0000,,notes okay so now we have the sister of Dialogue: 0,0:28:01.67,0:28:05.96,Default,,0000,0000,0000,,user names so let's go back to data Dialogue: 0,0:28:04.28,0:28:08.21,Default,,0000,0000,0000,,wrangling right like this list of user Dialogue: 0,0:28:05.96,0:28:10.25,Default,,0000,0000,0000,,names is still not that interesting to Dialogue: 0,0:28:08.21,0:28:15.79,Default,,0000,0000,0000,,me right let's let's see how many lines Dialogue: 0,0:28:10.25,0:28:15.79,Default,,0000,0000,0000,,there are so if I do WC - oh there are Dialogue: 0,0:28:15.91,0:28:21.47,Default,,0000,0000,0000,,one hundred and ninety eight thousand Dialogue: 0,0:28:18.32,0:28:23.26,Default,,0000,0000,0000,,lines so WC is the word count program - Dialogue: 0,0:28:21.47,0:28:26.03,Default,,0000,0000,0000,,L makes it count the number of lines Dialogue: 0,0:28:23.26,0:28:27.53,Default,,0000,0000,0000,,this is a lot of lines then if I start Dialogue: 0,0:28:26.03,0:28:29.69,Default,,0000,0000,0000,,scrolling through them that still Dialogue: 0,0:28:27.53,0:28:31.73,Default,,0000,0000,0000,,doesn't really help me right like I need Dialogue: 0,0:28:29.69,0:28:37.13,Default,,0000,0000,0000,,statistics over this I need aggregates Dialogue: 0,0:28:31.73,0:28:38.45,Default,,0000,0000,0000,,of some kind and the send tool is like Dialogue: 0,0:28:37.13,0:28:40.10,Default,,0000,0000,0000,,useful for many things it gives you a Dialogue: 0,0:28:38.45,0:28:43.01,Default,,0000,0000,0000,,full programming language it can do Dialogue: 0,0:28:40.10,0:28:45.02,Default,,0000,0000,0000,,weird things like insert text or only Dialogue: 0,0:28:43.01,0:28:46.40,Default,,0000,0000,0000,,print matching lines but it's not Dialogue: 0,0:28:45.02,0:28:48.56,Default,,0000,0000,0000,,necessarily the perfect tool for Dialogue: 0,0:28:46.40,0:28:50.33,Default,,0000,0000,0000,,everything right like sometimes there Dialogue: 0,0:28:48.56,0:28:53.42,Default,,0000,0000,0000,,are better tools like for example you Dialogue: 0,0:28:50.33,0:28:55.40,Default,,0000,0000,0000,,could write a line counter instead you Dialogue: 0,0:28:53.42,0:28:56.84,Default,,0000,0000,0000,,just should never said it's a terrible Dialogue: 0,0:28:55.40,0:29:00.44,Default,,0000,0000,0000,,programming language except for Dialogue: 0,0:28:56.84,0:29:02.74,Default,,0000,0000,0000,,searching and replacing but there are Dialogue: 0,0:29:00.44,0:29:07.94,Default,,0000,0000,0000,,other useful tools so for example Dialogue: 0,0:29:02.74,0:29:09.71,Default,,0000,0000,0000,,there's a tool called sort so sort this Dialogue: 0,0:29:07.94,0:29:12.08,Default,,0000,0000,0000,,is also not going to be very helpful but Dialogue: 0,0:29:09.71,0:29:13.85,Default,,0000,0000,0000,,sort takes a bunch of lines of input Dialogue: 0,0:29:12.08,0:29:16.94,Default,,0000,0000,0000,,sorts them and then prints them to your Dialogue: 0,0:29:13.85,0:29:19.13,Default,,0000,0000,0000,,output so in this case I now get the Dialogue: 0,0:29:16.94,0:29:20.54,Default,,0000,0000,0000,,sorted output of that list it is still Dialogue: 0,0:29:19.13,0:29:23.84,Default,,0000,0000,0000,,two hundred thousand lines long so it's Dialogue: 0,0:29:20.54,0:29:24.76,Default,,0000,0000,0000,,still not very helpful to me but now I Dialogue: 0,0:29:23.84,0:29:27.34,Default,,0000,0000,0000,,can combine it Dialogue: 0,0:29:24.76,0:29:30.55,Default,,0000,0000,0000,,the tool called unique so unique we'll Dialogue: 0,0:29:27.34,0:29:33.13,Default,,0000,0000,0000,,look at a sorted list of lines and it Dialogue: 0,0:29:30.55,0:29:34.93,Default,,0000,0000,0000,,will only print those that are unique so Dialogue: 0,0:29:33.13,0:29:37.09,Default,,0000,0000,0000,,if you have multiple instances of any Dialogue: 0,0:29:34.93,0:29:40.75,Default,,0000,0000,0000,,given line it will only print it once Dialogue: 0,0:29:37.09,0:29:44.29,Default,,0000,0000,0000,,and then I can say unique - C so this is Dialogue: 0,0:29:40.75,0:29:46.03,Default,,0000,0000,0000,,gonna say count the number of duplicates Dialogue: 0,0:29:44.29,0:29:48.01,Default,,0000,0000,0000,,for any lines that are duplicated and Dialogue: 0,0:29:46.03,0:29:52.00,Default,,0000,0000,0000,,eliminate them what does this look like Dialogue: 0,0:29:48.01,0:29:56.05,Default,,0000,0000,0000,,well if I run it it's gonna take a while Dialogue: 0,0:29:52.00,0:29:59.71,Default,,0000,0000,0000,,there were thirteen zze user names there Dialogue: 0,0:29:56.05,0:30:01.24,Default,,0000,0000,0000,,were ten ZX VF user names etc there and Dialogue: 0,0:29:59.71,0:30:03.46,Default,,0000,0000,0000,,I can scroll through this this is still Dialogue: 0,0:30:01.24,0:30:06.13,Default,,0000,0000,0000,,a very long list right but at least now Dialogue: 0,0:30:03.46,0:30:08.20,Default,,0000,0000,0000,,it's a little bit more collated than it Dialogue: 0,0:30:06.13,0:30:10.77,Default,,0000,0000,0000,,was let's see how many lines I'm dumped Dialogue: 0,0:30:08.20,0:30:10.77,Default,,0000,0000,0000,,in now okay Dialogue: 0,0:30:13.48,0:30:17.38,Default,,0000,0000,0000,,twenty-four thousand lines it's still Dialogue: 0,0:30:15.46,0:30:19.81,Default,,0000,0000,0000,,too much it's not useful information to Dialogue: 0,0:30:17.38,0:30:22.96,Default,,0000,0000,0000,,me but I can keep burning down this with Dialogue: 0,0:30:19.81,0:30:24.73,Default,,0000,0000,0000,,more tools for example what I might care Dialogue: 0,0:30:22.96,0:30:29.05,Default,,0000,0000,0000,,about is which user names have been used Dialogue: 0,0:30:24.73,0:30:31.33,Default,,0000,0000,0000,,the most well I can do sort again and I Dialogue: 0,0:30:29.05,0:30:35.56,Default,,0000,0000,0000,,can say I want a numeric sort on the Dialogue: 0,0:30:31.33,0:30:38.98,Default,,0000,0000,0000,,first column of the input so - n says Dialogue: 0,0:30:35.56,0:30:41.32,Default,,0000,0000,0000,,numeric sort - K lets you select a white Dialogue: 0,0:30:38.98,0:30:43.72,Default,,0000,0000,0000,,space separated column from the input to Dialogue: 0,0:30:41.32,0:30:45.76,Default,,0000,0000,0000,,sort my and the reason I'm giving one Dialogue: 0,0:30:43.72,0:30:47.68,Default,,0000,0000,0000,,comma one here is because I want to Dialogue: 0,0:30:45.76,0:30:49.69,Default,,0000,0000,0000,,start at the first column and stop at Dialogue: 0,0:30:47.68,0:30:52.15,Default,,0000,0000,0000,,the first column alternatively I could Dialogue: 0,0:30:49.69,0:30:54.13,Default,,0000,0000,0000,,say I want you to sort by this list of Dialogue: 0,0:30:52.15,0:30:58.30,Default,,0000,0000,0000,,columns but in this case I just want to Dialogue: 0,0:30:54.13,0:31:01.84,Default,,0000,0000,0000,,sort by that column and then I want only Dialogue: 0,0:30:58.30,0:31:06.72,Default,,0000,0000,0000,,the ten last lines so sort by default Dialogue: 0,0:31:01.84,0:31:08.89,Default,,0000,0000,0000,,will output in ascending order so the Dialogue: 0,0:31:06.72,0:31:10.33,Default,,0000,0000,0000,,the ones with the highest counts are Dialogue: 0,0:31:08.89,0:31:14.56,Default,,0000,0000,0000,,gonna be at the bottom and then I want Dialogue: 0,0:31:10.33,0:31:17.47,Default,,0000,0000,0000,,only lost ten lines and now when I run Dialogue: 0,0:31:14.56,0:31:20.59,Default,,0000,0000,0000,,this I actually get a useful bit of data Dialogue: 0,0:31:17.47,0:31:21.73,Default,,0000,0000,0000,,right it tells me there were eleven Dialogue: 0,0:31:20.59,0:31:24.73,Default,,0000,0000,0000,,thousand login attempts with the Dialogue: 0,0:31:21.73,0:31:26.50,Default,,0000,0000,0000,,username root there were four thousand Dialogue: 0,0:31:24.73,0:31:29.53,Default,,0000,0000,0000,,with one two three four five six isn't Dialogue: 0,0:31:26.50,0:31:33.79,Default,,0000,0000,0000,,username etc and this is pretty handy Dialogue: 0,0:31:29.53,0:31:36.04,Default,,0000,0000,0000,,right and now suddenly this giant log Dialogue: 0,0:31:33.79,0:31:38.23,Default,,0000,0000,0000,,file actually produces useful Dialogue: 0,0:31:36.04,0:31:40.54,Default,,0000,0000,0000,,information for me this is what I really Dialogue: 0,0:31:38.23,0:31:44.23,Default,,0000,0000,0000,,from that log file now maybe I want to Dialogue: 0,0:31:40.54,0:31:46.53,Default,,0000,0000,0000,,just like do a quick disabling of root Dialogue: 0,0:31:44.23,0:31:50.61,Default,,0000,0000,0000,,for example for SSH login on my machine Dialogue: 0,0:31:46.53,0:31:50.61,Default,,0000,0000,0000,,which I recommend you will do by the way Dialogue: 0,0:31:51.21,0:31:56.56,Default,,0000,0000,0000,,in this particular case we don't Dialogue: 0,0:31:53.41,0:31:58.51,Default,,0000,0000,0000,,actually need the k4 sort because sort Dialogue: 0,0:31:56.56,0:32:00.85,Default,,0000,0000,0000,,by default will sort by the entire line Dialogue: 0,0:31:58.51,0:32:01.99,Default,,0000,0000,0000,,and the number happens to come first but Dialogue: 0,0:32:00.85,0:32:04.06,Default,,0000,0000,0000,,it's useful to know about these Dialogue: 0,0:32:01.99,0:32:06.01,Default,,0000,0000,0000,,additional flags and you might wonder Dialogue: 0,0:32:04.06,0:32:07.33,Default,,0000,0000,0000,,well how would I know that these flags Dialogue: 0,0:32:06.01,0:32:08.56,Default,,0000,0000,0000,,exist how would I know that these Dialogue: 0,0:32:07.33,0:32:11.41,Default,,0000,0000,0000,,programs even exist Dialogue: 0,0:32:08.56,0:32:12.85,Default,,0000,0000,0000,,well the programs usually pick up just Dialogue: 0,0:32:11.41,0:32:15.90,Default,,0000,0000,0000,,from being told about them in classes Dialogue: 0,0:32:12.85,0:32:19.03,Default,,0000,0000,0000,,like here the flags are usually like I Dialogue: 0,0:32:15.90,0:32:22.30,Default,,0000,0000,0000,,want to sort by something that is not Dialogue: 0,0:32:19.03,0:32:24.16,Default,,0000,0000,0000,,the full line your first instinct should Dialogue: 0,0:32:22.30,0:32:25.93,Default,,0000,0000,0000,,be to type man sort and then read Dialogue: 0,0:32:24.16,0:32:27.67,Default,,0000,0000,0000,,through the page and then very quickly Dialogue: 0,0:32:25.93,0:32:29.23,Default,,0000,0000,0000,,will tell you here's how to select a Dialogue: 0,0:32:27.67,0:32:35.92,Default,,0000,0000,0000,,pretty good column here's how to sort by Dialogue: 0,0:32:29.23,0:32:38.49,Default,,0000,0000,0000,,a number etc okay what if now that I Dialogue: 0,0:32:35.92,0:32:40.42,Default,,0000,0000,0000,,have this like top let's say top 20 list Dialogue: 0,0:32:38.49,0:32:42.79,Default,,0000,0000,0000,,let's say I don't actually care about Dialogue: 0,0:32:40.42,0:32:45.01,Default,,0000,0000,0000,,the counts I just want like a comma Dialogue: 0,0:32:42.79,0:32:47.47,Default,,0000,0000,0000,,separated list of the user names because Dialogue: 0,0:32:45.01,0:32:49.51,Default,,0000,0000,0000,,I'm gonna like send it to myself by Dialogue: 0,0:32:47.47,0:32:53.41,Default,,0000,0000,0000,,email every day or something like that Dialogue: 0,0:32:49.51,0:32:56.91,Default,,0000,0000,0000,,like these are the top 20 usernames well Dialogue: 0,0:32:53.41,0:32:56.91,Default,,0000,0000,0000,,I can do this Dialogue: 0,0:32:58.29,0:33:02.56,Default,,0000,0000,0000,,ok that's a lot more weird commands but Dialogue: 0,0:33:01.36,0:33:07.33,Default,,0000,0000,0000,,their commands that are useful to know Dialogue: 0,0:33:02.56,0:33:09.88,Default,,0000,0000,0000,,about so awk is a column based stream Dialogue: 0,0:33:07.33,0:33:12.43,Default,,0000,0000,0000,,processor so we talked about said which Dialogue: 0,0:33:09.88,0:33:15.64,Default,,0000,0000,0000,,is a stream editor so it tries to edit Dialogue: 0,0:33:12.43,0:33:18.82,Default,,0000,0000,0000,,text primarily in the inputs awk on the Dialogue: 0,0:33:15.64,0:33:20.65,Default,,0000,0000,0000,,other hand also lets you edit text it is Dialogue: 0,0:33:18.82,0:33:23.29,Default,,0000,0000,0000,,still a full programming language but Dialogue: 0,0:33:20.65,0:33:25.66,Default,,0000,0000,0000,,it's more focused on columnar data so in Dialogue: 0,0:33:23.29,0:33:28.39,Default,,0000,0000,0000,,this case awk by default will parse its Dialogue: 0,0:33:25.66,0:33:30.19,Default,,0000,0000,0000,,input in white space separated columns Dialogue: 0,0:33:28.39,0:33:32.17,Default,,0000,0000,0000,,and then that you operate on those Dialogue: 0,0:33:30.19,0:33:33.43,Default,,0000,0000,0000,,columns separately in this case I'm Dialogue: 0,0:33:32.17,0:33:38.32,Default,,0000,0000,0000,,saying just print the second column Dialogue: 0,0:33:33.43,0:33:40.30,Default,,0000,0000,0000,,which is the user name right paste is a Dialogue: 0,0:33:38.32,0:33:43.03,Default,,0000,0000,0000,,command that takes a bunch of lines and Dialogue: 0,0:33:40.30,0:33:46.35,Default,,0000,0000,0000,,paste them together into a single line Dialogue: 0,0:33:43.03,0:33:49.45,Default,,0000,0000,0000,,that's the - s with the delimiter comma Dialogue: 0,0:33:46.35,0:33:51.74,Default,,0000,0000,0000,,so in this case for on this I want to Dialogue: 0,0:33:49.45,0:33:53.93,Default,,0000,0000,0000,,get a comma separated list of the top Dialogue: 0,0:33:51.74,0:33:56.12,Default,,0000,0000,0000,,user names which I can then do whatever Dialogue: 0,0:33:53.93,0:33:57.50,Default,,0000,0000,0000,,useful thing I might want maybe I want Dialogue: 0,0:33:56.12,0:33:59.15,Default,,0000,0000,0000,,to stick this in a config file of Dialogue: 0,0:33:57.50,0:34:00.43,Default,,0000,0000,0000,,disallowed usernames or something along Dialogue: 0,0:33:59.15,0:34:04.04,Default,,0000,0000,0000,,those lines Dialogue: 0,0:34:00.43,0:34:05.72,Default,,0000,0000,0000,,um awk is worth talking a little bit Dialogue: 0,0:34:04.04,0:34:08.51,Default,,0000,0000,0000,,more about because it turns out to be a Dialogue: 0,0:34:05.72,0:34:12.86,Default,,0000,0000,0000,,really powerful language for this kind Dialogue: 0,0:34:08.51,0:34:16.19,Default,,0000,0000,0000,,of data wrangling we mentioned briefly Dialogue: 0,0:34:12.86,0:34:19.01,Default,,0000,0000,0000,,what this print dollar 2 does but it Dialogue: 0,0:34:16.19,0:34:21.02,Default,,0000,0000,0000,,turns out the for awk you can do some Dialogue: 0,0:34:19.01,0:34:22.85,Default,,0000,0000,0000,,really really fancy things so for Dialogue: 0,0:34:21.02,0:34:25.13,Default,,0000,0000,0000,,example let's go back to here where we Dialogue: 0,0:34:22.85,0:34:29.42,Default,,0000,0000,0000,,just have the usernames I say let's Dialogue: 0,0:34:25.13,0:34:31.67,Default,,0000,0000,0000,,still do sort and unique because we Dialogue: 0,0:34:29.42,0:34:32.09,Default,,0000,0000,0000,,don't otherwise the list gets far too Dialogue: 0,0:34:31.67,0:34:34.04,Default,,0000,0000,0000,,long Dialogue: 0,0:34:32.09,0:34:36.80,Default,,0000,0000,0000,,and let's say that I only want to print Dialogue: 0,0:34:34.04,0:34:40.76,Default,,0000,0000,0000,,the usernames that match a particular Dialogue: 0,0:34:36.80,0:34:51.44,Default,,0000,0000,0000,,pattern let's say for example that I Dialogue: 0,0:34:40.76,0:34:56.57,Default,,0000,0000,0000,,want to see I want all of the usernames Dialogue: 0,0:34:51.44,0:34:59.60,Default,,0000,0000,0000,,that only appear once and that start Dialogue: 0,0:34:56.57,0:35:02.36,Default,,0000,0000,0000,,with a C and end with an e there's a Dialogue: 0,0:34:59.60,0:35:04.31,Default,,0000,0000,0000,,really weird thing to look for but in Dialogue: 0,0:35:02.36,0:35:06.41,Default,,0000,0000,0000,,all this is really simple to express I Dialogue: 0,0:35:04.31,0:35:11.20,Default,,0000,0000,0000,,can say I want the first column to be 1 Dialogue: 0,0:35:06.41,0:35:15.19,Default,,0000,0000,0000,,and I want the second column to match Dialogue: 0,0:35:11.20,0:35:15.19,Default,,0000,0000,0000,,the following regular expression Dialogue: 0,0:35:20.48,0:35:32.03,Default,,0000,0000,0000,,hey this could probably just be dot and Dialogue: 0,0:35:26.12,0:35:33.92,Default,,0000,0000,0000,,then I want to print the whole line so Dialogue: 0,0:35:32.03,0:35:36.23,Default,,0000,0000,0000,,unless I mess something up this will Dialogue: 0,0:35:33.92,0:35:38.90,Default,,0000,0000,0000,,give me all the usernames that start Dialogue: 0,0:35:36.23,0:35:42.86,Default,,0000,0000,0000,,with a C end with an e and only appear Dialogue: 0,0:35:38.90,0:35:44.78,Default,,0000,0000,0000,,once in my log now that might not be a Dialogue: 0,0:35:42.86,0:35:46.64,Default,,0000,0000,0000,,very useful thing to do with the data Dialogue: 0,0:35:44.78,0:35:48.23,Default,,0000,0000,0000,,what I'm trying to do in this lecture is Dialogue: 0,0:35:46.64,0:35:49.94,Default,,0000,0000,0000,,show you the kind of tools that are Dialogue: 0,0:35:48.23,0:35:51.62,Default,,0000,0000,0000,,available and in this particular case Dialogue: 0,0:35:49.94,0:35:53.18,Default,,0000,0000,0000,,this pattern is like not that Dialogue: 0,0:35:51.62,0:35:54.98,Default,,0000,0000,0000,,complicated even though what we're doing Dialogue: 0,0:35:53.18,0:35:58.34,Default,,0000,0000,0000,,is sort of weird and this is because Dialogue: 0,0:35:54.98,0:35:59.57,Default,,0000,0000,0000,,very often on Linux with Linux tools in Dialogue: 0,0:35:58.34,0:36:02.57,Default,,0000,0000,0000,,particular and command-line tools in Dialogue: 0,0:35:59.57,0:36:04.61,Default,,0000,0000,0000,,general the tools are built to be based Dialogue: 0,0:36:02.57,0:36:06.44,Default,,0000,0000,0000,,on lines of input and lines of output Dialogue: 0,0:36:04.61,0:36:09.08,Default,,0000,0000,0000,,and very often those lines are going to Dialogue: 0,0:36:06.44,0:36:18.08,Default,,0000,0000,0000,,be have multiple columns and awk is Dialogue: 0,0:36:09.08,0:36:22.16,Default,,0000,0000,0000,,great for operating over columns now awk Dialogue: 0,0:36:18.08,0:36:26.75,Default,,0000,0000,0000,,is is not just able to do things like Dialogue: 0,0:36:22.16,0:36:29.06,Default,,0000,0000,0000,,match per line but it lets you do things Dialogue: 0,0:36:26.75,0:36:31.22,Default,,0000,0000,0000,,like let's say I want the number of Dialogue: 0,0:36:29.06,0:36:32.90,Default,,0000,0000,0000,,these right I want to know how many user Dialogue: 0,0:36:31.22,0:36:36.83,Default,,0000,0000,0000,,names match this pattern well I can do Dialogue: 0,0:36:32.90,0:36:39.71,Default,,0000,0000,0000,,WCHL that works just fine all right Dialogue: 0,0:36:36.83,0:36:41.99,Default,,0000,0000,0000,,there are 31 such user names but awk is Dialogue: 0,0:36:39.71,0:36:44.78,Default,,0000,0000,0000,,a programming language this is something Dialogue: 0,0:36:41.99,0:36:46.82,Default,,0000,0000,0000,,that you will probably never end up Dialogue: 0,0:36:44.78,0:36:49.43,Default,,0000,0000,0000,,doing yourself but it's important to Dialogue: 0,0:36:46.82,0:36:53.20,Default,,0000,0000,0000,,know that you can every now and again it Dialogue: 0,0:36:49.43,0:36:53.20,Default,,0000,0000,0000,,is actually useful to know about these Dialogue: 0,0:36:53.62,0:37:02.42,Default,,0000,0000,0000,,this might be hard to read on my screen Dialogue: 0,0:36:57.14,0:37:04.96,Default,,0000,0000,0000,,I just realized let me try to fix that Dialogue: 0,0:37:02.42,0:37:04.96,Default,,0000,0000,0000,,in a second Dialogue: 0,0:37:07.30,0:37:17.65,Default,,0000,0000,0000,,let's do yeah apparently fish does not Dialogue: 0,0:37:14.47,0:37:19.75,Default,,0000,0000,0000,,want me to do that um so here begin is a Dialogue: 0,0:37:17.65,0:37:22.54,Default,,0000,0000,0000,,special pattern that only matches the Dialogue: 0,0:37:19.75,0:37:25.78,Default,,0000,0000,0000,,zeroth line end is a special pattern Dialogue: 0,0:37:22.54,0:37:28.18,Default,,0000,0000,0000,,that only matches after the last line Dialogue: 0,0:37:25.78,0:37:29.62,Default,,0000,0000,0000,,and then this is gonna be a normal Dialogue: 0,0:37:28.18,0:37:32.02,Default,,0000,0000,0000,,pattern that's matched against every Dialogue: 0,0:37:29.62,0:37:34.15,Default,,0000,0000,0000,,line so what I'm saying here is on the Dialogue: 0,0:37:32.02,0:37:36.58,Default,,0000,0000,0000,,zeroth line set the variable rose to Dialogue: 0,0:37:34.15,0:37:40.42,Default,,0000,0000,0000,,zero on every line that matches this Dialogue: 0,0:37:36.58,0:37:42.31,Default,,0000,0000,0000,,pattern increment rose and after you Dialogue: 0,0:37:40.42,0:37:44.92,Default,,0000,0000,0000,,have matched the last line print the Dialogue: 0,0:37:42.31,0:37:47.50,Default,,0000,0000,0000,,value of rose and this will have the Dialogue: 0,0:37:44.92,0:37:50.26,Default,,0000,0000,0000,,same effect as running WCHL but all Dialogue: 0,0:37:47.50,0:37:52.81,Default,,0000,0000,0000,,within awk his particular instance like Dialogue: 0,0:37:50.26,0:37:55.60,Default,,0000,0000,0000,,WCHL is just fine but sometimes you want Dialogue: 0,0:37:52.81,0:37:57.43,Default,,0000,0000,0000,,to do things like you want to might want Dialogue: 0,0:37:55.60,0:37:59.11,Default,,0000,0000,0000,,to keep a dictionary or a map of some Dialogue: 0,0:37:57.43,0:38:01.12,Default,,0000,0000,0000,,kind you might want to compute Dialogue: 0,0:37:59.11,0:38:03.22,Default,,0000,0000,0000,,statistics you might want to do things Dialogue: 0,0:38:01.12,0:38:05.47,Default,,0000,0000,0000,,like I want the second match of this Dialogue: 0,0:38:03.22,0:38:07.63,Default,,0000,0000,0000,,pattern so you need a stateful matcher Dialogue: 0,0:38:05.47,0:38:09.10,Default,,0000,0000,0000,,like ignore the first match but then Dialogue: 0,0:38:07.63,0:38:11.14,Default,,0000,0000,0000,,print everything following the second Dialogue: 0,0:38:09.10,0:38:12.64,Default,,0000,0000,0000,,match and for that this kind of simple Dialogue: 0,0:38:11.14,0:38:18.49,Default,,0000,0000,0000,,programming in all can be useful to know Dialogue: 0,0:38:12.64,0:38:22.93,Default,,0000,0000,0000,,about in fact we could in this pattern Dialogue: 0,0:38:18.49,0:38:24.79,Default,,0000,0000,0000,,get rid of said and sort and unique and Dialogue: 0,0:38:22.93,0:38:26.80,Default,,0000,0000,0000,,grep that we originally used to produce Dialogue: 0,0:38:24.79,0:38:28.21,Default,,0000,0000,0000,,this file and do it all in awk Dialogue: 0,0:38:26.80,0:38:30.88,Default,,0000,0000,0000,,but you probably don't want to do that Dialogue: 0,0:38:28.21,0:38:34.54,Default,,0000,0000,0000,,it would be probably too painful to be Dialogue: 0,0:38:30.88,0:38:37.36,Default,,0000,0000,0000,,worth it it's worth talking a little bit Dialogue: 0,0:38:34.54,0:38:38.100,Default,,0000,0000,0000,,about the other kinds of tools that you Dialogue: 0,0:38:37.36,0:38:41.17,Default,,0000,0000,0000,,might want to use on the command line Dialogue: 0,0:38:38.100,0:38:45.04,Default,,0000,0000,0000,,the first of these is a really handy Dialogue: 0,0:38:41.17,0:38:49.93,Default,,0000,0000,0000,,program called BC so BC is the Berkeley Dialogue: 0,0:38:45.04,0:38:51.45,Default,,0000,0000,0000,,calculator I believe man BC I think BC Dialogue: 0,0:38:49.93,0:38:54.07,Default,,0000,0000,0000,,is originally from Berkeley calculator Dialogue: 0,0:38:51.45,0:38:56.17,Default,,0000,0000,0000,,anyway it is a very simple command-line Dialogue: 0,0:38:54.07,0:38:58.96,Default,,0000,0000,0000,,calculator but instead of giving you a Dialogue: 0,0:38:56.17,0:39:00.76,Default,,0000,0000,0000,,prompt it reads from standard in so I Dialogue: 0,0:38:58.96,0:39:04.90,Default,,0000,0000,0000,,can do something like echo 1 plus 2 and Dialogue: 0,0:39:00.76,0:39:06.79,Default,,0000,0000,0000,,pipe it to BC - shell because many of Dialogue: 0,0:39:04.90,0:39:11.32,Default,,0000,0000,0000,,these programs normally operate in like Dialogue: 0,0:39:06.79,0:39:15.70,Default,,0000,0000,0000,,a stupid mode where they're unhelpful so Dialogue: 0,0:39:11.32,0:39:17.47,Default,,0000,0000,0000,,here it prints 3 Wow very impressive but Dialogue: 0,0:39:15.70,0:39:19.78,Default,,0000,0000,0000,,it turns out this can be really handy Dialogue: 0,0:39:17.47,0:39:21.10,Default,,0000,0000,0000,,imagine you have a file with a bunch of Dialogue: 0,0:39:19.78,0:39:26.34,Default,,0000,0000,0000,,lines Dialogue: 0,0:39:21.10,0:39:32.02,Default,,0000,0000,0000,,let's say something like oh I don't know Dialogue: 0,0:39:26.34,0:39:35.02,Default,,0000,0000,0000,,this file and let's say I want to sum up Dialogue: 0,0:39:32.02,0:39:36.91,Default,,0000,0000,0000,,the number of logins the number of user Dialogue: 0,0:39:35.02,0:39:40.03,Default,,0000,0000,0000,,names that have not been used only once Dialogue: 0,0:39:36.91,0:39:43.87,Default,,0000,0000,0000,,all right so the ones where the count is Dialogue: 0,0:39:40.03,0:39:48.55,Default,,0000,0000,0000,,not equal to one I want to print just Dialogue: 0,0:39:43.87,0:39:50.95,Default,,0000,0000,0000,,the count right this is me give me the Dialogue: 0,0:39:48.55,0:39:52.93,Default,,0000,0000,0000,,counts for all the non single-use user Dialogue: 0,0:39:50.95,0:39:55.18,Default,,0000,0000,0000,,names and then I want to know how many Dialogue: 0,0:39:52.93,0:39:56.74,Default,,0000,0000,0000,,are there of these notice that I can't Dialogue: 0,0:39:55.18,0:39:59.11,Default,,0000,0000,0000,,just count the lines that wouldn't work Dialogue: 0,0:39:56.74,0:40:02.20,Default,,0000,0000,0000,,right because there are numbers on each Dialogue: 0,0:39:59.11,0:40:05.95,Default,,0000,0000,0000,,ran I want to sum well I can use paste Dialogue: 0,0:40:02.20,0:40:08.10,Default,,0000,0000,0000,,to paste by plus so this paste every Dialogue: 0,0:40:05.95,0:40:12.04,Default,,0000,0000,0000,,line together into a plus expression Dialogue: 0,0:40:08.10,0:40:14.20,Default,,0000,0000,0000,,right and this is now an arithmetic Dialogue: 0,0:40:12.04,0:40:18.91,Default,,0000,0000,0000,,expression so I can pipe it through BCL Dialogue: 0,0:40:14.20,0:40:20.92,Default,,0000,0000,0000,,and now there have been hundred and Dialogue: 0,0:40:18.91,0:40:22.72,Default,,0000,0000,0000,,ninety one thousand logins that share to Dialogue: 0,0:40:20.92,0:40:25.54,Default,,0000,0000,0000,,username with at least one other login Dialogue: 0,0:40:22.72,0:40:27.70,Default,,0000,0000,0000,,again probably not something you really Dialogue: 0,0:40:25.54,0:40:29.56,Default,,0000,0000,0000,,care about but this is just to show you Dialogue: 0,0:40:27.70,0:40:34.36,Default,,0000,0000,0000,,that you can extract this data pretty Dialogue: 0,0:40:29.56,0:40:36.07,Default,,0000,0000,0000,,easily and there's all sort of other Dialogue: 0,0:40:34.36,0:40:37.81,Default,,0000,0000,0000,,stuff you can do with this for example Dialogue: 0,0:40:36.07,0:40:40.81,Default,,0000,0000,0000,,there are tools so that you compute Dialogue: 0,0:40:37.81,0:40:43.66,Default,,0000,0000,0000,,statistics over inputs so for example Dialogue: 0,0:40:40.81,0:40:45.85,Default,,0000,0000,0000,,for this list of numbers that's that I Dialogue: 0,0:40:43.66,0:40:49.59,Default,,0000,0000,0000,,just took the numbers and just print it Dialogue: 0,0:40:45.85,0:40:54.88,Default,,0000,0000,0000,,out just the distribution of numbers I Dialogue: 0,0:40:49.59,0:40:56.08,Default,,0000,0000,0000,,could do things like use our our is the Dialogue: 0,0:40:54.88,0:40:57.64,Default,,0000,0000,0000,,separate programming language that's Dialogue: 0,0:40:56.08,0:41:02.23,Default,,0000,0000,0000,,specifically built for a statistical Dialogue: 0,0:40:57.64,0:41:03.57,Default,,0000,0000,0000,,analysis and I can say let's see if I Dialogue: 0,0:41:02.23,0:41:06.28,Default,,0000,0000,0000,,got this right Dialogue: 0,0:41:03.57,0:41:10.44,Default,,0000,0000,0000,,this is again a different programming Dialogue: 0,0:41:06.28,0:41:13.21,Default,,0000,0000,0000,,language that you would have to learn Dialogue: 0,0:41:10.44,0:41:14.20,Default,,0000,0000,0000,,but if you already know R or you can Dialogue: 0,0:41:13.21,0:41:23.86,Default,,0000,0000,0000,,pipe them through all their languages Dialogue: 0,0:41:14.20,0:41:26.38,Default,,0000,0000,0000,,too like so so this gives me summary Dialogue: 0,0:41:23.86,0:41:30.16,Default,,0000,0000,0000,,statistics over that input stream of Dialogue: 0,0:41:26.38,0:41:33.31,Default,,0000,0000,0000,,numbers so the median number of login Dialogue: 0,0:41:30.16,0:41:34.33,Default,,0000,0000,0000,,attempts per user name is 3 the max is Dialogue: 0,0:41:33.31,0:41:35.98,Default,,0000,0000,0000,,10,000 that was route Dialogue: 0,0:41:34.33,0:41:39.25,Default,,0000,0000,0000,,we saw before I'll tell me the average Dialogue: 0,0:41:35.98,0:41:40.60,Default,,0000,0000,0000,,was 8 for this might not matter in this Dialogue: 0,0:41:39.25,0:41:42.04,Default,,0000,0000,0000,,particular instance like this might not Dialogue: 0,0:41:40.60,0:41:43.66,Default,,0000,0000,0000,,be interesting numbers but if you're Dialogue: 0,0:41:42.04,0:41:45.79,Default,,0000,0000,0000,,looking at things like output from your Dialogue: 0,0:41:43.66,0:41:46.78,Default,,0000,0000,0000,,benchmarking script or something else Dialogue: 0,0:41:45.79,0:41:48.52,Default,,0000,0000,0000,,where you have some numerical Dialogue: 0,0:41:46.78,0:41:52.90,Default,,0000,0000,0000,,distribution and you want to look at Dialogue: 0,0:41:48.52,0:41:54.25,Default,,0000,0000,0000,,them these tools are really handy we can Dialogue: 0,0:41:52.90,0:41:57.64,Default,,0000,0000,0000,,even do some simple plotting if we Dialogue: 0,0:41:54.25,0:42:01.33,Default,,0000,0000,0000,,wanted to right so this has a bunch of Dialogue: 0,0:41:57.64,0:42:06.22,Default,,0000,0000,0000,,numbers let's do let's go back to our Dialogue: 0,0:42:01.33,0:42:11.86,Default,,0000,0000,0000,,sort and k-11 and look at only the two Dialogue: 0,0:42:06.22,0:42:17.77,Default,,0000,0000,0000,,top 5 new plot is a plotter that lets Dialogue: 0,0:42:11.86,0:42:19.15,Default,,0000,0000,0000,,you take things from standard in I'm not Dialogue: 0,0:42:17.77,0:42:22.48,Default,,0000,0000,0000,,expecting you to know all of these Dialogue: 0,0:42:19.15,0:42:23.95,Default,,0000,0000,0000,,programming languages because they Dialogue: 0,0:42:22.48,0:42:25.81,Default,,0000,0000,0000,,really are programming languages in Dialogue: 0,0:42:23.95,0:42:30.58,Default,,0000,0000,0000,,their own right but is it just show you Dialogue: 0,0:42:25.81,0:42:34.36,Default,,0000,0000,0000,,what is possible right so this is now a Dialogue: 0,0:42:30.58,0:42:37.36,Default,,0000,0000,0000,,histogram of how many times each of the Dialogue: 0,0:42:34.36,0:42:41.02,Default,,0000,0000,0000,,top 5 user names have been used for my Dialogue: 0,0:42:37.36,0:42:43.81,Default,,0000,0000,0000,,server since January 1st and it's just Dialogue: 0,0:42:41.02,0:42:45.34,Default,,0000,0000,0000,,one command line it's somewhat Dialogue: 0,0:42:43.81,0:42:48.57,Default,,0000,0000,0000,,complicated command line but it's just Dialogue: 0,0:42:45.34,0:42:48.57,Default,,0000,0000,0000,,one command line thing that you can do Dialogue: 0,0:42:50.52,0:42:54.79,Default,,0000,0000,0000,,there are two sort of special types of Dialogue: 0,0:42:53.59,0:42:56.29,Default,,0000,0000,0000,,data wrangling that I want to talk to Dialogue: 0,0:42:54.79,0:42:58.42,Default,,0000,0000,0000,,you about in the in the last little bit Dialogue: 0,0:42:56.29,0:43:01.98,Default,,0000,0000,0000,,of time that we have and the first one Dialogue: 0,0:42:58.42,0:43:07.75,Default,,0000,0000,0000,,is command line argument wrangling Dialogue: 0,0:43:01.98,0:43:09.22,Default,,0000,0000,0000,,sometimes you might have something that Dialogue: 0,0:43:07.75,0:43:11.14,Default,,0000,0000,0000,,actually we looked at in the last Dialogue: 0,0:43:09.22,0:43:14.17,Default,,0000,0000,0000,,lecture like you have things like find Dialogue: 0,0:43:11.14,0:43:17.76,Default,,0000,0000,0000,,that produces a list of files or maybe Dialogue: 0,0:43:14.17,0:43:17.76,Default,,0000,0000,0000,,something that produces a list of Dialogue: 0,0:43:19.38,0:43:23.08,Default,,0000,0000,0000,,arguments for your benchmarking script Dialogue: 0,0:43:21.94,0:43:24.67,Default,,0000,0000,0000,,like you want to run it with a Dialogue: 0,0:43:23.08,0:43:26.02,Default,,0000,0000,0000,,particular distribution of arguments Dialogue: 0,0:43:24.67,0:43:28.81,Default,,0000,0000,0000,,like let's say you had a script that Dialogue: 0,0:43:26.02,0:43:29.98,Default,,0000,0000,0000,,printed the number of iterations to run Dialogue: 0,0:43:28.81,0:43:31.63,Default,,0000,0000,0000,,a particular project and you wanted like Dialogue: 0,0:43:29.98,0:43:33.52,Default,,0000,0000,0000,,an exponential distribution or something Dialogue: 0,0:43:31.63,0:43:35.50,Default,,0000,0000,0000,,and this prints the number of iterations Dialogue: 0,0:43:33.52,0:43:37.96,Default,,0000,0000,0000,,on each line and you were to run your Dialogue: 0,0:43:35.50,0:43:39.19,Default,,0000,0000,0000,,benchmark for each one well here is a Dialogue: 0,0:43:37.96,0:43:43.42,Default,,0000,0000,0000,,tool called X args Dialogue: 0,0:43:39.19,0:43:46.21,Default,,0000,0000,0000,,that's your friend so X args takes lines Dialogue: 0,0:43:43.42,0:43:47.62,Default,,0000,0000,0000,,of input and turns them into arguments Dialogue: 0,0:43:46.21,0:43:50.17,Default,,0000,0000,0000,,and this is my Dialogue: 0,0:43:47.62,0:43:52.27,Default,,0000,0000,0000,,look a little weird see if I can come Dialogue: 0,0:43:50.17,0:43:55.48,Default,,0000,0000,0000,,with a good example for this so I Dialogue: 0,0:43:52.27,0:43:56.77,Default,,0000,0000,0000,,program in rust and rust lets you Dialogue: 0,0:43:55.48,0:43:58.54,Default,,0000,0000,0000,,install multiple versions of the Dialogue: 0,0:43:56.77,0:44:01.36,Default,,0000,0000,0000,,compiler so in this case you can see Dialogue: 0,0:43:58.54,0:44:04.42,Default,,0000,0000,0000,,that I have stable beta I have a couple Dialogue: 0,0:44:01.36,0:44:05.86,Default,,0000,0000,0000,,of earlier stable releases and I've Dialogue: 0,0:44:04.42,0:44:08.98,Default,,0000,0000,0000,,launched a different dated Knightley's Dialogue: 0,0:44:05.86,0:44:12.01,Default,,0000,0000,0000,,and this is all very well but over time Dialogue: 0,0:44:08.98,0:44:14.14,Default,,0000,0000,0000,,like I don't really need the nightly Dialogue: 0,0:44:12.01,0:44:14.89,Default,,0000,0000,0000,,version from like March of last year Dialogue: 0,0:44:14.14,0:44:16.45,Default,,0000,0000,0000,,anymore Dialogue: 0,0:44:14.89,0:44:17.71,Default,,0000,0000,0000,,I can probably delete that every now and Dialogue: 0,0:44:16.45,0:44:21.55,Default,,0000,0000,0000,,again and maybe I want to clean these up Dialogue: 0,0:44:17.71,0:44:25.33,Default,,0000,0000,0000,,a little well this is a list of lines so Dialogue: 0,0:44:21.55,0:44:29.77,Default,,0000,0000,0000,,I can get for nightly I can get rid of Dialogue: 0,0:44:25.33,0:44:32.17,Default,,0000,0000,0000,,so - V is don't match I don't want to Dialogue: 0,0:44:29.77,0:44:34.54,Default,,0000,0000,0000,,match to the current nightly okay so Dialogue: 0,0:44:32.17,0:44:37.81,Default,,0000,0000,0000,,this is al a list of dated Knightley's Dialogue: 0,0:44:34.54,0:44:42.73,Default,,0000,0000,0000,,maybe I want only the ones from 2019 Dialogue: 0,0:44:37.81,0:44:45.37,Default,,0000,0000,0000,,and now I want to remove each of these Dialogue: 0,0:44:42.73,0:44:48.34,Default,,0000,0000,0000,,tool chains for my machine I could copy Dialogue: 0,0:44:45.37,0:44:52.63,Default,,0000,0000,0000,,paste each one into so there's a rust up Dialogue: 0,0:44:48.34,0:44:56.11,Default,,0000,0000,0000,,tool chain remove or uninstall maybe Dialogue: 0,0:44:52.63,0:44:58.06,Default,,0000,0000,0000,,tool chain uninstall right so I could Dialogue: 0,0:44:56.11,0:44:59.47,Default,,0000,0000,0000,,manually type out the name of each one Dialogue: 0,0:44:58.06,0:45:01.03,Default,,0000,0000,0000,,or copy/paste them but that's getting Dialogue: 0,0:44:59.47,0:45:03.70,Default,,0000,0000,0000,,gets annoying really quickly because I Dialogue: 0,0:45:01.03,0:45:10.66,Default,,0000,0000,0000,,have the list right here so instead how Dialogue: 0,0:45:03.70,0:45:14.89,Default,,0000,0000,0000,,about I said away this sort of this Dialogue: 0,0:45:10.66,0:45:17.77,Default,,0000,0000,0000,,suffix that it adds right so now it's Dialogue: 0,0:45:14.89,0:45:20.80,Default,,0000,0000,0000,,just that and then I use ex args so ex Dialogue: 0,0:45:17.77,0:45:23.77,Default,,0000,0000,0000,,args takes a list of inputs and turns Dialogue: 0,0:45:20.80,0:45:27.06,Default,,0000,0000,0000,,them into arguments so I want this to Dialogue: 0,0:45:23.77,0:45:30.73,Default,,0000,0000,0000,,become arguments to rust up tool chain Dialogue: 0,0:45:27.06,0:45:32.71,Default,,0000,0000,0000,,uninstall and just for my own sanity Dialogue: 0,0:45:30.73,0:45:33.91,Default,,0000,0000,0000,,sake I'm gonna make this echo just so Dialogue: 0,0:45:32.71,0:45:36.46,Default,,0000,0000,0000,,it's going to show which command it's Dialogue: 0,0:45:33.91,0:45:39.46,Default,,0000,0000,0000,,gonna run well it's relatively unhelpful Dialogue: 0,0:45:36.46,0:45:41.77,Default,,0000,0000,0000,,but are hard to read at least you see Dialogue: 0,0:45:39.46,0:45:43.99,Default,,0000,0000,0000,,the command it's going to execute if I Dialogue: 0,0:45:41.77,0:45:45.55,Default,,0000,0000,0000,,remove this echo is rust up tool chain Dialogue: 0,0:45:43.99,0:45:47.52,Default,,0000,0000,0000,,uninstall and then the list of Dialogue: 0,0:45:45.55,0:45:51.13,Default,,0000,0000,0000,,Knightley's as arguments to that program Dialogue: 0,0:45:47.52,0:45:52.63,Default,,0000,0000,0000,,and so if I run this it on installs Dialogue: 0,0:45:51.13,0:45:56.11,Default,,0000,0000,0000,,every tool chain instead of me having to Dialogue: 0,0:45:52.63,0:45:57.52,Default,,0000,0000,0000,,copy paste them so this is one example Dialogue: 0,0:45:56.11,0:45:59.11,Default,,0000,0000,0000,,where this kind of data wrangling Dialogue: 0,0:45:57.52,0:46:00.67,Default,,0000,0000,0000,,actually can be useful for other tasks Dialogue: 0,0:45:59.11,0:46:01.48,Default,,0000,0000,0000,,than just looking at data it's just Dialogue: 0,0:46:00.67,0:46:04.42,Default,,0000,0000,0000,,going from one Dialogue: 0,0:46:01.48,0:46:07.15,Default,,0000,0000,0000,,format to another you can also wrangle Dialogue: 0,0:46:04.42,0:46:09.55,Default,,0000,0000,0000,,binary data so a good example of this is Dialogue: 0,0:46:07.15,0:46:11.71,Default,,0000,0000,0000,,stuff like videos and images where you Dialogue: 0,0:46:09.55,0:46:14.77,Default,,0000,0000,0000,,might actually want to operate over them Dialogue: 0,0:46:11.71,0:46:17.11,Default,,0000,0000,0000,,in some interesting way so for example Dialogue: 0,0:46:14.77,0:46:19.72,Default,,0000,0000,0000,,there's a tool called ffmpeg ffmpeg is Dialogue: 0,0:46:17.11,0:46:23.08,Default,,0000,0000,0000,,for encoding and decoding video and to Dialogue: 0,0:46:19.72,0:46:24.31,Default,,0000,0000,0000,,some extent images I'm gonna set its log Dialogue: 0,0:46:23.08,0:46:26.80,Default,,0000,0000,0000,,level to panic because otherwise it Dialogue: 0,0:46:24.31,0:46:30.73,Default,,0000,0000,0000,,prints a bunch of stuff I want it to Dialogue: 0,0:46:26.80,0:46:34.57,Default,,0000,0000,0000,,read from dev video 0 which is my video Dialogue: 0,0:46:30.73,0:46:37.30,Default,,0000,0000,0000,,of my webcam video device and I wanted Dialogue: 0,0:46:34.57,0:46:40.42,Default,,0000,0000,0000,,to take the first frame so I just wanted Dialogue: 0,0:46:37.30,0:46:42.67,Default,,0000,0000,0000,,to take a picture and I wanted to take Dialogue: 0,0:46:40.42,0:46:45.79,Default,,0000,0000,0000,,an image rather than a single frame Dialogue: 0,0:46:42.67,0:46:48.07,Default,,0000,0000,0000,,video file and I wanted to print its Dialogue: 0,0:46:45.79,0:46:50.41,Default,,0000,0000,0000,,output so the image it captures to Dialogue: 0,0:46:48.07,0:46:52.57,Default,,0000,0000,0000,,standard output - is usually the way you Dialogue: 0,0:46:50.41,0:46:54.43,Default,,0000,0000,0000,,tell the program to use standard input Dialogue: 0,0:46:52.57,0:46:56.20,Default,,0000,0000,0000,,or output rather than a given file so Dialogue: 0,0:46:54.43,0:46:58.93,Default,,0000,0000,0000,,here it expects a file name and the file Dialogue: 0,0:46:56.20,0:47:00.79,Default,,0000,0000,0000,,name - means standard output in this Dialogue: 0,0:46:58.93,0:47:02.55,Default,,0000,0000,0000,,context and then I want to pipe that Dialogue: 0,0:47:00.79,0:47:05.50,Default,,0000,0000,0000,,through a parameter called convert Dialogue: 0,0:47:02.55,0:47:08.17,Default,,0000,0000,0000,,convert is a image manipulation program Dialogue: 0,0:47:05.50,0:47:12.28,Default,,0000,0000,0000,,I want to tell convert to read from Dialogue: 0,0:47:08.17,0:47:16.05,Default,,0000,0000,0000,,standard input and turn the image into Dialogue: 0,0:47:12.28,0:47:19.39,Default,,0000,0000,0000,,the color space gray and then write the Dialogue: 0,0:47:16.05,0:47:22.12,Default,,0000,0000,0000,,resulting image into the file - which is Dialogue: 0,0:47:19.39,0:47:25.12,Default,,0000,0000,0000,,standard output and I don't want to pipe Dialogue: 0,0:47:22.12,0:47:28.72,Default,,0000,0000,0000,,that into gzip we're just gonna compress Dialogue: 0,0:47:25.12,0:47:30.58,Default,,0000,0000,0000,,this image file and that's also going to Dialogue: 0,0:47:28.72,0:47:33.45,Default,,0000,0000,0000,,just operate on standard input standard Dialogue: 0,0:47:30.58,0:47:37.78,Default,,0000,0000,0000,,output and then I'm going to pipe that Dialogue: 0,0:47:33.45,0:47:41.35,Default,,0000,0000,0000,,to my remote server and on that I'm Dialogue: 0,0:47:37.78,0:47:44.05,Default,,0000,0000,0000,,going to decode that image and then I'm Dialogue: 0,0:47:41.35,0:47:46.84,Default,,0000,0000,0000,,gonna store a copy of that image so Dialogue: 0,0:47:44.05,0:47:49.03,Default,,0000,0000,0000,,remember T reads input prints it to Dialogue: 0,0:47:46.84,0:47:51.25,Default,,0000,0000,0000,,standard out and to a file this is gonna Dialogue: 0,0:47:49.03,0:47:55.75,Default,,0000,0000,0000,,make a copy of the decoded image file Dialogue: 0,0:47:51.25,0:47:58.21,Default,,0000,0000,0000,,ass copy about PNG and then it's gonna Dialogue: 0,0:47:55.75,0:48:00.55,Default,,0000,0000,0000,,continue to stream that out so now I'm Dialogue: 0,0:47:58.21,0:48:04.99,Default,,0000,0000,0000,,gonna bring that back into a local Dialogue: 0,0:48:00.55,0:48:07.24,Default,,0000,0000,0000,,stream and here I'm going to display Dialogue: 0,0:48:04.99,0:48:08.55,Default,,0000,0000,0000,,that in an image display err let's see Dialogue: 0,0:48:07.24,0:48:13.24,Default,,0000,0000,0000,,if that works Dialogue: 0,0:48:08.55,0:48:15.05,Default,,0000,0000,0000,,Hey right so this now did a round-trip Dialogue: 0,0:48:13.24,0:48:18.34,Default,,0000,0000,0000,,to my server Dialogue: 0,0:48:15.05,0:48:21.38,Default,,0000,0000,0000,,and then came back over pipes and Dialogue: 0,0:48:18.34,0:48:23.06,Default,,0000,0000,0000,,there's now a computer there's a Dialogue: 0,0:48:21.38,0:48:25.82,Default,,0000,0000,0000,,decompressed version of this file at Dialogue: 0,0:48:23.06,0:48:29.36,Default,,0000,0000,0000,,least in theory on my server let's see Dialogue: 0,0:48:25.82,0:48:38.18,Default,,0000,0000,0000,,if that's there a CPT's p copy PNG 2 Dialogue: 0,0:48:29.36,0:48:40.90,Default,,0000,0000,0000,,here and CP 8 yeah hey same file ended Dialogue: 0,0:48:38.18,0:48:43.58,Default,,0000,0000,0000,,up on the server so our pipeline worked Dialogue: 0,0:48:40.90,0:48:45.89,Default,,0000,0000,0000,,again this is a sort of silly example Dialogue: 0,0:48:43.58,0:48:48.29,Default,,0000,0000,0000,,but let's you see the power of building Dialogue: 0,0:48:45.89,0:48:50.15,Default,,0000,0000,0000,,these pipelines where it doesn't have to Dialogue: 0,0:48:48.29,0:48:52.31,Default,,0000,0000,0000,,be textual data it's just go taking data Dialogue: 0,0:48:50.15,0:48:55.10,Default,,0000,0000,0000,,from any format to any other like for Dialogue: 0,0:48:52.31,0:48:58.28,Default,,0000,0000,0000,,example if I wanted to I can do cat dev Dialogue: 0,0:48:55.10,0:49:00.71,Default,,0000,0000,0000,,video 0 and then pipe that to a server Dialogue: 0,0:48:58.28,0:49:02.66,Default,,0000,0000,0000,,that like Anish controls and then he Dialogue: 0,0:49:00.71,0:49:05.42,Default,,0000,0000,0000,,could watch that video stream by piping Dialogue: 0,0:49:02.66,0:49:08.90,Default,,0000,0000,0000,,it into a video player on his machine if Dialogue: 0,0:49:05.42,0:49:13.10,Default,,0000,0000,0000,,we wanted to write it just need to know Dialogue: 0,0:49:08.90,0:49:15.20,Default,,0000,0000,0000,,that these thing exist there are a bunch Dialogue: 0,0:49:13.10,0:49:17.18,Default,,0000,0000,0000,,of exercises for this lab and some of Dialogue: 0,0:49:15.20,0:49:19.31,Default,,0000,0000,0000,,them rely on you having a data source Dialogue: 0,0:49:17.18,0:49:21.11,Default,,0000,0000,0000,,that looks a little bit like a log on Dialogue: 0,0:49:19.31,0:49:22.46,Default,,0000,0000,0000,,Mac OS and Linux we give you some Dialogue: 0,0:49:21.11,0:49:24.59,Default,,0000,0000,0000,,commands you can try to experiment with Dialogue: 0,0:49:22.46,0:49:26.63,Default,,0000,0000,0000,,but keep in mind that it's not it's not Dialogue: 0,0:49:24.59,0:49:28.97,Default,,0000,0000,0000,,that important exactly what data source Dialogue: 0,0:49:26.63,0:49:30.29,Default,,0000,0000,0000,,you use this is more find some data Dialogue: 0,0:49:28.97,0:49:32.24,Default,,0000,0000,0000,,source that where you think there might Dialogue: 0,0:49:30.29,0:49:33.68,Default,,0000,0000,0000,,be an interesting signal and then try to Dialogue: 0,0:49:32.24,0:49:35.51,Default,,0000,0000,0000,,extract something interesting from it Dialogue: 0,0:49:33.68,0:49:38.66,Default,,0000,0000,0000,,that is what all of the exercises are Dialogue: 0,0:49:35.51,0:49:41.24,Default,,0000,0000,0000,,about we will not have class on Monday Dialogue: 0,0:49:38.66,0:49:43.37,Default,,0000,0000,0000,,because it's MLK Day so next lecture Dialogue: 0,0:49:41.24,0:49:45.44,Default,,0000,0000,0000,,will be Tuesday on command line Dialogue: 0,0:49:43.37,0:49:47.42,Default,,0000,0000,0000,,environments any questions about what Dialogue: 0,0:49:45.44,0:49:51.41,Default,,0000,0000,0000,,we've guarded so far or the pipelines or Dialogue: 0,0:49:47.42,0:49:52.79,Default,,0000,0000,0000,,regular expressions I really recommend Dialogue: 0,0:49:51.41,0:49:54.80,Default,,0000,0000,0000,,that you look into regular expressions Dialogue: 0,0:49:52.79,0:49:57.23,Default,,0000,0000,0000,,and try to learn them they are extremely Dialogue: 0,0:49:54.80,0:49:59.30,Default,,0000,0000,0000,,handy both for this and in programming Dialogue: 0,0:49:57.23,0:50:00.44,Default,,0000,0000,0000,,in general and if you have any questions Dialogue: 0,0:49:59.30,0:50:02.56,Default,,0000,0000,0000,,come to office hours and we'll help you Dialogue: 0,0:50:00.44,0:50:02.56,Default,,0000,0000,0000,,up