WEBVTT 00:00:02.000 --> 00:00:03.319 Hello everyone. 00:00:04.289 --> 00:00:06.839 We are getting started here on 00:00:06.839 --> 00:00:09.880 our August lunch and learn session 00:00:09.880 --> 00:00:12.639 presented by Kinney Group's Atlas Customer 00:00:12.639 --> 00:00:16.400 Experience team. My name is Alice Devaney. I 00:00:16.400 --> 00:00:19.160 am the engineering manager for the Atlas 00:00:19.160 --> 00:00:21.960 Customer Experience team, and I'm excited 00:00:21.960 --> 00:00:24.800 to be presenting this month's session on 00:00:24.800 --> 00:00:28.000 intermediate-level Splunk searching. So 00:00:28.000 --> 00:00:30.199 thank you all for attending. I hope you 00:00:30.199 --> 00:00:33.000 get some good ideas out of this. 00:00:33.000 --> 00:00:35.120 I certainly encourage engagement through 00:00:35.120 --> 00:00:37.040 the chat, and I'll have some 00:00:37.040 --> 00:00:39.800 information at the end on following up 00:00:39.800 --> 00:00:42.239 and speaking with my team directly on 00:00:42.239 --> 00:00:45.879 any issues or interests that you have 00:00:45.879 --> 00:00:48.000 around these types of concepts that 00:00:48.000 --> 00:00:51.520 we're going to cover today. So jumping 00:00:51.520 --> 00:00:55.199 into an intermediate-level session. 00:00:55.199 --> 00:00:57.960 I do want to say that we have previously 00:00:57.960 --> 00:01:02.120 done a basic level searching 00:01:02.120 --> 00:01:05.280 session so that we are really 00:01:05.280 --> 00:01:07.360 progressing from that, picking up right 00:01:07.360 --> 00:01:09.400 where we left off. We've done that 00:01:09.400 --> 00:01:10.640 session with quite a few of our 00:01:10.640 --> 00:01:12.920 customers individually and highly 00:01:12.920 --> 00:01:14.640 recommend if you're interested in doing 00:01:14.640 --> 00:01:18.200 that or this session with a larger team, 00:01:18.200 --> 00:01:19.920 we're happy to discuss and 00:01:19.920 --> 00:01:22.840 coordinate that. So getting started, 00:01:22.840 --> 00:01:25.600 we're going to take a look at the final 00:01:25.600 --> 00:01:29.000 search from our basic search session. 00:01:29.000 --> 00:01:31.159 And we're going to walk through that, 00:01:31.159 --> 00:01:34.159 understand some of the concepts, and 00:01:34.159 --> 00:01:36.479 then we're going to take a step back, 00:01:36.479 --> 00:01:39.479 look a little more generally at SPL 00:01:39.479 --> 00:01:41.759 operations and understanding how 00:01:41.759 --> 00:01:46.200 different commands apply to data, and 00:01:46.200 --> 00:01:49.320 really that next level of understanding 00:01:49.320 --> 00:01:51.759 for how you can write more complex 00:01:51.759 --> 00:01:54.119 searches and understand really when 00:01:54.119 --> 00:01:57.119 to use certain types of commands. And 00:01:57.119 --> 00:01:59.560 of course, in the session we're going 00:01:59.560 --> 00:02:04.399 to have a series of demos using 00:02:04.399 --> 00:02:07.360 a few specific commands, highlighting the 00:02:07.360 --> 00:02:10.440 different SPL command types that we 00:02:10.440 --> 00:02:12.840 discuss in the second portion and get 00:02:12.840 --> 00:02:15.879 to see that on the tutorial data that 00:02:15.879 --> 00:02:18.160 you can also use in your environment, 00:02:18.160 --> 00:02:20.840 in a test environment very 00:02:20.840 --> 00:02:24.200 simply. So I will always encourage 00:02:24.200 --> 00:02:27.720 especially with search content that you 00:02:27.720 --> 00:02:30.319 look into the additional resource that I 00:02:30.319 --> 00:02:34.120 have listed here. The search reference 00:02:34.120 --> 00:02:36.440 documentation is one of my favorite 00:02:36.440 --> 00:02:38.760 bookmarks that I use frequently in my 00:02:38.760 --> 00:02:41.000 own environments and working in customer 00:02:41.000 --> 00:02:43.560 environments. It is really the 00:02:43.560 --> 00:02:46.000 best quick resource to get information 00:02:46.000 --> 00:02:49.560 on syntax and examples of any search 00:02:49.560 --> 00:02:51.760 command and is always a great 00:02:51.760 --> 00:02:55.000 resource to have. The search manual is a 00:02:55.000 --> 00:02:57.080 little bit more conceptual, but as you're 00:02:57.080 --> 00:02:59.120 learning more about different types of 00:02:59.120 --> 00:03:00.360 search operations, 00:03:00.360 --> 00:03:02.440 it's very helpful to be able to review 00:03:02.440 --> 00:03:03.500 this documentation 00:03:03.500 --> 00:03:05.560 and have reference 00:03:05.560 --> 00:03:08.680 material that you can come back to as 00:03:08.680 --> 00:03:11.080 you are studying and trying to get 00:03:11.080 --> 00:03:13.480 better and writing more complex 00:03:13.480 --> 00:03:16.879 search content. I have also linked here 00:03:16.879 --> 00:03:18.959 the documentation on how to use the 00:03:18.959 --> 00:03:21.799 Splunk tutorial data, so if you've not 00:03:21.799 --> 00:03:23.360 done that before, it's a very simple 00:03:23.360 --> 00:03:25.920 process, and there are consistently 00:03:25.920 --> 00:03:28.280 updated download files that Splunk 00:03:28.280 --> 00:03:30.680 provides that you're able to directly 00:03:30.680 --> 00:03:33.439 upload into any Splunk environment. So 00:03:33.439 --> 00:03:35.560 that's what I'm going to be using today, 00:03:35.560 --> 00:03:39.000 and given that you are searching over 00:03:39.000 --> 00:03:41.400 appropriate time windows for when you 00:03:41.400 --> 00:03:43.920 download the tutorial dataset, these 00:03:43.920 --> 00:03:46.519 searches will work on the tutorial 00:03:46.519 --> 00:03:48.760 data as well. So highly encourage, after 00:03:48.760 --> 00:03:50.879 the fact, if you want to go through 00:03:50.879 --> 00:03:53.760 and test out some of the content, 00:03:53.760 --> 00:03:56.920 you'll be able to access a recording as 00:03:56.920 --> 00:03:59.360 well as if you'd like the slides that 00:03:59.360 --> 00:04:00.959 I'm presenting off of today, which I 00:04:00.959 --> 00:04:02.280 highly encourage because there are a lot 00:04:02.280 --> 00:04:04.799 of useful links in here, reach out to 00:04:04.799 --> 00:04:06.760 my team. Again, right at the end of the 00:04:06.760 --> 00:04:08.599 slides we'll have that info. 00:04:08.599 --> 00:04:13.079 So looking at our overview of basic 00:04:13.079 --> 00:04:15.799 search, I just want to cover 00:04:15.799 --> 00:04:18.120 conceptually the two categories that 00:04:18.120 --> 00:04:21.639 we discuss in that session. And so those 00:04:21.639 --> 00:04:24.199 two are the statistical and charting 00:04:24.199 --> 00:04:28.479 functions which consist of in those 00:04:28.479 --> 00:04:31.479 demos aggregate and time functions. So 00:04:31.479 --> 00:04:33.919 aggregate functions are going to be your 00:04:33.919 --> 00:04:37.400 commonly used statistical functions 00:04:37.400 --> 00:04:40.400 meant for summarization, and then time 00:04:40.400 --> 00:04:43.199 functions actually using the 00:04:43.199 --> 00:04:46.639 timestamp field underscore time or any 00:04:46.639 --> 00:04:48.600 other time that you've extracted from 00:04:48.600 --> 00:04:51.759 data and looking at earliest, latest 00:04:51.759 --> 00:04:55.000 relative time values in a 00:04:55.000 --> 00:04:58.240 summative fashion. And then evaluation 00:04:58.240 --> 00:05:02.320 functions are the separate type where 00:05:02.320 --> 00:05:04.400 we discuss comparison and conditional 00:05:04.400 --> 00:05:07.600 statements so using your if and your 00:05:07.600 --> 00:05:10.240 case commands in 00:05:10.240 --> 00:05:14.120 evals. Also datetime functions that 00:05:14.120 --> 00:05:17.160 apply operations to events uniquely 00:05:17.160 --> 00:05:19.759 so not necessarily summarization, but 00:05:19.759 --> 00:05:22.280 interacting with the time values 00:05:22.280 --> 00:05:24.319 themselves, maybe changing the time 00:05:24.319 --> 00:05:27.000 format, and then multivalue evalq 00:05:27.000 --> 00:05:29.360 functions, we touch on that very lightly, 00:05:29.360 --> 00:05:31.720 and it is more conceptual in basic 00:05:31.720 --> 00:05:34.000 search. So today we're going to dive in 00:05:34.000 --> 00:05:36.120 as part of our demo and look at 00:05:36.120 --> 00:05:39.160 multivalue eval functions later in 00:05:39.160 --> 00:05:41.319 the presentation. 00:05:41.479 --> 00:05:44.880 So on this slide here I 00:05:44.880 --> 00:05:48.800 have highlighted in gray the search 00:05:48.800 --> 00:05:52.120 that we end basic search with. And so 00:05:52.120 --> 00:05:55.000 that is broken up into three segments 00:05:55.000 --> 00:05:57.479 where we have the first line being a 00:05:57.479 --> 00:06:00.240 filter to a dataset. This is very 00:06:00.240 --> 00:06:03.120 simply how you are sourcing most of your 00:06:03.120 --> 00:06:06.319 data in most of your searches in Splunk. 00:06:06.319 --> 00:06:08.000 And we always want to be a specific 00:06:08.000 --> 00:06:11.000 as possible. You'll most often see the 00:06:11.000 --> 00:06:13.039 logical way to do that is by 00:06:13.039 --> 00:06:15.680 identifying an index and a source type, 00:06:15.680 --> 00:06:18.120 possibly some specific values of given 00:06:18.120 --> 00:06:20.199 fields in that data before you start 00:06:20.199 --> 00:06:22.720 applying other operations. In our case, we 00:06:22.720 --> 00:06:25.199 want to work with a whole dataset, 00:06:25.199 --> 00:06:28.880 and then we move into applying our eval 00:06:28.880 --> 00:06:30.120 statements. 00:06:30.120 --> 00:06:33.080 So in the evals, the purpose of these is 00:06:33.080 --> 00:06:36.560 to create some new fields to work with, 00:06:36.560 --> 00:06:40.080 and so we have two operations here. 00:06:40.080 --> 00:06:42.440 And you can see that on the first line, 00:06:42.440 --> 00:06:46.120 we're starting with an error check field. 00:06:46.120 --> 00:06:49.160 These are web access logs, so we're 00:06:49.160 --> 00:06:52.720 looking at the HTTP status codes as the 00:06:52.720 --> 00:06:56.039 status field, and we have a logical 00:06:56.039 --> 00:06:57.599 condition here for greater than or equal 00:06:57.599 --> 00:07:00.680 to 400, we want to return errors. And so 00:07:00.680 --> 00:07:04.120 very simple example, making it as easy 00:07:04.120 --> 00:07:05.879 as possible. If you want to get specifics 00:07:05.879 --> 00:07:08.720 on your 200s and your 300s, it's the 00:07:08.720 --> 00:07:11.639 exact same type of logic to go and apply 00:07:11.639 --> 00:07:14.120 likely a case statement to get some 00:07:14.120 --> 00:07:17.199 additional conditions and more unique 00:07:17.199 --> 00:07:20.520 output in an error check or some sort of 00:07:20.520 --> 00:07:23.800 field indicating what you want to 00:07:23.800 --> 00:07:25.919 see out of your status code so this case, 00:07:25.919 --> 00:07:30.080 simple errors. Or the value of non error 00:07:30.080 --> 00:07:32.120 if we have say a 200. 00:07:32.120 --> 00:07:35.400 We're also using a time function to 00:07:35.400 --> 00:07:39.160 create a second field called day. You 00:07:39.160 --> 00:07:41.759 may be familiar with some of the 00:07:41.759 --> 00:07:46.360 fields that you get out of by default 00:07:46.360 --> 00:07:49.759 for most any events in Splunk and 00:07:49.759 --> 00:07:51.759 that they're related to breakdowns of 00:07:51.759 --> 00:07:56.000 the time stamps. You have day, month, 00:07:56.000 --> 00:07:58.240 and many others. In this case, I want to 00:07:58.240 --> 00:08:00.560 get a specific format for day so we use 00:08:00.560 --> 00:08:03.479 a strftime function, and we have a 00:08:03.479 --> 00:08:07.039 time format variable here on the actual 00:08:07.039 --> 00:08:10.280 extracted time stamp for Splunk. So 00:08:10.280 --> 00:08:12.039 coming out of the second line, we've 00:08:12.039 --> 00:08:14.319 accessed our data, we have created two 00:08:14.319 --> 00:08:17.479 new fields to use, and then we are 00:08:17.479 --> 00:08:20.960 actually performing charting with a 00:08:20.960 --> 00:08:23.680 statistical function, and so that is 00:08:23.680 --> 00:08:26.240 using timechart. And we can see here 00:08:26.240 --> 00:08:29.159 that we are counting our events that 00:08:29.159 --> 00:08:33.479 actually have the error value for our 00:08:33.479 --> 00:08:36.000 created error check field. And so I'm 00:08:36.000 --> 00:08:39.279 going to pivot over to Splunk here, 00:08:39.279 --> 00:08:40.880 and we're going to look at this search, 00:08:40.880 --> 00:08:43.440 and I have commented out most of the 00:08:43.440 --> 00:08:46.279 logic, we'll step back through it. We 00:08:46.279 --> 00:08:49.200 are looking in our web access log events 00:08:49.200 --> 00:08:52.800 here, and we want to then apply our 00:08:52.800 --> 00:08:58.240 eval. And so by applying the eval, we can 00:08:58.240 --> 00:09:01.279 get our error check field that provides 00:09:01.279 --> 00:09:03.279 error or non-error. We're seeing that we 00:09:03.279 --> 00:09:05.160 have mostly non-error 00:09:05.160 --> 00:09:09.680 events. And then we have the day field, 00:09:09.680 --> 00:09:11.760 and so day is actually providing the 00:09:11.760 --> 00:09:14.440 full name of day for the time stamp for 00:09:14.440 --> 00:09:17.800 all these events. So with our timechart, 00:09:17.800 --> 00:09:22.200 this is the summarization with a 00:09:22.200 --> 00:09:24.160 condition actually that we're spanning 00:09:24.160 --> 00:09:27.720 by default over a single day, so this may 00:09:27.720 --> 00:09:31.839 not be a very logical use of a split by 00:09:31.839 --> 00:09:34.910 day when we are already using a timechart 00:09:34.910 --> 00:09:37.079 command that is dividing our 00:09:37.079 --> 00:09:41.040 results by the time bin, effectively a 00:09:41.040 --> 00:09:46.079 span of one day. But what we can do is 00:09:46.079 --> 00:09:50.440 change our split by field to host and 00:09:50.440 --> 00:09:52.600 get a little bit more of a reasonable 00:09:52.600 --> 00:09:54.720 presentation. We were able to see with 00:09:54.720 --> 00:09:57.720 the counts in the individual days not 00:09:57.720 --> 00:09:59.600 only split through the time chart, but by 00:09:59.600 --> 00:10:02.399 the day field that we only had values 00:10:02.399 --> 00:10:04.959 where our matrix matched up for the 00:10:04.959 --> 00:10:09.680 actual day. So here we have our hosts 00:10:09.680 --> 00:10:12.640 one, two, and three, and then across days 00:10:12.640 --> 00:10:15.640 counts of the error events that we 00:10:15.640 --> 00:10:20.160 observe. So that is the search that we 00:10:20.160 --> 00:10:22.440 end on in basic search. The concepts 00:10:22.440 --> 00:10:25.040 there being accessing our data, 00:10:25.040 --> 00:10:27.279 searching in a descriptive manner, using 00:10:27.279 --> 00:10:29.320 our metadata fields, the index and the 00:10:29.320 --> 00:10:32.200 source type, the evaluation functions 00:10:32.200 --> 00:10:33.920 where we're creating new fields, 00:10:33.920 --> 00:10:37.639 manipulating data, and then we have a 00:10:37.639 --> 00:10:40.200 timechart function that is providing 00:10:40.200 --> 00:10:42.880 some summarized statistics here based 00:10:42.880 --> 00:10:44.480 on a time range. 00:10:44.480 --> 00:10:48.680 So we will pivot back, and we're 00:10:48.680 --> 00:10:51.399 going to take a step back out of the SPL 00:10:51.399 --> 00:10:54.200 for a second just to talk about these 00:10:54.200 --> 00:10:56.519 different kinds of search operations 00:10:56.519 --> 00:10:59.360 that we just performed. So you'll hear 00:10:59.360 --> 00:11:03.079 these terms if you are really kind of 00:11:03.079 --> 00:11:06.040 diving deeper into actual operations of 00:11:06.040 --> 00:11:09.920 Splunk searching. And you can get very 00:11:09.920 --> 00:11:12.560 detailed regarding the optimization of 00:11:12.560 --> 00:11:16.279 searches around these types of 00:11:16.279 --> 00:11:17.680 commands and the order in which you 00:11:17.680 --> 00:11:21.399 choose to execute SPL. Today I'm going to 00:11:21.399 --> 00:11:24.240 focus on how these operations actually 00:11:24.240 --> 00:11:27.240 apply to the data and helping you to 00:11:27.240 --> 00:11:29.320 make better decisions about what 00:11:29.320 --> 00:11:32.320 commands are best for the scenario that 00:11:32.320 --> 00:11:34.240 you have or the output that you want to 00:11:34.240 --> 00:11:37.639 see. And in future sessions, we will 00:11:37.639 --> 00:11:39.360 discuss the actual optimization of 00:11:39.360 --> 00:11:42.079 searches through this optimal order 00:11:42.079 --> 00:11:46.440 of functions and some other means. 00:11:46.440 --> 00:11:48.200 But just a caveat there that we're going 00:11:48.200 --> 00:11:50.440 to talk pretty specifically today 00:11:50.440 --> 00:11:52.839 just about these individually, how 00:11:52.839 --> 00:11:54.720 they work with data, and then how you 00:11:54.720 --> 00:11:56.569 see them in combination. 00:11:56.569 --> 00:11:59.839 So our types of SPL commands, 00:11:59.839 --> 00:12:03.160 the top three in bold we'll focus on in 00:12:03.160 --> 00:12:06.079 our examples. The first of which is 00:12:06.079 --> 00:12:08.223 streaming operations 00:12:08.223 --> 00:12:10.760 which are executed on 00:12:10.760 --> 00:12:13.079 individual events as they're returned by a 00:12:13.079 --> 00:12:15.399 search. So you can think of this like 00:12:15.399 --> 00:12:16.990 your evals 00:12:16.990 --> 00:12:18.880 that is going to be doing 00:12:18.880 --> 00:12:21.440 something to every single event, 00:12:21.440 --> 00:12:24.279 modifying fields when they're available. 00:12:24.279 --> 00:12:28.399 We do have generating functions. So 00:12:28.399 --> 00:12:30.800 generating function are going to be used 00:12:30.800 --> 00:12:33.839 situationally where you're sourcing data 00:12:33.839 --> 00:12:38.079 from non-indexed datasets, and so you 00:12:38.079 --> 00:12:40.839 would see that from either input 00:12:40.839 --> 00:12:43.760 lookup commands or maybe tstats, 00:12:43.760 --> 00:12:46.120 pulling information from the tsidx 00:12:46.120 --> 00:12:48.920 files, and so generating the 00:12:48.920 --> 00:12:51.079 statistical output based on the data 00:12:51.079 --> 00:12:55.040 available there. Transforming commands 00:12:55.040 --> 00:12:58.560 you will see as often as streaming 00:12:58.560 --> 00:13:00.600 commands, generally speaking, and more 00:13:00.600 --> 00:13:02.800 often than generating commands where 00:13:02.800 --> 00:13:05.399 transforming is intended to order 00:13:05.399 --> 00:13:08.519 results into a data table. And I often 00:13:08.519 --> 00:13:11.320 think of this much like how we discuss 00:13:11.320 --> 00:13:13.639 the statistical functions in basic 00:13:13.639 --> 00:13:17.160 search as summarization functions where 00:13:17.160 --> 00:13:19.519 you're looking to condense your overall 00:13:19.519 --> 00:13:22.680 dataset into really manageable 00:13:22.680 --> 00:13:24.880 consumable results. So these 00:13:24.880 --> 00:13:28.320 operations that apply that summarization 00:13:28.320 --> 00:13:31.720 are transforming. We do have two 00:13:31.720 --> 00:13:35.600 additional types of SPL commands, the 00:13:35.600 --> 00:13:39.480 first is orchestrating. You can read 00:13:39.480 --> 00:13:41.680 about these, I will not discuss in great 00:13:41.680 --> 00:13:45.199 detail. They are used to manipulate 00:13:45.199 --> 00:13:48.639 how searches are actually processed or 00:13:48.639 --> 00:13:50.800 or how commands are processed. And 00:13:50.800 --> 00:13:54.079 they don't directly affect the results 00:13:54.079 --> 00:13:56.079 in a search, how we think about say 00:13:56.079 --> 00:13:59.839 applying a stats or an eval to a data 00:13:59.839 --> 00:14:02.320 set. So if you're interested, 00:14:02.320 --> 00:14:04.399 definitely check it out. Linked 00:14:04.399 --> 00:14:07.418 documentation has details there. 00:14:07.418 --> 00:14:11.120 Dataset processing is seen much more often, 00:14:11.120 --> 00:14:15.000 and you do have some conditional 00:14:15.000 --> 00:14:18.680 scenarios where commands can act as 00:14:18.680 --> 00:14:21.759 dataset processing, so the 00:14:21.759 --> 00:14:23.959 distinction for dataset processing is 00:14:23.959 --> 00:14:26.360 going to be that you are operating in 00:14:26.360 --> 00:14:29.800 bulk on a single completed dataset at 00:14:29.800 --> 00:14:32.240 one time. So we'll look at an 00:14:32.240 --> 00:14:34.260 example of that. 00:14:34.260 --> 00:14:36.600 I want to pivot back to our main 00:14:36.600 --> 00:14:38.360 three that we're going to be focusing on, 00:14:38.360 --> 00:14:39.839 and I have mentioned some of these 00:14:39.839 --> 00:14:43.800 examples already. The eval functions 00:14:43.800 --> 00:14:45.880 that we've been talking about so far are 00:14:45.880 --> 00:14:47.920 perfect examples of our streaming 00:14:47.920 --> 00:14:51.440 commands. So where we are creating new 00:14:51.440 --> 00:14:55.600 fields for each entry or log event, 00:14:55.600 --> 00:14:59.399 where we are modifying values for all of 00:14:59.399 --> 00:15:01.920 the results that are available. That 00:15:01.920 --> 00:15:05.279 is where we are streaming with the 00:15:05.279 --> 00:15:08.560 search functions. Inputlookup is 00:15:08.560 --> 00:15:09.959 possibly one of the most common 00:15:09.959 --> 00:15:12.399 generating commands that I see 00:15:12.399 --> 00:15:15.199 because someone is intending to 00:15:15.199 --> 00:15:18.720 source a dataset stored in a CSV file 00:15:18.720 --> 00:15:21.480 or a KV store collection, and you're 00:15:21.480 --> 00:15:23.720 able to bring that back as a report and 00:15:23.720 --> 00:15:27.719 use that logic in your queries. 00:15:27.719 --> 00:15:29.639 So that is 00:15:29.639 --> 00:15:33.399 not requiring the index data or 00:15:33.399 --> 00:15:35.560 any index data to actually return the 00:15:35.560 --> 00:15:38.490 results that you want to see. 00:15:38.914 --> 00:15:41.319 And we've talked about stats, very 00:15:41.319 --> 00:15:43.600 generally speaking, with a lot of 00:15:43.600 --> 00:15:46.440 unique functions you can apply there 00:15:46.440 --> 00:15:49.560 where this is going to provide a tabular 00:15:49.560 --> 00:15:53.560 output. And it is serving that purpose of 00:15:53.560 --> 00:15:54.800 summarization, so we're really 00:15:54.800 --> 00:15:57.560 reformatting the data into that 00:15:57.560 --> 00:16:00.920 tabular report. 00:16:02.000 --> 00:16:06.519 So we see in this example search here 00:16:06.519 --> 00:16:09.000 that we are often combining these 00:16:09.000 --> 00:16:12.360 different types of search operations. So 00:16:12.360 --> 00:16:15.240 in this example that we have, I have 00:16:15.240 --> 00:16:19.319 data that already exists in a CSV file. 00:16:19.319 --> 00:16:22.839 We are applying a streaming command here, 00:16:22.839 --> 00:16:26.000 where, evaluating each line to see if 00:16:26.000 --> 00:16:28.399 we match a condition, and then returning 00:16:28.399 --> 00:16:29.639 the results 00:16:29.639 --> 00:16:32.240 based on that evaluation. And then we're 00:16:32.240 --> 00:16:34.199 applying a transforming command at the 00:16:34.199 --> 00:16:36.639 end which is that stats summarization, 00:16:36.639 --> 00:16:40.480 getting the maximum values for the 00:16:40.480 --> 00:16:44.319 count of errors and the host that is 00:16:44.319 --> 00:16:47.600 associated with that. So let's pivot over 00:16:47.600 --> 00:16:52.079 to Splunk and we'll take a look at that example. 00:16:54.160 --> 00:16:56.319 So I'm just going to grab my 00:16:56.319 --> 00:16:59.440 search here and I precommented out 00:16:59.440 --> 00:17:04.212 the specific lines following inputlookup 00:17:04.212 --> 00:17:06.079 just to see that this generating 00:17:06.079 --> 00:17:07.799 command here is not looking for any 00:17:07.799 --> 00:17:10.160 specific index data. We're pulling 00:17:10.160 --> 00:17:13.240 directly the results that I have in a 00:17:13.240 --> 00:17:17.720 CSV file here into this output, and so 00:17:17.720 --> 00:17:20.520 we have a count of errors observed 00:17:20.520 --> 00:17:25.439 across multiple hosts. Our where command 00:17:25.439 --> 00:17:28.520 you might think is reformatting data 00:17:28.520 --> 00:17:31.000 in the sense it is transforming the 00:17:31.000 --> 00:17:34.160 results, but the evaluation of a where 00:17:34.160 --> 00:17:37.320 function does apply effectively to every 00:17:37.320 --> 00:17:41.760 event that is returned. So it is a 00:17:41.760 --> 00:17:43.960 streaming command that is going to 00:17:43.960 --> 00:17:46.559 filter down our result set based on our 00:17:46.559 --> 00:17:49.120 condition that the error count is less 00:17:49.120 --> 00:17:50.919 than 200. 00:17:50.919 --> 00:17:54.760 So the following line is our 00:17:54.760 --> 00:17:57.320 transforming command where we have two 00:17:57.320 --> 00:18:02.240 results left 187 for host 3. We want 00:18:02.240 --> 00:18:06.039 to see our maximum values here of 187 on 00:18:06.039 --> 00:18:09.960 host 3. So our scenario here has really 00:18:09.960 --> 00:18:13.400 covered where you may have hosts 00:18:13.400 --> 00:18:15.960 that are trending toward a negative 00:18:15.960 --> 00:18:19.280 state. You're aware that the second 00:18:19.280 --> 00:18:22.039 host had already exceeded its 00:18:22.039 --> 00:18:25.360 threshold value for errors, but host 3 00:18:25.360 --> 00:18:27.440 also appears to be trending toward this 00:18:27.440 --> 00:18:30.159 threshold. So being able to combine 00:18:30.159 --> 00:18:33.000 these types of commands, understand 00:18:33.000 --> 00:18:35.240 the logical condition that you're 00:18:35.240 --> 00:18:37.679 searching for, and then also providing 00:18:37.679 --> 00:18:40.840 that consumable output. So combining 00:18:40.840 --> 00:18:44.480 all three of our types of commands here. 00:18:46.166 --> 00:18:49.440 So I'm going to jump to an SPL 00:18:49.440 --> 00:18:53.159 demo, and as I go through these different 00:18:53.159 --> 00:18:55.840 commands, I'm going to be referencing 00:18:55.840 --> 00:18:58.360 back to the different command types that 00:18:58.360 --> 00:19:00.080 we're working with. I'm going to 00:19:00.080 --> 00:19:02.360 introduce in a lot of these searches 00:19:02.360 --> 00:19:04.679 a lot of small commands that I won't 00:19:04.679 --> 00:19:07.000 talk about in great detail and that 00:19:07.000 --> 00:19:09.360 really is the purpose of using your 00:19:09.360 --> 00:19:11.640 search manual, using your search 00:19:11.640 --> 00:19:14.760 reference documentation. So I will 00:19:14.760 --> 00:19:17.400 glance over the use case, talk about 00:19:17.400 --> 00:19:19.559 how it's meant to be applied, and then 00:19:19.559 --> 00:19:22.200 using in your own scenarios where you 00:19:22.200 --> 00:19:24.400 have problem you need to solve, 00:19:24.400 --> 00:19:26.880 referencing the docs to find out where 00:19:26.880 --> 00:19:29.960 you can apply similar functions to 00:19:29.960 --> 00:19:32.559 what we observe in the the demonstration here. 00:19:32.559 --> 00:19:36.760 So the first command I'm going to 00:19:36.760 --> 00:19:40.880 focus on is the rex command. So rex is a 00:19:40.880 --> 00:19:43.480 streaming command that you often see 00:19:43.480 --> 00:19:46.559 applied to datasets that do not fully 00:19:46.559 --> 00:19:49.720 have data extracted in the format that 00:19:49.720 --> 00:19:53.159 you want to be using in your 00:19:53.159 --> 00:19:56.760 reporting or in your logic. And so 00:19:56.760 --> 00:20:00.120 this could very well be handled actually 00:20:00.120 --> 00:20:03.440 in the configuration of props and 00:20:03.440 --> 00:20:06.080 transforms and extracting fields at the 00:20:06.080 --> 00:20:08.480 right times and indexing data, but as 00:20:08.480 --> 00:20:10.280 your bringing new data sources, you need 00:20:10.280 --> 00:20:12.480 to understand what's available for use 00:20:12.480 --> 00:20:14.360 in Splunk. A lot of times you'll find 00:20:14.360 --> 00:20:16.840 yourself needing to extract new fields 00:20:16.840 --> 00:20:19.200 in line in your searches and be able 00:20:19.200 --> 00:20:22.080 to use those in your search logic. Rex 00:20:22.080 --> 00:20:28.039 also has a sed mode that I also see 00:20:28.039 --> 00:20:31.600 testing done for masking of data in line 00:20:31.600 --> 00:20:34.080 prior to actually putting that into 00:20:34.080 --> 00:20:36.356 indexing configurations. 00:20:36.356 --> 00:20:38.000 So rex you would 00:20:38.000 --> 00:20:41.200 generally see used when you don't 00:20:41.200 --> 00:20:43.039 have those fields available, you need to 00:20:43.039 --> 00:20:45.640 use them at that time. And then we're 00:20:45.640 --> 00:20:47.120 going to take a look at an example of 00:20:47.120 --> 00:20:49.640 masking data as well to test your 00:20:49.640 --> 00:20:53.480 syntax for a sed style replace in 00:20:53.480 --> 00:21:00.600 config files. So we will jump back over. 00:21:04.679 --> 00:21:06.880 So I'm going to start with a search on 00:21:06.880 --> 00:21:10.120 an index source type, my tutorial data. 00:21:10.120 --> 00:21:13.159 And then this is actual Linux secure 00:21:13.159 --> 00:21:16.159 logging so these are going to be OS 00:21:16.159 --> 00:21:19.039 security logs, and we're looking at all 00:21:19.039 --> 00:21:21.039 of our web hosts that we've been 00:21:21.039 --> 00:21:22.850 focusing on previously. 00:21:22.850 --> 00:21:25.000 In our events, you can see 00:21:25.000 --> 00:21:29.039 that we have first here an event that 00:21:29.039 --> 00:21:31.870 has failed password for invalid user inet, 00:21:31.870 --> 00:21:34.320 We're provided a source IP, a source 00:21:34.320 --> 00:21:36.559 port, and we go to see the fields that 00:21:36.559 --> 00:21:38.919 are extracted and that's not 00:21:38.919 --> 00:21:41.919 being done for us automatically. So just 00:21:41.919 --> 00:21:43.880 to start testing our logic to see if we 00:21:43.880 --> 00:21:46.799 can get the results we want to see, 00:21:46.799 --> 00:21:49.760 we're going to use the rex command. And 00:21:49.760 --> 00:21:53.240 in doing so, we are applying this 00:21:53.240 --> 00:21:55.440 operation across every event, again, a 00:21:55.440 --> 00:21:59.600 streaming command. We are looking at the 00:21:59.600 --> 00:22:01.279 raw field, so we're actually looking at 00:22:01.279 --> 00:22:04.679 the raw text of each of these log events. 00:22:04.679 --> 00:22:07.480 And then the rex syntax is simply to 00:22:07.480 --> 00:22:11.960 provide in double quotes a regex 00:22:11.960 --> 00:22:14.840 match, and we're using named groups for 00:22:14.840 --> 00:22:17.440 field extractions. So for every single 00:22:17.440 --> 00:22:19.440 event that we see failed password for 00:22:19.440 --> 00:22:22.919 invalid user, we are actually extracting 00:22:22.919 --> 00:22:26.400 a user field, the source IP field, and the 00:22:26.400 --> 00:22:28.799 source port field. For the sake of 00:22:28.799 --> 00:22:30.880 simplicity, I tried to keep the regex simple. 00:22:30.880 --> 00:22:33.760 You can make this as complex as you need 00:22:33.760 --> 00:22:37.679 to for your needs, for your data. And 00:22:37.679 --> 00:22:40.960 so in our extracted fields, I've 00:22:40.960 --> 00:22:42.840 actually pre-selected these so we can 00:22:42.840 --> 00:22:46.240 see our user is now available, and this 00:22:46.240 --> 00:22:50.039 applies to the events where the regex was 00:22:50.039 --> 00:22:53.159 actually valid and matching on the 00:22:53.159 --> 00:22:57.440 failed password for invalid user, etc string. 00:22:57.440 --> 00:23:00.120 So now that we have our fields 00:23:00.120 --> 00:23:03.799 extracted, we can actually use these. And 00:23:03.799 --> 00:23:04.629 we want 00:23:04.629 --> 00:23:09.400 to do a stats count as failed logins, so 00:23:09.400 --> 00:23:13.400 anytime you see an operation as and 00:23:13.400 --> 00:23:16.640 then a unique name, just a rename 00:23:16.640 --> 00:23:19.080 through the transformation function, 00:23:19.080 --> 00:23:21.480 easier way to actually keep 00:23:21.480 --> 00:23:23.480 consistency with referencing your 00:23:23.480 --> 00:23:26.760 fields as well as not have to rename 00:23:26.760 --> 00:23:29.919 later on with some additional- in this 00:23:29.919 --> 00:23:31.679 case, you'd have to reference the name 00:23:31.679 --> 00:23:34.520 distinct count so just a way to keep 00:23:34.520 --> 00:23:38.320 things clean and easy to use in further 00:23:38.320 --> 00:23:42.159 lines of SPL. So we are counting our 00:23:42.159 --> 00:23:43.919 failed logins, we're looking at the 00:23:43.919 --> 00:23:47.840 distinct count of the source IP values 00:23:47.840 --> 00:23:50.000 that we have, and then we're splitting 00:23:50.000 --> 00:23:52.960 that by the host and the user. So you can 00:23:52.960 --> 00:23:55.720 see here, this tutorial data is 00:23:55.720 --> 00:23:57.880 actually pretty flat across most of the 00:23:57.880 --> 00:24:00.120 sources so we're not going to have 00:24:00.120 --> 00:24:04.679 any outliers or spikes in our stats here, 00:24:04.679 --> 00:24:07.960 but you can see the resulting presentation. 00:24:08.960 --> 00:24:11.440 In line four, we do have a 00:24:11.440 --> 00:24:14.840 sort command, and this is an example of a 00:24:14.840 --> 00:24:17.520 dataset processing command where we are 00:24:17.520 --> 00:24:20.400 actually evaluating a full completed 00:24:20.400 --> 00:24:23.640 dataset and reordering it. Given the 00:24:23.640 --> 00:24:26.000 logic here, we want to descend on these 00:24:26.000 --> 00:24:29.000 numeric values. So keep mind as you're 00:24:29.000 --> 00:24:31.200 operating on different fields, it's going 00:24:31.200 --> 00:24:33.799 to be the same sort of either basic 00:24:33.799 --> 00:24:37.159 numeric or the lexicographical ordering 00:24:37.159 --> 00:24:40.360 that you typically see in Splunk. 00:24:40.840 --> 00:24:45.720 So we do have a second example 00:24:45.720 --> 00:24:49.200 with the sed style replace. 00:24:54.240 --> 00:24:58.640 So you can see in my events here 00:24:58.640 --> 00:25:01.640 we are searching the tutorial and 00:25:01.640 --> 00:25:05.039 vendor sales index and source type. And 00:25:05.039 --> 00:25:06.720 I've gone ahead and applied one 00:25:06.720 --> 00:25:09.399 operation, and this is going to be a 00:25:09.399 --> 00:25:11.880 helpful operation to understand really 00:25:11.880 --> 00:25:14.679 what we are replacing and how to get 00:25:14.679 --> 00:25:18.159 consistent operation on these fields. 00:25:18.159 --> 00:25:20.279 So in this case, we are actually creating 00:25:20.279 --> 00:25:23.559 an ID length field where we are going to 00:25:23.559 --> 00:25:26.960 choose to mask the value of account ID 00:25:26.960 --> 00:25:29.120 in our rex command. We want to know that 00:25:29.120 --> 00:25:31.679 that's a consistent number of characters 00:25:31.679 --> 00:25:33.799 through all of our data. It's very 00:25:33.799 --> 00:25:37.080 simple to spot check, but just to be 00:25:37.080 --> 00:25:39.440 certain, we want to apply this to all of 00:25:39.440 --> 00:25:42.760 our data, in this case, streaming command 00:25:42.760 --> 00:25:45.520 through this eval. We 00:25:45.520 --> 00:25:49.279 are changing the type of the data 00:25:49.279 --> 00:25:51.919 because account ID is actually numeric. 00:25:51.919 --> 00:25:53.720 We're making that a string value so that 00:25:53.720 --> 00:25:56.720 we can look at the length. These are 00:25:56.720 --> 00:25:58.840 common functions in any programming 00:25:58.840 --> 00:26:01.559 languages, and so the syntax here in 00:26:01.559 --> 00:26:04.039 SPL is quite simple. Just to be able 00:26:04.039 --> 00:26:06.520 to get that contextual feel, we 00:26:06.520 --> 00:26:09.399 understand we have 16 characters for 00:26:09.399 --> 00:26:12.480 100% of our events in the account IDs. 00:26:12.480 --> 00:26:17.000 So actually applying our rex command, 00:26:17.000 --> 00:26:20.760 we are going to now specify a unique 00:26:20.760 --> 00:26:23.919 field, not just underscore raw. We are 00:26:23.919 --> 00:26:27.159 applying the sed mode, and this is a 00:26:27.159 --> 00:26:30.799 sed syntax replacement looking 00:26:30.799 --> 00:26:33.559 for the- it's a capture group for the 00:26:33.559 --> 00:26:35.880 first 12 digits. And then we're 00:26:35.880 --> 00:26:39.240 replacing that with a series of 12 X's. 00:26:39.240 --> 00:26:42.039 So you can see in our first event, the 00:26:42.039 --> 00:26:45.320 account ID is now masked, we only have 00:26:45.320 --> 00:26:48.520 the remaining four digits to be able to 00:26:48.520 --> 00:26:52.320 identify that. And so if our data was 00:26:52.320 --> 00:26:55.360 indexed and is appropriately done so 00:26:55.360 --> 00:26:58.039 in Splunk with the full account IDs, but 00:26:58.039 --> 00:27:00.360 for the sake of reporting we want to 00:27:00.360 --> 00:27:04.840 be able to mask that for the audience, 00:27:04.840 --> 00:27:07.799 then we're able to use the sed 00:27:07.799 --> 00:27:11.919 replace. And then to finalize a report, 00:27:11.919 --> 00:27:13.880 this is just an example of the top 00:27:13.880 --> 00:27:16.399 command which does a few operations 00:27:16.399 --> 00:27:18.120 together and makes for a good 00:27:18.120 --> 00:27:20.720 shorthand report, taking all the 00:27:20.720 --> 00:27:24.080 unique values of the provided field, 00:27:24.080 --> 00:27:26.480 giving you a count of those values, and 00:27:26.480 --> 00:27:29.000 then showing the percentage 00:27:29.000 --> 00:27:31.679 of the makeup for the total dataset 00:27:31.679 --> 00:27:34.520 that that unique value accounts for. So 00:27:34.520 --> 00:27:37.399 again, pretty flat in this tutorial data 00:27:37.399 --> 00:27:40.200 in seeing a very consistent 00:27:40.200 --> 00:27:45.159 .03% across these different account IDs. 00:27:46.679 --> 00:27:51.080 So we have looked at a few examples 00:27:51.080 --> 00:27:54.640 with the rex command, and that is 00:27:54.640 --> 00:27:57.039 again, streaming. We're going to look at 00:27:57.039 --> 00:27:59.120 another streaming command 00:27:59.120 --> 00:28:02.399 which is going to be a set of 00:28:02.399 --> 00:28:07.200 multivalue eval functions. And so again, 00:28:07.200 --> 00:28:09.559 if you're to have a bookmark for search 00:28:09.559 --> 00:28:12.320 documentation, multivalue eval functions 00:28:12.320 --> 00:28:14.559 are a great one to have because when 00:28:14.559 --> 00:28:17.240 you encounter these, it really takes 00:28:17.240 --> 00:28:19.960 some time to figure out how to actually 00:28:19.960 --> 00:28:25.960 operate on data. And so the 00:28:25.960 --> 00:28:29.559 multivalue functions are really just 00:28:29.559 --> 00:28:31.799 a collection that depending on your use 00:28:31.799 --> 00:28:34.679 case, you're able to determine the 00:28:34.679 --> 00:28:39.080 best to apply. You see it often used 00:28:39.080 --> 00:28:42.840 with JSON and XML so data formats 00:28:42.840 --> 00:28:44.880 that are actually naturally going to 00:28:44.880 --> 00:28:47.360 provide a multivalue field where you 00:28:47.360 --> 00:28:50.480 have repeated tags or keys across 00:28:50.480 --> 00:28:54.320 unique events as they're extracted. 00:28:54.320 --> 00:28:56.360 And you often see a lot of times in 00:28:56.360 --> 00:28:58.480 Windows event logs, you actually have 00:28:58.480 --> 00:29:01.360 repeated key values where your values 00:29:01.360 --> 00:29:02.960 are different and the position in the 00:29:02.960 --> 00:29:05.200 event is actually specific to a 00:29:05.200 --> 00:29:08.840 condition, so you may have a need 00:29:08.840 --> 00:29:11.440 for extraction or interaction with one 00:29:11.440 --> 00:29:14.399 of those unique values to actually 00:29:14.399 --> 00:29:18.600 get a reasonable outcome from your data. 00:29:18.600 --> 00:29:22.799 And so we're going to use 00:29:22.799 --> 00:29:25.960 multivalue eval functions when we 00:29:25.960 --> 00:29:28.679 have a change we want make to the 00:29:28.679 --> 00:29:31.880 presentation of data and we're able 00:29:31.880 --> 00:29:34.880 to do so with multivalue fields. This I 00:29:34.880 --> 00:29:36.720 would say often occurs when you have 00:29:36.720 --> 00:29:39.960 multivalue data and then you want to 00:29:39.960 --> 00:29:43.080 be able to change the format of the 00:29:43.080 --> 00:29:45.640 multivalue fields there. And then 00:29:45.640 --> 00:29:46.960 we're also going to look at a quick 00:29:46.960 --> 00:29:51.279 example of actually using multivalue 00:29:51.279 --> 00:29:54.880 evaluation as a logical condition. 00:29:54.880 --> 00:30:00.039 So the first example. 00:30:03.320 --> 00:30:05.679 We're going to start with a 00:30:05.679 --> 00:30:08.720 simple table looking at our web access 00:30:08.720 --> 00:30:11.240 logs, and so we're just going to pull 00:30:11.240 --> 00:30:14.880 in our status and referer domain fields. 00:30:14.880 --> 00:30:18.440 And so you can see we've got a 00:30:18.440 --> 00:30:23.000 HTTP status code, and we've got the 00:30:23.000 --> 00:30:26.120 format of a protocol subdomain 00:30:26.120 --> 00:30:29.519 TLD. And our scenario here is that for a 00:30:29.519 --> 00:30:31.559 simplicity of reporting, we just want 00:30:31.559 --> 00:30:33.760 to work with this referer domain field 00:30:33.760 --> 00:30:38.320 and be able to simplify that. So in 00:30:38.320 --> 00:30:41.799 actually splitting out the field in this 00:30:41.799 --> 00:30:44.880 case, split referer domain, and then 00:30:44.880 --> 00:30:47.720 choosing the period character as our 00:30:47.720 --> 00:30:50.399 point to split the data. We're creating a 00:30:50.399 --> 00:30:52.919 multivalue from what was previously 00:30:52.919 --> 00:30:57.200 just a single value field. And using 00:30:57.200 --> 00:31:01.600 this, we can actually create a new field 00:31:01.600 --> 00:31:06.080 by using the index of a multivalue field, 00:31:06.080 --> 00:31:08.039 and in this case, we're looking at 00:31:08.039 --> 00:31:10.740 index 012. 00:31:10.740 --> 00:31:13.279 The multivalue index function allows 00:31:13.279 --> 00:31:15.799 us to target a specific field and then 00:31:15.799 --> 00:31:18.559 choose a starting and ending index to 00:31:18.559 --> 00:31:21.320 extract given values. There are a number 00:31:21.320 --> 00:31:23.320 of ways to do this. In our case here 00:31:23.320 --> 00:31:25.039 where we have three entries, it's quite 00:31:25.039 --> 00:31:26.639 simple just to give that start and end 00:31:26.639 --> 00:31:28.639 of range as the 00:31:28.639 --> 00:31:29.841 two entries 00:31:29.841 --> 00:31:35.360 apart. So as we are working to recreate 00:31:35.360 --> 00:31:39.200 our domain, and so that is just applying 00:31:39.200 --> 00:31:41.720 for this new domain field, we have 00:31:41.720 --> 00:31:44.200 buttercupgames.com in what was 00:31:44.200 --> 00:31:47.686 previously the HTTP www.buttercup 00:31:47.686 --> 00:31:51.440 games.com. We can now use those fields 00:31:51.440 --> 00:31:54.720 in a transformation function. In this 00:31:54.720 --> 00:31:58.039 case, simple stats count by status in 00:31:58.039 --> 00:32:00.200 the domain. 00:32:02.600 --> 00:32:06.960 So I do want to look at another 00:32:06.960 --> 00:32:10.240 example here that is similar, but 00:32:10.240 --> 00:32:13.639 we're going to use a multivalue function 00:32:13.639 --> 00:32:16.919 to actually test a condition. And so I'm 00:32:16.919 --> 00:32:18.399 going to, 00:32:18.399 --> 00:32:21.639 in this case, be searching the same 00:32:21.639 --> 00:32:24.240 data. We're going to start with a stats 00:32:24.240 --> 00:32:28.639 command, and so a stats count as well as 00:32:28.639 --> 00:32:32.039 a values of status. And so the values 00:32:32.039 --> 00:32:33.360 function is going to provide all the 00:32:33.360 --> 00:32:37.480 unique values of a given field based 00:32:37.480 --> 00:32:41.840 on the split by. And so that produces 00:32:41.840 --> 00:32:44.960 a multivalue field here in the case of 00:32:44.960 --> 00:32:47.279 status. We have quite a few events 00:32:47.279 --> 00:32:50.799 that have multiple status codes, and as 00:32:50.799 --> 00:32:52.960 we're interested in pulling those events 00:32:52.960 --> 00:32:57.480 out, we can use an mvcount function to 00:32:57.480 --> 00:33:01.200 evaluate and filter our dataset to 00:33:01.200 --> 00:33:04.240 those specific events. So a very simple 00:33:04.240 --> 00:33:07.200 operation here, you're just looking at what has 00:33:07.200 --> 00:33:10.240 the- what has more than a single value 00:33:10.240 --> 00:33:13.399 for status, but very useful as you're 00:33:13.399 --> 00:33:15.919 applying this in reporting especially in 00:33:15.919 --> 00:33:19.000 combination with others and with more 00:33:19.000 --> 00:33:22.639 complex conditions. 00:33:22.639 --> 00:33:28.200 So that is our set of multivalue 00:33:28.200 --> 00:33:32.519 eval functions there as streaming commands. 00:33:34.240 --> 00:33:38.279 So for a final section of 00:33:38.279 --> 00:33:42.000 the demo, I want to talk about a concept 00:33:42.000 --> 00:33:44.720 that is not so much a set of functions, 00:33:44.720 --> 00:33:47.960 but really enables more complex 00:33:47.960 --> 00:33:50.159 and interesting searching and can allow 00:33:50.159 --> 00:33:52.799 us to use a few different types of 00:33:52.799 --> 00:33:57.240 commands in our SPL. And so the concept of 00:33:57.240 --> 00:34:00.200 subsearching for both filtering and 00:34:00.200 --> 00:34:04.279 enrichment is taking secondary search 00:34:04.279 --> 00:34:06.960 results, and we're using that to 00:34:06.960 --> 00:34:10.659 affect a primary search. So a subsearch 00:34:10.659 --> 00:34:12.200 will be executed, the results 00:34:12.200 --> 00:34:15.079 returned, and depending on how it's used, 00:34:15.079 --> 00:34:17.760 this is going to be processed in the 00:34:17.760 --> 00:34:21.599 original search, and that is going to- 00:34:21.599 --> 00:34:24.359 We'll look at an example that it is 00:34:24.359 --> 00:34:27.399 filtering. So based on the results, we get 00:34:27.399 --> 00:34:31.240 a effectively a value equals X or value 00:34:31.240 --> 00:34:34.320 equals y for one of our fields that 00:34:34.320 --> 00:34:37.159 we're looking at in the subsearch. 00:34:37.159 --> 00:34:39.320 And then we're also going to look at an 00:34:39.320 --> 00:34:42.399 enrichment example, so you see this often 00:34:42.399 --> 00:34:45.760 when you have a dataset maybe saved 00:34:45.760 --> 00:34:48.480 in a lookup table or you just have a 00:34:48.480 --> 00:34:50.079 simple reference where you want to bring 00:34:50.079 --> 00:34:52.879 in more context, maybe descriptions of 00:34:52.879 --> 00:34:54.560 event codes, things like 00:34:54.560 --> 00:34:59.640 that. So in that case, 00:35:02.160 --> 00:35:05.440 we'll look at the first command here. Now, 00:35:05.440 --> 00:35:08.160 I'm going to run my search, and we're 00:35:08.160 --> 00:35:12.119 going to pivot over to a subsearch 00:35:12.119 --> 00:35:14.740 tab here. And so you can see our subsearch 00:35:14.740 --> 00:35:19.720 looking at the secure logs. 00:35:19.720 --> 00:35:21.880 We are actually just pulling out the 00:35:21.880 --> 00:35:24.359 search to see what the results are or 00:35:24.359 --> 00:35:26.079 what's going to be returned from that 00:35:26.079 --> 00:35:28.839 subsearch. So we're applying the same 00:35:28.839 --> 00:35:31.200 rex that we had before to extract our 00:35:31.200 --> 00:35:33.720 fields. We're applying a where, a streaming 00:35:33.720 --> 00:35:35.920 command looking for anything that's not 00:35:35.920 --> 00:35:38.599 null for user. We observed that we had 00:35:38.599 --> 00:35:40.920 about 60% of our events that were going 00:35:40.920 --> 00:35:43.359 to be null based on not having a user 00:35:43.359 --> 00:35:46.970 field, and so looking at that total dataset, 00:35:46.970 --> 00:35:50.280 we're just going to count by our 00:35:50.280 --> 00:35:53.839 source IP. And this is often a quick way 00:35:53.839 --> 00:35:56.839 to really just get a list of unique 00:35:56.839 --> 00:35:59.880 values of any given field. And then 00:35:59.880 --> 00:36:03.119 operating on that to return just the 00:36:03.119 --> 00:36:05.079 the list of values, few different ways to 00:36:05.079 --> 00:36:08.800 do that, I see stats count pretty often. 00:36:08.800 --> 00:36:10.599 And in this case, we're actually tabling 00:36:10.599 --> 00:36:13.960 out just keeping our source IP field and 00:36:13.960 --> 00:36:16.800 renaming it to client IP, so the resulting 00:36:16.800 --> 00:36:20.560 dataset is a single column table 00:36:20.560 --> 00:36:21.440 with 00:36:21.440 --> 00:36:26.319 182 results, and the field name is client 00:36:26.319 --> 00:36:29.880 IP. So when returned to the original 00:36:29.880 --> 00:36:32.119 search, we're running this as a sub 00:36:32.119 --> 00:36:36.319 search, the effective result of this is 00:36:36.319 --> 00:36:39.960 actually client IP equals my first value 00:36:39.960 --> 00:36:43.800 here or client IP equals my second value 00:36:43.800 --> 00:36:46.960 and so on through the full dataset. And 00:36:46.960 --> 00:36:49.200 so looking at our search here, we're 00:36:49.200 --> 00:36:52.359 applying this to the access logs. You can 00:36:52.359 --> 00:36:55.280 see that we had a field named source IP 00:36:55.280 --> 00:36:58.520 in the secure logs and we renamed to 00:36:58.520 --> 00:37:02.160 client IP so that we could apply this to 00:37:02.160 --> 00:37:05.760 the access logs where client IP is the 00:37:05.760 --> 00:37:09.480 actual field name for the source IP 00:37:09.480 --> 00:37:13.560 data. And in this case, we are filtering 00:37:13.560 --> 00:37:16.079 to the client IP's relevant in the secure 00:37:16.079 --> 00:37:19.839 logs for our web access logs. 00:37:19.839 --> 00:37:23.960 So uncommenting here, we have a 00:37:23.960 --> 00:37:26.800 series of operations that we're doing, 00:37:26.800 --> 00:37:29.000 and I'm just going to run them all at 00:37:29.000 --> 00:37:33.079 once and talk through that we are 00:37:33.079 --> 00:37:37.240 counting the status or we're counting 00:37:37.240 --> 00:37:40.319 the events by status and client IP 00:37:40.319 --> 00:37:42.640 for the client IPs that were relevant to 00:37:42.640 --> 00:37:44.880 authentication failures in the secure 00:37:44.880 --> 00:37:48.760 logs. We are then creating a status count 00:37:48.760 --> 00:37:52.040 field just by combining our status 00:37:52.040 --> 00:37:54.680 and count fields, adding a colon 00:37:54.680 --> 00:37:58.640 between them. And then we are doing a 00:37:58.640 --> 00:38:02.079 second stats statement here to 00:38:02.079 --> 00:38:03.960 actually combine all of our newly 00:38:03.960 --> 00:38:06.319 created fields together in a more 00:38:06.319 --> 00:38:10.560 condensed report. So a transforming command, 00:38:10.560 --> 00:38:12.520 then streaming for creating our new 00:38:12.520 --> 00:38:15.359 field, another transforming command, and 00:38:15.359 --> 00:38:17.880 then our sort for dataset processing 00:38:17.880 --> 00:38:20.920 actually gives us the results here for a 00:38:20.920 --> 00:38:25.480 given client IP. And so we are, in this 00:38:25.480 --> 00:38:28.440 case, looking for the scenario that 00:38:28.440 --> 00:38:31.319 these client IPs that are involved in 00:38:31.319 --> 00:38:34.240 authentication failures to the web 00:38:34.240 --> 00:38:37.319 servers. In this case, these were all over 00:38:37.319 --> 00:38:39.680 SSH. We want to see if there are 00:38:39.680 --> 00:38:42.760 interactions by these same source IPs 00:38:42.760 --> 00:38:46.079 actually on the website that we're 00:38:46.079 --> 00:38:50.200 hosting. So seeing a high number of 00:38:50.200 --> 00:38:53.400 failed values, looking at actions also is 00:38:53.400 --> 00:38:55.599 a use case here for just bringing in 00:38:55.599 --> 00:38:57.680 that context and seeing if there's any 00:38:57.680 --> 00:39:00.520 sort of relationship between the data. 00:39:00.520 --> 00:39:04.000 This is discussed often as correlation 00:39:04.000 --> 00:39:07.680 of logs. I'm usually careful about using 00:39:07.680 --> 00:39:09.440 the term correlation in talking about 00:39:09.440 --> 00:39:11.119 Splunk queries especially in Enterprise 00:39:11.119 --> 00:39:12.640 security talking about correlation 00:39:12.640 --> 00:39:16.119 searches where I typically think of 00:39:16.119 --> 00:39:18.480 correlation searches as being 00:39:18.480 --> 00:39:20.599 overarching concepts that cover data 00:39:20.599 --> 00:39:23.920 from multiple data sources, and in this 00:39:23.920 --> 00:39:26.480 case, correlating events would be looking 00:39:26.480 --> 00:39:28.400 at unique data types that are 00:39:28.400 --> 00:39:31.240 potentially related in finding that 00:39:31.240 --> 00:39:33.839 logical connection for the condition. 00:39:33.839 --> 00:39:35.880 That's a little bit more up to the user. 00:39:35.880 --> 00:39:38.319 It's not quite as easy as say, 00:39:38.319 --> 00:39:41.520 pointing to a specific data 00:39:41.520 --> 00:39:44.880 model. So we are going to look at one 00:39:44.880 --> 00:39:47.920 more subsearch here, and this case is 00:39:47.920 --> 00:39:52.240 going to apply the join command. And 00:39:52.240 --> 00:39:55.680 so I talk about using lookup files or 00:39:55.680 --> 00:39:59.000 other data returned by subsearches 00:39:59.000 --> 00:40:01.599 to enrich, to bring more data in 00:40:01.599 --> 00:40:05.599 rather than filter. We are going to 00:40:05.599 --> 00:40:08.960 look at our first part of the command 00:40:08.960 --> 00:40:11.480 here, and this is actually just a 00:40:11.480 --> 00:40:15.720 simple stats report based on this rex 00:40:15.720 --> 00:40:18.079 that keeps coming through the SPL to 00:40:18.079 --> 00:40:21.000 give us those user and source IP fields. 00:40:21.000 --> 00:40:24.079 So our result here is authentication 00:40:24.079 --> 00:40:26.200 failures for all these web hosts so 00:40:26.200 --> 00:40:28.760 similar to what we had previously 00:40:28.760 --> 00:40:31.200 returned. And then we're going to take a 00:40:31.200 --> 00:40:33.319 look at the results of the subsearch 00:40:33.319 --> 00:40:35.400 here. I'm going to actually split this up so that we 00:40:35.400 --> 00:40:38.839 can see the first two lines. We're 00:40:38.839 --> 00:40:41.760 looking at our web access logs for 00:40:41.760 --> 00:40:45.560 purchase actions, and then we are 00:40:45.560 --> 00:40:50.599 looking at our stats count for errors 00:40:50.599 --> 00:40:52.960 and stats count for successes. We have 00:40:52.960 --> 00:40:55.079 pretty limited status code to return in 00:40:55.079 --> 00:40:59.240 this data so this is viable for 00:40:59.240 --> 00:41:01.800 the data present to observe our 00:41:01.800 --> 00:41:04.420 errors and successes. 00:41:04.420 --> 00:41:05.880 And then we are actually 00:41:05.880 --> 00:41:08.160 creating a new field based on the 00:41:08.160 --> 00:41:10.839 statistics that we're generating, 00:41:10.839 --> 00:41:13.920 looking at our transaction errors so 00:41:13.920 --> 00:41:18.000 where we have high or low numbers 00:41:18.000 --> 00:41:22.079 of failed purchase actions, and then 00:41:22.079 --> 00:41:25.599 summarizing that. So in the case of our 00:41:25.599 --> 00:41:27.800 final command here, another transforming 00:41:27.800 --> 00:41:30.640 command of table just to reduce this to 00:41:30.640 --> 00:41:35.079 a small dataset to use in the subsearch. 00:41:35.079 --> 00:41:37.440 And so in this case, we have our host 00:41:37.440 --> 00:41:39.400 value and then our transaction error 00:41:39.400 --> 00:41:41.480 rate that we observe from the web access 00:41:41.480 --> 00:41:44.760 logs. And then over in our other search 00:41:44.760 --> 00:41:48.640 here, we are going to perform a left 00:41:48.640 --> 00:41:51.400 join based on this host field. So you see 00:41:51.400 --> 00:41:53.359 in our secure logs, we still have the 00:41:53.359 --> 00:41:55.800 same host value, and this is going to be 00:41:55.800 --> 00:41:59.640 used to actually add our 00:41:59.640 --> 00:42:02.760 transaction error rates in for each 00:42:02.760 --> 00:42:06.400 host. So as we observe increased 00:42:06.400 --> 00:42:08.640 authentication failures, if there's a 00:42:08.640 --> 00:42:11.960 scenario for a breach and some sort of 00:42:11.960 --> 00:42:14.960 interruption to the ability to serve out 00:42:14.960 --> 00:42:17.520 or perform these purchase actions that 00:42:17.520 --> 00:42:20.960 are affecting the intended 00:42:20.960 --> 00:42:23.200 operations of the web servers, we can 00:42:23.200 --> 00:42:25.280 see that here. Of course in our tutorial 00:42:25.280 --> 00:42:27.319 data, there's not really much that 00:42:27.319 --> 00:42:29.880 jumping out or showing that there is 00:42:29.880 --> 00:42:32.599 any correlation between the two, but the 00:42:32.599 --> 00:42:34.640 purpose of the join is to bring in that 00:42:34.640 --> 00:42:37.440 extra dataset to give the context to 00:42:37.440 --> 00:42:39.839 further investigate. 00:42:41.040 --> 00:42:47.440 So that is the final 00:42:47.440 --> 00:42:52.359 portion of the SPL demo. And I do want 00:42:52.359 --> 00:42:54.920 to say for any questions, I'm going to 00:42:54.920 --> 00:42:56.960 take a look at the chat, I'll do my best 00:42:56.960 --> 00:43:00.079 to answer any questions, and then if 00:43:00.079 --> 00:43:03.079 you have any other questions, please 00:43:03.079 --> 00:43:05.800 feel free to reach out to my team at 00:43:05.800 --> 00:43:08.599 support@kennygroup.com, and we'll be 00:43:08.599 --> 00:43:11.920 happy to get back to you and help. I 00:43:11.920 --> 00:43:15.440 am taking a look through. 00:43:32.200 --> 00:43:33.760 Okay, seeing some questions on 00:43:33.760 --> 00:43:38.280 performance of the rex, sed, regex 00:43:38.280 --> 00:43:41.599 commands. So off the top of my head, 00:43:41.599 --> 00:43:43.800 I'm not sure about a direct performance 00:43:43.800 --> 00:43:46.400 comparison of the individual commands. 00:43:46.400 --> 00:43:49.200 Definitely want to look into that, and 00:43:49.200 --> 00:43:52.280 definitely follow up if you'd like to 00:43:52.280 --> 00:43:54.280 explain a more detailed scenario or 00:43:54.280 --> 00:43:57.119 look at some SPL that we can apply and 00:43:57.119 --> 00:43:59.699 observe those changes. 00:43:59.699 --> 00:44:01.680 The question on getting the 00:44:01.680 --> 00:44:05.480 dataset, that is what I mentioned at 00:44:05.480 --> 00:44:07.520 the beginning. Reach out to us for the 00:44:07.520 --> 00:44:10.119 slides or just reach out about the 00:44:10.119 --> 00:44:15.480 link. And the Splunk tutorial data, you 00:44:15.480 --> 00:44:17.880 can actually search that as well. And 00:44:17.880 --> 00:44:20.400 there's documentation on how to use the 00:44:20.400 --> 00:44:22.400 tutorial data, one of the first links 00:44:22.400 --> 00:44:25.640 there, takes you to a page that has- 00:44:25.640 --> 00:44:29.079 it is a tutorial data zip file, and 00:44:29.079 --> 00:44:31.079 instructions on how to [inaudible] that, it's 00:44:31.079 --> 00:44:34.079 just an upload for your specific 00:44:34.079 --> 00:44:37.599 environment. So in add data and then 00:44:37.599 --> 00:44:40.040 upload data, two clicks, and upload 00:44:40.040 --> 00:44:43.400 your file. So that is freely available 00:44:43.400 --> 00:44:45.760 for anyone, and again, that package is 00:44:45.760 --> 00:44:47.440 dynamically updated as well so your time 00:44:47.440 --> 00:44:51.359 stamps are pretty close to normal 00:44:51.359 --> 00:44:53.440 as you download the app, kind of depends 00:44:53.440 --> 00:44:55.920 on the time of the cycle for the 00:44:55.920 --> 00:44:58.559 update, but search overall time, you 00:44:58.559 --> 00:45:02.359 won't have any issues there. And then 00:45:02.359 --> 00:45:05.119 yeah, again on receiving slides, reach 00:45:05.119 --> 00:45:08.240 out to my team, and we're happy to 00:45:08.240 --> 00:45:10.240 provide those, discuss further, and we'll 00:45:10.240 --> 00:45:16.040 have the recording available 00:45:16.040 --> 00:45:18.400 for this session. You should be able to, 00:45:18.400 --> 00:45:20.680 after the recording processes when 00:45:20.680 --> 00:45:22.880 the session ends, actually use the 00:45:22.880 --> 00:45:24.640 same link, and you can watch this 00:45:24.640 --> 00:45:26.480 recording and post without having to 00:45:26.480 --> 00:45:31.800 sign up or transfer that file so- 00:45:33.680 --> 00:45:38.319 So okay, Chris, seeing your 00:45:38.319 --> 00:45:41.240 comment there, let me know if you want 00:45:41.240 --> 00:45:44.480 to reach out to me directly, anyone as 00:45:44.480 --> 00:45:49.440 well. We can discuss what slides and 00:45:49.440 --> 00:45:51.640 presentation you had attended, I'm not 00:45:51.640 --> 00:45:55.359 sure I have the attendance report 00:45:55.359 --> 00:45:57.319 for what you've seen previously, so 00:45:57.319 --> 00:46:00.240 happy to get those for you. 00:46:06.720 --> 00:46:10.319 All right and seeing- thanks Brett. 00:46:10.319 --> 00:46:13.079 So you see Brett Woodruff in the chat 00:46:13.079 --> 00:46:16.680 commenting, systems engineer on the 00:46:16.680 --> 00:46:18.640 expertise on demand team so very 00:46:18.640 --> 00:46:20.400 knowledgeable guy, and he's going to be 00:46:20.400 --> 00:46:23.720 presenting next month's session. That 00:46:23.720 --> 00:46:25.400 is going to take this concept that we 00:46:25.400 --> 00:46:28.760 talked about in the subsearching as a just 00:46:28.760 --> 00:46:30.760 general search topic, he's going to go 00:46:30.760 --> 00:46:34.319 specifically into data enrichment using 00:46:34.319 --> 00:46:38.079 joins, lookup commands, and how we see 00:46:38.079 --> 00:46:41.079 that used in the wild. So definitely 00:46:41.079 --> 00:46:43.359 excited for that one, encourage you to 00:46:43.359 --> 00:46:46.480 register for that event. 00:46:46.920 --> 00:46:52.240 All right, I'm not seeing any more questions. 00:46:57.800 --> 00:47:02.119 All right, with that I am stopping my 00:47:02.119 --> 00:47:05.079 share. I'm going to hang around for a few 00:47:05.079 --> 00:47:07.440 minutes, but thank you all for 00:47:07.440 --> 00:47:11.079 attending. and we'll see you on the next session.