1 00:00:02,000 --> 00:00:03,319 Hello everyone. 2 00:00:04,289 --> 00:00:06,839 We are getting started here on 3 00:00:06,839 --> 00:00:09,880 our August lunch and learn session 4 00:00:09,880 --> 00:00:12,639 presented by Kinney Group's Atlas Customer 5 00:00:12,639 --> 00:00:16,400 Experience team. My name is Alice Devaney. I 6 00:00:16,400 --> 00:00:19,160 am the engineering manager for the Atlas 7 00:00:19,160 --> 00:00:21,960 Customer Experience team, and I'm excited 8 00:00:21,960 --> 00:00:24,800 to be presenting this month's session on 9 00:00:24,800 --> 00:00:28,000 intermediate-level Splunk searching. So 10 00:00:28,000 --> 00:00:30,199 thank you all for attending. I hope you 11 00:00:30,199 --> 00:00:33,000 get some good ideas out of this. 12 00:00:33,000 --> 00:00:35,120 I certainly encourage engagement through 13 00:00:35,120 --> 00:00:37,040 the chat, and I'll have some 14 00:00:37,040 --> 00:00:39,800 information at the end on following up 15 00:00:39,800 --> 00:00:42,239 and speaking with my team directly on 16 00:00:42,239 --> 00:00:45,879 any issues or interests that you have 17 00:00:45,879 --> 00:00:48,000 around these types of concepts that 18 00:00:48,000 --> 00:00:51,520 we're going to cover today. So jumping 19 00:00:51,520 --> 00:00:55,199 into an intermediate-level session. 20 00:00:55,199 --> 00:00:57,960 I do want to say that we have previously 21 00:00:57,960 --> 00:01:02,120 done a basic level searching 22 00:01:02,120 --> 00:01:05,280 session so that we are really 23 00:01:05,280 --> 00:01:07,360 progressing from that, picking up right 24 00:01:07,360 --> 00:01:09,400 where we left off. We've done that 25 00:01:09,400 --> 00:01:10,640 session with quite a few of our 26 00:01:10,640 --> 00:01:12,920 customers individually and highly 27 00:01:12,920 --> 00:01:14,640 recommend if you're interested in doing 28 00:01:14,640 --> 00:01:18,200 that or this session with a larger team, 29 00:01:18,200 --> 00:01:19,920 we're happy to discuss and 30 00:01:19,920 --> 00:01:22,840 coordinate that. So getting started, 31 00:01:22,840 --> 00:01:25,600 we're going to take a look at the final 32 00:01:25,600 --> 00:01:29,000 search from our basic search session. 33 00:01:29,000 --> 00:01:31,159 And we're going to walk through that, 34 00:01:31,159 --> 00:01:34,159 understand some of the concepts, and 35 00:01:34,159 --> 00:01:36,479 then we're going to take a step back, 36 00:01:36,479 --> 00:01:39,479 look a little more generally at SPL 37 00:01:39,479 --> 00:01:41,759 operations and understanding how 38 00:01:41,759 --> 00:01:46,200 different commands apply to data, and 39 00:01:46,200 --> 00:01:49,320 really that next level of understanding 40 00:01:49,320 --> 00:01:51,759 for how you can write more complex 41 00:01:51,759 --> 00:01:54,119 searches and understand really when 42 00:01:54,119 --> 00:01:57,119 to use certain types of commands. And 43 00:01:57,119 --> 00:01:59,560 of course, in the session we're going 44 00:01:59,560 --> 00:02:04,399 to have a series of demos using 45 00:02:04,399 --> 00:02:07,360 a few specific commands, highlighting the 46 00:02:07,360 --> 00:02:10,440 different SPL command types that we 47 00:02:10,440 --> 00:02:12,840 discuss in the second portion and get 48 00:02:12,840 --> 00:02:15,879 to see that on the tutorial data that 49 00:02:15,879 --> 00:02:18,160 you can also use in your environment, 50 00:02:18,160 --> 00:02:20,840 in a test environment very 51 00:02:20,840 --> 00:02:24,200 simply. So I will always encourage 52 00:02:24,200 --> 00:02:27,720 especially with search content that you 53 00:02:27,720 --> 00:02:30,319 look into the additional resource that I 54 00:02:30,319 --> 00:02:34,120 have listed here. The search reference 55 00:02:34,120 --> 00:02:36,440 documentation is one of my favorite 56 00:02:36,440 --> 00:02:38,760 bookmarks that I use frequently in my 57 00:02:38,760 --> 00:02:41,000 own environments and working in customer 58 00:02:41,000 --> 00:02:43,560 environments. It is really the 59 00:02:43,560 --> 00:02:46,000 best quick resource to get information 60 00:02:46,000 --> 00:02:49,560 on syntax and examples of any search 61 00:02:49,560 --> 00:02:51,760 command and is always a great 62 00:02:51,760 --> 00:02:55,000 resource to have. The search manual is a 63 00:02:55,000 --> 00:02:57,080 little bit more conceptual, but as you're 64 00:02:57,080 --> 00:02:59,120 learning more about different types of 65 00:02:59,120 --> 00:03:00,360 search operations, 66 00:03:00,360 --> 00:03:02,440 it's very helpful to be able to review 67 00:03:02,440 --> 00:03:03,500 this documentation 68 00:03:03,500 --> 00:03:05,560 and have reference 69 00:03:05,560 --> 00:03:08,680 material that you can come back to as 70 00:03:08,680 --> 00:03:11,080 you are studying and trying to get 71 00:03:11,080 --> 00:03:13,480 better and writing more complex 72 00:03:13,480 --> 00:03:16,879 search content. I have also linked here 73 00:03:16,879 --> 00:03:18,959 the documentation on how to use the 74 00:03:18,959 --> 00:03:21,799 Splunk tutorial data, so if you've not 75 00:03:21,799 --> 00:03:23,360 done that before, it's a very simple 76 00:03:23,360 --> 00:03:25,920 process, and there are consistently 77 00:03:25,920 --> 00:03:28,280 updated download files that Splunk 78 00:03:28,280 --> 00:03:30,680 provides that you're able to directly 79 00:03:30,680 --> 00:03:33,439 upload into any Splunk environment. So 80 00:03:33,439 --> 00:03:35,560 that's what I'm going to be using today, 81 00:03:35,560 --> 00:03:39,000 and given that you are searching over 82 00:03:39,000 --> 00:03:41,400 appropriate time windows for when you 83 00:03:41,400 --> 00:03:43,920 download the tutorial dataset, these 84 00:03:43,920 --> 00:03:46,519 searches will work on the tutorial 85 00:03:46,519 --> 00:03:48,760 data as well. So highly encourage, after 86 00:03:48,760 --> 00:03:50,879 the fact, if you want to go through 87 00:03:50,879 --> 00:03:53,760 and test out some of the content, 88 00:03:53,760 --> 00:03:56,920 you'll be able to access a recording as 89 00:03:56,920 --> 00:03:59,360 well as if you'd like the slides that 90 00:03:59,360 --> 00:04:00,959 I'm presenting off of today, which I 91 00:04:00,959 --> 00:04:02,280 highly encourage because there are a lot 92 00:04:02,280 --> 00:04:04,799 of useful links in here, reach out to 93 00:04:04,799 --> 00:04:06,760 my team. Again, right at the end of the 94 00:04:06,760 --> 00:04:08,599 slides we'll have that info. 95 00:04:08,599 --> 00:04:13,079 So looking at our overview of basic 96 00:04:13,079 --> 00:04:15,799 search, I just want to cover 97 00:04:15,799 --> 00:04:18,120 conceptually the two categories that 98 00:04:18,120 --> 00:04:21,639 we discuss in that session. And so those 99 00:04:21,639 --> 00:04:24,199 two are the statistical and charting 100 00:04:24,199 --> 00:04:28,479 functions which consist of in those 101 00:04:28,479 --> 00:04:31,479 demos aggregate and time functions. So 102 00:04:31,479 --> 00:04:33,919 aggregate functions are going to be your 103 00:04:33,919 --> 00:04:37,400 commonly used statistical functions 104 00:04:37,400 --> 00:04:40,400 meant for summarization, and then time 105 00:04:40,400 --> 00:04:43,199 functions actually using the 106 00:04:43,199 --> 00:04:46,639 timestamp field underscore time or any 107 00:04:46,639 --> 00:04:48,600 other time that you've extracted from 108 00:04:48,600 --> 00:04:51,759 data and looking at earliest, latest 109 00:04:51,759 --> 00:04:55,000 relative time values in a 110 00:04:55,000 --> 00:04:58,240 summative fashion. And then evaluation 111 00:04:58,240 --> 00:05:02,320 functions are the separate type where 112 00:05:02,320 --> 00:05:04,400 we discuss comparison and conditional 113 00:05:04,400 --> 00:05:07,600 statements so using your if and your 114 00:05:07,600 --> 00:05:10,240 case commands in 115 00:05:10,240 --> 00:05:14,120 evals. Also datetime functions that 116 00:05:14,120 --> 00:05:17,160 apply operations to events uniquely 117 00:05:17,160 --> 00:05:19,759 so not necessarily summarization, but 118 00:05:19,759 --> 00:05:22,280 interacting with the time values 119 00:05:22,280 --> 00:05:24,319 themselves, maybe changing the time 120 00:05:24,319 --> 00:05:27,000 format, and then multivalue evalq 121 00:05:27,000 --> 00:05:29,360 functions, we touch on that very lightly, 122 00:05:29,360 --> 00:05:31,720 and it is more conceptual in basic 123 00:05:31,720 --> 00:05:34,000 search. So today we're going to dive in 124 00:05:34,000 --> 00:05:36,120 as part of our demo and look at 125 00:05:36,120 --> 00:05:39,160 multivalue eval functions later in 126 00:05:39,160 --> 00:05:41,319 the presentation. 127 00:05:41,479 --> 00:05:44,880 So on this slide here I 128 00:05:44,880 --> 00:05:48,800 have highlighted in gray the search 129 00:05:48,800 --> 00:05:52,120 that we end basic search with. And so 130 00:05:52,120 --> 00:05:55,000 that is broken up into three segments 131 00:05:55,000 --> 00:05:57,479 where we have the first line being a 132 00:05:57,479 --> 00:06:00,240 filter to a dataset. This is very 133 00:06:00,240 --> 00:06:03,120 simply how you are sourcing most of your 134 00:06:03,120 --> 00:06:06,319 data in most of your searches in Splunk. 135 00:06:06,319 --> 00:06:08,000 And we always want to be a specific 136 00:06:08,000 --> 00:06:11,000 as possible. You'll most often see the 137 00:06:11,000 --> 00:06:13,039 logical way to do that is by 138 00:06:13,039 --> 00:06:15,680 identifying an index and a source type, 139 00:06:15,680 --> 00:06:18,120 possibly some specific values of given 140 00:06:18,120 --> 00:06:20,199 fields in that data before you start 141 00:06:20,199 --> 00:06:22,720 applying other operations. In our case, we 142 00:06:22,720 --> 00:06:25,199 want to work with a whole dataset, 143 00:06:25,199 --> 00:06:28,880 and then we move into applying our eval 144 00:06:28,880 --> 00:06:30,120 statements. 145 00:06:30,120 --> 00:06:33,080 So in the evals, the purpose of these is 146 00:06:33,080 --> 00:06:36,560 to create some new fields to work with, 147 00:06:36,560 --> 00:06:40,080 and so we have two operations here. 148 00:06:40,080 --> 00:06:42,440 And you can see that on the first line, 149 00:06:42,440 --> 00:06:46,120 we're starting with an error check field. 150 00:06:46,120 --> 00:06:49,160 These are web access logs, so we're 151 00:06:49,160 --> 00:06:52,720 looking at the HTTP status codes as the 152 00:06:52,720 --> 00:06:56,039 status field, and we have a logical 153 00:06:56,039 --> 00:06:57,599 condition here for greater than or equal 154 00:06:57,599 --> 00:07:00,680 to 400, we want to return errors. And so 155 00:07:00,680 --> 00:07:04,120 very simple example, making it as easy 156 00:07:04,120 --> 00:07:05,879 as possible. If you want to get specifics 157 00:07:05,879 --> 00:07:08,720 on your 200s and your 300s, it's the 158 00:07:08,720 --> 00:07:11,639 exact same type of logic to go and apply 159 00:07:11,639 --> 00:07:14,120 likely a case statement to get some 160 00:07:14,120 --> 00:07:17,199 additional conditions and more unique 161 00:07:17,199 --> 00:07:20,520 output in an error check or some sort of 162 00:07:20,520 --> 00:07:23,800 field indicating what you want to 163 00:07:23,800 --> 00:07:25,919 see out of your status code so this case, 164 00:07:25,919 --> 00:07:30,080 simple errors. Or the value of non error 165 00:07:30,080 --> 00:07:32,120 if we have say a 200. 166 00:07:32,120 --> 00:07:35,400 We're also using a time function to 167 00:07:35,400 --> 00:07:39,160 create a second field called day. You 168 00:07:39,160 --> 00:07:41,759 may be familiar with some of the 169 00:07:41,759 --> 00:07:46,360 fields that you get out of by default 170 00:07:46,360 --> 00:07:49,759 for most any events in Splunk and 171 00:07:49,759 --> 00:07:51,759 that they're related to breakdowns of 172 00:07:51,759 --> 00:07:56,000 the time stamps. You have day, month, 173 00:07:56,000 --> 00:07:58,240 and many others. In this case, I want to 174 00:07:58,240 --> 00:08:00,560 get a specific format for day so we use 175 00:08:00,560 --> 00:08:03,479 a strftime function, and we have a 176 00:08:03,479 --> 00:08:07,039 time format variable here on the actual 177 00:08:07,039 --> 00:08:10,280 extracted time stamp for Splunk. So 178 00:08:10,280 --> 00:08:12,039 coming out of the second line, we've 179 00:08:12,039 --> 00:08:14,319 accessed our data, we have created two 180 00:08:14,319 --> 00:08:17,479 new fields to use, and then we are 181 00:08:17,479 --> 00:08:20,960 actually performing charting with a 182 00:08:20,960 --> 00:08:23,680 statistical function, and so that is 183 00:08:23,680 --> 00:08:26,240 using timechart. And we can see here 184 00:08:26,240 --> 00:08:29,159 that we are counting our events that 185 00:08:29,159 --> 00:08:33,479 actually have the error value for our 186 00:08:33,479 --> 00:08:36,000 created error check field. And so I'm 187 00:08:36,000 --> 00:08:39,279 going to pivot over to Splunk here, 188 00:08:39,279 --> 00:08:40,880 and we're going to look at this search, 189 00:08:40,880 --> 00:08:43,440 and I have commented out most of the 190 00:08:43,440 --> 00:08:46,279 logic, we'll step back through it. We 191 00:08:46,279 --> 00:08:49,200 are looking in our web access log events 192 00:08:49,200 --> 00:08:52,800 here, and we want to then apply our 193 00:08:52,800 --> 00:08:58,240 eval. And so by applying the eval, we can 194 00:08:58,240 --> 00:09:01,279 get our error check field that provides 195 00:09:01,279 --> 00:09:03,279 error or non-error. We're seeing that we 196 00:09:03,279 --> 00:09:05,160 have mostly non-error 197 00:09:05,160 --> 00:09:09,680 events. And then we have the day field, 198 00:09:09,680 --> 00:09:11,760 and so day is actually providing the 199 00:09:11,760 --> 00:09:14,440 full name of day for the time stamp for 200 00:09:14,440 --> 00:09:17,800 all these events. So with our timechart, 201 00:09:17,800 --> 00:09:22,200 this is the summarization with a 202 00:09:22,200 --> 00:09:24,160 condition actually that we're spanning 203 00:09:24,160 --> 00:09:27,720 by default over a single day, so this may 204 00:09:27,720 --> 00:09:31,839 not be a very logical use of a split by 205 00:09:31,839 --> 00:09:34,910 day when we are already using a timechart 206 00:09:34,910 --> 00:09:37,079 command that is dividing our 207 00:09:37,079 --> 00:09:41,040 results by the time bin, effectively a 208 00:09:41,040 --> 00:09:46,079 span of one day. But what we can do is 209 00:09:46,079 --> 00:09:50,440 change our split by field to host and 210 00:09:50,440 --> 00:09:52,600 get a little bit more of a reasonable 211 00:09:52,600 --> 00:09:54,720 presentation. We were able to see with 212 00:09:54,720 --> 00:09:57,720 the counts in the individual days not 213 00:09:57,720 --> 00:09:59,600 only split through the time chart, but by 214 00:09:59,600 --> 00:10:02,399 the day field that we only had values 215 00:10:02,399 --> 00:10:04,959 where our matrix matched up for the 216 00:10:04,959 --> 00:10:09,680 actual day. So here we have our hosts 217 00:10:09,680 --> 00:10:12,640 one, two, and three, and then across days 218 00:10:12,640 --> 00:10:15,640 counts of the error events that we 219 00:10:15,640 --> 00:10:20,160 observe. So that is the search that we 220 00:10:20,160 --> 00:10:22,440 end on in basic search. The concepts 221 00:10:22,440 --> 00:10:25,040 there being accessing our data, 222 00:10:25,040 --> 00:10:27,279 searching in a descriptive manner, using 223 00:10:27,279 --> 00:10:29,320 our metadata fields, the index and the 224 00:10:29,320 --> 00:10:32,200 source type, the evaluation functions 225 00:10:32,200 --> 00:10:33,920 where we're creating new fields, 226 00:10:33,920 --> 00:10:37,639 manipulating data, and then we have a 227 00:10:37,639 --> 00:10:40,200 timechart function that is providing 228 00:10:40,200 --> 00:10:42,880 some summarized statistics here based 229 00:10:42,880 --> 00:10:44,480 on a time range. 230 00:10:44,480 --> 00:10:48,680 So we will pivot back, and we're 231 00:10:48,680 --> 00:10:51,399 going to take a step back out of the SPL 232 00:10:51,399 --> 00:10:54,200 for a second just to talk about these 233 00:10:54,200 --> 00:10:56,519 different kinds of search operations 234 00:10:56,519 --> 00:10:59,360 that we just performed. So you'll hear 235 00:10:59,360 --> 00:11:03,079 these terms if you are really kind of 236 00:11:03,079 --> 00:11:06,040 diving deeper into actual operations of 237 00:11:06,040 --> 00:11:09,920 Splunk searching. And you can get very 238 00:11:09,920 --> 00:11:12,560 detailed regarding the optimization of 239 00:11:12,560 --> 00:11:16,279 searches around these types of 240 00:11:16,279 --> 00:11:17,680 commands and the order in which you 241 00:11:17,680 --> 00:11:21,399 choose to execute SPL. Today I'm going to 242 00:11:21,399 --> 00:11:24,240 focus on how these operations actually 243 00:11:24,240 --> 00:11:27,240 apply to the data and helping you to 244 00:11:27,240 --> 00:11:29,320 make better decisions about what 245 00:11:29,320 --> 00:11:32,320 commands are best for the scenario that 246 00:11:32,320 --> 00:11:34,240 you have or the output that you want to 247 00:11:34,240 --> 00:11:37,639 see. And in future sessions, we will 248 00:11:37,639 --> 00:11:39,360 discuss the actual optimization of 249 00:11:39,360 --> 00:11:42,079 searches through this optimal order 250 00:11:42,079 --> 00:11:46,440 of functions and some other means. 251 00:11:46,440 --> 00:11:48,200 But just a caveat there that we're going 252 00:11:48,200 --> 00:11:50,440 to talk pretty specifically today 253 00:11:50,440 --> 00:11:52,839 just about these individually, how 254 00:11:52,839 --> 00:11:54,720 they work with data, and then how you 255 00:11:54,720 --> 00:11:56,569 see them in combination. 256 00:11:56,569 --> 00:11:59,839 So our types of SPL commands, 257 00:11:59,839 --> 00:12:03,160 the top three in bold we'll focus on in 258 00:12:03,160 --> 00:12:06,079 our examples. The first of which is 259 00:12:06,079 --> 00:12:08,223 streaming operations 260 00:12:08,223 --> 00:12:10,760 which are executed on 261 00:12:10,760 --> 00:12:13,079 individual events as they're returned by a 262 00:12:13,079 --> 00:12:15,399 search. So you can think of this like 263 00:12:15,399 --> 00:12:16,990 your evals 264 00:12:16,990 --> 00:12:18,880 that is going to be doing 265 00:12:18,880 --> 00:12:21,440 something to every single event, 266 00:12:21,440 --> 00:12:24,279 modifying fields when they're available. 267 00:12:24,279 --> 00:12:28,399 We do have generating functions. So 268 00:12:28,399 --> 00:12:30,800 generating function are going to be used 269 00:12:30,800 --> 00:12:33,839 situationally where you're sourcing data 270 00:12:33,839 --> 00:12:38,079 from non-indexed datasets, and so you 271 00:12:38,079 --> 00:12:40,839 would see that from either input 272 00:12:40,839 --> 00:12:43,760 lookup commands or maybe tstats, 273 00:12:43,760 --> 00:12:46,120 pulling information from the tsidx 274 00:12:46,120 --> 00:12:48,920 files, and so generating the 275 00:12:48,920 --> 00:12:51,079 statistical output based on the data 276 00:12:51,079 --> 00:12:55,040 available there. Transforming commands 277 00:12:55,040 --> 00:12:58,560 you will see as often as streaming 278 00:12:58,560 --> 00:13:00,600 commands, generally speaking, and more 279 00:13:00,600 --> 00:13:02,800 often than generating commands where 280 00:13:02,800 --> 00:13:05,399 transforming is intended to order 281 00:13:05,399 --> 00:13:08,519 results into a data table. And I often 282 00:13:08,519 --> 00:13:11,320 think of this much like how we discuss 283 00:13:11,320 --> 00:13:13,639 the statistical functions in basic 284 00:13:13,639 --> 00:13:17,160 search as summarization functions where 285 00:13:17,160 --> 00:13:19,519 you're looking to condense your overall 286 00:13:19,519 --> 00:13:22,680 dataset into really manageable 287 00:13:22,680 --> 00:13:24,880 consumable results. So these 288 00:13:24,880 --> 00:13:28,320 operations that apply that summarization 289 00:13:28,320 --> 00:13:31,720 are transforming. We do have two 290 00:13:31,720 --> 00:13:35,600 additional types of SPL commands, the 291 00:13:35,600 --> 00:13:39,480 first is orchestrating. You can read 292 00:13:39,480 --> 00:13:41,680 about these, I will not discuss in great 293 00:13:41,680 --> 00:13:45,199 detail. They are used to manipulate 294 00:13:45,199 --> 00:13:48,639 how searches are actually processed or 295 00:13:48,639 --> 00:13:50,800 or how commands are processed. And 296 00:13:50,800 --> 00:13:54,079 they don't directly affect the results 297 00:13:54,079 --> 00:13:56,079 in a search, how we think about say 298 00:13:56,079 --> 00:13:59,839 applying a stats or an eval to a data 299 00:13:59,839 --> 00:14:02,320 set. So if you're interested, 300 00:14:02,320 --> 00:14:04,399 definitely check it out. Linked 301 00:14:04,399 --> 00:14:07,418 documentation has details there. 302 00:14:07,418 --> 00:14:11,120 Dataset processing is seen much more often, 303 00:14:11,120 --> 00:14:15,000 and you do have some conditional 304 00:14:15,000 --> 00:14:18,680 scenarios where commands can act as 305 00:14:18,680 --> 00:14:21,759 dataset processing, so the 306 00:14:21,759 --> 00:14:23,959 distinction for dataset processing is 307 00:14:23,959 --> 00:14:26,360 going to be that you are operating in 308 00:14:26,360 --> 00:14:29,800 bulk on a single completed dataset at 309 00:14:29,800 --> 00:14:32,240 one time. So we'll look at an 310 00:14:32,240 --> 00:14:34,260 example of that. 311 00:14:34,260 --> 00:14:36,600 I want to pivot back to our main 312 00:14:36,600 --> 00:14:38,360 three that we're going to be focusing on, 313 00:14:38,360 --> 00:14:39,839 and I have mentioned some of these 314 00:14:39,839 --> 00:14:43,800 examples already. The eval functions 315 00:14:43,800 --> 00:14:45,880 that we've been talking about so far are 316 00:14:45,880 --> 00:14:47,920 perfect examples of our streaming 317 00:14:47,920 --> 00:14:51,440 commands. So where we are creating new 318 00:14:51,440 --> 00:14:55,600 fields for each entry or log event, 319 00:14:55,600 --> 00:14:59,399 where we are modifying values for all of 320 00:14:59,399 --> 00:15:01,920 the results that are available. That 321 00:15:01,920 --> 00:15:05,279 is where we are streaming with the 322 00:15:05,279 --> 00:15:08,560 search functions. Inputlookup is 323 00:15:08,560 --> 00:15:09,959 possibly one of the most common 324 00:15:09,959 --> 00:15:12,399 generating commands that I see 325 00:15:12,399 --> 00:15:15,199 because someone is intending to 326 00:15:15,199 --> 00:15:18,720 source a dataset stored in a CSV file 327 00:15:18,720 --> 00:15:21,480 or a KV store collection, and you're 328 00:15:21,480 --> 00:15:23,720 able to bring that back as a report and 329 00:15:23,720 --> 00:15:27,719 use that logic in your queries. 330 00:15:27,719 --> 00:15:29,639 So that is 331 00:15:29,639 --> 00:15:33,399 not requiring the index data or 332 00:15:33,399 --> 00:15:35,560 any index data to actually return the 333 00:15:35,560 --> 00:15:38,490 results that you want to see. 334 00:15:38,914 --> 00:15:41,319 And we've talked about stats, very 335 00:15:41,319 --> 00:15:43,600 generally speaking, with a lot of 336 00:15:43,600 --> 00:15:46,440 unique functions you can apply there 337 00:15:46,440 --> 00:15:49,560 where this is going to provide a tabular 338 00:15:49,560 --> 00:15:53,560 output. And it is serving that purpose of 339 00:15:53,560 --> 00:15:54,800 summarization, so we're really 340 00:15:54,800 --> 00:15:57,560 reformatting the data into that 341 00:15:57,560 --> 00:16:00,920 tabular report. 342 00:16:02,000 --> 00:16:06,519 So we see in this example search here 343 00:16:06,519 --> 00:16:09,000 that we are often combining these 344 00:16:09,000 --> 00:16:12,360 different types of search operations. So 345 00:16:12,360 --> 00:16:15,240 in this example that we have, I have 346 00:16:15,240 --> 00:16:19,319 data that already exists in a CSV file. 347 00:16:19,319 --> 00:16:22,839 We are applying a streaming command here, 348 00:16:22,839 --> 00:16:26,000 where, evaluating each line to see if 349 00:16:26,000 --> 00:16:28,399 we match a condition, and then returning 350 00:16:28,399 --> 00:16:29,639 the results 351 00:16:29,639 --> 00:16:32,240 based on that evaluation. And then we're 352 00:16:32,240 --> 00:16:34,199 applying a transforming command at the 353 00:16:34,199 --> 00:16:36,639 end which is that stats summarization, 354 00:16:36,639 --> 00:16:40,480 getting the maximum values for the 355 00:16:40,480 --> 00:16:44,319 count of errors and the host that is 356 00:16:44,319 --> 00:16:47,600 associated with that. So let's pivot over 357 00:16:47,600 --> 00:16:52,079 to Splunk and we'll take a look at that example. 358 00:16:54,160 --> 00:16:56,319 So I'm just going to grab my 359 00:16:56,319 --> 00:16:59,440 search here and I precommented out 360 00:16:59,440 --> 00:17:04,212 the specific lines following inputlookup 361 00:17:04,212 --> 00:17:06,079 just to see that this generating 362 00:17:06,079 --> 00:17:07,799 command here is not looking for any 363 00:17:07,799 --> 00:17:10,160 specific index data. We're pulling 364 00:17:10,160 --> 00:17:13,240 directly the results that I have in a 365 00:17:13,240 --> 00:17:17,720 CSV file here into this output, and so 366 00:17:17,720 --> 00:17:20,520 we have a count of errors observed 367 00:17:20,520 --> 00:17:25,439 across multiple hosts. Our where command 368 00:17:25,439 --> 00:17:28,520 you might think is reformatting data 369 00:17:28,520 --> 00:17:31,000 in the sense it is transforming the 370 00:17:31,000 --> 00:17:34,160 results, but the evaluation of a where 371 00:17:34,160 --> 00:17:37,320 function does apply effectively to every 372 00:17:37,320 --> 00:17:41,760 event that is returned. So it is a 373 00:17:41,760 --> 00:17:43,960 streaming command that is going to 374 00:17:43,960 --> 00:17:46,559 filter down our result set based on our 375 00:17:46,559 --> 00:17:49,120 condition that the error count is less 376 00:17:49,120 --> 00:17:50,919 than 200. 377 00:17:50,919 --> 00:17:54,760 So the following line is our 378 00:17:54,760 --> 00:17:57,320 transforming command where we have two 379 00:17:57,320 --> 00:18:02,240 results left 187 for host 3. We want 380 00:18:02,240 --> 00:18:06,039 to see our maximum values here of 187 on 381 00:18:06,039 --> 00:18:09,960 host 3. So our scenario here has really 382 00:18:09,960 --> 00:18:13,400 covered where you may have hosts 383 00:18:13,400 --> 00:18:15,960 that are trending toward a negative 384 00:18:15,960 --> 00:18:19,280 state. You're aware that the second 385 00:18:19,280 --> 00:18:22,039 host had already exceeded its 386 00:18:22,039 --> 00:18:25,360 threshold value for errors, but host 3 387 00:18:25,360 --> 00:18:27,440 also appears to be trending toward this 388 00:18:27,440 --> 00:18:30,159 threshold. So being able to combine 389 00:18:30,159 --> 00:18:33,000 these types of commands, understand 390 00:18:33,000 --> 00:18:35,240 the logical condition that you're 391 00:18:35,240 --> 00:18:37,679 searching for, and then also providing 392 00:18:37,679 --> 00:18:40,840 that consumable output. So combining 393 00:18:40,840 --> 00:18:44,480 all three of our types of commands here. 394 00:18:46,166 --> 00:18:49,440 So I'm going to jump to an SPL 395 00:18:49,440 --> 00:18:53,159 demo, and as I go through these different 396 00:18:53,159 --> 00:18:55,840 commands, I'm going to be referencing 397 00:18:55,840 --> 00:18:58,360 back to the different command types that 398 00:18:58,360 --> 00:19:00,080 we're working with. I'm going to 399 00:19:00,080 --> 00:19:02,360 introduce in a lot of these searches 400 00:19:02,360 --> 00:19:04,679 a lot of small commands that I won't 401 00:19:04,679 --> 00:19:07,000 talk about in great detail and that 402 00:19:07,000 --> 00:19:09,360 really is the purpose of using your 403 00:19:09,360 --> 00:19:11,640 search manual, using your search 404 00:19:11,640 --> 00:19:14,760 reference documentation. So I will 405 00:19:14,760 --> 00:19:17,400 glance over the use case, talk about 406 00:19:17,400 --> 00:19:19,559 how it's meant to be applied, and then 407 00:19:19,559 --> 00:19:22,200 using in your own scenarios where you 408 00:19:22,200 --> 00:19:24,400 have problem you need to solve, 409 00:19:24,400 --> 00:19:26,880 referencing the docs to find out where 410 00:19:26,880 --> 00:19:29,960 you can apply similar functions to 411 00:19:29,960 --> 00:19:32,559 what we observe in the the demonstration here. 412 00:19:32,559 --> 00:19:36,760 So the first command I'm going to 413 00:19:36,760 --> 00:19:40,880 focus on is the rex command. So rex is a 414 00:19:40,880 --> 00:19:43,480 streaming command that you often see 415 00:19:43,480 --> 00:19:46,559 applied to datasets that do not fully 416 00:19:46,559 --> 00:19:49,720 have data extracted in the format that 417 00:19:49,720 --> 00:19:53,159 you want to be using in your 418 00:19:53,159 --> 00:19:56,760 reporting or in your logic. And so 419 00:19:56,760 --> 00:20:00,120 this could very well be handled actually 420 00:20:00,120 --> 00:20:03,440 in the configuration of props and 421 00:20:03,440 --> 00:20:06,080 transforms and extracting fields at the 422 00:20:06,080 --> 00:20:08,480 right times and indexing data, but as 423 00:20:08,480 --> 00:20:10,280 your bringing new data sources, you need 424 00:20:10,280 --> 00:20:12,480 to understand what's available for use 425 00:20:12,480 --> 00:20:14,360 in Splunk. A lot of times you'll find 426 00:20:14,360 --> 00:20:16,840 yourself needing to extract new fields 427 00:20:16,840 --> 00:20:19,200 in line in your searches and be able 428 00:20:19,200 --> 00:20:22,080 to use those in your search logic. Rex 429 00:20:22,080 --> 00:20:28,039 also has a sed mode that I also see 430 00:20:28,039 --> 00:20:31,600 testing done for masking of data in line 431 00:20:31,600 --> 00:20:34,080 prior to actually putting that into 432 00:20:34,080 --> 00:20:36,356 indexing configurations. 433 00:20:36,356 --> 00:20:38,000 So rex you would 434 00:20:38,000 --> 00:20:41,200 generally see used when you don't 435 00:20:41,200 --> 00:20:43,039 have those fields available, you need to 436 00:20:43,039 --> 00:20:45,640 use them at that time. And then we're 437 00:20:45,640 --> 00:20:47,120 going to take a look at an example of 438 00:20:47,120 --> 00:20:49,640 masking data as well to test your 439 00:20:49,640 --> 00:20:53,480 syntax for a sed style replace in 440 00:20:53,480 --> 00:21:00,600 config files. So we will jump back over. 441 00:21:04,679 --> 00:21:06,880 So I'm going to start with a search on 442 00:21:06,880 --> 00:21:10,120 an index source type, my tutorial data. 443 00:21:10,120 --> 00:21:13,159 And then this is actual Linux secure 444 00:21:13,159 --> 00:21:16,159 logging so these are going to be OS 445 00:21:16,159 --> 00:21:19,039 security logs, and we're looking at all 446 00:21:19,039 --> 00:21:21,039 of our web hosts that we've been 447 00:21:21,039 --> 00:21:22,850 focusing on previously. 448 00:21:22,850 --> 00:21:25,000 In our events, you can see 449 00:21:25,000 --> 00:21:29,039 that we have first here an event that 450 00:21:29,039 --> 00:21:31,870 has failed password for invalid user inet, 451 00:21:31,870 --> 00:21:34,320 We're provided a source IP, a source 452 00:21:34,320 --> 00:21:36,559 port, and we go to see the fields that 453 00:21:36,559 --> 00:21:38,919 are extracted and that's not 454 00:21:38,919 --> 00:21:41,919 being done for us automatically. So just 455 00:21:41,919 --> 00:21:43,880 to start testing our logic to see if we 456 00:21:43,880 --> 00:21:46,799 can get the results we want to see, 457 00:21:46,799 --> 00:21:49,760 we're going to use the rex command. And 458 00:21:49,760 --> 00:21:53,240 in doing so, we are applying this 459 00:21:53,240 --> 00:21:55,440 operation across every event, again, a 460 00:21:55,440 --> 00:21:59,600 streaming command. We are looking at the 461 00:21:59,600 --> 00:22:01,279 raw field, so we're actually looking at 462 00:22:01,279 --> 00:22:04,679 the raw text of each of these log events. 463 00:22:04,679 --> 00:22:07,480 And then the rex syntax is simply to 464 00:22:07,480 --> 00:22:11,960 provide in double quotes a regex 465 00:22:11,960 --> 00:22:14,840 match, and we're using named groups for 466 00:22:14,840 --> 00:22:17,440 field extractions. So for every single 467 00:22:17,440 --> 00:22:19,440 event that we see failed password for 468 00:22:19,440 --> 00:22:22,919 invalid user, we are actually extracting 469 00:22:22,919 --> 00:22:26,400 a user field, the source IP field, and the 470 00:22:26,400 --> 00:22:28,799 source port field. For the sake of 471 00:22:28,799 --> 00:22:30,880 simplicity, I tried to keep the regex simple. 472 00:22:30,880 --> 00:22:33,760 You can make this as complex as you need 473 00:22:33,760 --> 00:22:37,679 to for your needs, for your data. And 474 00:22:37,679 --> 00:22:40,960 so in our extracted fields, I've 475 00:22:40,960 --> 00:22:42,840 actually pre-selected these so we can 476 00:22:42,840 --> 00:22:46,240 see our user is now available, and this 477 00:22:46,240 --> 00:22:50,039 applies to the events where the regex was 478 00:22:50,039 --> 00:22:53,159 actually valid and matching on the 479 00:22:53,159 --> 00:22:57,440 failed password for invalid user, etc string. 480 00:22:57,440 --> 00:23:00,120 So now that we have our fields 481 00:23:00,120 --> 00:23:03,799 extracted, we can actually use these. And 482 00:23:03,799 --> 00:23:04,629 we want 483 00:23:04,629 --> 00:23:09,400 to do a stats count as failed logins, so 484 00:23:09,400 --> 00:23:13,400 anytime you see an operation as and 485 00:23:13,400 --> 00:23:16,640 then a unique name, just a rename 486 00:23:16,640 --> 00:23:19,080 through the transformation function, 487 00:23:19,080 --> 00:23:21,480 easier way to actually keep 488 00:23:21,480 --> 00:23:23,480 consistency with referencing your 489 00:23:23,480 --> 00:23:26,760 fields as well as not have to rename 490 00:23:26,760 --> 00:23:29,919 later on with some additional- in this 491 00:23:29,919 --> 00:23:31,679 case, you'd have to reference the name 492 00:23:31,679 --> 00:23:34,520 distinct count so just a way to keep 493 00:23:34,520 --> 00:23:38,320 things clean and easy to use in further 494 00:23:38,320 --> 00:23:42,159 lines of SPL. So we are counting our 495 00:23:42,159 --> 00:23:43,919 failed logins, we're looking at the 496 00:23:43,919 --> 00:23:47,840 distinct count of the source IP values 497 00:23:47,840 --> 00:23:50,000 that we have, and then we're splitting 498 00:23:50,000 --> 00:23:52,960 that by the host and the user. So you can 499 00:23:52,960 --> 00:23:55,720 see here, this tutorial data is 500 00:23:55,720 --> 00:23:57,880 actually pretty flat across most of the 501 00:23:57,880 --> 00:24:00,120 sources so we're not going to have 502 00:24:00,120 --> 00:24:04,679 any outliers or spikes in our stats here, 503 00:24:04,679 --> 00:24:07,960 but you can see the resulting presentation. 504 00:24:08,960 --> 00:24:11,440 In line four, we do have a 505 00:24:11,440 --> 00:24:14,840 sort command, and this is an example of a 506 00:24:14,840 --> 00:24:17,520 dataset processing command where we are 507 00:24:17,520 --> 00:24:20,400 actually evaluating a full completed 508 00:24:20,400 --> 00:24:23,640 dataset and reordering it. Given the 509 00:24:23,640 --> 00:24:26,000 logic here, we want to descend on these 510 00:24:26,000 --> 00:24:29,000 numeric values. So keep mind as you're 511 00:24:29,000 --> 00:24:31,200 operating on different fields, it's going 512 00:24:31,200 --> 00:24:33,799 to be the same sort of either basic 513 00:24:33,799 --> 00:24:37,159 numeric or the lexicographical ordering 514 00:24:37,159 --> 00:24:40,360 that you typically see in Splunk. 515 00:24:40,840 --> 00:24:45,720 So we do have a second example 516 00:24:45,720 --> 00:24:49,200 with the sed style replace. 517 00:24:54,240 --> 00:24:58,640 So you can see in my events here 518 00:24:58,640 --> 00:25:01,640 we are searching the tutorial and 519 00:25:01,640 --> 00:25:05,039 vendor sales index and source type. And 520 00:25:05,039 --> 00:25:06,720 I've gone ahead and applied one 521 00:25:06,720 --> 00:25:09,399 operation, and this is going to be a 522 00:25:09,399 --> 00:25:11,880 helpful operation to understand really 523 00:25:11,880 --> 00:25:14,679 what we are replacing and how to get 524 00:25:14,679 --> 00:25:18,159 consistent operation on these fields. 525 00:25:18,159 --> 00:25:20,279 So in this case, we are actually creating 526 00:25:20,279 --> 00:25:23,559 an ID length field where we are going to 527 00:25:23,559 --> 00:25:26,960 choose to mask the value of account ID 528 00:25:26,960 --> 00:25:29,120 in our rex command. We want to know that 529 00:25:29,120 --> 00:25:31,679 that's a consistent number of characters 530 00:25:31,679 --> 00:25:33,799 through all of our data. It's very 531 00:25:33,799 --> 00:25:37,080 simple to spot check, but just to be 532 00:25:37,080 --> 00:25:39,440 certain, we want to apply this to all of 533 00:25:39,440 --> 00:25:42,760 our data, in this case, streaming command 534 00:25:42,760 --> 00:25:45,520 through this eval. We 535 00:25:45,520 --> 00:25:49,279 are changing the type of the data 536 00:25:49,279 --> 00:25:51,919 because account ID is actually numeric. 537 00:25:51,919 --> 00:25:53,720 We're making that a string value so that 538 00:25:53,720 --> 00:25:56,720 we can look at the length. These are 539 00:25:56,720 --> 00:25:58,840 common functions in any programming 540 00:25:58,840 --> 00:26:01,559 languages, and so the syntax here in 541 00:26:01,559 --> 00:26:04,039 SPL is quite simple. Just to be able 542 00:26:04,039 --> 00:26:06,520 to get that contextual feel, we 543 00:26:06,520 --> 00:26:09,399 understand we have 16 characters for 544 00:26:09,399 --> 00:26:12,480 100% of our events in the account IDs. 545 00:26:12,480 --> 00:26:17,000 So actually applying our rex command, 546 00:26:17,000 --> 00:26:20,760 we are going to now specify a unique 547 00:26:20,760 --> 00:26:23,919 field, not just underscore raw. We are 548 00:26:23,919 --> 00:26:27,159 applying the sed mode, and this is a 549 00:26:27,159 --> 00:26:30,799 sed syntax replacement looking 550 00:26:30,799 --> 00:26:33,559 for the- it's a capture group for the 551 00:26:33,559 --> 00:26:35,880 first 12 digits. And then we're 552 00:26:35,880 --> 00:26:39,240 replacing that with a series of 12 X's. 553 00:26:39,240 --> 00:26:42,039 So you can see in our first event, the 554 00:26:42,039 --> 00:26:45,320 account ID is now masked, we only have 555 00:26:45,320 --> 00:26:48,520 the remaining four digits to be able to 556 00:26:48,520 --> 00:26:52,320 identify that. And so if our data was 557 00:26:52,320 --> 00:26:55,360 indexed and is appropriately done so 558 00:26:55,360 --> 00:26:58,039 in Splunk with the full account IDs, but 559 00:26:58,039 --> 00:27:00,360 for the sake of reporting we want to 560 00:27:00,360 --> 00:27:04,840 be able to mask that for the audience, 561 00:27:04,840 --> 00:27:07,799 then we're able to use the sed 562 00:27:07,799 --> 00:27:11,919 replace. And then to finalize a report, 563 00:27:11,919 --> 00:27:13,880 this is just an example of the top 564 00:27:13,880 --> 00:27:16,399 command which does a few operations 565 00:27:16,399 --> 00:27:18,120 together and makes for a good 566 00:27:18,120 --> 00:27:20,720 shorthand report, taking all the 567 00:27:20,720 --> 00:27:24,080 unique values of the provided field, 568 00:27:24,080 --> 00:27:26,480 giving you a count of those values, and 569 00:27:26,480 --> 00:27:29,000 then showing the percentage 570 00:27:29,000 --> 00:27:31,679 of the makeup for the total dataset 571 00:27:31,679 --> 00:27:34,520 that that unique value accounts for. So 572 00:27:34,520 --> 00:27:37,399 again, pretty flat in this tutorial data 573 00:27:37,399 --> 00:27:40,200 in seeing a very consistent 574 00:27:40,200 --> 00:27:45,159 .03% across these different account IDs. 575 00:27:46,679 --> 00:27:51,080 So we have looked at a few examples 576 00:27:51,080 --> 00:27:54,640 with the rex command, and that is 577 00:27:54,640 --> 00:27:57,039 again, streaming. We're going to look at 578 00:27:57,039 --> 00:27:59,120 another streaming command 579 00:27:59,120 --> 00:28:02,399 which is going to be a set of 580 00:28:02,399 --> 00:28:07,200 multivalue eval functions. And so again, 581 00:28:07,200 --> 00:28:09,559 if you're to have a bookmark for search 582 00:28:09,559 --> 00:28:12,320 documentation, multivalue eval functions 583 00:28:12,320 --> 00:28:14,559 are a great one to have because when 584 00:28:14,559 --> 00:28:17,240 you encounter these, it really takes 585 00:28:17,240 --> 00:28:19,960 some time to figure out how to actually 586 00:28:19,960 --> 00:28:25,960 operate on data. And so the 587 00:28:25,960 --> 00:28:29,559 multivalue functions are really just 588 00:28:29,559 --> 00:28:31,799 a collection that depending on your use 589 00:28:31,799 --> 00:28:34,679 case, you're able to determine the 590 00:28:34,679 --> 00:28:39,080 best to apply. You see it often used 591 00:28:39,080 --> 00:28:42,840 with JSON and XML so data formats 592 00:28:42,840 --> 00:28:44,880 that are actually naturally going to 593 00:28:44,880 --> 00:28:47,360 provide a multivalue field where you 594 00:28:47,360 --> 00:28:50,480 have repeated tags or keys across 595 00:28:50,480 --> 00:28:54,320 unique events as they're extracted. 596 00:28:54,320 --> 00:28:56,360 And you often see a lot of times in 597 00:28:56,360 --> 00:28:58,480 Windows event logs, you actually have 598 00:28:58,480 --> 00:29:01,360 repeated key values where your values 599 00:29:01,360 --> 00:29:02,960 are different and the position in the 600 00:29:02,960 --> 00:29:05,200 event is actually specific to a 601 00:29:05,200 --> 00:29:08,840 condition, so you may have a need 602 00:29:08,840 --> 00:29:11,440 for extraction or interaction with one 603 00:29:11,440 --> 00:29:14,399 of those unique values to actually 604 00:29:14,399 --> 00:29:18,600 get a reasonable outcome from your data. 605 00:29:18,600 --> 00:29:22,799 And so we're going to use 606 00:29:22,799 --> 00:29:25,960 multivalue eval functions when we 607 00:29:25,960 --> 00:29:28,679 have a change we want make to the 608 00:29:28,679 --> 00:29:31,880 presentation of data and we're able 609 00:29:31,880 --> 00:29:34,880 to do so with multivalue fields. This I 610 00:29:34,880 --> 00:29:36,720 would say often occurs when you have 611 00:29:36,720 --> 00:29:39,960 multivalue data and then you want to 612 00:29:39,960 --> 00:29:43,080 be able to change the format of the 613 00:29:43,080 --> 00:29:45,640 multivalue fields there. And then 614 00:29:45,640 --> 00:29:46,960 we're also going to look at a quick 615 00:29:46,960 --> 00:29:51,279 example of actually using multivalue 616 00:29:51,279 --> 00:29:54,880 evaluation as a logical condition. 617 00:29:54,880 --> 00:30:00,039 So the first example. 618 00:30:03,320 --> 00:30:05,679 We're going to start with a 619 00:30:05,679 --> 00:30:08,720 simple table looking at our web access 620 00:30:08,720 --> 00:30:11,240 logs, and so we're just going to pull 621 00:30:11,240 --> 00:30:14,880 in our status and referer domain fields. 622 00:30:14,880 --> 00:30:18,440 And so you can see we've got a 623 00:30:18,440 --> 00:30:23,000 HTTP status code, and we've got the 624 00:30:23,000 --> 00:30:26,120 format of a protocol subdomain 625 00:30:26,120 --> 00:30:29,519 TLD. And our scenario here is that for a 626 00:30:29,519 --> 00:30:31,559 simplicity of reporting, we just want 627 00:30:31,559 --> 00:30:33,760 to work with this referer domain field 628 00:30:33,760 --> 00:30:38,320 and be able to simplify that. So in 629 00:30:38,320 --> 00:30:41,799 actually splitting out the field in this 630 00:30:41,799 --> 00:30:44,880 case, split referer domain, and then 631 00:30:44,880 --> 00:30:47,720 choosing the period character as our 632 00:30:47,720 --> 00:30:50,399 point to split the data. We're creating a 633 00:30:50,399 --> 00:30:52,919 multivalue from what was previously 634 00:30:52,919 --> 00:30:57,200 just a single value field. And using 635 00:30:57,200 --> 00:31:01,600 this, we can actually create a new field 636 00:31:01,600 --> 00:31:06,080 by using the index of a multivalue field, 637 00:31:06,080 --> 00:31:08,039 and in this case, we're looking at 638 00:31:08,039 --> 00:31:10,740 index 012. 639 00:31:10,740 --> 00:31:13,279 The multivalue index function allows 640 00:31:13,279 --> 00:31:15,799 us to target a specific field and then 641 00:31:15,799 --> 00:31:18,559 choose a starting and ending index to 642 00:31:18,559 --> 00:31:21,320 extract given values. There are a number 643 00:31:21,320 --> 00:31:23,320 of ways to do this. In our case here 644 00:31:23,320 --> 00:31:25,039 where we have three entries, it's quite 645 00:31:25,039 --> 00:31:26,639 simple just to give that start and end 646 00:31:26,639 --> 00:31:28,639 of range as the 647 00:31:28,639 --> 00:31:29,841 two entries 648 00:31:29,841 --> 00:31:35,360 apart. So as we are working to recreate 649 00:31:35,360 --> 00:31:39,200 our domain, and so that is just applying 650 00:31:39,200 --> 00:31:41,720 for this new domain field, we have 651 00:31:41,720 --> 00:31:44,200 buttercupgames.com in what was 652 00:31:44,200 --> 00:31:47,686 previously the HTTP www.buttercup 653 00:31:47,686 --> 00:31:51,440 games.com. We can now use those fields 654 00:31:51,440 --> 00:31:54,720 in a transformation function. In this 655 00:31:54,720 --> 00:31:58,039 case, simple stats count by status in 656 00:31:58,039 --> 00:32:00,200 the domain. 657 00:32:02,600 --> 00:32:06,960 So I do want to look at another 658 00:32:06,960 --> 00:32:10,240 example here that is similar, but 659 00:32:10,240 --> 00:32:13,639 we're going to use a multivalue function 660 00:32:13,639 --> 00:32:16,919 to actually test a condition. And so I'm 661 00:32:16,919 --> 00:32:18,399 going to, 662 00:32:18,399 --> 00:32:21,639 in this case, be searching the same 663 00:32:21,639 --> 00:32:24,240 data. We're going to start with a stats 664 00:32:24,240 --> 00:32:28,639 command, and so a stats count as well as 665 00:32:28,639 --> 00:32:32,039 a values of status. And so the values 666 00:32:32,039 --> 00:32:33,360 function is going to provide all the 667 00:32:33,360 --> 00:32:37,480 unique values of a given field based 668 00:32:37,480 --> 00:32:41,840 on the split by. And so that produces 669 00:32:41,840 --> 00:32:44,960 a multivalue field here in the case of 670 00:32:44,960 --> 00:32:47,279 status. We have quite a few events 671 00:32:47,279 --> 00:32:50,799 that have multiple status codes, and as 672 00:32:50,799 --> 00:32:52,960 we're interested in pulling those events 673 00:32:52,960 --> 00:32:57,480 out, we can use an mvcount function to 674 00:32:57,480 --> 00:33:01,200 evaluate and filter our dataset to 675 00:33:01,200 --> 00:33:04,240 those specific events. So a very simple 676 00:33:04,240 --> 00:33:07,200 operation here, you're just looking at what has 677 00:33:07,200 --> 00:33:10,240 the- what has more than a single value 678 00:33:10,240 --> 00:33:13,399 for status, but very useful as you're 679 00:33:13,399 --> 00:33:15,919 applying this in reporting especially in 680 00:33:15,919 --> 00:33:19,000 combination with others and with more 681 00:33:19,000 --> 00:33:22,639 complex conditions. 682 00:33:22,639 --> 00:33:28,200 So that is our set of multivalue 683 00:33:28,200 --> 00:33:32,519 eval functions there as streaming commands. 684 00:33:34,240 --> 00:33:38,279 So for a final section of 685 00:33:38,279 --> 00:33:42,000 the demo, I want to talk about a concept 686 00:33:42,000 --> 00:33:44,720 that is not so much a set of functions, 687 00:33:44,720 --> 00:33:47,960 but really enables more complex 688 00:33:47,960 --> 00:33:50,159 and interesting searching and can allow 689 00:33:50,159 --> 00:33:52,799 us to use a few different types of 690 00:33:52,799 --> 00:33:57,240 commands in our SPL. And so the concept of 691 00:33:57,240 --> 00:34:00,200 subsearching for both filtering and 692 00:34:00,200 --> 00:34:04,279 enrichment is taking secondary search 693 00:34:04,279 --> 00:34:06,960 results, and we're using that to 694 00:34:06,960 --> 00:34:10,659 affect a primary search. So a subsearch 695 00:34:10,659 --> 00:34:12,200 will be executed, the results 696 00:34:12,200 --> 00:34:15,079 returned, and depending on how it's used, 697 00:34:15,079 --> 00:34:17,760 this is going to be processed in the 698 00:34:17,760 --> 00:34:21,599 original search, and that is going to- 699 00:34:21,599 --> 00:34:24,359 We'll look at an example that it is 700 00:34:24,359 --> 00:34:27,399 filtering. So based on the results, we get 701 00:34:27,399 --> 00:34:31,240 a effectively a value equals X or value 702 00:34:31,240 --> 00:34:34,320 equals y for one of our fields that 703 00:34:34,320 --> 00:34:37,159 we're looking at in the subsearch. 704 00:34:37,159 --> 00:34:39,320 And then we're also going to look at an 705 00:34:39,320 --> 00:34:42,399 enrichment example, so you see this often 706 00:34:42,399 --> 00:34:45,760 when you have a dataset maybe saved 707 00:34:45,760 --> 00:34:48,480 in a lookup table or you just have a 708 00:34:48,480 --> 00:34:50,079 simple reference where you want to bring 709 00:34:50,079 --> 00:34:52,879 in more context, maybe descriptions of 710 00:34:52,879 --> 00:34:54,560 event codes, things like 711 00:34:54,560 --> 00:34:59,640 that. So in that case, 712 00:35:02,160 --> 00:35:05,440 we'll look at the first command here. Now, 713 00:35:05,440 --> 00:35:08,160 I'm going to run my search, and we're 714 00:35:08,160 --> 00:35:12,119 going to pivot over to a subsearch 715 00:35:12,119 --> 00:35:14,740 tab here. And so you can see our subsearch 716 00:35:14,740 --> 00:35:19,720 looking at the secure logs. 717 00:35:19,720 --> 00:35:21,880 We are actually just pulling out the 718 00:35:21,880 --> 00:35:24,359 search to see what the results are or 719 00:35:24,359 --> 00:35:26,079 what's going to be returned from that 720 00:35:26,079 --> 00:35:28,839 subsearch. So we're applying the same 721 00:35:28,839 --> 00:35:31,200 rex that we had before to extract our 722 00:35:31,200 --> 00:35:33,720 fields. We're applying a where, a streaming 723 00:35:33,720 --> 00:35:35,920 command looking for anything that's not 724 00:35:35,920 --> 00:35:38,599 null for user. We observed that we had 725 00:35:38,599 --> 00:35:40,920 about 60% of our events that were going 726 00:35:40,920 --> 00:35:43,359 to be null based on not having a user 727 00:35:43,359 --> 00:35:46,970 field, and so looking at that total dataset, 728 00:35:46,970 --> 00:35:50,280 we're just going to count by our 729 00:35:50,280 --> 00:35:53,839 source IP. And this is often a quick way 730 00:35:53,839 --> 00:35:56,839 to really just get a list of unique 731 00:35:56,839 --> 00:35:59,880 values of any given field. And then 732 00:35:59,880 --> 00:36:03,119 operating on that to return just the 733 00:36:03,119 --> 00:36:05,079 the list of values, few different ways to 734 00:36:05,079 --> 00:36:08,800 do that, I see stats count pretty often. 735 00:36:08,800 --> 00:36:10,599 And in this case, we're actually tabling 736 00:36:10,599 --> 00:36:13,960 out just keeping our source IP field and 737 00:36:13,960 --> 00:36:16,800 renaming it to client IP, so the resulting 738 00:36:16,800 --> 00:36:20,560 dataset is a single column table 739 00:36:20,560 --> 00:36:21,440 with 740 00:36:21,440 --> 00:36:26,319 182 results, and the field name is client 741 00:36:26,319 --> 00:36:29,880 IP. So when returned to the original 742 00:36:29,880 --> 00:36:32,119 search, we're running this as a sub 743 00:36:32,119 --> 00:36:36,319 search, the effective result of this is 744 00:36:36,319 --> 00:36:39,960 actually client IP equals my first value 745 00:36:39,960 --> 00:36:43,800 here or client IP equals my second value 746 00:36:43,800 --> 00:36:46,960 and so on through the full dataset. And 747 00:36:46,960 --> 00:36:49,200 so looking at our search here, we're 748 00:36:49,200 --> 00:36:52,359 applying this to the access logs. You can 749 00:36:52,359 --> 00:36:55,280 see that we had a field named source IP 750 00:36:55,280 --> 00:36:58,520 in the secure logs and we renamed to 751 00:36:58,520 --> 00:37:02,160 client IP so that we could apply this to 752 00:37:02,160 --> 00:37:05,760 the access logs where client IP is the 753 00:37:05,760 --> 00:37:09,480 actual field name for the source IP 754 00:37:09,480 --> 00:37:13,560 data. And in this case, we are filtering 755 00:37:13,560 --> 00:37:16,079 to the client IP's relevant in the secure 756 00:37:16,079 --> 00:37:19,839 logs for our web access logs. 757 00:37:19,839 --> 00:37:23,960 So uncommenting here, we have a 758 00:37:23,960 --> 00:37:26,800 series of operations that we're doing, 759 00:37:26,800 --> 00:37:29,000 and I'm just going to run them all at 760 00:37:29,000 --> 00:37:33,079 once and talk through that we are 761 00:37:33,079 --> 00:37:37,240 counting the status or we're counting 762 00:37:37,240 --> 00:37:40,319 the events by status and client IP 763 00:37:40,319 --> 00:37:42,640 for the client IPs that were relevant to 764 00:37:42,640 --> 00:37:44,880 authentication failures in the secure 765 00:37:44,880 --> 00:37:48,760 logs. We are then creating a status count 766 00:37:48,760 --> 00:37:52,040 field just by combining our status 767 00:37:52,040 --> 00:37:54,680 and count fields, adding a colon 768 00:37:54,680 --> 00:37:58,640 between them. And then we are doing a 769 00:37:58,640 --> 00:38:02,079 second stats statement here to 770 00:38:02,079 --> 00:38:03,960 actually combine all of our newly 771 00:38:03,960 --> 00:38:06,319 created fields together in a more 772 00:38:06,319 --> 00:38:10,560 condensed report. So a transforming command, 773 00:38:10,560 --> 00:38:12,520 then streaming for creating our new 774 00:38:12,520 --> 00:38:15,359 field, another transforming command, and 775 00:38:15,359 --> 00:38:17,880 then our sort for dataset processing 776 00:38:17,880 --> 00:38:20,920 actually gives us the results here for a 777 00:38:20,920 --> 00:38:25,480 given client IP. And so we are, in this 778 00:38:25,480 --> 00:38:28,440 case, looking for the scenario that 779 00:38:28,440 --> 00:38:31,319 these client IPs that are involved in 780 00:38:31,319 --> 00:38:34,240 authentication failures to the web 781 00:38:34,240 --> 00:38:37,319 servers. In this case, these were all over 782 00:38:37,319 --> 00:38:39,680 SSH. We want to see if there are 783 00:38:39,680 --> 00:38:42,760 interactions by these same source IPs 784 00:38:42,760 --> 00:38:46,079 actually on the website that we're 785 00:38:46,079 --> 00:38:50,200 hosting. So seeing a high number of 786 00:38:50,200 --> 00:38:53,400 failed values, looking at actions also is 787 00:38:53,400 --> 00:38:55,599 a use case here for just bringing in 788 00:38:55,599 --> 00:38:57,680 that context and seeing if there's any 789 00:38:57,680 --> 00:39:00,520 sort of relationship between the data. 790 00:39:00,520 --> 00:39:04,000 This is discussed often as correlation 791 00:39:04,000 --> 00:39:07,680 of logs. I'm usually careful about using 792 00:39:07,680 --> 00:39:09,440 the term correlation in talking about 793 00:39:09,440 --> 00:39:11,119 Splunk queries especially in Enterprise 794 00:39:11,119 --> 00:39:12,640 security talking about correlation 795 00:39:12,640 --> 00:39:16,119 searches where I typically think of 796 00:39:16,119 --> 00:39:18,480 correlation searches as being 797 00:39:18,480 --> 00:39:20,599 overarching concepts that cover data 798 00:39:20,599 --> 00:39:23,920 from multiple data sources, and in this 799 00:39:23,920 --> 00:39:26,480 case, correlating events would be looking 800 00:39:26,480 --> 00:39:28,400 at unique data types that are 801 00:39:28,400 --> 00:39:31,240 potentially related in finding that 802 00:39:31,240 --> 00:39:33,839 logical connection for the condition. 803 00:39:33,839 --> 00:39:35,880 That's a little bit more up to the user. 804 00:39:35,880 --> 00:39:38,319 It's not quite as easy as say, 805 00:39:38,319 --> 00:39:41,520 pointing to a specific data 806 00:39:41,520 --> 00:39:44,880 model. So we are going to look at one 807 00:39:44,880 --> 00:39:47,920 more subsearch here, and this case is 808 00:39:47,920 --> 00:39:52,240 going to apply the join command. And 809 00:39:52,240 --> 00:39:55,680 so I talk about using lookup files or 810 00:39:55,680 --> 00:39:59,000 other data returned by subsearches 811 00:39:59,000 --> 00:40:01,599 to enrich, to bring more data in 812 00:40:01,599 --> 00:40:05,599 rather than filter. We are going to 813 00:40:05,599 --> 00:40:08,960 look at our first part of the command 814 00:40:08,960 --> 00:40:11,480 here, and this is actually just a 815 00:40:11,480 --> 00:40:15,720 simple stats report based on this rex 816 00:40:15,720 --> 00:40:18,079 that keeps coming through the SPL to 817 00:40:18,079 --> 00:40:21,000 give us those user and source IP fields. 818 00:40:21,000 --> 00:40:24,079 So our result here is authentication 819 00:40:24,079 --> 00:40:26,200 failures for all these web hosts so 820 00:40:26,200 --> 00:40:28,760 similar to what we had previously 821 00:40:28,760 --> 00:40:31,200 returned. And then we're going to take a 822 00:40:31,200 --> 00:40:33,319 look at the results of the subsearch 823 00:40:33,319 --> 00:40:35,400 here. I'm going to actually split this up so that we 824 00:40:35,400 --> 00:40:38,839 can see the first two lines. We're 825 00:40:38,839 --> 00:40:41,760 looking at our web access logs for 826 00:40:41,760 --> 00:40:45,560 purchase actions, and then we are 827 00:40:45,560 --> 00:40:50,599 looking at our stats count for errors 828 00:40:50,599 --> 00:40:52,960 and stats count for successes. We have 829 00:40:52,960 --> 00:40:55,079 pretty limited status code to return in 830 00:40:55,079 --> 00:40:59,240 this data so this is viable for 831 00:40:59,240 --> 00:41:01,800 the data present to observe our 832 00:41:01,800 --> 00:41:04,420 errors and successes. 833 00:41:04,420 --> 00:41:05,880 And then we are actually 834 00:41:05,880 --> 00:41:08,160 creating a new field based on the 835 00:41:08,160 --> 00:41:10,839 statistics that we're generating, 836 00:41:10,839 --> 00:41:13,920 looking at our transaction errors so 837 00:41:13,920 --> 00:41:18,000 where we have high or low numbers 838 00:41:18,000 --> 00:41:22,079 of failed purchase actions, and then 839 00:41:22,079 --> 00:41:25,599 summarizing that. So in the case of our 840 00:41:25,599 --> 00:41:27,800 final command here, another transforming 841 00:41:27,800 --> 00:41:30,640 command of table just to reduce this to 842 00:41:30,640 --> 00:41:35,079 a small dataset to use in the subsearch. 843 00:41:35,079 --> 00:41:37,440 And so in this case, we have our host 844 00:41:37,440 --> 00:41:39,400 value and then our transaction error 845 00:41:39,400 --> 00:41:41,480 rate that we observe from the web access 846 00:41:41,480 --> 00:41:44,760 logs. And then over in our other search 847 00:41:44,760 --> 00:41:48,640 here, we are going to perform a left 848 00:41:48,640 --> 00:41:51,400 join based on this host field. So you see 849 00:41:51,400 --> 00:41:53,359 in our secure logs, we still have the 850 00:41:53,359 --> 00:41:55,800 same host value, and this is going to be 851 00:41:55,800 --> 00:41:59,640 used to actually add our 852 00:41:59,640 --> 00:42:02,760 transaction error rates in for each 853 00:42:02,760 --> 00:42:06,400 host. So as we observe increased 854 00:42:06,400 --> 00:42:08,640 authentication failures, if there's a 855 00:42:08,640 --> 00:42:11,960 scenario for a breach and some sort of 856 00:42:11,960 --> 00:42:14,960 interruption to the ability to serve out 857 00:42:14,960 --> 00:42:17,520 or perform these purchase actions that 858 00:42:17,520 --> 00:42:20,960 are affecting the intended 859 00:42:20,960 --> 00:42:23,200 operations of the web servers, we can 860 00:42:23,200 --> 00:42:25,280 see that here. Of course in our tutorial 861 00:42:25,280 --> 00:42:27,319 data, there's not really much that 862 00:42:27,319 --> 00:42:29,880 jumping out or showing that there is 863 00:42:29,880 --> 00:42:32,599 any correlation between the two, but the 864 00:42:32,599 --> 00:42:34,640 purpose of the join is to bring in that 865 00:42:34,640 --> 00:42:37,440 extra dataset to give the context to 866 00:42:37,440 --> 00:42:39,839 further investigate. 867 00:42:41,040 --> 00:42:47,440 So that is the final 868 00:42:47,440 --> 00:42:52,359 portion of the SPL demo. And I do want 869 00:42:52,359 --> 00:42:54,920 to say for any questions, I'm going to 870 00:42:54,920 --> 00:42:56,960 take a look at the chat, I'll do my best 871 00:42:56,960 --> 00:43:00,079 to answer any questions, and then if 872 00:43:00,079 --> 00:43:03,079 you have any other questions, please 873 00:43:03,079 --> 00:43:05,800 feel free to reach out to my team at 874 00:43:05,800 --> 00:43:08,599 support@kennygroup.com, and we'll be 875 00:43:08,599 --> 00:43:11,920 happy to get back to you and help. I 876 00:43:11,920 --> 00:43:15,440 am taking a look through. 877 00:43:32,200 --> 00:43:33,760 Okay, seeing some questions on 878 00:43:33,760 --> 00:43:38,280 performance of the rex, sed, regex 879 00:43:38,280 --> 00:43:41,599 commands. So off the top of my head, 880 00:43:41,599 --> 00:43:43,800 I'm not sure about a direct performance 881 00:43:43,800 --> 00:43:46,400 comparison of the individual commands. 882 00:43:46,400 --> 00:43:49,200 Definitely want to look into that, and 883 00:43:49,200 --> 00:43:52,280 definitely follow up if you'd like to 884 00:43:52,280 --> 00:43:54,280 explain a more detailed scenario or 885 00:43:54,280 --> 00:43:57,119 look at some SPL that we can apply and 886 00:43:57,119 --> 00:43:59,699 observe those changes. 887 00:43:59,699 --> 00:44:01,680 The question on getting the 888 00:44:01,680 --> 00:44:05,480 dataset, that is what I mentioned at 889 00:44:05,480 --> 00:44:07,520 the beginning. Reach out to us for the 890 00:44:07,520 --> 00:44:10,119 slides or just reach out about the 891 00:44:10,119 --> 00:44:15,480 link. And the Splunk tutorial data, you 892 00:44:15,480 --> 00:44:17,880 can actually search that as well. And 893 00:44:17,880 --> 00:44:20,400 there's documentation on how to use the 894 00:44:20,400 --> 00:44:22,400 tutorial data, one of the first links 895 00:44:22,400 --> 00:44:25,640 there, takes you to a page that has- 896 00:44:25,640 --> 00:44:29,079 it is a tutorial data zip file, and 897 00:44:29,079 --> 00:44:31,079 instructions on how to [inaudible] that, it's 898 00:44:31,079 --> 00:44:34,079 just an upload for your specific 899 00:44:34,079 --> 00:44:37,599 environment. So in add data and then 900 00:44:37,599 --> 00:44:40,040 upload data, two clicks, and upload 901 00:44:40,040 --> 00:44:43,400 your file. So that is freely available 902 00:44:43,400 --> 00:44:45,760 for anyone, and again, that package is 903 00:44:45,760 --> 00:44:47,440 dynamically updated as well so your time 904 00:44:47,440 --> 00:44:51,359 stamps are pretty close to normal 905 00:44:51,359 --> 00:44:53,440 as you download the app, kind of depends 906 00:44:53,440 --> 00:44:55,920 on the time of the cycle for the 907 00:44:55,920 --> 00:44:58,559 update, but search overall time, you 908 00:44:58,559 --> 00:45:02,359 won't have any issues there. And then 909 00:45:02,359 --> 00:45:05,119 yeah, again on receiving slides, reach 910 00:45:05,119 --> 00:45:08,240 out to my team, and we're happy to 911 00:45:08,240 --> 00:45:10,240 provide those, discuss further, and we'll 912 00:45:10,240 --> 00:45:16,040 have the recording available 913 00:45:16,040 --> 00:45:18,400 for this session. You should be able to, 914 00:45:18,400 --> 00:45:20,680 after the recording processes when 915 00:45:20,680 --> 00:45:22,880 the session ends, actually use the 916 00:45:22,880 --> 00:45:24,640 same link, and you can watch this 917 00:45:24,640 --> 00:45:26,480 recording and post without having to 918 00:45:26,480 --> 00:45:31,800 sign up or transfer that file so- 919 00:45:33,680 --> 00:45:38,319 So okay, Chris, seeing your 920 00:45:38,319 --> 00:45:41,240 comment there, let me know if you want 921 00:45:41,240 --> 00:45:44,480 to reach out to me directly, anyone as 922 00:45:44,480 --> 00:45:49,440 well. We can discuss what slides and 923 00:45:49,440 --> 00:45:51,640 presentation you had attended, I'm not 924 00:45:51,640 --> 00:45:55,359 sure I have the attendance report 925 00:45:55,359 --> 00:45:57,319 for what you've seen previously, so 926 00:45:57,319 --> 00:46:00,240 happy to get those for you. 927 00:46:06,720 --> 00:46:10,319 All right and seeing- thanks Brett. 928 00:46:10,319 --> 00:46:13,079 So you see Brett Woodruff in the chat 929 00:46:13,079 --> 00:46:16,680 commenting, systems engineer on the 930 00:46:16,680 --> 00:46:18,640 expertise on demand team so very 931 00:46:18,640 --> 00:46:20,400 knowledgeable guy, and he's going to be 932 00:46:20,400 --> 00:46:23,720 presenting next month's session. That 933 00:46:23,720 --> 00:46:25,400 is going to take this concept that we 934 00:46:25,400 --> 00:46:28,760 talked about in the subsearching as a just 935 00:46:28,760 --> 00:46:30,760 general search topic, he's going to go 936 00:46:30,760 --> 00:46:34,319 specifically into data enrichment using 937 00:46:34,319 --> 00:46:38,079 joins, lookup commands, and how we see 938 00:46:38,079 --> 00:46:41,079 that used in the wild. So definitely 939 00:46:41,079 --> 00:46:43,359 excited for that one, encourage you to 940 00:46:43,359 --> 00:46:46,480 register for that event. 941 00:46:46,920 --> 00:46:52,240 All right, I'm not seeing any more questions. 942 00:46:57,800 --> 00:47:02,119 All right, with that I am stopping my 943 00:47:02,119 --> 00:47:05,079 share. I'm going to hang around for a few 944 00:47:05,079 --> 00:47:07,440 minutes, but thank you all for 945 00:47:07,440 --> 00:47:11,079 attending. and we'll see you on the next session.