Hello everyone.
We are getting started here on
our August lunch and learn session
presented by Kinney Group's Atlas Customer
Experience team. My name is Alice Devaney. I
am the engineering manager for the Atlas
Customer Experience team, and I'm excited
to be presenting this month's session on
intermediate-level Splunk searching. So
thank you all for attending. I hope you
get some good ideas out of this.
I certainly encourage engagement through
the chat, and I'll have some
information at the end on following up
and speaking with my team directly on
any issues or interests that you have
around these types of concepts that
we're going to cover today. So jumping
into an intermediate-level session.
I do want to say that we have previously
done a basic level searching
session so that we are really
progressing from that, picking up right
where we left off. We've done that
session with quite a few of our
customers individually and highly
recommend if you're interested in doing
that or this session with a larger team,
we're happy to discuss and
coordinate that. So getting started,
we're going to take a look at the final
search from our basic search session.
And we're going to walk through that,
understand some of the concepts, and
then we're going to take a step back,
look a little more generally at SPL
operations and understanding how
different commands apply to data, and
really that next level of understanding
for how you can write more complex
searches and understand really when
to use certain types of commands. And
of course, in the session we're going
to have a series of demos using
a few specific commands, highlighting the
different SPL command types that we
discuss in the second portion and get
to see that on the tutorial data that
you can also use in your environment,
in a test environment very
simply. So I will always encourage
especially with search content that you
look into the additional resource that I
have listed here. The search reference
documentation is one of my favorite
bookmarks that I use frequently in my
own environments and working in customer
environments. It is really the
best quick resource to get information
on syntax and examples of any search
command and is always a great
resource to have. The search manual is a
little bit more conceptual, but as you're
learning more about different types of
search operations,
it's very helpful to be able to review
this documentation
and have reference
material that you can come back to as
you are studying and trying to get
better and writing more complex
search content. I have also linked here
the documentation on how to use the
Splunk tutorial data, so if you've not
done that before, it's a very simple
process, and there are consistently
updated download files that Splunk
provides that you're able to directly
upload into any Splunk environment. So
that's what I'm going to be using today,
and given that you are searching over
appropriate time windows for when you
download the tutorial dataset, these
searches will work on the tutorial
data as well. So highly encourage, after
the fact, if you want to go through
and test out some of the content,
you'll be able to access a recording as
well as if you'd like the slides that
I'm presenting off of today, which I
highly encourage because there are a lot
of useful links in here, reach out to
my team. Again, right at the end of the
slides we'll have that info.
So looking at our overview of basic
search, I just want to cover
conceptually the two categories that
we discuss in that session. And so those
two are the statistical and charting
functions which consist of in those
demos aggregate and time functions. So
aggregate functions are going to be your
commonly used statistical functions
meant for summarization, and then time
functions actually using the
timestamp field underscore time or any
other time that you've extracted from
data and looking at earliest, latest
relative time values in a
summative fashion. And then evaluation
functions are the separate type where
we discuss comparison and conditional
statements so using your if and your
case commands in
evals. Also datetime functions that
apply operations to events uniquely
so not necessarily summarization, but
interacting with the time values
themselves, maybe changing the time
format, and then multivalue evalq
functions, we touch on that very lightly,
and it is more conceptual in basic
search. So today we're going to dive in
as part of our demo and look at
multivalue eval functions later in
the presentation.
So on this slide here I
have highlighted in gray the search
that we end basic search with. And so
that is broken up into three segments
where we have the first line being a
filter to a dataset. This is very
simply how you are sourcing most of your
data in most of your searches in Splunk.
And we always want to be a specific
as possible. You'll most often see the
logical way to do that is by
identifying an index and a source type,
possibly some specific values of given
fields in that data before you start
applying other operations. In our case, we
want to work with a whole dataset,
and then we move into applying our eval
statements.
So in the evals, the purpose of these is
to create some new fields to work with,
and so we have two operations here.
And you can see that on the first line,
we're starting with an error check field.
These are web access logs, so we're
looking at the HTTP status codes as the
status field, and we have a logical
condition here for greater than or equal
to 400, we want to return errors. And so
very simple example, making it as easy
as possible. If you want to get specifics
on your 200s and your 300s, it's the
exact same type of logic to go and apply
likely a case statement to get some
additional conditions and more unique
output in an error check or some sort of
field indicating what you want to
see out of your status code so this case,
simple errors. Or the value of non error
if we have say a 200.
We're also using a time function to
create a second field called day. You
may be familiar with some of the
fields that you get out of by default
for most any events in Splunk and
that they're related to breakdowns of
the time stamps. You have day, month,
and many others. In this case, I want to
get a specific format for day so we use
a strftime function, and we have a
time format variable here on the actual
extracted time stamp for Splunk. So
coming out of the second line, we've
accessed our data, we have created two
new fields to use, and then we are
actually performing charting with a
statistical function, and so that is
using timechart. And we can see here
that we are counting our events that
actually have the error value for our
created error check field. And so I'm
going to pivot over to Splunk here,
and we're going to look at this search,
and I have commented out most of the
logic, we'll step back through it. We
are looking in our web access log events
here, and we want to then apply our
eval. And so by applying the eval, we can
get our error check field that provides
error or non-error. We're seeing that we
have mostly non-error
events. And then we have the day field,
and so day is actually providing the
full name of day for the time stamp for
all these events. So with our timechart,
this is the summarization with a
condition actually that we're spanning
by default over a single day, so this may
not be a very logical use of a split by
day when we are already using a timechart
command that is dividing our
results by the time bin, effectively a
span of one day. But what we can do is
change our split by field to host and
get a little bit more of a reasonable
presentation. We were able to see with
the counts in the individual days not
only split through the time chart, but by
the day field that we only had values
where our matrix matched up for the
actual day. So here we have our hosts
one, two, and three, and then across days
counts of the error events that we
observe. So that is the search that we
end on in basic search. The concepts
there being accessing our data,
searching in a descriptive manner, using
our metadata fields, the index and the
source type, the evaluation functions
where we're creating new fields,
manipulating data, and then we have a
timechart function that is providing
some summarized statistics here based
on a time range.
So we will pivot back, and we're
going to take a step back out of the SPL
for a second just to talk about these
different kinds of search operations
that we just performed. So you'll hear
these terms if you are really kind of
diving deeper into actual operations of
Splunk searching. And you can get very
detailed regarding the optimization of
searches around these types of
commands and the order in which you
choose to execute SPL. Today I'm going to
focus on how these operations actually
apply to the data and helping you to
make better decisions about what
commands are best for the scenario that
you have or the output that you want to
see. And in future sessions, we will
discuss the actual optimization of
searches through this optimal order
of functions and some other means.
But just a caveat there that we're going
to talk pretty specifically today
just about these individually, how
they work with data, and then how you
see them in combination.
So our types of SPL commands,
the top three in bold we'll focus on in
our examples. The first of which is
streaming operations
which are executed on
individual events as they're returned by a
search. So you can think of this like
your evals
that is going to be doing
something to every single event,
modifying fields when they're available.
We do have generating functions. So
generating function are going to be used
situationally where you're sourcing data
from non-indexed datasets, and so you
would see that from either input
lookup commands or maybe tstats,
pulling information from the tsidx
files, and so generating the
statistical output based on the data
available there. Transforming commands
you will see as often as streaming
commands, generally speaking, and more
often than generating commands where
transforming is intended to order
results into a data table. And I often
think of this much like how we discuss
the statistical functions in basic
search as summarization functions where
you're looking to condense your overall
dataset into really manageable
consumable results. So these
operations that apply that summarization
are transforming. We do have two
additional types of SPL commands, the
first is orchestrating. You can read
about these, I will not discuss in great
detail. They are used to manipulate
how searches are actually processed or
or how commands are processed. And
they don't directly affect the results
in a search, how we think about say
applying a stats or an eval to a data
set. So if you're interested,
definitely check it out. Linked
documentation has details there.
Dataset processing is seen much more often,
and you do have some conditional
scenarios where commands can act as
dataset processing, so the
distinction for dataset processing is
going to be that you are operating in
bulk on a single completed dataset at
one time. So we'll look at an
example of that.
I want to pivot back to our main
three that we're going to be focusing on,
and I have mentioned some of these
examples already. The eval functions
that we've been talking about so far are
perfect examples of our streaming
commands. So where we are creating new
fields for each entry or log event,
where we are modifying values for all of
the results that are available. That
is where we are streaming with the
search functions. Inputlookup is
possibly one of the most common
generating commands that I see
because someone is intending to
source a dataset stored in a CSV file
or a KV store collection, and you're
able to bring that back as a report and
use that logic in your queries.
So that is
not requiring the index data or
any index data to actually return the
results that you want to see.
And we've talked about stats, very
generally speaking, with a lot of
unique functions you can apply there
where this is going to provide a tabular
output. And it is serving that purpose of
summarization, so we're really
reformatting the data into that
tabular report.
So we see in this example search here
that we are often combining these
different types of search operations. So
in this example that we have, I have
data that already exists in a CSV file.
We are applying a streaming command here,
where, evaluating each line to see if
we match a condition, and then returning
the results
based on that evaluation. And then we're
applying a transforming command at the
end which is that stats summarization,
getting the maximum values for the
count of errors and the host that is
associated with that. So let's pivot over
to Splunk and we'll take a look at that example.
So I'm just going to grab my
search here and I precommented out
the specific lines following inputlookup
just to see that this generating
command here is not looking for any
specific index data. We're pulling
directly the results that I have in a
CSV file here into this output, and so
we have a count of errors observed
across multiple hosts. Our where command
you might think is reformatting data
in the sense it is transforming the
results, but the evaluation of a where
function does apply effectively to every
event that is returned. So it is a
streaming command that is going to
filter down our result set based on our
condition that the error count is less
than 200.
So the following line is our
transforming command where we have two
results left 187 for host 3. We want
to see our maximum values here of 187 on
host 3. So our scenario here has really
covered where you may have hosts
that are trending toward a negative
state. You're aware that the second
host had already exceeded its
threshold value for errors, but host 3
also appears to be trending toward this
threshold. So being able to combine
these types of commands, understand
the logical condition that you're
searching for, and then also providing
that consumable output. So combining
all three of our types of commands here.
So I'm going to jump to an SPL
demo, and as I go through these different
commands, I'm going to be referencing
back to the different command types that
we're working with. I'm going to
introduce in a lot of these searches
a lot of small commands that I won't
talk about in great detail and that
really is the purpose of using your
search manual, using your search
reference documentation. So I will
glance over the use case, talk about
how it's meant to be applied, and then
using in your own scenarios where you
have problem you need to solve,
referencing the docs to find out where
you can apply similar functions to
what we observe in the the demonstration here.
So the first command I'm going to
focus on is the rex command. So rex is a
streaming command that you often see
applied to datasets that do not fully
have data extracted in the format that
you want to be using in your
reporting or in your logic. And so
this could very well be handled actually
in the configuration of props and
transforms and extracting fields at the
right times and indexing data, but as
your bringing new data sources, you need
to understand what's available for use
in Splunk. A lot of times you'll find
yourself needing to extract new fields
in line in your searches and be able
to use those in your search logic. Rex
also has a sed mode that I also see
testing done for masking of data in line
prior to actually putting that into
indexing configurations.
So rex you would
generally see used when you don't
have those fields available, you need to
use them at that time. And then we're
going to take a look at an example of
masking data as well to test your
syntax for a sed style replace in
config files. So we will jump back over.
So I'm going to start with a search on
an index source type, my tutorial data.
And then this is actual Linux secure
logging so these are going to be OS
security logs, and we're looking at all
of our web hosts that we've been
focusing on previously.
In our events, you can see
that we have first here an event that
has failed password for invalid user inet,
We're provided a source IP, a source
port, and we go to see the fields that
are extracted and that's not
being done for us automatically. So just
to start testing our logic to see if we
can get the results we want to see,
we're going to use the rex command. And
in doing so, we are applying this
operation across every event, again, a
streaming command. We are looking at the
raw field, so we're actually looking at
the raw text of each of these log events.
And then the rex syntax is simply to
provide in double quotes a regex
match, and we're using named groups for
field extractions. So for every single
event that we see failed password for
invalid user, we are actually extracting
a user field, the source IP field, and the
source port field. For the sake of
simplicity, I tried to keep the regex simple.
You can make this as complex as you need
to for your needs, for your data. And
so in our extracted fields, I've
actually pre-selected these so we can
see our user is now available, and this
applies to the events where the regex was
actually valid and matching on the
failed password for invalid user, etc string.
So now that we have our fields
extracted, we can actually use these. And
we want
to do a stats count as failed logins, so
anytime you see an operation as and
then a unique name, just a rename
through the transformation function,
easier way to actually keep
consistency with referencing your
fields as well as not have to rename
later on with some additional- in this
case, you'd have to reference the name
distinct count so just a way to keep
things clean and easy to use in further
lines of SPL. So we are counting our
failed logins, we're looking at the
distinct count of the source IP values
that we have, and then we're splitting
that by the host and the user. So you can
see here, this tutorial data is
actually pretty flat across most of the
sources so we're not going to have
any outliers or spikes in our stats here,
but you can see the resulting presentation.
In line four, we do have a
sort command, and this is an example of a
dataset processing command where we are
actually evaluating a full completed
dataset and reordering it. Given the
logic here, we want to descend on these
numeric values. So keep mind as you're
operating on different fields, it's going
to be the same sort of either basic
numeric or the lexicographical ordering
that you typically see in Splunk.
So we do have a second example
with the sed style replace.
So you can see in my events here
we are searching the tutorial and
vendor sales index and source type. And
I've gone ahead and applied one
operation, and this is going to be a
helpful operation to understand really
what we are replacing and how to get
consistent operation on these fields.
So in this case, we are actually creating
an ID length field where we are going to
choose to mask the value of account ID
in our rex command. We want to know that
that's a consistent number of characters
through all of our data. It's very
simple to spot check, but just to be
certain, we want to apply this to all of
our data, in this case, streaming command
through this eval. We
are changing the type of the data
because account ID is actually numeric.
We're making that a string value so that
we can look at the length. These are
common functions in any programming
languages, and so the syntax here in
SPL is quite simple. Just to be able
to get that contextual feel, we
understand we have 16 characters for
100% of our events in the account IDs.
So actually applying our rex command,
we are going to now specify a unique
field, not just underscore raw. We are
applying the sed mode, and this is a
sed syntax replacement looking
for the- it's a capture group for the
first 12 digits. And then we're
replacing that with a series of 12 X's.
So you can see in our first event, the
account ID is now masked, we only have
the remaining four digits to be able to
identify that. And so if our data was
indexed and is appropriately done so
in Splunk with the full account IDs, but
for the sake of reporting we want to
be able to mask that for the audience,
then we're able to use the sed
replace. And then to finalize a report,
this is just an example of the top
command which does a few operations
together and makes for a good
shorthand report, taking all the
unique values of the provided field,
giving you a count of those values, and
then showing the percentage
of the makeup for the total dataset
that that unique value accounts for. So
again, pretty flat in this tutorial data
in seeing a very consistent
.03% across these different account IDs.
So we have looked at a few examples
with the rex command, and that is
again, streaming. We're going to look at
another streaming command
which is going to be a set of
multivalue eval functions. And so again,
if you're to have a bookmark for search
documentation, multivalue eval functions
are a great one to have because when
you encounter these, it really takes
some time to figure out how to actually
operate on data. And so the
multivalue functions are really just
a collection that depending on your use
case, you're able to determine the
best to apply. You see it often used
with JSON and XML so data formats
that are actually naturally going to
provide a multivalue field where you
have repeated tags or keys across
unique events as they're extracted.
And you often see a lot of times in
Windows event logs, you actually have
repeated key values where your values
are different and the position in the
event is actually specific to a
condition, so you may have a need
for extraction or interaction with one
of those unique values to actually
get a reasonable outcome from your data.
And so we're going to use
multivalue eval functions when we
have a change we want make to the
presentation of data and we're able
to do so with multivalue fields. This I
would say often occurs when you have
multivalue data and then you want to
be able to change the format of the
multivalue fields there. And then
we're also going to look at a quick
example of actually using multivalue
evaluation as a logical condition.
So the first example.
We're going to start with a
simple table looking at our web access
logs, and so we're just going to pull
in our status and referer domain fields.
And so you can see we've got a
HTTP status code, and we've got the
format of a protocol subdomain
TLD. And our scenario here is that for a
simplicity of reporting, we just want
to work with this referer domain field
and be able to simplify that. So in
actually splitting out the field in this
case, split referer domain, and then
choosing the period character as our
point to split the data. We're creating a
multivalue from what was previously
just a single value field. And using
this, we can actually create a new field
by using the index of a multivalue field,
and in this case, we're looking at
index 012.
The multivalue index function allows
us to target a specific field and then
choose a starting and ending index to
extract given values. There are a number
of ways to do this. In our case here
where we have three entries, it's quite
simple just to give that start and end
of range as the
two entries
apart. So as we are working to recreate
our domain, and so that is just applying
for this new domain field, we have
buttercupgames.com in what was
previously the HTTP www.buttercup
games.com. We can now use those fields
in a transformation function. In this
case, simple stats count by status in
the domain.
So I do want to look at another
example here that is similar, but
we're going to use a multivalue function
to actually test a condition. And so I'm
going to,
in this case, be searching the same
data. We're going to start with a stats
command, and so a stats count as well as
a values of status. And so the values
function is going to provide all the
unique values of a given field based
on the split by. And so that produces
a multivalue field here in the case of
status. We have quite a few events
that have multiple status codes, and as
we're interested in pulling those events
out, we can use an mvcount function to
evaluate and filter our dataset to
those specific events. So a very simple
operation here, you're just looking at what has
the- what has more than a single value
for status, but very useful as you're
applying this in reporting especially in
combination with others and with more
complex conditions.
So that is our set of multivalue
eval functions there as streaming commands.
So for a final section of
the demo, I want to talk about a concept
that is not so much a set of functions,
but really enables more complex
and interesting searching and can allow
us to use a few different types of
commands in our SPL. And so the concept of
subsearching for both filtering and
enrichment is taking secondary search
results, and we're using that to
affect a primary search. So a subsearch
will be executed, the results
returned, and depending on how it's used,
this is going to be processed in the
original search, and that is going to-
We'll look at an example that it is
filtering. So based on the results, we get
a effectively a value equals X or value
equals y for one of our fields that
we're looking at in the subsearch.
And then we're also going to look at an
enrichment example, so you see this often
when you have a dataset maybe saved
in a lookup table or you just have a
simple reference where you want to bring
in more context, maybe descriptions of
event codes, things like
that. So in that case,
we'll look at the first command here. Now,
I'm going to run my search, and we're
going to pivot over to a subsearch
tab here. And so you can see our subsearch
looking at the secure logs.
We are actually just pulling out the
search to see what the results are or
what's going to be returned from that
subsearch. So we're applying the same
rex that we had before to extract our
fields. We're applying a where, a streaming
command looking for anything that's not
null for user. We observed that we had
about 60% of our events that were going
to be null based on not having a user
field, and so looking at that total dataset,
we're just going to count by our
source IP. And this is often a quick way
to really just get a list of unique
values of any given field. And then
operating on that to return just the
the list of values, few different ways to
do that, I see stats count pretty often.
And in this case, we're actually tabling
out just keeping our source IP field and
renaming it to client IP, so the resulting
dataset is a single column table
with
182 results, and the field name is client
IP. So when returned to the original
search, we're running this as a sub
search, the effective result of this is
actually client IP equals my first value
here or client IP equals my second value
and so on through the full dataset. And
so looking at our search here, we're
applying this to the access logs. You can
see that we had a field named source IP
in the secure logs and we renamed to
client IP so that we could apply this to
the access logs where client IP is the
actual field name for the source IP
data. And in this case, we are filtering
to the client IP's relevant in the secure
logs for our web access logs.
So uncommenting here, we have a
series of operations that we're doing,
and I'm just going to run them all at
once and talk through that we are
counting the status or we're counting
the events by status and client IP
for the client IPs that were relevant to
authentication failures in the secure
logs. We are then creating a status count
field just by combining our status
and count fields, adding a colon
between them. And then we are doing a
second stats statement here to
actually combine all of our newly
created fields together in a more
condensed report. So a transforming command,
then streaming for creating our new
field, another transforming command, and
then our sort for dataset processing
actually gives us the results here for a
given client IP. And so we are, in this
case, looking for the scenario that
these client IPs that are involved in
authentication failures to the web
servers. In this case, these were all over
SSH. We want to see if there are
interactions by these same source IPs
actually on the website that we're
hosting. So seeing a high number of
failed values, looking at actions also is
a use case here for just bringing in
that context and seeing if there's any
sort of relationship between the data.
This is discussed often as correlation
of logs. I'm usually careful about using
the term correlation in talking about
Splunk queries especially in Enterprise
security talking about correlation
searches where I typically think of
correlation searches as being
overarching concepts that cover data
from multiple data sources, and in this
case, correlating events would be looking
at unique data types that are
potentially related in finding that
logical connection for the condition.
That's a little bit more up to the user.
It's not quite as easy as say,
pointing to a specific data
model. So we are going to look at one
more subsearch here, and this case is
going to apply the join command. And
so I talk about using lookup files or
other data returned by subsearches
to enrich, to bring more data in
rather than filter. We are going to
look at our first part of the command
here, and this is actually just a
simple stats report based on this rex
that keeps coming through the SPL to
give us those user and source IP fields.
So our result here is authentication
failures for all these web hosts so
similar to what we had previously
returned. And then we're going to take a
look at the results of the subsearch
here. I'm going to actually split this up so that we
can see the first two lines. We're
looking at our web access logs for
purchase actions, and then we are
looking at our stats count for errors
and stats count for successes. We have
pretty limited status code to return in
this data so this is viable for
the data present to observe our
errors and successes.
And then we are actually
creating a new field based on the
statistics that we're generating,
looking at our transaction errors so
where we have high or low numbers
of failed purchase actions, and then
summarizing that. So in the case of our
final command here, another transforming
command of table just to reduce this to
a small dataset to use in the subsearch.
And so in this case, we have our host
value and then our transaction error
rate that we observe from the web access
logs. And then over in our other search
here, we are going to perform a left
join based on this host field. So you see
in our secure logs, we still have the
same host value, and this is going to be
used to actually add our
transaction error rates in for each
host. So as we observe increased
authentication failures, if there's a
scenario for a breach and some sort of
interruption to the ability to serve out
or perform these purchase actions that
are affecting the intended
operations of the web servers, we can
see that here. Of course in our tutorial
data, there's not really much that
jumping out or showing that there is
any correlation between the two, but the
purpose of the join is to bring in that
extra dataset to give the context to
further investigate.
So that is the final
portion of the SPL demo. And I do want
to say for any questions, I'm going to
take a look at the chat, I'll do my best
to answer any questions, and then if
you have any other questions, please
feel free to reach out to my team at
support@kennygroup.com, and we'll be
happy to get back to you and help. I
am taking a look through.
Okay, seeing some questions on
performance of the rex, sed, regex
commands. So off the top of my head,
I'm not sure about a direct performance
comparison of the individual commands.
Definitely want to look into that, and
definitely follow up if you'd like to
explain a more detailed scenario or
look at some SPL that we can apply and
observe those changes.
The question on getting the
dataset, that is what I mentioned at
the beginning. Reach out to us for the
slides or just reach out about the
link. And the Splunk tutorial data, you
can actually search that as well. And
there's documentation on how to use the
tutorial data, one of the first links
there, takes you to a page that has-
it is a tutorial data zip file, and
instructions on how to [inaudible] that, it's
just an upload for your specific
environment. So in add data and then
upload data, two clicks, and upload
your file. So that is freely available
for anyone, and again, that package is
dynamically updated as well so your time
stamps are pretty close to normal
as you download the app, kind of depends
on the time of the cycle for the
update, but search overall time, you
won't have any issues there. And then
yeah, again on receiving slides, reach
out to my team, and we're happy to
provide those, discuss further, and we'll
have the recording available
for this session. You should be able to,
after the recording processes when
the session ends, actually use the
same link, and you can watch this
recording and post without having to
sign up or transfer that file so-
So okay, Chris, seeing your
comment there, let me know if you want
to reach out to me directly, anyone as
well. We can discuss what slides and
presentation you had attended, I'm not
sure I have the attendance report
for what you've seen previously, so
happy to get those for you.
All right and seeing- thanks Brett.
So you see Brett Woodruff in the chat
commenting, systems engineer on the
expertise on demand team so very
knowledgeable guy, and he's going to be
presenting next month's session. That
is going to take this concept that we
talked about in the subsearching as a just
general search topic, he's going to go
specifically into data enrichment using
joins, lookup commands, and how we see
that used in the wild. So definitely
excited for that one, encourage you to
register for that event.
All right, I'm not seeing any more questions.
All right, with that I am stopping my
share. I'm going to hang around for a few
minutes, but thank you all for
attending. and we'll see you on the next session.