-
[Music]
-
Now, network monitoring has been around for a lot of
-
time, probably ever since the first
-
networks were invented. Just like with
-
any system, just like with any electronic
-
device, we tend to want to be able to
-
monitor if everything is going okay. We
-
want to receive warnings, we want to be
-
alerted when something goes wrong, when
-
when something fails. And this type of
-
monitoring is tremendously useful
-
especially in larger networks. Over time,
-
this monitoring has extended to security
-
monitoring as well. So we're not just
-
concerned about how is the network doing,
-
if it's working well, if you don't have
-
any failed devices, but we're also
-
starting to look at the network traffic.
-
How is the network utilized, who uses it,
-
who attempts to access it, what type of
-
traffic are they generating? And if we
-
try to gather all this type of
-
information, we try to make sense of it,
-
we try to correlate it, with a smart
-
enough device, we might be able to detect
-
attempts at intrusion or attacks that
-
are about to happen or that have
-
happened in the past or proves that
-
we've been compromised or somebody in
-
the network has been infected. And all
-
that information is in there if you know
-
where to look for and also if you have
-
the right tools to look for it.
-
In general, the term intrusion detection
-
refers to a system that is able to
-
monitor whatever can be observed in a
-
network, and in most cases, we're talking
-
about two things that can be observed.
-
First of all, we have Network traffic,
-
and then we have application events or
-
logs generated by the operating systems
-
by the applications running on those
-
OS's and so on. So coming back here to
-
our network focus, we've talked about
-
intrusion detection at your network level.
-
We're going to call this one a network
-
based intrusion detection system or NIDS.
-
And we have many commercial solutions as
-
well as open source ones that are able
-
to perform this type of network-based
-
intrusion detection. Of course, all the
-
major security vendors are doing it. In
-
many examples, you're going to see the
-
IDS functionality built into the
-
functionality of a larger firewall or a
-
larger UTM device, especially for major
-
vendors out there. But you also have
-
solutions in the open source area
-
such as Snort, Suricata, or Zeek or Bro.
-
They're all available. And some of them also
-
have commercial versions as well, but
-
they also provide you with three
-
versions that you can freely install and
-
try and run in your own environment. Now
-
the way these intrusion detection
-
systems work by definition is that they
-
rely on a database of signatures. And
-
those signatures are basically just a
-
way to describe how a specific traffic
-
pattern is supposed to look like in
-
order to detect a specific type of
-
attack or attempt at an intrusion. So we
-
might be looking at a sequence of
-
packets that looks in a certain way.
-
We might be looking at a specific type
-
of packet that doesn't play by the
-
normal protocol rules that it belongs to.
-
Or a specific type of payload or simply
-
just a signature, a byte sequence that
-
can be found in the packet
-
payload that indicates the fact that the
-
payload is malicious. And this behavior
-
is very similar to what you're seeing in
-
antivirus scanning or anti-malware
-
scanning. We're simply looking for a
-
sequence of bytes that indicates that-
-
well, if we find the sequence of bytes in
-
a specific executable file, it means that
-
the file is infected with that specific
-
virus that the sequence belongs to.
-
Now, in intrusion detection, again, we
-
we're kind of doing the same thing, right?
-
We're looking for patterns, but we're not
-
just scanning individual packets.
-
Sometimes we need to collect more
-
packets in a sequence in order to
-
determine if the behavior of the client
-
that is generating those packets is
-
abnormal, and if it's abnormal, does
-
it indicate an attack pattern or not? So
-
long story short, intrusion detection
-
systems are strongly dependent on a
-
database of signatures. Now, more advanced
-
instrusion detection systems could also
-
correlate this network information with
-
log information. So we're seeing
-
something fishy in the network by
-
looking at the network traffic, let's
-
check the application logs that the
-
traffic is going towards, for example.
-
Let's see how that application reacts
-
and if we can see some abnormal
-
logs being generated by the app as well.
-
Now, correlating that information, the
-
traffic and the logs, might tell us more about
-
the actual attack or might increase the
-
confidence of the fact that we really
-
have identified a valid attack signature.
-
Not all solutions are able to do this of
-
course. Also, a very important distinction
-
for intrusion detection system with an
-
emphasis on detection is the fact that
-
these systems are never able to block
-
the malicious traffic once they identify
-
it. It's just like the name says, it's just
-
detection, it's not prevention, all right?
-
So we're not stopping the traffic. We
-
might be able to see an attack signature.
-
We might be able to raise some alerts,
-
generate some syslogs, but we're not
-
going to be able to block that specific
-
type of traffic. One positive side for
-
this is that well, if the device is not
-
inside of the traffic path, then the
-
attacker might not even be able to
-
detect it.
-
So most likely, the IDS is going to
-
work with a copy of the traffic just to
-
analyze it, but it's not going to be able
-
to stop the malicious traffic. And the
-
attacker is not going to be able to
-
detect the IDS device and might not even
-
be able to compromise it if they
-
intend to. In most situations, the IDS
-
device doesn't even have a valid IP
-
address within the network that they're
-
monitoring, so it cannot be addressed, it
-
cannot be compromised by communicating
-
with it directly. Alright, so since we
-
mentioned the fact that an IDS works
-
with just a copy of the traffic, let's
-
see how can we generate that copy of
-
traffic, right? They're not within the
-
traffic path, so we need to make a copy
-
of the traffic and just send it in a
-
separate channel, on a separate channel
-
to the IDS device for analysis. Now, my
-
way of doing this is by enabling Port
-
mirroring or SPAN. In Cisco speak, this
-
is switchboard analyzer. Just a
-
functionality on layer 2 or layer 3
-
switches that allow us to configure the
-
switch, and we're basically telling it
-
well, whatever traffic you're seeing on
-
ports let's say one, two, and three make a
-
copy of that traffic and forward it out
-
of port number eight. And of course, we're
-
assuming that on port number eight,
-
there's an IDS device connected right
-
there. So we're basically telling the
-
switch to make a copy of all the
-
interesting traffic and send it towards
-
the IDS. And of course, you might be
-
thinking here well, what if the switch is
-
overloaded, what if there's more traffic
-
generated on those ports than the
-
mirror port can actually support. Well
-
that's true, it might happen. So in
-
cases when the switch is overloaded and
-
there's too much traffic in the network,
-
packets might be dropped, and also frames
-
with errors might not be forwarded to the
-
to the mirrored port either. So we
-
might not be able to see 100% of all the
-
traffic, but in most cases, it's going to
-
be enough. And it's also one of the
-
features that basically doesn't require
-
you to install anything else in the
-
network, it's just a functionality, just a
-
configuration, effort- just a couple of
-
commands on a switch. Another method for
-
duplicating traffic is by using a
-
passive or an active. It's basically
-
a layer 1 device called a TAP, a test
-
access port. It's nothing else than a
-
kind of like a T-connector where the
-
main cable goes from one end to the next,
-
and there's a third cable that actually
-
receives a copy of the entire traffic
-
going through that segment of cable.
-
The device is not a smart one, so it's
-
it's not like a switch. It's not going to
-
look at the destination frames and
-
forward entire packets. It's simply
-
going to duplicate the electrical or the
-
optical signals that it sees on the wire,
-
and it's going to make a complete and
-
identical copy of those signals onto the
-
third connection which, of course, is
-
is ideally connected to the IDS device.
-
Now, this type of approach is, again,
-
completely undetectable.
-
Span is not detectable either, right? And
-
it also copies entire frames regardless if
-
those frames contain errors or not. As we
-
said with port mirroring, while the
-
frames need to be correct in order to be
-
copied, well, with a TAP, the TAP doesn't
-
care. It's basically just a signal
-
repeater, and we can do this for both
-
copper cables and so electrical signals
-
as well as fiber optic so optical
-
signals. The TAP will not care, it will just
-
blindly copy all the signals that it
-
receives. And finally the third method
-
for monitoring traffic is by having the
-
IDS device in the traffic path
-
but acting as a transparent device. Again,
-
without an IP address, we're basically
-
becoming a layer 2 device that is part
-
of the same VLAN that they're
-
bridging, but they cannot be addressed on
-
the network, they cannot be detected on
-
the network, and they- if it's a true IDS
-
device, then it's not going to be able to
-
block the actual traffic that goes
-
through it. Now, having the device placed
-
inside of the traffic path opens us to
-
the possibility of actually blocking the
-
traffic, and that's going to be a
-
different type of solution called
-
intrusion prevention system. And we'll
-
get there in just a moment. There's one
-
more type of intrusion detection device
-
or solution and that is a software
-
solution that can be installed directly
-
on the workstations. So I'm not talking
-
about a box that listens to network
-
traffic on an entire segment, but we're
-
talking here about a software solution,
-
basically a program that runs on your
-
endpoint machine, on your host machine, be
-
it a laptop or a desktop. Now, this one is
-
called host-based instrusion detection
-
because it runs on the host, and it does
-
have pretty much the same benefit
-
or the same abilities as a network-based
-
instrusion detection, so it's able to look
-
at the network traffic going in and out
-
of your network interface. It's able to
-
look at the logs generated by the
-
applications on your system, but since
-
they are running as an application on
-
your system, they can become even smarter
-
because they might have access now to
-
the actual process table. They might be
-
looking at the kernel, you might be able
-
to look at the memory to see what
-
processes are running, when did they
-
execute, who executed them, with what
-
privileges, and they can also openly look
-
at encrypted traffic. So if you are
-
communicating over SSL with a website,
-
well a network-based instrusion detection
-
might not be able to understand anything
-
that's going back and forth because it's
-
encrypted, but your host-based intrusion
-
detection
-
is located at the end of that encrypted
-
tunnel, so it is able to see that
-
unencrypted traffic before it even
-
enters the encrypted tunnel and right
-
after it leaves the encrypted tunnel. So
-
it's able to actually watch the entire
-
traffic flow in an unencrypted form. And
-
again, since we have pretty much full
-
permissions on the monitored host in
-
order to be able to properly monitor the,
-
you know, the process table and the
-
network connections and the network
-
traffic, we could also have a look at the
-
files on the disk.
-
Why would you do that? Well that's
-
because monitoring the integrity of the
-
files on the disk, especially the
-
integrity of the operating system files,
-
and being able to detect when that
-
integrity fails, when a system file is
-
being replaced with a malicious one, when
-
a system file is becoming encrypted
-
or it is replaced with a completely
-
different version, that might be an
-
indication of compromise, that might be
-
an indication of the fact that you have
-
been infected with malware. So solutions
-
or functionality additional to host
-
based instrusion detection that monitor
-
files on your system, especially
-
operating system files, these are called
-
file integrity monitoring solutions.And
-
remember that we said that when we place
-
the intrusion detection device in the
-
traffic path, that
-
device actually becomes able to also
-
block the traffic that goes through it
-
which can make it an intrusion
-
prevention system, right? So detection
-
just alerts, just generate alerts or
-
events. Intrusion prevention is about
-
actually taking action or
-
acting upon the detected intrusion. So
-
what can such a device actually do
-
whenever they're seeing
-
something fishy going on inside of a
-
network? Well, they could do something as
-
simple as simply sending a TCP reset
-
packet to the originator of the
-
malicious connection. They could also
-
have some more advanced functionality
-
especially if it's the same
-
device that acts as a firewall. They
-
might be dynamically able to generate a
-
firewall rule to block similar traffic
-
like the one that was just detected as
-
being part of an attempt for an
-
attack or for a compromise. We could be
-
choosing if we're detecting something
-
that looks like a denial of service
-
attack, we could be choosing to limit the
-
amount of bandwidth that is allocated to
-
that specific type of traffic. Kind of
-
like policing that we're doing in
-
well, quality of service. In any case,
-
any type of action that the IPS device
-
can take against the malicious traffic,
-
we're going to call it active response.
-
And depending on how complex the device
-
is and how powerful the device is, you
-
might actually choose to look not just
-
at simple IPS or IDS signatures, but also
-
look for malware signatures. Yeah, that's
-
that's going to require you to, you know,
-
to decode encrypted traffic. It's going
-
to require you to identify potential
-
protocols that might be carrying files,
-
gather all those related packets that
-
belong to the same TCP stream to the
-
same flow, assemble them into an
-
executable file, store that in memory,
-
attempt to scan it with an antivirus
-
engine, and then determine if that flow
-
was actually malicious or not. Now, this
-
requires a lot of processing power. This
-
is going to create some sort of delay in
-
the networks of the users. They're going
-
to see their download unable to
-
finish or the application responding
-
slowly until the firewall, the UTM device,
-
or the intrusion prevention system is
-
actually able to scan those files
-
against malware signatures. On a lighter
-
approach, we could also just be looking
-
at URLs, looking for malicious domains or
-
domains that associated with
-
malware or with the command and control
-
servers. We might be looking at URLs in
-
order to categorize those URLs and
-
figure out the reputation of that URL
-
and decide whether we want the
-
communication to that specific website
-
to proceed or not. So regardless if the
-
device is an IPS or an IDS, the detection
-
methods are pretty much the same. Now, the
-
difference is just in what the device is
-
actually doing. Is it only alerting or is
-
it actually taking an active response
-
approach to the traffic? But the
-
detection part is pretty much the same,
-
right? And when talking about detection,
-
we are going to start with the basic
-
type of detection that is where we're
-
just looking for signatures in the
-
database, which, of course, means that we
-
need to have an up-to-date database for
-
the device to be able to detect the
-
latest and the greatest attack. Now, this
-
is basically one of the reasons why
-
people choose to pay for commercial
-
solutions because databases maintained
-
by a dedicated software or security
-
vendor that deals with intrusion
-
prevention, those databases are going to
-
be much more often updated and kept up
-
to date in order to mirror as best as
-
possible the database of all the known
-
attack patterns ever detected in the
-
world. Now with open source solutions,
-
you're still going to have
-
a pretty good level of protection, but
-
you might not be able to detect an
-
attack that was just identified
-
six hours ago. Nevertheless and
-
regardless how up-to-date your database
-
is, you're still limited by the attack
-
patterns listed in that database. If an
-
attack emerges and doesn't match
-
anything in your database, it's still
-
going to go through,
-
which leads us to a different approach,
-
and that is a behavioral approach. So
-
instead of looking at specific streams
-
of bytes, specific headers, specific
-
sequences of packets, let's look at the
-
overall behavior of an application or of
-
a protocol.
-
Does it look like it's doing what's
-
supposed to do? Is it generating more
-
packets than we're used to seeing? Is it
-
generating more traffic? Is it
-
generating an abnormal amount of control
-
information as opposed to a real
-
transfer data? And we call this
-
behavioral monitoring. Now, in order for
-
behavioral monitoring to work, we need to
-
have something to compare that behavior
-
to, and say well, if it goes outside of
-
the known ranges,
-
then it looks like something's fishy.
-
Well that known range is supposed to be
-
your baseline. So such a device or such a
-
system is supposed to be trained first.
-
You're supposed to just leave it inside
-
of the network for let's say a week or
-
two. Just let it figure out how
-
does a normal Monday morning look like
-
in your network when everybody comes
-
into work and they start logging in and
-
start updating their
-
machines and perhaps even their mobile
-
phones on the company Wi-Fi. But
-
nevertheless, you have to leave that
-
instrusion prevention solution learn what
-
does your normal traffic look like when
-
people start accessing internal
-
applications, when people start
-
accessing internet destinations, when
-
people start communicating, sharing files
-
between each other, when backups
-
start to happen at midnight perhaps,
-
right? You have to let it learn so that
-
in a couple of weeks when something goes
-
outside of the known range where an
-
application behaves the way it did not
-
behave in the first training weeks, then
-
it's going to be able to raise an alarm
-
and perhaps indicate the fact that the
-
application has been compromised or that
-
somebody is using it in order to elevate
-
their privileges or just compromise your
-
network. And as you can probably guess,
-
this is one area where machine learning
-
is going to provide you a lot of benefit
-
given that you take the time and efforts
-
to educate, to teach the machine learning
-
system. What does your normal baseline
-
look like? Now, of course, regardless how
-
complex or how well-tuned your solution
-
is going to be, there will be false
-
positives and there will be false
-
negatives, which is why I always tell
-
tell students there's a old saying that
-
I heard from someone in Cisco a long,
-
long time ago, and they said that IPS
-
without eyes
-
is useless. So IPS without human eyes is
-
useless. There's always going to be
-
the need to have a human being right
-
there evaluating and analyzing whether
-
the alerts generated by the intrusion
-
prevention or detection system are valid
-
or not. Does it need more fine-tuning or
-
do we need to raise an alarm? So what
-
devices can we actually find that
-
implement this type of advanced
-
functionality, be it detection or
-
prevention. Well unfortunately, this is
-
the place where we're slowly
-
stepping into the marketing area. That's
-
because the devices that we're going to
-
be listing here are not completely
-
different devices, but over time,
-
different naming conventions have
-
emerged, different marketing names have
-
been invented to make them sound cool, to
-
make them sound different from what the
-
other vendors were doing. So we're going
-
to start with the next generation
-
firewall, and we would had this type of
-
next generation
-
for about 12 or 15 years already. I've
-
been hearing the next generation term in
-
in IT security for so long that
-
I'm starting to wonder
-
are we still next generation, are we- have
-
we skipped the generation? Are we now in
-
the next next generation or where does
-
it stop, where does it end, where
-
does the next generation begin, right? Now,
-
unfortunately marketing people don't
-
really ask themselves these questions. So
-
we're kind of stuck with this
-
terminology for now, and we're gonna keep
-
calling you next generation until I
-
don't know when, but regardless, a next
-
generation firewall is basically just a
-
layer 7 firewall. That's an application
-
layer firewall which is able to look at
-
the application layer payload, so we're
-
actually seeing the data being sent,
-
we're not just looking at the packet
-
headers. And it also has some sort of
-
detection or prevention system built in,
-
okay? So we have an IPS or an IDS built
-
in, which leads us back to the discussion
-
that we had before. So we have an
-
application layer firewall which can be
-
enriched with additional functionality.
-
Now that we have access to the actual
-
application payload, well, why not
-
look for intrusion signatures, why not
-
look for malware signatures, why not look
-
for spam signatures, right? So depending
-
on how complex the device is, if it at
-
least has IPS functionality built in,
-
we're going to call it a next
-
generation firewall. And here's the funny
-
part, if the next generation firewall has
-
a bunch of other additional features on
-
top of the IPS functionality, such as
-
malware scanning, antivirus scanning,
-
perhaps looking at the files and being
-
able to implement some data loss
-
prevention policies, it's able to look
-
at the URLs and categorize them and
-
analyze the reputation of the web pages,
-
and pretty much everything that we could
-
possibly think of that we could be doing
-
just by looking at the application data,
-
then we're going to call this a unified
-
threat management device, a UTM device.
-
Again, I don't think I need to repeat
-
this, but the more complex the device
-
becomes, the more stuff it needs to do in
-
order to decide weather to allow a
-
packet or not, the more resources, the
-
more CPU intensive it's going to be, the
-
more memory it's going to require, and the
-
more delay that is going to be introduced in the
-
network. So keep this in mind. Even though
-
it kind of sounds cool, right, to have all
-
that security functionality in a single
-
box,
-
which by the way, try to make sure
-
it's not a single box of failure, single
-
point of failure, all right? [Laughs]
-
Even though it sounds cool to have all
-
this functionality in one place,
-
it's going to hit your performance
-
pretty badly, right? So keep this in mind.
-
Don't just enable everything blindly
-
because the end users, the applications,
-
and well, God forbid your
-
customers, you're paying customers,
-
they're going to feel the effects of
-
your awesome UTM device, and
-
their application experience is going to
-
suffer. Now, a special type of network
-
monitoring device can also be considered,
-
a web application firewall. We've briefly
-
mentioned about web application
-
firewalls in a previous video, and we
-
said that a WAF, a web application firewall,
-
is just a dedicated firewall that is
-
specifically trained and educated to
-
look at attack signatures aimed at web
-
applications. So we're looking for things
-
such as cross-site scripting, we're
-
looking for,
-
you know, directory traversals, we're
-
looking at SQL injection attacks. We're
-
looking at pretty much anything that
-
could be performed by malicious user
-
that is trying to exploit a input
-
validation flaw in a web application. So
-
it's still an application layer firewall.
-
It still looks at the application
-
layer payload. It's just that it's a bit
-
more let's say, picky about what type of
-
traffic is it going to analyze. It's
-
only going to look at web traffic, and
-
it's only going to look for web
-
attacks, web application attacks. It's
-
mostly going to rely on signatures.
-
That's because we cannot really do much
-
when it comes to requests coming in from
-
our clients. Behavioral analysis
-
doesn't really play well here because
-
most attacks, especially web application
-
attacks, are just one single request, one
-
single query with a malicious payload.
-
So in many situations, it's going to be
-
either black or white, right? We're
-
detecting an attempt at an intrusion,
-
we're detecting an attack in that
-
request or not. It's pretty much not
-
going to be much of a gray area with
-
web application firewalls. And you could
-
deploy a WAF as a separate device. It
-
could be a physical box, it could be a
-
virtual machine, it could be a
-
functionality within a UTM device, again,
-
all in one wonders. But it can also be a
-
part of the web server itself. So we have
-
plugins that install alongside the
-
actual web server that is hosting the
-
web application, such as plugins for the
-
Apache web server, for the IIS web server,
-
on Windows server, or for Nginx. So
-
we're installing these plugins right
-
there, and their purpose is to scan the
-
traffic that's coming in from the
-
clients before allowing that request to
-
be processed by the web server. Having
-
something such as a plugin that runs
-
alongside the web server on the same
-
machine, on the same box, opens us to the
-
risk of either having that machine
-
compromised by an attacker who, this time
-
doesn't target the web application, but
-
targets the scanning engine and can
-
intentionally cause, for example, a denial
-
of service, give it so much traffic to
-
analyze that the web server running on
-
the same machine is unable to actually
-
respond to valid requests. So there you
-
have it that's the denial of service attack.
-
Now, when it comes to actually monitoring
-
the network traffic, we said that a
-
solution would be to just simply mirror
-
all the traffic, and then look for
-
specific attack patterns inside of that
-
traffic. Now, that might not be always
-
feasible because the amount of traffic
-
entering a data center or the server
-
front that hosts an application might be
-
huge, right? So in some situations, we
-
might not be able to analyze the exact
-
amount of traffic that goes in, but we
-
might be able to generate a summary of
-
that traffic and then analyze that
-
summary for intrusion attempts. Now, this
-
traffic summary is sometimes found under
-
the terminology of NetFlow or sFlow or
-
jFlow, which is basically just a
-
technology implemented by various
-
vendors out there in which instead of
-
creating an exact copy of the traffic,
-
we're simply summarizing that traffic,
-
and then reporting that summary back to
-
some analysis software. So we're only
-
telling it what type of sources, what
-
type of destinations have communicated,
-
how many bytes were used, what type of
-
protocols have been used,
-
what type of flags have been set in that
-
specific type of traffic. But we don't
-
put the burden of sending the entire
-
actual traffic in the entire payload to
-
that analysis software. Now, this also
-
means that we're losing application
-
layer visibility, all right? Since we're
-
just summarizing the type of traffic,
-
we're only describing the metadata about
-
that traffic, we're losing everything that
-
pertains to the application layer, but
-
we're gaining a lot of performance, and
-
we can also store this summary
-
information long term for further
-
analysis somewhere along the line in the
-
future. Sometimes, looking at traffic, it's
-
simply not feasible. Maybe we cannot grab
-
all the traffic that's running through
-
the network. Maybe we don't have network
-
devices smart enough to generate those
-
summaries, those flow
-
reports for us. So another solution would
-
be to simply have a software monitoring
-
solution or so-called a network
-
performance monitor that queries
-
periodically your networking devices,
-
queries your routers, your switches,
-
your wireless LAN controllers, your
-
firewalls perhaps about the status of
-
their physical resources, status of their
-
interfaces, how much traffic is going
-
through their interfaces, what's the CPU
-
load, what's the memory usage, what's the
-
structure of the routing table, how does
-
the r table look like, how is the DHTP
-
traffic looking like, right? So any type
-
of monitoring information that can be
-
extracted out of these networking
-
devices, which, in turn, can be correlated
-
in order to figure out if we can see
-
some anomalies in there. One such
-
solution is, for example, SolarWinds NPM,
-
network performance monitor, which is a
-
dedicated solution for monitoring not
-
just networking devices, but also servers
-
and virtual machines about their
-
their health, right? How are their network
-
interfaces looking like, how much load is
-
there on their hardware resource or
-
their hardware components, are they
-
generating any alerts, do we have failed
-
interfaces, do we have failed processes, do
-
we have something that's- failed links,
-
are we detecting errors or overloaded
-
devices? Stuff like that.
-
Now, this type of performance monitoring
-
can be done over a variety of protocols.
-
In most cases, the SNMP protocol is going
-
to be used because it allows us to
-
report a lot of the hardware counters
-
and a lot of the interesting information
-
that we want to gather and store long
-
term. Also, we might be using WMI such as
-
Windows management instrumentation and a
-
couple other protocols as well. And of
-
course, we could enrich this collection
-
by collecting logs from the monitored
-
devices and appliances as well. And we
-
could be collecting those logs over
-
syslog, so we need to configure the
-
device to actually send those syslog
-
messages or at least a copy of them to
-
the monitoring device. Or we could rely
-
on an agent, an additional piece of
-
software installed on the server on the
-
virtual machine that periodically
-
reports back to us everything of
-
interest regarding that specific
-
host. When talking about dedicated
-
software design specifically designed to
-
analyze a lot of information coming from
-
the network be it network traffic, network
-
summaries such as NetFlow, logs, and any
-
kind of application data, that solution
-
is most likely going to be called a SIEM,
-
a security information and event
-
management. Now the keyword and the
-
definition of SIEM is correlation. That
-
is it's not just a place where you just
-
dump all that information in a huge
-
database, it's a place that as you dump
-
that information, it's going to look for
-
patterns inside of it. It's going to try
-
to correlate network traffic with logs
-
or application data with NetFlow
-
data in order to figure out if some
-
anomalous behavior is detected in your
-
network. So SIEM solution, and by the way,
-
these are pretty expensive solutions out
-
there, are never designed to be just log
-
storage, right? They're engines, smart
-
engines based on machine learning that
-
aim to detect patterns of intrusion by
-
analyzing and correlating information
-
found in multiple log files, and what's
-
interesting about the implementation of
-
SIEMs is that they're supposed to
-
collect logs from your network devices,
-
from your security devices, even from
-
your workstations, and your mobile
-
devices perhaps. And they're able to
-
understand and correlate all that
-
information and normalize all that
-
information even if it comes from tens
-
or hundreds of vendors or thousands of
-
devices,
-
and they're able to normalize that
-
information and make it look the same so
-
that in the end,
-
it can look for patterns inside of it,
-
and it also allows you to perform
-
queries in a language quite similar to a
-
regular SQL language and query all that
-
information regardless of the fact that
-
it actually came from tens of
-
hundreds of different vendors. And since
-
a SIEM without machine learning
-
functionality is not a very useful SIEM,
-
we could use that machine learning
-
features to look at user behavior as
-
well because in the end, we're trying not
-
to detect just, you know, attack patterns,
-
we're also trying to identify who is
-
conducting them. And a great risk comes
-
from insider threats, so if we are able
-
to monitor what our users are doing,
-
we're not talking here about just
-
watching what websites they're
-
visiting or taking frequent screenshots
-
of their workstations, no we're
-
not doing that, but we're looking at the
-
behavior that they're exhibiting
-
whenever they are interacting with
-
specific applications. And if the SIEM
-
has such an ability, we call that ability
-
user and entity behavior analysis. Don't
-
think that we're only performing here
-
a witch hunt against insider threats.
-
Think about the fact that we might be
-
able to detect abnormal behavior because
-
a user account has been compromised by a
-
hacker, and that hacker is now acting on
-
behalf of that user. The user might have
-
nothing to do with that abnormal
-
behavior, might not even know about it,
-
might not even be logged in at that
-
specific point in time. But the attacker
-
might be acting on behalf of that user.
-
If we're able to detect that abnormal
-
behavior, we might be able to detect the
-
attack going on right then. And stepping
-
just a bit into the realm of science
-
fiction here, I know that some vendors
-
will say no, this is not science fiction,
-
we're selling this, we've had huge
-
success with this. Well, yes and no. I'm
-
going to keep being a bit skeptical as to
-
how efficient this approach is. What I'm
-
talking here about is sentiment analysis
-
or emotion AI. tTat is analyzing user
-
behavior in what content the user is
-
actually creating as in blog posts,
-
social media postings.
-
We're not talking here about actual, you
-
know, analyzing the contents of emails
-
and chats because that might, you
-
know, step into the privacy area which we
-
might not want to do that. But by
-
analyzing publicly available information
-
generated by those users, we might be
-
able to detect disgruntled employees. We
-
might be able to detect unsatisfied
-
clients that might create some bad
-
reputation for the company, perhaps even
-
before they become so upset as to take
-
action or malicious action against our
-
company. Again, take this with a grain of
-
salt, and don't just think that if it
-
sounds awesome on paper, it has to be
-
awesome in real life. If it sounds too
-
good to be true, then it probably is too
-
good to be true.
-
And finally, the last term here that I
-
wanted you to know about is SOAR,
-
security orchestration, automation and
-
response. That's a mouthful, I know. It's
-
usually a functionality built into SIEM
-
solutions or it can be just a standalone
-
solution. What it basically tries to
-
address is the problem of too much
-
information that is being overwhelmed by
-
too many alerts, too many security events,
-
too many security incidents, too many
-
incidents that we need to determine if
-
they're security related or not. [Laughs]
-
Basically the hell of any IT Department
-
that deals solely with monitoring the
-
network and the applications. And the
-
idea behind this is that a SOAR
-
solution is supposed to use some machine
-
learning techniques in order to not just
-
to figure out which anomalous events are
-
occurring in the network, but by
-
analyzing those anomalous events, it is
-
able to take some action against them. So
-
it could, at some point, determine if an
-
attack is going on, even if it happens
-
in the middle of the night, and take
-
action immediately by blocking some
-
ports, by creating an access list, by
-
disabling- temporarily disabling some user
-
accounts that might have been
-
compromised. So that's security
-
orchestration, automation and response.
-
Just be sure everybody is clear on this,
-
especially for the exam, where does the
-
SIEM get its information from. Where
-
first of all, it's going to get it from
-
logs, right? Syslogs. That's going to be
-
the main source of information. How do
-
you collect logs? Well you don't really
-
collect them. You expect those devices to
-
send those to you, so those devices need
-
to be configured be it networking
-
devices. They might be servers, they
-
might be virtual machines, whatever type
-
of device you have, just configure them
-
to send your logs to a secondary
-
destination if the SIEM is not the
-
primary one. Just make sure they send a
-
copy of those syslogs to the same device
-
as well. Next, the SIEM can also collect
-
data by installing agents on specific
-
systems. Now of course, we might not be
-
able to install agents on let's say
-
routers or switches apart from some
-
recent devices that are running Docker
-
containers perhaps. But in most cases, SIEM
-
agents are designed to be installed on
-
Windows and Linux systems. Then they're
-
running as background processes that
-
periodically scan the system and
-
report back to the SIEM. The logs
-
generated by the operating system, the
-
running applications, the logs generated
-
by the applications, actually, running on
-
that host, depending on how the agent is
-
configured. The built-in listeners or
-
collectors that you're seeing here on
-
the slide refers to the fact that the
-
SIEM is pre-configured or has plugins
-
that allow it to understand what
-
different vendors are reporting back to
-
it. So it's going to have different
-
plugins to understand logs coming in
-
from, you know, Cisco devices, HP devices,
-
Dell, VMware, whatever vendor it is, it
-
needs some sort of a plugin to
-
understand that specific log format and
-
more than that, it needs to
-
understand the contents of the payload of
-
what the log is saying. SNMP traps, again,
-
most monitoring information is going to
-
come in through an SNMP query or as an
-
SNMP trap generated by the device back
-
to the SIEM. And also NetFlow. NetFlow or
-
different variants implemented by
-
different vendors are basically just
-
summaries of the traffic flows detected
-
over a certain period of time, collected,
-
and then sent over to the SIEM device
-
in order for that traffic summary to be
-
analyzed. Finally, the SIEM can also
-
capture raw packet data if it has
-
dedicated sensors that are able to
-
generate a copy of the traffic and send
-
it back to the SIEM, or we can even have
-
sensors installed inside our network that
-
are monitoring real traffic, and they're
-
only telling back to the SIEM or they're
-
reporting back to the SIEM a summary of
-
that traffic. This is very useful when
-
your devices don't have enough reporting
-
or monitoring capabilities to report
-
back to the SIEM device, and instead, you need
-
to install some specific sensors that
-
look at the traffic, and then tell the
-
SIEM the necessary information that it
-
needs to perform those correlations.
-
Sometimes a sensor such as this one might be
-
an IPS or an IDS device even. Log
-
normalization is a feature built into
-
most SIEM Solutions out there. And
-
normalization is required, and it's a
-
very important feature because the SIEM
-
is designed to collect information from
-
hundreds of vendors and thousands of
-
different appliances, each of them
-
running different operating systems on
-
different versions, and they're all
-
building syslogs and SNP traps in
-
different formats. Some are
-
reporting them as a text, some are
-
generating logs in binary format, some
-
logs are in JSON format, some are in XML
-
format or CSV format, depending on how
-
the vendor actually designed its logging
-
and monitoring abilities. We might even
-
find differences as to how the logs are
-
actually encoded. Some of them are might
-
be using UTF, some of them might be using
-
some regional encoding. We might even run
-
into some issues due to the fact that
-
the new line character is represented
-
differently between Windows and Linux
-
systems, and that also might be reflected
-
in the payload included in the logs that
-
we're receiving as part of the
-
monitoring process. Not to mention the
-
fact that the SNMP mips, basically the
-
the database schemas that each vendor is
-
using for their own software solutions
-
or hardware appliances, these are
-
completely different not just among
-
vendors, but also among different
-
products from the same vendor. So in
-
order to have all this bunch of
-
information collected in some
-
centralized location and to be able to
-
query all this information and to be
-
able to approach it in a consistent
-
manner, we need normalization. That is
-
taking all this information coming from
-
so many vendors in so many formats and
-
making that information look exactly the
-
same so that it can be stored in a
-
single database that can be queried at
-
once regardless of the source of that
-
information. So what are we using to
-
normalize all this information coming
-
from all these vendors? Well, you guessed
-
it? We're gonna need some plugins. Some of
-
these plugins come from this SIEM vendor
-
itself. So they're going to be
-
pre-packaged with vendor plugins
-
from major vendors out there. Some of
-
these plugins are going to come from the
-
actual vendors. So if a smaller vendor
-
creates them, let's say smaller
-
firewalls at some point, and they want to
-
be able to integrate with the
-
large-scale SIEM Solutions, they're going
-
to provide you with a plugin for their
-
own environment as well. And another type
-
of normalization that is really, really
-
important is timestamp normalization.
-
Don't forget that we're looking for
-
anomalies in network traffic and in
-
network events. And if we don't have
-
timestamp normalization, if we don't make
-
sure that all the events that we're
-
looking at are actually stored with
-
their right timestamp, at their right
-
moment in time when they actually
-
happened, we have no chance of detecting
-
anomalies in the network. So we might
-
have devices that have a badly
-
configured clock. We might have devices
-
that have been configured for different
-
time zones. We might have devices that
-
display time or timestamp, and those time
-
values in their logs in one
-
format versus another format. Some of
-
them might be using 24 hour, some of them
-
might be using 12 hours. Some of them
-
might include the daylight savings time.
-
Some of them might be using a UTC or
-
Unix epoch time. It's up to the vendor, so
-
normalizing these timestamps is also a
-
very, very, important topic here that
-
needs to be taken care of by the SIEM
-
solution before that event indicated by
-
that specific timestamp is stored in the
-
database alongside with the others. Now,
-
the way a SIEM solution can look for
-
anomalies in that huge database that we
-
just talked about. Well, it could be done
-
in a number of ways. We could just rely
-
on simple if then else matches, so we're
-
looking for, you know, specific events,
-
specific types of logs being generated
-
in a specific time range perhaps. This
-
type of approach is the fastest one
-
because it basically boils down to
-
a simple query in that huge database
-
stored by the SIEM and appliance.
-
Unfortunately, if there are unknown
-
threats, if there are attacks that we
-
know nothing about, that we don't have a
-
signature for them, we don't know what to
-
look for, we're not going to be able to
-
detect them. Kind of makes sense, right? So
-
another approach would be heuristic rule
-
matching. This is a type of rule matching
-
where we're not exactly looking for an
-
exact match
-
for the specific type of event, but we're
-
looking for something that it's pretty
-
close to it, all right? So this type of
-
approach
-
relies on a more permissive set of rules.
-
So if it doesn't 100% match or rule, let's
-
say if we have some events that are
-
pretty close to it and match it like
-
let's say 80% or 90%.
-
Now this also requires you to fine-tune
-
your rule set, so if at some point by
-
doing heuristic rule matching, you're
-
detecting some anomalies, but you don't
-
have a rule that matches that anomaly
-
100%, well, you better create it, right?
-
You better fine tune your rule set and
-
add some more rules or tweak the
-
existing ones to match that newly
-
detected anomaly, and just to recap this
-
here, an behavioral analysis implemented
-
in a SIEM relies on the fact that you need
-
to build a baseline. You need to tell the
-
SIEM how does your normal look like, how
-
does your normal traffic look like, how
-
does your normal logs generated by all
-
the devices and all the applications in
-
your network looks like. So that, in turn,
-
can be used as a starting point in order
-
to detect potential, well, mismatches that
-
might indicate attacks or attempts at
-
compromising your network. Now of course,
-
this is going to create a lot of false
-
positives. So you might run into a situation
-
where an alert is being raised because
-
an application starts generating some
-
huge backups because some admin has
-
modified the backup policy. Now the SIEM
-
device sees a lot of traffic in there,
-
raises an alert, raises everyone from their
-
sleep at 3am in the morning, and saying
-
that, oh my god, this looks like a data
-
exfiltration attempt. Somebody is
-
dumping all the data from our database,
-
and then an admin has to come in and
-
intervene and say, my dear, SIEM, what's
-
happening in there, what you're seeing is
-
just a full backup happening at 3am in
-
the morning. It's okay, right? Don't freak
-
out about it, okay? So it does require
-
human intervention for fine tuning these
-
rules.
-
On the other hand, we have anomaly
-
analysis. And this is, by definition, a
-
type of analysis that is performed
-
whenever we're comparing observed
-
behavior with known standard behavior,
-
especially when we're comparing what
-
we're seeing as part of a protocol's
-
behavior with what this SIEM device
-
knows that the protocol is supposed to
-
behave according to its RFC, according to
-
its definition. Finally with trend
-
analysis, we're going to be looking at
-
historic data and try to extrapolate it.
-
For example, if we see that the backups
-
are increasing every single week because
-
more data and more data is generated, the
-
SIEM device might be able to generate a
-
pattern so that if we see five gigabytes
-
in a backup this week, and eight gigabytes
-
of backups next week, when it is going to
-
see 12 gigabytes two weeks from now, it's
-
not going to raise an alert because it
-
expected the backup volume to increase
-
by that amount. But I don't need to tell
-
you that not everything can be safely
-
predicted this way. Finally, after all
-
that advanced correlation and machine
-
learning and AI features, the SIEMs
-
actually can be used as a database for
-
event storage, and they can be queried by
-
human users, by admins if you know what
-
to look for. Perhaps you just need to
-
investigate some event. Perhaps you need
-
to perform some forensic analysis.
-
So those databases become available to
-
you, to any admin basically, simply by
-
creating specific rules in order to
-
match specific types of events stored in
-
there. So you could create simple rules
-
that are they're matching based on
-
specific conditions. Look for one
-
specific IP address or look for a
-
specific time range, look for one
-
specific string that might occur in all
-
those log payloads. Maybe look for a user
-
and see what are the events that are
-
that are generated by the user or that
-
implicate that user and so on and so
-
forth. So the SIEM appliances are going
-
to allow you to create some queries very
-
similar to what you might be already
-
used to if you ever used SQL in the past
-
because all that data is basically
-
stored in a relational database which
-
can be queried with an SQL like
-
language. And finally, don't forget that
-
at the end of the day, not everybody has
-
money to invest in a SIEM solution, so you
-
might end up having to analyze your logs
-
by yourself, just navigating a bunch of
-
logs. And this is where a bunch of text
-
matching utilities, especially some
-
utilities that are built into most Linux
-
distributions are going to come in and
-
help you tremendously. Now, this is not a
-
Linux course, and the exam is not going
-
to expect you to know everything about
-
all these command line commands. But I
-
would say that knowing at least the
-
commands right here on the slide is
-
going to help you figure out a couple of
-
the outputs on the exam. Alright, so without
-
going into too much detail here, let's
-
have a look in one of my folders here
-
that stores log files and a Ubuntu
-
distribution, this is running on WSL,
-
right, Windows subsystem for Linux. We
-
have a log file right here, dpkg log
-
which is the log that's generated by the
-
package managers. So this log is going to
-
tell me which package-based operations
-
have been conducted on this machine
-
from it's beginning, from its
-
installation, right? What did I install,
-
what did I uninstall, what did I upgrade?
-
So it might be some useful information
-
in here. So let's just see a couple of
-
these commands. 'cat' is the concatenate
-
command in Linux and can also be used to
-
list the contents of
-
text files. So cat dpkg log is going to
-
provide you a bunch of listing right
-
here, trying to display all the contents
-
of the text file right at the console.
-
Now, this file right here, we can also
-
pipe it. So resend the result of this cat
-
command to another command, which could
-
be word count, word count minus l. This is
-
going to count the lines in this log file.
-
So you can see it's over 9000 lines
-
long. Pretty tough to search for some
-
information in a 9000 line log file. So
-
what we can do right here is, for example,
-
limit the amount of information that
-
we're displaying on the screen. This is
-
where the head or tail commands come in.
-
The head command, as you can probably
-
guess, is going to provide you with a
-
listing of the first 10 lines in this
-
log file. Similarly the tail command is
-
going to provide you a listing of the
-
last 10 lines in a log file. The tail
-
command is very useful for log files
-
that get appended frequently. So if you just
-
want to see the last modifications
-
made in this file, use the tail
-
command. Of course, the number of lines is
-
configurable. We're not going to go into
-
all these parameters right now. If you're
-
interested in finding out more about any
-
Linux command, any Linux utility, just use
-
the man pages, man tail,
-
and it's going to provide you with the
-
manual pages that are going to tell you
-
what are all the possible configuration
-
flags or settings that can be added to
-
this command. Here's the dash n, for
-
example, number of lines. Output the
-
last number of lines, you can add it as a
-
minus n parameter or dash dash line
-
equals how many lines you want to
-
display on the screen. Quit with the
-
letter Q. Now, the grep utility is a
-
regular expression evaluator, which can
-
be, of course, used to run some complex
-
regular expressions, which are going to
-
help you tremendously dig through a lot
-
of information aand extract what is actually
-
useful to you. But you can also do some
-
very simple string matching using grep.
-
For example, if we are displaying the
-
dpkg log here and piping this to the
-
to the grep command and search for, let's
-
say, installation of a specific package,
-
such as, let me see, ansible, right? I did
-
use this machine for ansible in the past.
-
So there you go. These are all the log
-
entries in here generated by the ansible
-
package. Notice that we've been through a
-
number of ansible versions in here.
-
Starting from version 2.8.1, we went
-
through 2.9.19, 2.9.27, and so on. We can
-
even see the evolution of this package
-
on this machine. Now, this is just a very,
-
very simple example here. I just wanted
-
to let you know that you do have a lot
-
of utilities available at your disposal
-
for manual log searching if you don't
-
have a SIEM solution available, all right?
-
Now, there's a lot more to talk about
-
this, but since this is not a Linux
-
training, we're gonna stop right here.
-
Alright everyone, thanks so much for
-
watching. I know there's been a lot of
-
information in this video, but I hope you
-
found this useful and informative, and I
-
hope to see you on the next video as
-
well. Don't forget to leave a comment if
-
you like this. Support the channel if you
-
can, if you wish, if you find this useful
-
in your studies, and see you in the next
-
video. Bye, bye.
-
[Music]
-
[Music]