//////////////////////////////////////////////////////////////////////////
can help determine the sophistication of
their systems and the potential advantages
and pitfalls.
Compare data sources. What
is the data set used in the
artificial intelligence?
Over time, security analytics has advanced
from events, to alerts, to incidents. As more
systems become digital and cloud-based,
there’s even more data available (think IoT
sensors and micro-service interactions). The
increasing data volume has improved the
potential benefit while exacerbating the
challenge of finding meaning in the data
quickly enough to act on it effectively.
For enterprise threat detection and analysis,
meaningful data depends on context
from application, transaction, user and
session visibility. These data elements
may come from different sources, or from
network traffic analytics that capture and
analyse the entire L2-7 data set, subsets, or
metadata. The highest-level (L7) application
data set offers the most interesting data
for behavioural anomalies and pattern
matching; the lower-level packet data offers
the most details for forensics.
What network data
architectures do you support?
Be sure your vendor can collect the network
data in your evolving business architecture,
including north-south, east-west and cloud
visibility. The network is the best source
of data from which AI systems can learn
about all aspects of your business, because
every significant activity touches the
network. And that dependency is rising,
as cloud services, including Software-as-
a-Service (SaaS), depend on the network
as well. Further, wire data is different from
logs and other self-reported data, as it is
empirically observed.
What data do you collect from
encrypted traffic?
Encryption is increasingly a default
technique used by the good guys to foil
government monitoring and protect data
privacy and also used by the bad guys to
foil security tools and extend the lifespan
of new techniques. More of this traffic
is using advanced encryption based on
www.intelligentcio.com
perfect forward secrecy (PFS) rather than
public key encryption (PKE). Conceptually,
unlike the shared key models of PKE, PFS
encrypts each session with a single-use key.
If someone steals the key, they only access
that individual session’s data.
The increasing prevalence and complexity
of encryption presents a challenge for
AI-driven analytics. Decryption approaches
can introduce a time lag and counteract the
security benefits of encrypting in the first
place. AI-driven security analytics have to
be able to decrypt data at line rate in a way
that doesn’t expose sensitive data to risk.
Few tools meet this requirement.
Decryption is a common feature of many
perimeter (north-south) security systems,
including firewalls and web gateways.
However, many remote users interact
directly with the cloud provider, without
traversing traditional on-premises networks.
And perimeter systems perform other
control duties. When resources get strained,
the system will let encrypted traffic go
through uninspected.
One hedge for analytics systems is to only
decode headers and other metadata for
encrypted traffic. This leaves a huge blind
spot. The actual content of the traffic
may include info that is vital to catching
bad actors: malware, suspicious database
commands, sensitive files, SQL injection
attacks, command and control behaviour
and more.
Security-aware companies are encrypting
to improve their defences against friend
and foe. The downside of all this encryption
is that an attacker or insider who gets into
the network will be obfuscated as well,
proceeding without anyone monitoring. In
attack terms, east-west activities include
reconnaissance scans for vulnerable
and desirable targets, lateral movement
between devices and transfer of data from
an internal-only system to a system with
external access privileges. These actions are
low-frequency, high-risk activities that are
ripe for AI analytics.
It’s likely that your business will encrypt
more traffic within and beyond your
enterprise, so the vendor’s model for
analysing this traffic should be a make-or-
break consideration.
FEATURE: THREAT ANALYSIS
How many and which protocols
are providing data?
Once you peel off encryption, there’s a
wealth of data at the application protocol
layer. Most businesses have many different
protocols running, yet most analytics start
with just one or a few. Visibility into your
existing protocols with extensibility to your
custom protocols will be very important in
detecting relevant activities with AI. As you
evolve your digital business and introduce
new specialty systems (IoT sensors, supply
chain partners, cloud services), customisation
features may become vital.
Compare data science.
How do you train and tune
your AI engines?
There’s value in both supervised and
unsupervised machine learning. Supervision
can help refine and evolve algorithms to
improve accuracy and detection of new and
specialised threat artifacts. Unsupervised
training can be used to identify previously
unknown attacks and insider threats, plus
you don’t have to pay for, or wait for vendor
staff to enhance detection.
The devil is in the details. Some vendors
perform AI on a dedicated system at the
customer site, so learning comes only from
local detections, plus occasional updates
from the vendor. Other vendors collect data
from multiple customers and train AI engines
across customer data sets. This shared
education advances detection more quickly
through crowdsourcing of data. When you
overlay these data training strategies onto
the variations in data sources discussed in
the previous section, AI analytics quality
can drop quickly. Plus, the data shared with
the cloud may be minimised or anonymised
to protect privacy. These three variables;
type of training, data sources and data
anonymisation, all affect the potential for
accurate, timely AI.
How do you reduce
false positives?
Early efforts in AI often add to the decision
clutter with alerts of dubious value. This
weakness can be offset by better detectors
(algorithms), better data sources (such as
the list of file names communicated rather
than simply a record of communications),
INTELLIGENTCIO
61