Derek Lin, Chief Data Scientist at Exabeam, talks to us about AI, ML and Deep Learning and how to avoid getting lost in the hype.
When it comes to Artificial Intelligence (AI) and Machine Learning (ML), there’s no shortage of buzz and hype. Often referred to interchangeably, Artificial Intelligence and Machine Learning are part of our daily reality and technology lexicon – whether it’s in a product marketing pitch or a Netflix recommendation for which film to see.
In cybersecurity, as these and other emerging technologies like Deep Learning (DL) evolve, their capabilities have become a driving force shaping modern cybersecurity solutions. At the same time, security practitioners, fatigued by the barrage of AI and ML messaging, are raising suspicions about vendor claims.
At the InteropITX conference in 2018, panellists echoed the same sentiment about the hype, asking what can be legitimately claimed as Artificial Intelligence. The audience was encouraged to look beyond the marketing spin and find out what’s really being offered.
I’m glad to see the hype cycle has reached its peak. It’s a healthy sign that security practitioners are asking the right questions and demanding to know what constitutes reality.
In order to ask the right questions, let’s start with a correct understanding of the terminology. Despite all the marketing messaging, for many of us it isn’t always clear what some terms may mean.
Artificial Intelligence
AI is often misunderstood and not everyone agrees on its meaning. The term Artificial Intelligence first appeared in the 1950s to describe systems comprising a set of human-defined, if/then decision rules – which have always been easily broken and hard to maintain.
For example, static correlation rules that raise alerts – used in traditional security information and event management (SIEM) – cannot learn and adapt. This results in a high number of false positives. Such AI systems appear to be intelligent in their decision-making because they make decisions. But in reality, they’re 100% predetermined (based on static rules) and are drafted by humans.
But the word ‘intelligence’ has stuck with the public since AI’s introduction. Why not? It sounds cool. Yet today AI is often little more than a catchy marketing label, liberally applied to any system that performs tasks having some semblance of automated decision-making.
Machine Learning
Machine learning is often expressed in the same breath as AI, but Machine Learning is more specific. To learn from collected data, it uses algorithms for prediction, classification and insight generation.
With Machine Learning, a formal body of methods are grounded in solid mathematical foundations. Applied to cybersecurity, the right problems must be matched with the right Machine Learning tools.
But not all problems require advanced Machine Learning tools. For example, some popular indicators used in user behaviour analytics (UBA) are based on simple statistical analysis, such as p-value hypothesis testing used for rare event detection.
On the other hand, many cybersecurity problems cannot be solved without Machine Learning. Consider phishing scam domain detection. In this instance, the URLs, WHOIS data, other properties, as well as the known (legitimate or malicious) labels of URLs are examined in a supervised learning setting to predict whether a domain is malicious. It does so without resorting to conventional, but less effective, blacklist-based matching.
Deep Learning
This is all the rage today. As with AI, Deep Learning evokes an air of sophistication, but it’s also subject to misunderstandings. As a tool within Machine Learning, Deep Learning is highly dependent on matching the right problems to the right tools.
Deep Learning applications are best suited in the image processing and natural language processing fields. In cybersecurity, it has found a home in packet stream and malware binary analysis. These benefit most from supervised learning, when labelled (i.e. legitimate vs. malicious) data is available.
But for insider threat detection, Deep Learning doesn’t enjoy wide adoption for several technical reasons. One is the black box nature of the model, where it’s impossible to explain the causes of the alerts. This renders investigations difficult.
Peer behind the messaging and examine what’s under the hood
The cybersecurity marketplace is buzzing with AI and ML terminology. This isn’t surprising as data-driven approaches do lead to exciting applications that were never possible before. That said, it’s all too easy to get confused and thus, lost in the hype.
It’s important to question how the problems or use cases being framed are and which analytical approaches are being used and why. Transparency and a thorough understanding of the terms and their use cases will help you demystify the hype.