Expand your corporate data technology and strategy at Transform 2021.
Before we dive into cybersecurity and how the industry is currently using AI, let’s first define the term AI. Artificial Intelligence (AI), as the term is used today, is the overarching concept that encompasses machine learning (supervised, including deep learning and unsupervised) as well as other algorithmic approaches that are more than just statistics. These other algorithms include the areas of Natural Language Processing (NLP), Natural Language Understanding (NLU), Reinforcement Learning, and Knowledge Representation. These are the most relevant approaches in cybersecurity.
Given this definition, how far along are cybersecurity products in terms of deploying AI and ML?
I see more and more cybersecurity companies using ML and AI in some way. The question is to what extent. I’ve already written about the dangers of algorithms. It has become too easy for any software engineer to play a data scientist. It’s as simple as downloading a library and calling the .start () function. The challenge lies in the fact that the engineer often has no idea what has just happened in the algorithm and how to use it correctly. Does the algorithm work with data that are not normally distributed? How about normalizing the data before entering it into the algorithm? How are the results to be interpreted? I gave a talk at BlackHat where I showed what happens when we don’t know what an algorithm is doing.
So the mere fact that a company is using AI or ML in their product isn’t a good indicator that the product is actually doing something smart. On the contrary, most of the companies I’ve looked at that claim to use AI for some core functionality are doing it “wrong” in some way. To be fair, there are some companies that stick to the right principles, hire real data scientists, apply algorithms correctly, and interpret the data correctly.
How AI is used in security
In general, I see the right application of AI in the supervised machine learning camp, where a lot of flagged data is available: malware detection (detection of harmless malware binaries), malware classification (assigning malware to a malware family), document and Website classification, document analysis and natural language understanding for phishing and BEC detection. There is some early but promising work on analyzing graphs (or social networks) for communication analysis. However, you need a lot of data and contextual information that is not easy to get your hands on. Then there are some companies that use belief networks to model expert knowledge, such as for event logging or insider threat detection. But unfortunately, these companies are a dozen.
That leads us to the next question: What are the top use cases for AI in security?
Personally, I am excited about a few areas that I think are quite promising for advancing cybersecurity efforts:
- Using NLP and NLU to understand people’s email habits, then identify malicious activity (BEC, phishing, etc.). We initially tried to do sentiment analysis of messaging data, but it quickly became clear that we should leave this to analyzing Tweets for brand sentiment and avoiding human (or phishing) behavioral assessments. It’s a little early for that. But there has been some success with subject modeling, token classification of things like account numbers, and even the use of language.
- Use chart analytics to map data movement and data origins to understand when exfiltration or malicious data changes occur. This topic has not been well researched and I am not yet aware of any company or product that does this well. It’s a difficult problem on many levels, from data collection to deduplication and interpretation. But that also makes this research interesting.
Given the above, it doesn’t look like we’ve made much progress on AI for security. Why this? I would attribute it to a couple of things:
- Access to training data. We have to test and validate every hypothesis we make. It’s hard without data. We need complex data sets that show user interactions across applications, their data and cloud apps along with contextual information about the users and their data. This type of data is hard to come by, especially given privacy concerns and regulations like the GDPR, which are digging deeper into the processes surrounding research.
- There is a lack of engineers who understand data science and security. We need security professionals with a lot of experience to work on these issues. When I say security professionals, they’re people who have a deep understanding (and hands-on experience) of operating systems and applications, networks and cloud infrastructures. You are unlikely to find these experts who also have data science knowledge. It helps to pair them with data scientists, but a lot is lost in their communication.
- Research grants. There are few companies that do real security research. Take a bigger security company. You may be doing malware research, but how many of them have real data science teams researching new approaches? Microsoft has some great researchers working on relevant issues. Bank of America makes an effort to fund colleges to work on urgent problems for them. But this work generally does not see the light of day in your off-the-shelf security products. In general, security vendors do not invest in research that is not directly related to their products. And when they do, they want to see turnarounds pretty quick. Startups can fill the gaps here. Your challenge is to make your approaches scalable. That means not only scaling to a lot of data, but also being relevant in a variety of customer environments with dozens of different processes, applications, usage patterns, and so on. This closes the circle with the data problem. You need data from a wide variety of environments to hypothesize and test your approaches.
Is there anything the security buyer should be doing differently to motivate security vendors to do the AI better?
I don’t think the security buyer is to blame for anything. The buyer shouldn’t need to know anything about how security products work. The products should do what they claim to do and do it well. I think that’s one of the deadly sins of the security industry: building overly complex products. As Ron Rivest said on a panel the other day, “Complexity is the enemy of security.”
Raffael Marty is a technology manager, entrepreneur and investor and writes about artificial intelligence, big data and the product landscape surrounding the cybersecurity market.
This story originally appeared on Raffy.ch. Copyright 2021
VentureBeat’s mission is to be a digital marketplace for tech decision makers to gain knowledge of transformative technologies and transactions. Our website provides essential information on data technologies and strategies to help you run your organization. We invite you to become a member of our community to gain access:
- current information on the topics of interest to you
- our newsletters
- closed thought leadership content and discounted access to our award-winning events such as Transform 2021: Learn more
- Network functions and more
become a member