Machine learning-based (ML) techniques for network intrusion detection have gained notable traction in the web security industry over the past decade. Some Intrusion Detection Systems (IDS) successfully used these techniques to detect and deflect network intrusions before they could cause significant harm to network services. Simply put, IDS systems construct a signature model of how normal traffic looks, using data retrieved from web access logs as input. Then, an online processing system is put in place to maintain a model of how expected network traffic looks like, and/or how malicious traffic looks like. When traffic that is deviant from the expected model exceeds the defined threshold, the IDS flags it as malicious. The theory behind it was that the more data the system sees, the more accurate the model would become. This provides a flexible system for traffic analysis, seemingly perfect for the constantly evolving and growing web traffic patterns. However, this fairytale did not last for long. It was soon found that the attackers had been avoiding detection by ‘poisoning’ the classifier models used by these PCA systems. [1] The adversaries slowly train the detection model by sending large volumes of seemingly benign web traffic to make the classification model more tolerant to outliers and actual malicious attempts. They succeeded. In this talk, we will do a live demo of this ‘model-poisoning’ attack and analyze methods that have been proposed to decrease the susceptibility of ML-based network anomaly detection systems from being manipulated by attackers. [2] Instead of diving into the ML theory behind this, we will emphasize on examples of these systems working in the real world, the attacks that render them impotent, and how it affects developers looking to protect themselves from network intrusion. Most importantly, we will look towards the future of ML-based network intrusion detection.
Clarence recently graduated with a B.S. and M.S. in Computer Science from Stanford University, specializing in data mining and artificial intelligence. He currently works at Shape Security, a startup in Silicon Valley building a product that protects its customers from malicious bot intrusion. At Shape, he works on the system that tackles this problem from the angle of big data analysis. Clarence is a community speaker with Intel, traveling around the USA speaking about topics related to the Internet of Things and hardware hacking. He is also the organizer of the “Data Mining for Cyber Security” meetup group in the SF Bay Area. Clarence is based in Mountain View, California.