Making & Breaking Machine Learning Anomaly Detectors in Real Life

Machine learning-based (ML) techniques for network intrusion detection have gained notable traction in the web security industry over the past decade. Some Intrusion Detection Systems (IDS) successfully used these techniques to detect and deflect network intrusions before they could cause significant harm to network services. Simply put, IDS systems construct a signature model of how normal traffic looks, using data retrieved from web access logs as input. Then, an online processing system is put in place to maintain a model of how expected network traffic looks like, and/or how malicious traffic looks like. When traffic that is deviant from the expected model exceeds the defined threshold, the IDS flags it as malicious. The theory behind it was that the more data the system sees, the more accurate the model would become. This provides a flexible system for traffic analysis, seemingly perfect for the constantly evolving and growing web traffic patterns. However, this fairytale did not last for long. It was soon found that the attackers had been avoiding detection by ‘poisoning’ the classifier models used by these PCA systems. [1] The adversaries slowly train the detection model by sending large volumes of seemingly benign web traffic to make the classification model more tolerant to outliers and actual malicious attempts. They succeeded. In this talk, we will do a live demo of this ‘model-poisoning’ attack and analyze methods that have been proposed to decrease the susceptibility of ML-based network anomaly detection systems from being manipulated by attackers. [2] Instead of diving into the ML theory behind this, we will emphasize on examples of these systems working in the real world, the attacks that render them impotent, and how it affects developers looking to protect themselves from network intrusion. Most importantly, we will look towards the future of ML-based network intrusion detection.

Presented by

Links