Detecting Malicious websites using Machine Learning

Detecting Malicious websites using Machine Learning

We have developed a set of techniques to uncover malicious websites that operate under the veil of TLS. An increasing number of websites that disseminate malware are now served over HTTPS using valid SSL certificates (not necessarily self-signed). This makes it increasingly difficult for IPS/IDS and network security tools to decode the payload and thus prevent the propagation of malware. Is all hope lost?

No, we present a set of newly tuned algorithms that can distinguish between malicious and non-malicious websites with a high degree of accuracy using Machine Learning (ML). We use the Bro IDS/IPS tool for training our algorithm using a novel idea that simplifies the training phase significantly. Bro is a very effective and simple tool for analyzing and extracting data from network traffic.

The extracted data is loaded into multiple ML frameworks such as Splunk, AWS ML and we run a series of Machine Learning algorithms to identify those attributes that correlate with malicious sites. The algorithms we used also allow for categorization of certificates used in the delivery and control of malware. Our analysis shows that there are a number of emerging patterns that even allow for identification of high-jacked devices and self-signed certificates. We present the results of our analysis which show which attributes are the most relevant for detecting malicious SSL certificates and as well the performance of the ML algorithms. Finally, we show how well the training has worked in detecting new malicious sources.

This presentation showcases a real-life use of Machine Learning for the detection of malicious TLS websites. Machine Learning is gaining a lot of popularity for analyzing cyber data and these algorithms have a broad applicability to multiple aspects of cybersecurity. Our aim is to galvanize the community to develop more interesting ways of applying big data analytics in cybersecurity.

Presented by