The ability to detect automated behavior within cyber relevant log data is a useful tool for the network defender, as malicious activity executed by scripts or bots is likely to leave behind identifiable traces in logs. This paper presents a methodology for detecting certain types of automated activity within logs based on matching observed temporal patterns. This methodology is scalable, overcoming the infeasibility of brute force methods to identify groups of nearest neighbors in large datasets by implementing a locality sensitive hashing algorithm. This coordination detection method- ology applied to cyber relevant log data can be used to develop features for input into further analysis such as anomaly detection to flag potentially malicious activity or unsupervised clustering to char- acterize classes of automated behavior. Alternatively, the methodology could be used as a means to fuse together disparate data sources by generating a ‘temporal signature’ key and allowing for fuzzy matching on this key. Examples of each type of application are presented using a dataset of billions of records of netflow data.
Dr. Lauren Deason is a data scientist at PUNCH Cyber Analytics Group and has been working for over two years DARPA’s Network Defense program developing algorithms to automatically flag suspicious activity based on various cyber relevant logs. Prior to becoming a data scientist, she worked for over a decade as an International Trade Economist and a Math Instructor. She holds a PhD in Economics from University of Maryland, College Park, an MA in Mathematics from University of California, Berkeley, and a BS in Applied Mathematics from University of Virginia.