Time Signature Based Matching for Data Fusion and Coordination Detection in Cyber Relevant Logs

The ability to detect automated behavior within cyber relevant log data is a useful tool for the network defender, as malicious activity executed by scripts or bots is likely to leave behind identifiable traces in logs. This paper presents a methodology for detecting certain types of automated activity within logs based on matching observed temporal patterns. This methodology is scalable, overcoming the infeasibility of brute force methods to identify groups of nearest neighbors in large datasets by implementing a locality sensitive hashing algorithm. This coordination detection method- ology applied to cyber relevant log data can be used to develop features for input into further analysis such as anomaly detection to flag potentially malicious activity or unsupervised clustering to char- acterize classes of automated behavior. Alternatively, the methodology could be used as a means to fuse together disparate data sources by generating a ‘temporal signature’ key and allowing for fuzzy matching on this key. Examples of each type of application are presented using a dataset of billions of records of netflow data.

Presented by