Wed 08/07 10:00 Security data science -- Getting the fundamentals right

Security data science -- Getting the fundamentals right

BSidesLV 2019

Presented by: Richard Harang
Date: Wednesday August 07, 2019
Time: 10:00 - 10:55
Location: Ground Truth

A data science team is now table stakes for most security operations, however data science for security poses unique challenges that are different from both traditional data science as well as traditional security. Rather than clean data sets with reliable ground truth labels, obvious metrics, and clear featurization strategies, security data sets tend to be messy, ambiguous, and noisy, with metrics that can be difficult to operationalize, and require significant expert knowledge build good features.

In this self-contained and broadly accessible talk, drawing from real-world experience leading basic research in a global anti-malware/security company, we’ll cover everything but the modeling bit of security data science, and give attendees a roadmap for how to maximize their effectiveness when starting their own security data science teams and/or projects. From how to collect, clean, and label security-relevant data, how to approach feature construction and extraction, organizing and managing reproducible experiments, to finally addressing how to manage evaluation both for head-to-head comparison of candidate models as well as mapping model metrics to business outcomes, we’ll cover the major pitfalls in both doing security data science with an experienced team as well as the areas that ‘traditional’ data scientists often have trouble with.

Richard Harang

Richard Harang is a Director of Data Science Research at Sophos with over eight years of research experience at the intersection of computer security, machine learning, and privacy. Prior to joining Sophos, he served as a scientist at the U.S. Army Research Laboratory, where he led the research group investigating the applications of machine learning and statistical analysis to problems in network security. He received his PhD in Statistics from the University of California, Santa Barbara. Research interests include randomized methods in machine learning, adversarial machine learning, and ways to use machine learning to support human analysis. By day he uses bad guys to catch math. By night he teaches killer robots to protect his garden from squirrels