Stretching the Sandbox with Malware Feature Vectors

DerbyCon V - Unity

Presented by: Mike Schladt
Date: Sunday September 27, 2015
Time: 12:30 - 13:20
Location: Track 1
Track: Break Me

Love ‘em or hate ‘em, the malware sandbox has evolved to become a staple for incident responders and researchers alike. In true hacker fashion, many of us have stockpiled thousands of HTML and JSON reports, squirreled away for that rainy day when something sparks a memory of that one incident with that one sample where that one thing occurred. Tragically, that day simply never comes for most malware reports. This presentation discusses one technique for giving new life to dynamically generated malware observables. Specifically, it focuses on putting sandbox reports to work with feature vector clustering. Feature vectors have long been utilized in the mathematical community to facilitate machine learning and pattern recognition. By applying similar concepts to dynamically generated observables, it is possible to visualize the relational proximity of malware samples. The key to this process lies in constructing statistically significant feature sets. This presentation details the methodology for turning predominately text-based reports into a series of meaningful quantitative data points. Lastly, a process for evaluating individual feature effectiveness is explored through the application of real-world data. In a concerted effort to achieve tangible operational benefits from this exercise, an open source reporting module for Cuckoo Sandbox will be released for generating the presented feature vectors as well as code for visualizing sample proximity.

Mike Schladt


KhanFu - Mobile schedules for INFOSEC conferences.
Mobile interface | Alternate Formats