Java and Flash are and will continue to be popular attack vectors. To combat this, we’ll put these two file formats under the microscope and throw some data science at them. For each file format, we will take a quick look at its layout and then explore some of the file features. Then using a malicious and clean file set, we will walk through the process we took to identify important features and show the results of from several different machine learning algorithms when built from these feature sets. We’ll use several open source tools and libraries to perform the data exploration and analysis, including pandas, scikit-learn as well as the data hacking library we’ve already released. IPython notebooks containing the analysis will be released at the start of the talk.
David has been in the security field for over 10 years now. He enjoys static file analysis and tearing apart shellcode. He's starting to add various data analysis techniques to this toolbox when before he would only rely on hex editors, debuggers, and disassemblers. He dislikes wearing pants and has a strong antisock agenda.