CROWDSOURCE: AN OPEN SOURCE, CROWD TRAINED MACHINE LEARNING MODEL FOR MALWARE CAPABILITY DETECTION

Black Hat USA 2013

Presented by: Joshua Saxe
Date: Wednesday July 31, 2013
Time: 10:45 - 11:15
Location: Palace 2

Due to the exploding number of unique malware binaries on the Internet and the slow process required for manually analyzing these binaries, security practitioners today have only limited visibility into the functionality implemented by the global population of malware. To date little work has been focused explicitly on quickly and automatically detecting the broad range of high level malware functionality such as the ability of malware to take screenshots, communicate via IRC, or surreptitiously operate users’ webcams.

To address this gap, we debut CrowdSource, an open source machine learning based reverse engineering tool. CrowdSource approaches the problem of malware capability identification in a novel way, by training a malware capability detection engine on millions of technical documents from the web. Our intuition for this approach is that malware reverse engineers already rely heavily on the web “crowd” (performing web searches to discover the purpose of obscure function calls and byte strings, for example), so automated approaches, using the tools of machine learning, should also take advantage of this rich and as of yet untapped data source.

As a novel malware capability detection approach, CrowdSource does the following:

CrowdSource is funded under the DARPA Cyber Fast Track initiative, is being developed by the machine learning and malware analysis group at Invincea Labs and is scheduled for beta, open source release to the security community this October. In this presentation we will give complete details on our algorithm for CrowdSource as it stands, including compelling results that demonstrate that CrowdSource can already rapidly reverse engineer a variety of currently active malware variants.

Joshua Saxe

Josh Saxe is a lead research engineer at Invincea Labs, where he serves as technical lead on the DARPA Cyber Genome program, seeking to produce automated systems that discover, analyze and visualize evolutionary relationships between malicious software artifacts. Josh also serves as technical lead on a DARPA Cyber Fast Track effort dubbed "CrowdSource," on which he leads the development of algorithms for rapidly and automatically characterizing novel malware binaries' functionality using crowdsourced, machine learning-based methods.


KhanFu - Mobile schedules for INFOSEC conferences.
Mobile interface | Alternate Formats