Recent research has shown that many machine learning algorithms are susceptible to misclassification via the construction of adversarial examples. These cleverly crafted inputs are designed to toe the line of classifier decision boundaries, and are typically constructed by slightly perturbing correctly classified samples until the classifier misclassifies it, even though the sample is largely the same. Researchers have published ways to construct these examples with full, some, or no knowledge of the target classifier, and have furthermore shown their applicability to a variety of domains, including in security.
In this talk, we’ll discuss several experiments where we attempted to make Meterpreter – a well-known and well-signatured RAT – into an adversarial example. To do this, we leveraged the open-source gym-malware package, which treats the classifier as a black-box and uses reinforcement learning to train an agent to apply perturbations that result in evasive malware. Deviating from existing work, our approach trained the agent on differently-compiled versions of Meterpreter, as opposed to a large corpus of unrelated malware samples. The results of our experiments were underwhelming, showing little difference between our trained agent and random perturbations. However, further analysis of the results highlight interesting trends and areas for future research.