Historically, machine learning for information security has prioritized defense: think intrusion detection systems, malware classification and botnet traffic identification. Offense can benefit from data just as well. Social networks, especially Twitter with its access to extensive personal data, bot-friendly API, colloquial syntax and prevalence of shortened links, are the perfect venues for spreading machine-generated malicious content. We present a recurrent neural network that learns to tweet phishing posts targeting specific users. The model is trained using spear phishing pen-testing data, and in order to make a click-through more likely, it is dynamically seeded with topics extracted from timeline posts of both the target and the users they retweet or follow. We augment the model with clustering to identify high value targets based on their level of social engagement such as their number of followers and retweets, and measure success using click-rates of IP-tracked links. Taken together, these techniques enable the world's first automated end-to-end spear phishing campaign generator for Twitter.
John Seymour is a Data Scientist at ZeroFOX, Inc. by day, and Ph.D. student at University of Maryland, Baltimore County by night. He researches the intersection of machine learning and InfoSec in both roles. He's mostly interested in avoiding and helping others avoid some of the major pitfalls in machine learning, especially in dataset preparation (seriously, do people still use malware datasets from 1998?) He has spoken at both DEF CON and BSides, and aims to add BlackHat USA and SecTor to the list in the near future. Twitter: @_delta_zero
Philip Tully is a Senior Data Scientist at ZeroFOX, a social media security company based in Baltimore. He employs natural language processing and computer vision techniques in order to develop predictive models for combating threats emanating from social media. His pivot into the realm of infosec is recent, but his experience in machine learning and artificial neural networks is not. Rather than learning patterns within text and image data, his previous work focused on learning patterns of spikes in large-scale recurrently connected neural circuit models. He is an all-but-defended computer science PhD student, in the final stages of completing a joint degree at the Royal Institute of Technology (KTH) and the University of Edinburgh. Twitter: @phtully