De-anonymizing Programmers from Source Code and Binaries

DEF CON 26

Presented by: Dr. Aylin Caliskan, Rachel Greenstadt
Date: Friday August 10, 2018
Time: 10:00 - 10:45
Location: Track 2

Many hackers like to contribute code, binaries, and exploits under pseudonyms, but how anonymous are these contributions really? In this talk, we will discuss our work on programmer de-anonymization from the standpoint of machine learning. We will show how abstract syntax trees contain stylistic fingerprints and how these can be used to potentially identify programmers from code and binaries. We perform programmer de-anonymization using both obfuscated binaries, and real-world code found in single-author GitHub repositories and the leaked Nulled.IO hacker forum.

Rachel Greenstadt

Dr. Rachel Greenstadt (PI) is an Associate Professor of Computer Science at Drexel University where she teaches graduate-level courses in computer security, privacy, and machine learning. She founded the Privacy, Security, and Automation Laboratory at Drexel University in 2008. Dr. Greenstadt was among the first to explore the effect of adversarial attacks on stylometric methods, and the first to demonstrate empirically how stylometric methods can fail in adversarial settings while succeeding in non-adversarial settings. She has a history of speaking at hacker conferences including DEF CON 14, ShmooCon 2009, 31C3, and 32C3. Dr. Greenstadt's scholarship has been recognized by the privacy research community. She is an alum of the DARPA Computer Science Study Group and a recipient of the NSF CAREER Award. Her work has received the PET Award for Outstanding Research in Privacy Enhancing Technologies and the Andreas Pfitzmann Best Student Paper Award. She currently serves as co-editor-in-chief of the journal Proceedings on Privacy Enhancing Technologies (PoPETs). Her research has been featured in the New York Times, the New Republic, Der Spiegel, and other local and international media outlets. @ragreens

Dr. Aylin Caliskan

Aylin Caliskan is an assistant professor of computer science at George Washington University. Her research interests include the emerging science of bias in machine learning, fairness in artificial intelligence, data privacy, and security. Her work aims to characterize and quantify aspects of natural and artificial intelligence using a multitude of machine learning and language processing techniques. In her recent publication in Science, she demonstrated how semantics derived from language corpora contain human-like biases. In addition, she developed novel privacy attacks to de-anonymize programmers using code stylometry. Her presentations on both de-anonymization and bias in machine learning are the recipients of best talk awards. Her work on semi-automated anonymization of writing style furthermore received the Privacy Enhancing Technologies Symposium Best Paper Award. Her research has received extensive press coverage across the globe. Aylin holds a PhD in Computer Science from Drexel University and a Master of Science in Robotics from the University of Pennsylvania. She has previously spoken at 29C3, 31C3, 32C3, and 33C3. @aylin_cim


KhanFu - Mobile schedules for INFOSEC conferences.
Mobile interface | Alternate Formats