Over the past year several researchers gave talks proposing the use of embedding models for measuring code similarity. At least four different embedding models are currently available, however there has not been an evaluation of these methods. In this talk I will provide an overview of the four methods, as well as a comparison of their computational complexity. I will also attempt to measure how well these embeddings encode interesting information by comparing the similarity of the embeddings generated between systems.
Rob is a researcher on the Advanced Threat Hunting team in Booz Allen Hamilton Dark Labs. He has a PhD in computer science from the University of Maryland, Baltimore County and multiple years of experience in the computer security field. His research interests include semantic modeling of computer programs, reverse engineering, and cataloging the inevitable failure of human efforts to build well-engineered systems.