With the astonishing rate of new and modified malware samples being released daily, automation of analysis is needed to classify and cluster together similar samples, exclude basic and uninteresting variations, and focus costly manual analysis work on novel and interesting features (e.g., added or remove pieces of code with a given semantic). We will discuss the challenges in analyzing large malware datasets in a (semi)automatic fashion, and look at some recent research results that may help with the task, by leveraging the concept of “behavior” applied to malicious code.