Everybody is aware of the buzzword BINGO wining square of "Machine Learning", but how can we apply this to a real problem? More importantly what output can we drive from doing some analysis! This talk will cover clustering (unlabeled data) of file types based off various static features. Then, using information from the clusters, is it possible to automatically generate Yara signatures to go hunting for files that are similar? We believe so, and we'll show you how you can do this at home.