We've already brought our malicious data collection skills to an art level, but in order to get good verdicts (most importantly - low FP rate) our benign (or White) data must enjoy the same level of confidence as the malicious (or Black) data. When dealing with Machine Learning algorithms, the certainty of the White data is taken for granted, but reality shows that it's a less-than-simple challenge. In this talk, we will focus on the collection of White data: Where do we get it from, and how do we collect it?
The talk is based on research we performed in the past year, during which we developed a methodology for the collection and creation of such repositories of clean data. We will share this methodology with the audience.
With both a BSc and an MSc in Computer Science, accompanied by a career performing R&D for the IDF and the industry, Irena is a security and intelligence researcher with a disturbing affection to "Hello Kitty". When she is not watching cartoons she is running the Threat Intelligence team at Check Point, performing innovative Malware research and developing infrastructure for better detection and techniques for research.