Dangerous Minds: The Art of Guerrilla Data Mining

Dangerous Minds: The Art of Guerrilla Data Mining

It is not a secret that in today's world, information is as valuable or maybe even more valuable that any security tool that we have out there. Information is the key. That is why the US Information Awareness Office's (IAO) motto is "scientia est potential", which means "knowledge is power". The IAO just like the CIA, FBI and others make information their business. Aside from these there are multiple military related projects like TALON,ECHELON, ADVISE, and MATRIX that are concerned with information gathering and analysis.

The goal of the Veritas Project is to model itself in the same general threat intelligence premise as the organization above but primarily based on community sharing approach and using tools, technologies, and techniques that are freely available. Often, concepts that are part of artificial intelligence, data mining, and text mining are thought to be highly complex and difficult. Don't mistake me, these concepts are indeed difficult, but there are tools out there that would facilitate the use of these techniques without having to learn all the concepts and math behind these topics. And as sir Isaac Newton once said, "If I have seen further it is by standing on the shoulders of giants".

The combination of all the techniques presented in this site is what we call "Guerrilla Data Mining". It's supposed to be fast, easy, and accessible to anyone. The techniques provides more emphasis on practicality than theory. For example, these tools and techniques presented can be used to visualize trends (e.g. security trends over time), summarize large and diverse data sets (forums, blogs, irc), find commonalities (e.g. profiles of computer criminals) gather a high level understanding of a topic (e.g. the US economy, military activities), and automatically categorize different topics to assist research (e.g. malware taxonomy).

Aside from the framework and techniques themselves, the Veritas Project hopes to present a number of current ongoing studies that uses "guerilla data mining". Ultimately, our goal is to provide as much information in how each study was done so other people can generate their own studies and share them through the project. The following studies are currently available and will be presented:<blockquote>1. Computer Security Trends - This is the banner study for the Vertias Project. The study uses various clustering, text mining, and visualization techniques to track security trends based on security news and forum data. The idea is to detect different increases and changes in "chatter" on different security topics and determine which topics are related to which. For example, tracking "Credit Cards", would give you associations to Hannaford, Stolen Laptops, and Heartland. This is useful for people security researchers in order to track trends and which events are related to other events. The study is based on more than a years worth of data consisting of thousands of data items.

  1. American Minds - This is a non-security implementation of the Veritas Project. The study aims to assist in understanding what is currently on the minds of people here in the US based on a mixture of text and data mining techniques. This is based on data gathered from different forums such as the Obama Townhall meeting. Current research of the Veritas Project using this data set sentiment analysis which is meant to provide a positive or negative rating which will give an indication of the mood and reaction of the people. This is currently in progress.

  2. Malware Taxonomy - This research is meant to see how Artificial Intelligence techniques would classify malware. This produces a "non-standard" classification since it takes into consideration all aspects of the description given by AV vendors.

  3. "Computer Criminals" - This is a profiling study of people labeled as "computer criminals". The goal of the study is find any commonalities on the actions, backgrounds of these people. This is an ongoing research. Estimated time for completion is May 20009.

  4. Military Intelligence - This is a second phase of the Computer Security Trends. It will be based on the same techniques and presentation of the Computer Security Trends Study but will be based on a different data set. The objective is to track events that are of military in nature. This is an ongoing research. First batch will be available on May 1, 2009.

  5. A Company Profile - This is meant as a proof-of-concept on the marketing and corporate intelligence aspects of text mining. The study was used to find out the web "fingerprint" of a security company to show what aspects of security their current business model is based on.

  6. Conficker - This study was done at the eve of the April 1 Conficker bruhaha. Someone asked us to create a profile on the current news regarding Conficker. This study was fast and dirty, done in less than 15 minutes as it was meant to give just a quick overview of the Conficker worm. This illustrates the speed that this kind of analysis can be done.

  7. IRC Hacker Chatter - This study was inspired by the TV show NCIS. There was a point that lead was investigating a particular terrorist and asked whether INTERPOL detected an increase in chatter concerning that particular name. Thus in the same context, we applied this same concept in detecting potential increase in high value targets and topics in an IRC chat session. As a test, we used an old and fairly popular transcript of several "hackers" chatting to see whether names and associations will appear.</blockquote>

For more details, a preview of these studies can be viewed in our website:

http://www.zerodays.org/veritas/index.php

Presented by