Screen Scraper Tricks: Extracting Data from Difficult Websites

DEF CON 17

Presented by: Antonio Rucci
Date: Sunday August 02, 2009
Time: 14:00 - 14:50
Location: Track 3
Track: Track 3

Screen scrapers and data mining bots often encounter problems when extracting data from modern websites. Obstacles like AJAX discourage many bot writers from completing screen scraping projects. The good news is that you can overcome most challenges if you learn a few tricks.

This session describes the (sometimes mind numbing) roadblocks that can come between you and your ability to apply a screen scraper to a website. You'll discover simple techniques for extracting data from websites that freely employ DHTML, AJAX, complex cookie management as well as other techniques. Additionally, you will also learn how "agencies" create large scale CAPTCHA solutions.

All the tools discussed in this talk are available for free, offer complete customization and run on multiple platforms.

Michael Schrenk

<strong>Michael Schrenk</strong> is a webbot developer and the author of &quot;Webbots, Spiders, and Screen Scrapers&quot; (2007, No Starch Press). He has also written for ComputerWorld, php|architect and Web Techniques magazines. Mike also gave presentations at DEF CON X, XI and XV. He works for a wide range of clients across North American as well as in Russia, Spain and The Netherlands. Stop by www.schrenk.com and say hello.


KhanFu - Mobile schedules for INFOSEC conferences.
Mobile interface | Alternate Formats