There’s an escalating arms race between bots and the people who protect sites from them. Bots, or web scrapers, can be used to gather valuable data, probe large collections of sites for vulnerabilities, exploit found weaknesses, and are often unfazed by traditional solutions like robots.txt files, Ajax loading, and even CAPTCHAs. I’ll give an overview of both sides of the battle and explain what what really separates the bots from the humans. I’ll also demonstrate and easy new tool that can be used to crack CAPTCHAs with high rates of success, some creative approaches to honeypots, and demonstrate how to scrape many “bot-proof” sites.
Ryan Mitchell is Software Engineer at LinkeDrive in Boston, where she develops their API and data analysis tools. She is a graduate of Olin College of Engineering, and is a masters degree student at Harvard University School of Extension Studies. Prior to joining LinkeDrive, she was a Software Engineer building web scrapers and bots at Abine Inc, and regularly does freelance work, building web scrapers for clients, primarily in the financial and retail industries. Ryan is also the author of two books: “Instant Web Scraping with Java” (Packt Publishing, 2013) and “Web Scraping with Python” (O’Reilly Media, 2015) Twitter: @Kludgist Amazon Author Page: http://www.amazon.com/Ryan-Mitchell/e/B00MQI8TVQ Website: http://ryanemitchell.com