From "Shotgun Parsers" to Better Software Stacks

Everyone agrees that aggressive input checking and validation of input-handling code are crucial to secure programming. Yet vulnerabilities still abound, and exploitation still defies all kinds of protective measures (e.g., DEP, ASLR, EMET, etc.). Suppressed in some system layers and environments, exploitation quickly resurfaces in others as even more versatile; this means that we still don't know how to properly design a software stack that safely handles data on several layers of abstraction.

Any code that transforms data has to make some assumptions about what it receives; it's up to some other code to recognize if the data is as it expects. The sole purpose of this recognizer is to protect subsequent innocent code from being lured into memory corruption or from otherwise aiding and abetting pwnage.

Sadly, a lot of actual input handling code is a mixture of data processing and recognition, scattered throughout a codebase. Its "sanity checking" is neither strong enough to verify all the implicit assumptions, nor written with these assumptions in mind. We call such input handling code "shotgun parsers" and argue that it's the number 1 reason for the ubiquitous insecurity of programs facing the internet.

In this talk, we will discuss examples of shotgun parsers across the layers of a TCP/IP stack and well-attested exploits for them, drawn from the pages of Phrack and other sources. We'll discuss the kind of software engineering principles that could have prevented them, and talk about the engineering methods that we believe will lead away from the "shotgun parsers", towards software stacks that can finally be trusted to safely process inputs.

Our previous talks (see http://langsec.org/) concentrated on theory; in this talk, we take the practical software engineering view.

Presented by