From datacenters and mobile devices to aircraft, automobiles, and banks, many systems require new ways of tightly constraining the flow of information such that untrusted and malicious devices can never affect critical system tasks, or that cryptographic secrets can never leak. To create such systems, information flow controls must be incorporated from the ground up, including at hardware design time, in order to provide a formal basis for a system's root of trust.
My thesis demonstrates a new method for constructing and analyzing architectures capable of tracking all information flow within the machine, including timing and covert channels. The key to such an approach is a novel gate-level information flow tracking method (GLIFT) which provides a way to compose complex logical structures with well defined information flow properties. Starting from a simple NAND gate, we create more complex structures including muxes, control, registers, and finally a microprocessor.
This Execution Lease processor provides programmers (i.e. system software) with explicit control over all information flows, and I am currently working on implementing a micro-kernel that uses our proposed ISA to construct information-tight partitions. Finally, by formalizing the GLIFT technique as a static analysis tool, we can prove that the kernel and hardware together enforce a security policy for all possible user-level processes and external inputs. A short summary of this work is available as an IEEE Micro Top Pick paper.
I have recently started to collaborate with Xun Li and Ben Hardekopf to formalize common secure-hardware design patterns in the form of a type system for a hardware description language. We propose Caisson -- a finite state-machine based language -- using which we are able to design complex processors that maintain strict information flow control even in the presence of pipelining, caches, and other micro-architectural features. Our initial work has been presented in PLAS 2010 workshop, and a more complete version is currently under submission.
I am working with Jonny Valamehr to examine the use of novel technologies such as 3D-Integration to add introspection hardware in the last stages of processor fabrication. This 3-D Integration decouples the function and economics of security policy enforcement from the underlying computing hardware. As a result, security enhancements are manufacturing options applicable only to those systems that require them. In our ACSAC paper, we present a set of circuit level primitives that can be used by the 3D control plane to manipulate signals on the base computation plane.
As programmers are asked to manage more complicated parallel machines, it is likely that they will become increasingly dependent on tools such as multi-threaded data race detectors, memory bounds checkers, dynamic dataflow trackers, and various performance profilers to understand and maintain their software. Rather than performing all the instrumentation and analysis on the main processor, we explore the idea of using the increasingly high-throughput board level interconnect available on many systems to offload analysis to a parallel off-chip accelerator.
We have developed Hardgrind, a prototype system that maps a DMA based analysis engine, sitting on a PCI-mounted FPGA, into the Valgrind instrumentation framework. Using this prototype we characterize the potential of such a system to both accelerate existing software development tools and enable a new class of heavyweight dynamic analysis. While many issues still remain with such an approach, we demonstrate that program analysis speedups of 29% to 440% could be achieved today with strictly off-the-shelf components on some of the state-of-the-art tools, and we carefully quantify the bottlenecks to illuminate several new opportunities for further architectural innovation.
Many dynamic analysis schemes rely on the ability to associate metadata (or ``tags'') with all of virtual or physical memory. If one wishes to store large 32-bit tags, multiple tags per data element, or tags at the granularity of bytes rather than words, then directly storing one tag on chip to cover one byte or word (in a cache or otherwise) can be an expensive proposition. We show that dataflow tags naturally exhibit a very high degree of spatial-value locality, an observation we can exploit by storing metadata on ranges of addresses (which cover a non-aligned contiguous span of memory) rather than on individual elements. In fact, a small 128 entry on-chip range cache (with area equivalent to 4KB of SRAM) hits more than 98% of the time on average.
The key to this approach is our proposed method by which ranges of tags are kept in cache in an optimally RLE-compressed form, queried at high speed, swapped in and out with secondary memory storage, and (most important for dataflow tracking) rapidly stitched together into the largest possible ranges as new tags are written on every store, all the while correctly handling the cases of unaligned and overlapping ranges. We examine the effectiveness of this approach by simulating its use in definedness tracking (covering both the stack and the heap), in tracking network-derived dataflow through a multi-language web application, and through a synthesizable prototype on an FPGA.
One summer at NEC Labs, I worked with Hari Cadambi on scaling MapReduce for many-core architectures. Back in college at IIT Guwahati, I interned at the EDA Group in Torino (Italy) with Professors Enrico Macii and Luca Benini, and the VLSI-ASIC group at CDAC, Trivandrum, designing hardware for fun and profit.