is working on better ways to design full systems, including the
processors, specialized hardware, system software, and the
application software itself. We are currently attacking that
very large problem space through domain-specific languages to
describe systems efficiently and quickly and the ability to
transform system descriptions written in those domain-specific
languages into both very fast, RTL cycle-accurate capable
simulators of the performance and power consumption of computer
systems and the full implementations of those systems.
initially focused on FPGA-Accelerated
Simulation Technologies (FAST), a methodology to build
extremely fast, cycle-accurate full system simulators that run
real applications on top of real operating systems.
The methodology uses a split functional (e.g., ISA) / timing
(e.g., micro-architecture) description that is simpler and more
reusable than integrated approaches.
unicore version is able to run the x86 ISA, boot unmodified
Windows XP , Linux 2.4 and Linux 2.6 and run unmodified
applications on top of those operating systems at simulation
speeds in the 1.2MIPS range (between 100 and 1000 times faster
than Intel's and AMD's cycle-accurate simulators), which is fast
enough to type into Microsoft Word (click
here to see a real-time video of us doing exactly that.)
We are close to completing a multicore version will support 64
cores with coherent caches that will be significantly faster.
addition to modeling performance, we are studying how to better
model the power consumption of computer systems. We are able
to estimate the power of commercial out-of-order designs with 8%
cycle-by-cycle RMS compared to the best power simulators and
models internal to ARM and Freescale. Our models run at
10MHz in an FPGA, while standard tools run 5 to 6 orders of
magnitude slower. Our FPL 2010 paper
describes a snapshot of that work. We are currently working
with Intel to further improve our power modeling methodology using
HSPICE output as the reference, rather than standard power
past, I was part of the Research
for Multiple Processors (RAMP) project that researched
methods to model 1000 core systems using FPGAs.
been making progress on combining a split functional/timing
description of a computer system to the full implementation.
Our DAC 2011 paper
describes a snapshot of that work.
been working on domain-specific languages that can dramatically
reduce system and hardware design times. In some sense, the
split functional/timing description is the structure of our domain
specific language. Our submitted FPGA 2012 paper describes
our domain-specific language for network processing applications
that generates 100Gbps network processor for a single FPGA from a
networking-specific language based on C. The power
consumption of that part is comparable or better than any ASIC
based network processor we know of, and out performs 32 Intel
Nahelem cores doing the same task.
research is supported by a Department
of Energy Early Faculty Career Award, the National Science Foundation
including a National Science Foundation CAREER Award, SRC, Intel, Xilinx, IBM
Faculty Awards, Freescale,
AMD, and Altera.
Email: derek at ece period utexas period
Office: ENS Building, room 540
Snail mail: The University of Texas at Austin, 2501 Speedway, ENS Building, room 540, C0803, Austin, TX 78712