The Third Annual IEEE Workshop on Workload Characterization
In conjunction with ICCD 2000:
The
Second Annual Workshop on Hardware Support for Objects and Microarchitectures
for Java
9:00-9:15
Opening Remarks
Craig Chase, ICCD 2000 General Chair, The University of Texas at Austin
Sandip Kundu, ICCD 2000 Program Chair, Intel Corporation
9:15-10:00
Keynote Address: On the road to a mobile information society
Dirk FriebelEveryone is becoming increasingly mobile. The wide adoption of technologies such as mobile phones, laptop computers and personal digital assistants are good examples of this. We expect immediate access to information from a wide variety of online sources. These market dynamics are driving us towards a future where users will be able to handle all types of information and services.
Nokia GmbH - Nokia Research Center
Mobile Information society is:
·
Mobility and the Internet are the drivers
·
Work is no longer a place, it is wherever you need to be
·
Mobile access to information and services solves real business problems
and will improve the quality of life
·
Solutions which enable mobility are available today
·
Nokia has the global presence and the key core competencies in mobility
and in the enabling technologies required for mobility
·
Nokia will lead the way to the Mobile Information Society "And putting
the Internet in everyone?s pocket"
10:30-12:00
Session
1.1: New Architectures
Session Chair: Mauricio Breternitz,
Motorola
This session is about architectures for new paradigms. The first paper illustrates the performance degradation of secure information processing and its mitigation with microarchitecture techniques. The second describes novel instruction-set architecture for accelerating both secure and multimedia information processing. The third paper describes an efficient simulator generator for new instruction-set architecture.
1.1.1. Architectural Impact of Secure Socket Layer on Internet Servers
K. Kant and R. Iyer, Intel, USA1.1.2. Fast Subword Permutation Instructions Using Omega and Flip Network Stages
P. Mohapatra, Michigan State University, USA
Xiao Yang and Ruby B. Lee, Princeton University, USA1.1.3. Sleipnir - An Instruction Level Simulator Generator
Tor Jeremiassen, Lucent Technologies, USA
Session
1.2: Fault-Simulation and ATPG at Different Design Levels
Session Chair: Nur Touba, The University
of Texas at Austin
The first paper of this session proposes a new algorithm for dynamic fault grouping and simulation scheduling for transient fault simulation of analog circuits. In the second paper, a novel approach is presented for computing feedback polynomials and seeds of an LFSR such that a pseudo-exhaustive test is obtained. Finally, in the last paper a functional level test pattern generator combining genetic algorithms and symbolic methods is described.
1.2.1. Analog Transient Concurrent Fault Simulation with Dynamic Fault Grouping
Junwei Hou and Abhijit Chatterjee, Georgia Institute of Technology, USA1.2.2. Pseudoexhaustive TPG with a Provably Low Number of LFSR Seeds
Dimitri Kagaris and Spyros Tragoudas, Southern Illinois University, USA1.2.3. An Application of Genetic Algorithms and BDDs to Functional Testing
Fabrizio Ferrandi and Donatella Sciuto, Politecnico di Milano, Italy
Alessandro Fin and Franco Fummi, Universita? di Verona, Italy
Session
1.3: Advanced Design Techniques
Session Chair: Ken Shepard, Columbia
University
In this session, we consider design and analysis techniques for advanced digital integrated circuits. The first paper describes skewed-static logic for low-power and high-performance. The second paper described an analysis framework for estimating power-supply integrity, while the last paper considers a local clocking mechanism based on a tunable delay line that calibrates itself form a low frequency global clock.
1.3.1. High-Performance, Low-Power Skewed Static Logic in Very Deep Submicron Technology
Chulwoo Kim, Jaesik Lee, Kwang-Hyun Baek, Eric Martina, and Sung-Mo Kang, University of Illinois at Urbana-Champaign, USA1.3.2. Estimation of Inductive and Resistive Switching Noise on Power Supply Network in Deep Sub-micron CMOS Circuits
Shiyou Zhao, Kaushik Roy, and Chengkok Koh, Purdue University, USA1.3.3. Self-Calibrating Clocks for Globally Asynchronous Locally Synchronous Systems
S.W. Moore, G.S. Taylor, P.A. Cunningham, R.D. Mullins, and P. Robinson, University of Cambridge, United Kingdom
12:00-1:00
Lunch
Sponsored by Intel
1:00-3:00
Session
2.1: Improving CPU Performance
Session Chair: Brian Grayson, Motorola
This session has four papers that show how to optimize CPU performance by improving interactions with the memory system. The techniques presented include a space-efficient value prediction strategy, followed by a dynamic program-flow-driven data placement mechanism and improved instruction cache. The third paper presents an architectural extension to support high performance heap data allocation. The final paper, presents an analysis of efficient ways to use a software controlled primary cache system for scientific computations.
2.1.1. Hybridizing and Coalescing Load Value Predictors
Martin Burtscher, University of Colorado Boulder USA2.1.2. A 2-way Thrashing-Avoidance Cache (TAC): An Efficient Instruction Cache Scheme for Object-Oriented Languages
Benjamin G. Zorn, Microsoft Corporation, USA
Yul Chu and M. R. Ito, University of British Columbia, Canada2.1.3. Architectural Support for Dynamic Memory Management
J. Morris Chang, Witawas Srisa-an and Chia-Tien Dan Lo, Illinois Institute of Technology, USA2.1.4. SCIMA: Software Controlled Integrated Memory Architecture for High Performance Computing
Masaaki Kondo, Hideki Okawara, and Hiroshi Nakamura, The University of Tokyo, Japan
Taisuke Boku, University of Tsukuba, Japan
Session
2.2: Parasitic Modeling, Analysis and Optimization
Session Chair: Tom Dillinger, Sun
Microsystems
This session begins with an exposition of new developments in the area of signal integrity. The first paper uses the idea of timing windows to perform static timing analysis in the presence of crosstalk. Next, a discussion on the design of output drivers under ground bounce considerations is presented. The last two papers are related to parasitic modeling: the third paper in the session builds a simple yet elegant model for spiral inductors, while the last presents a comparative analysis of parallel implementations of several capacitance extraction algorithms.
2.2.1. Worst Delay Estimation in Crosstalk Aware Static Timing Analysis
Tong Xiao and Malgorzata Marek-Sadowska, University of California, Santa Barbara, USA2.2.2. Analysis and Optimization of Ground Bounce in Digital CMOS Circuits
Payam Heydari and Massoud Pedram, University of Southern California, USA2.2.3. An Efficient and Accurate Model for RF/Microwave Spiral Inductors Using Microstrip Lines Theory
Nasser Masoumi, S. Safavi-Naeini, and Mohamed I. Elmasty, University of Waterloo, Canada2.2.4. Comparative Study of Parallel Algorithms for 3-D Capacitance Extraction on Distributed Memory Multiprocessors
Yanhong Yuan and Prithviraj Banerjee, Northwestern University, USA
Session
2.3: Low Power and Arithmetic
Session Chair: Margarida Jacome,
The University of Texas at Austin
This session presents new techniques for low power microprocessor design, a high-performance adder design, and a performance comparison of signal processing algorithms on commercial SIMD and VLIW microprocessors.
2.3.1. A Novel Low-Power Microprocessor
Rolf Hakenes and Yiannos Manoli, Institute of Microelectronics, University of Saarland2.3.2. A Power Perspective of Value Speculation for Superscalar Microprocessors
Rafael Moreno, Luis Piñuel, Silvia del Pino, Francisco Tirado, Universidad Complutense de2.3.3. Multilevel Reverse-Carry Adder
Madrid, Spain.
Javier D. Bruguera, University of Santiago de Compostela, Spain2.3.4. Evaluating Signal Processing and Multimedia Applications on SIMD, VLIW and Superscalar Architectures
Tomas Lang, University of California at Irvine, USA
Deependra Talla, Lizy John, Viktor Lapinskii, and Brian Evans, The University of Texas at Austin, USA
3:30-5:00
Session
3.1: Servers and Parallelism
Session Chair: Ruby Lee, Princeton
University
This session presents papers involving an optimized index and data buffering mechanism for database workloads, an analysis of sharing patterns in multiprocessors, and power-performance trade-offs in multithreaded microprocessors.
3.1.1. Unified Index and Data Fine-Granularity Buffering: Approach and Implementation
Qiang Cao and Josep Torrellas, University of Illinois, USA3.1.2. Analysis of Shared Memory Misses and Reference Patterns
H. V. Jagadish, University of Michigan, USA
Jeffrey B. Rothman, Lyris Technologies, Inc., USA3.1.3. Power-Sensitive Multithreaded Architecture
Alan Jay Smith, University of California, Berkeley, USA
John S. Seng and Dean M. Tullsen, University of California, San Diego, USA
George Z. N. Cai, Intel, USA
Session
3.2: Circuit Optimization and Analysis
Session Chair: Shervin Hojat, IBM
This session presents papers on circuit optimizations for improved performance. The first paper performs a set of concurrent optimizations to modify a circuit to meet a performance objective, using a Lagrangian multiplier based method. The second paper also performs a set of concurrent optimizations, except that these are more closely related to device fabrication: it considers the possibility of varying channel lengths and oxide thickness to improve performance. The third paper presents a clustering-based approach to selecting a set of buffers for a library. Finally, the session closes with a description of a full-chip power analysis methodology for a state-of-the-art microprocessor.
3.2.1. Delay Constrained Optimization by Simultaneous Fanout Tree Construction, Buffer Insertion and Gate Sizing
I-Min Liu and Adnan Aziz, The University of Texas at Austin, USA3.2.2. Application-based, Transistor-level Full-chip Power Analysis for 700 MHz PowerPC Microprocessor
Yi-Kan Cheng, Magma Design Automation, USA3.2.3. Buffer Library Selection (short paper)
David Bearden and Kanti Suryadevara, Motorola, USA
Charles J. Alpert, R. Gopal Gandham, Jose L. Neves, and Stephen T. Quay, IBM, USA3.2.4. High-Performance Low-Power CMOS Circuits Using Multiple Channel Length and Multiple Oxide Thickness (short paper)
Naran Sirisantana, Liqiong Wei, and Kaushik Roy, Purdue University, USA
Session
3.3: Logic Circuit Families
Session Chair: Shyh-Jye Jou, National
Central University
This session considers novel circuit approaches for digital logic. The first paper describes high-performance, low-power logic using current-mode threshold gates, while the second paper considers design techniques based on skewed CMOS. The last paper describes a technique for speeding up static and dynamic logic.
3.3.1. Current-Mode Threshold Logic Gates
S. Bobba and I.N. Hajj, University of Illinois at Urbana-Champaign, USA3.3.2. Skewed CMOS: Noise-Immune High-Performance Low-Power Static Circuit Family
Dinesh Somasekhar, Intel, USA3.3.3. Output Prediction Logic: A High-Performance CMOS Design Technique
Alexandre Solomatnikov and Kaushik Roy, Purdue University, USA
Larry McMurchie, Su Kio, Gin Yee, Tyler Thorp, and Carl Sechen, University of Washington, USA
5:00-6:30
Poster Session
9:00-10:00
Keynote Address: The Future of Populist Parallelism
Greg PfisterThis keynote covers past and future history of clusters. It will extrapolate the future from an observable cycle of cluster creation that has recurred several times in the past, combining that with the pieces and the glue that will be available in the future.
Senior Technical Staff Member
IBM Advanced Technology & Architecture, Server Design
10:30-12:00
Session
4.1: Intelligent Memory
Session Chair: Steven Reinhardt,
University of Michigan
This session focuses on issues relating to DRAM-based memory systems, buffering and prefetching policies, as well as trade-offs involved in processors-in-memory under multiprogramming workloads.
4.1.1. A Study of Channeled DRAM Memory Architectures
Lars Friebe, University of Hannover, Germany4.1.2. DRAM-Page Based Prediction and Prefetching
Yoshikazu Yabe, NEC Corp., Japan
Masato Motomura, NEC Corp., Japan
Haifeng Yu and Gershon Kedem Duke University, USA4.1.3. Reducing Cost and Tolerating Defects in Page-Based Intelligent Memory
Mark Oskin, Diana Keen, Justin Hensley, Lucian-Vlad Lita, and Frederic T. Chong, University of California, Davis, USA
Session
4.2: Processor Microarchitecture
Session Chair: Steve Furber, The
University of Manchester
This session presents new state-of-the-art techniques in cache memory, instruction streaming buffer, and branch decoupled processor design.
4.2.1. A selective temporal and aggressive spatial cache system based on time interval
Jung-Hoon Lee, Jang-Soo Lee, and Shin-Dug Kim, Yonsei University, Korea4.2.2. Design of Instruction Stream Buffer with Trace Support for X86 Processors
Jih-Ching Chiu, I-Huan Huang, and Chung-Ping Chung, National Chiao Tung University, Taiwan4.2.3. A Trace Based Evaluation of Speculative Branch Decoupling
Anshuman Nadkarni and Akhilesh Tyagi, Iowa State University, USA
Session
4.3: Digital Logic Techniques
Session Chair: Barbara Chappell,
Intel
This session considers novel circuit techniques applied to digital ICs. The first paper provides an interesting technique for using the DRAM cell array to do addition. The second paper presents a fixed-width array multiplier and a Booth multiplier for digital signal processing applications, while the last paper considers a dynamic flip-flop design for low power.
4.3.1. An Adder Using Charge Sharing and Its Application in DRAMs
Hak-soo Yu, Songjun Lee, and Jacob Abraham, The University of Texas at Austin, USA4.3.2. Fixed-Width Multiplier for DSP Applications
Shyh-Jye Jou and Hui-Hsuan Wang, National Central University, Taiwan4.3.3. Dynamic Flip-Flop with Improved Power
Nikola Nedovic, Vojin G. Oklobdzija, University of California, Davis, USA
12:00-1:00
Lunch
Sponsored by Texas Instruments
1:00-3:00
Session
5.1: Embedded Processors: Architecture and System-design Issues
Session Chair: Ricardo Gonzales,
Tensilica
This session has four papers on special purpose processors covering embedded, mobile, low power and DSP processors.
5.1.1. Mobile Processors
Farinaz Koushanfar and Miodrag Potkonjak, University of California, Los Angeles, USA5.1.2. AMULET3: A 100 MIPS Asynchronous Embedded Processor
Jan Rabaey, University of California, Berkeley, USA
S.B. Furber, D.A. Edwards, and J.D. Garside, The University of Manchester, UK5.1.3. Xtensa with User Defined DSP Coprocessor Micro-architectures
Gulbin Ezer, Tensilica, Inc., USA5.1.4. Predictive Strategies for Low-Power RTOS Scheduling
Pavan Kumar and Mani Srivastava, University of California, Los Angeles, USA
Session
5.2: Floorplanning and Partitioning
Session Chair: Tim Burks, Magma
Design Automation
Attend this session to obtain insights into new (and old) approaches to floorplanning and partitioning! The first two papers deal with the problem of floorplanning, one using the idea of B* trees, and another utilizing a hierarchical method based on partitioning. The third paper in the session presents a comparative evaluation of several existing multi-way partitioning algorithms, with detailed experimental results. Rounding off the session is a paper that designs a datapath using concurrent one-dimensional floorplanning.
5.2.1. Rectilinear Block Placement Using B*-Trees
Guang-Ming Wu, Yun-Chih Chang, and Yao-Wen Chang, National Chiao Tung University, Taiwan5.2.2. Fast Hierarchical Floorplanning With Congestion and Timing Control
Abhishek Ranjan, Kiarash Bazargan, and Majid Sarrafzadeh, Northwestern University, USA5.2.3. An Evaluation of Move-Based Multi-Way Partitioning Algorithms
E. Yarack, Silicon Graphics, USA5.2.4. Assignment-Space Exploration Approach to Concurrent Data-path/Floorplan Synthesis
J. Carletta, University of Akron, USA
Koji Ohashi, Mineo Kaneko, and Satoshi Tayu, Japan Advanced Institute of Science and Technology, Japan
Session
5.3: Basic Algorithms in Verification and Test
Session Chair: Yatin Hoskote, Intel
This session reports on work aimed at improving the basic infrastructure for verification and test. The first paper deals with how to improve the performance of a SAT solver for the common case when constraints are dynamically added and removed. The second paper deals with the problem of finding good variable orders for word-level decision diagrams. In the third paper sensitivity levels of test patterns are defined and used to guide simulation based ATPG for combinational circuits. Finally, in the fourth paper, it is investigated in which way a previously introduced compaction technique for scan test sets affects the quality of the tests.
5.3.1. On Solving Stack-Based Incremental Satisfiability Problems
Joonyoung Kim, Jesse Whittemore, and Karem Sakallah, University of Michigan, USA5.3.2. Efficient Dynamic Minimization of Word-Level DDs based on Lower Bound Computation
Wolfgang Guenther and Rolf Drechsler, University of Freiburg, Germany5.3.3. Sensitivity Levels of Test Patterns and Their Usefulness in Simulation-Based Test Generation
Stefan Hoereth, Siemens AG, Germany
Irith Pomeranz and Sudhakar M. Reddy, University of Iowa, USA5.3.4. On Test Application Time and Defect Detection Capabilities of Test Sets for Scan Designs
Irith Pomeranz and Sudhakar M. Reddy, University of Iowa, USA
3:30-5:30
Session
6.1: Special Session
Advancements in DSP Architecture
Session Chair: Jim Bondi, Texas
Instruments
Organizer: Nagaraj NS, Texas Instruments
This session has three papers describing various architecture aspects of the latest DSP processor designs from Texas Instruments.
6.1.1. Effective hardware based two way Loop Cache for high performance low power processors
Tim Anderson and Sanjive Agarwala, Texas Instruments, USA6.1.2. A multi-level memory system architecture for high-performance DSP applications
Sanjive Agarwala, Charles Fuoco, Tim Anderson, and Dave Comisky, Texas Instruments, USA6.1.3. A scalable high performance DMA architecture for high-performance DSP applications
Dave Comisky, Sanjive Agarwala, and Charles Fuoco, Texas Instruments, USA
Session
6.2: Advanced Architectural Design and Synthesis
Session Chair: Edward Grochowski,
Intel
This session describes new and innovative approaches to architectural design and synthesis. The first paper discusses a compilation methodology for pipeline reconfigurable architecture. Next, a processor design system for complex pipelined designs is presented. The third paper uses a symbolic framework to solve the binding problem for embedded VLIW ASIPs. The final paper describes a system that develops methods to take designs from a software description and interface them with hardware using a C++ framework.
6.2.1. A Fast and Efficient Compiler for Pipeline Reconfigurable Architectures
Srihari Cadambi and Seth Copen Goldstein, Carnegie Mellon University, USA6.2.2. PEAS-III: An ASIP Design Environment
Makiko Itoh, Shigeaki Higaki, Yoshinori Takeuchi, Akira Kitajima, and Masaharu Imai, Osaka University, Japan6.2.3. Symbolic Binding for VLIW ASIPs (short paper)
Jun Sato, Tsuruoka National College of Technology, Japan
Akichika Shiomi, Shizuoka University, Japan
Satish Pillai and Margarida Jacome, The University of Texas at Austin, USA6.2.4. Interfacing Hardware and Software Using C++ Class Libraries (short paper)
Dinesh Ramanathan and Rajesh Gupta, University of California, Irvine, USA
Ray Roth, CynApps Inc., USA
Session
6.3: Application and Case Studies in Test and Verification
Session Chair: Carl Pixley, Motorola
In this session practical aspects of applying modern test and verification techniques are discussed. The session starts with a paper on applying formal property verification in industry to ensure a design satisfies some desired properties. The second paper deals with verifying that the actual detailed implementation has not introduced any new bugs. This is followed by a paper dealing with how to correct/report any such bug. The final paper describes a set of production level procedures used to identify and verify the test structure and behavior of the BIST hardware as implemented for IBM?s TestBench test generation system.
6.3.1. Formal Verification of an Industrial System-on-a-chip
Hoon Choi, Myung-Kyoon Yim, Jae-Young Lee, Byeon-Whee Yun, and Yun-Tae Lee, Samsung Electronics, Korea6.3.2. Equivalence Checking Using a Structural SAT-Solver, BDDs, and Simulation
Viresh Paruthi and Andreas Kuehlmann, IBM, USA6.3.3. Efficient Design Error Correction of Digital Circuits
Dirk W. Hoffmann and Thomas Kropf, University of Tübingen, Germany6.3.4. An Automatic Validation Methodology for Logic BIST in High Performance VLSI Design
Michael Cogswell, James Sage, Don Pearl, and Alan Troidl, IBM, USA
5:30-6:30
Poster Session
7:00-9:00
Banquet
Speaker: TBD
9:00-10:00
Bryan Ackland, Lucent TechnologiesWhere are DSP architectures heading? How can high performance and low power be achieved? How to exploit parallelism in high end DSPs?
10:30-12:00
Session
7.1: Logic Optimization
Session Chair: Chin-Long Wey, Michigan
State University
This session considers techniques for the optimization of multi-level and two-level logic representations.
7.1.1. Efficient Logic Optimization Using Regular Extraction
Thomas Kutzschebauch, IBM T. J. Watson Research Center, USA7.1.2. Binary and Multi-valued SPFD-based Wire Removal in PLA Networks
Subarnarekha Sinha, University of California, Berkeley, USA7.1.3. Minimization of Ordered Pseudo Kronecker Decision Diagram
Sunil P. Khatri, University of Colorado at Boulder, USA
Robert K. Brayton and A. Sangiovanni-Vencentelli, University of California, Berkeley, USA
Per Lindgren, Lulea University of Technology, Sweden;
Rolf Drechsler and Brend Becker, Albert-Ludwigs University, Germany
Session
7.2: High Level Specification and Synthesis
Session Chair: Pranav Ashar, NEC
This session describes recent activity in the area of high level synthesis and design. The first paper addresses the productivity gap between the promise and reality of behavioral synthesis and considers how best it may be integrated within current design flows. The second paper attacks the problem of interfacing IP?s operating at different clock frequencies and the final paper of the session is related to multi-level communication synthesis.
7.2.1. Rethinking Behavioral Synthesis for a Better Integration within Existing Design Flows
Wander Oliveira Cesario and Ahmed Amine Jerraya, TIMA Laboratory, France7.2.2. Synthesis and Optimization of Interface Hardware between IP?s Operating at Different Clock Frequencies
Zoltan Sugar and Imed Moussa, Arexsys, France
Bong-Il Park, In-Cheol Park, and Chong-Min Kyung, KAIST, Korea7.2.3. Multi-level Communication Synthesis of Heterogeneous Multilanguage Specification
Hoon Choi, Samsung Electronics, Korea
F. Hessel, TIMA & PUCRS, Brazil
P. Coste, G. Nicolescu, P. LeMarrec, N. Zergainoh, and A. Jerraya, TIMA Laboratory, France