- Student
Information Sheet – please fill out
this form, attach a recent recognizable photograph, and turn it in Monday, January 26th
This section lists papers referenced in class. Some links may require you to login using your UT EID if accessed
off-campus.
-
Processor Micorarchitectures
- J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, D. Shippy. Introduction to
the Cell multiprocessor. 2005.
- H. Corporaal.Design
of Transport Triggered Architectures. 4th Great Lakes Symposium on VLSI, 1995.
- J.R. Goodman et al. PIPE:
A VLSI Decoupled Architecture. ISCA-12, 1985.
- Guriandar S. Sohi et al. Multiscalar
Processors. ISCA, 1995.
- James E. Smith et al. Implementing Precise
Interrupts in Pipelined Processors. IEEE Transactions on Computers, Vol. 37, No 5, May
1988.
- Richard M. Russel The CRAY-1 computer
system. Commun. ACM 21, 1 (January 1978), 63-72.
- John H. Kelm, Daniel R. Johnson, Matthew R. Johnson, Neal C. Crago,
William Tuohy, Aqeel Mahesri, Steven S. Lumetta, Matthew I. Frank, Sanjay J. Patel Rigel: an architecture and scalable
programming interface for a 1000-core accelerator. ISCA 2009.
- James E. Thornton The CDC
6600 Project. Annals of the History of Computing, vol. 2, no. 4, pp. 338-348, Oct.-Dec.
1980.
-
A. Gottlieb, R. Grishman, C. P. Kruskal, K. P. McAuliffe, L. Rudolph and M. Snir The NYU Ultracomputer;
Designing an MIMD
Shared Memory Parallel Computer. IEEE Transactions on Computers, vol. C-32, no. 2, pp.
175-189, Feb.
1983.
-
Adrián Cristal, Daniel Ortega, Josep Llosa, Mateo Valero Kilo-instruction
Processors. ISHPC
2003.
-
Out-of-Order and Superscalar
- Tomasulo, R. M. An
Efficient Algorithm for Exploiting Multiple Arithmetic Units. IBM Journal of Research
and
Development, 1967.
- Arthur H. Vee.
Dataflow machine architecture. ACM Computing Surveys (CSUR), 1986.
- J. R. Gurd, C. C. Kirkham, and I. Watson
The
Manchester Prototype Dataflow Computer. Communications of the ACM, 1985.
- H. Corporaal Design of transport
triggered
architectures. VLSI, 1994. Design Automation of High Performance VLSI Systems.
- Yale Patt, Wen-mei Hwu, and Michael Shebanow. HPS,
a
new microarchitecture: rationale and introduction. MICRO-18, 1985.
- Yale Patt, Stephen W. Melvin, Wen-mei Hwu, and Michael Shebanow.
Critical
issues regarding HPS, a high performance microarchitecture. MICRO-18, 1985.
- James E. Smith. Decoupled
Access/Execute Computer. 1984. (revised journal version)
- Subbarao Palacharla, Norman Jouppi, J.E. Smith. Complexity-Effective
Superscalar Processors. ISCA, 1997.
- Jared Stark, Mary D. Brown, and Yale N. Patt, On Pipelining Dynamic Instruction Scheduling Logic.
MICRO'00, 2000.
- M.D. Smith, M Johnson, M.A. Horowitz. Limits
on
multiple instruction issue. ASPLOS-3, 1989.
- Mattan Erez et al.Spills
fills
and kills. An Architecture for Reducing Register-Memory Traffic.. Technical report
Concurrent VLSI Architecture (TR-23), Stanford University, July, 2000.
- Jack B. Dennis, David P. Misunas.A
preliminary
architecture for a basic data-flow processor. ISCA 1975.
- Arvind and R. S. Nikhil.Executing a
program
on the MIT tagged-token dataflow architecture. IEEE Transactions on Computers, vol. 39,
no.
3, pp. 300-318, Mar 1990.
-
Simultaneous Multithreading
- Burton Smith. Architecture
and applications of the HEP multiprocessor computer system . Proc. SPIE, vol. 298
Real-Time
Signal Processing IV, 1981, pp. 241-248.
- Mario Nemirovsky, Forrest Brewer, Roger C. Wood. DISC:
Dynamic Instruction Stream Computer. MICRO'91, 1991.
- Donalson, D.; Serrano, M.; Wood, R.; Nemirovsky, M.
DISC: dynamic instruction stream computer-an evaluation of performance.Proceeding of the
Twenty-Sixth Hawaii International Conference on System Sciences, 1993.
- Hirata, H.; Kimura, K.; Nagamine, S.; Mochizuki, Y.; Nishimura, A.; Nakase, Y.; Nishizawa,
T. An
Elementary Processor Architecture with Simultaneous Instruction Issuing from Multiple
Threads. ISCA-19, 1992.
- D.M. Tullsen, S.J. Eggers, H.M. Levy. Simultaneous
Multithreading: Maximizing On-Chip Parallelism. Proceedings of ISCA-22, June
1995.
- Robert S. Chappell, et. al. Simultaneous
subordinate microthreading (SSMT). ISCA-26, 1999.
-
Future Trends
- H. Esmaeilzadeh, E, Blemz, R St. Amantx, K. Sankaralingamz, D. Burger, A. Seznec, P. Michaud.
Dark Silicon and the End of
Multicore Scaling. ISCA, 2011.
- R. Baumann. Soft
Errors
in Advanced Computer Systems. IEEE Design and Test of Computers, 2005.
- Yale Patt. Requirements,
Bottlenecks, and Good Fortune: Agents for Microprocessor Evolution. Proceedings of the
IEEE,
vol. 89 no 11, 2001.
- Charles Leiserson, Neil Thompson, Joel Emer, Bradley Kuszmaul, Butler Lampson, Daniel Sanchez,
Tao
Schard There's plenty of room at
the
Top: What will drive computer performance after Moore's law? Science, Vol 368, No.
6495
-
Superblocks and Hyperblocks
-
Trace Cache
- Stephen W. Melvin and Yale N. Patt. Performance
benefits of large execution atomic units in dynamically scheduled machines. ICS 3,
1989.
- Alexander Peleg and Uri Weiser. Dynamic flow
instruction
cache memory organized around trace segments independent of virtual address line. U.S.
Patent 5381533, 1994.
- Daniel H. Friendly, Sanjay J. Patel, and Yale N. Patt. Alternative Fetch and Issue Policies for the
Trace Cache Fetch Mechanism. MICRO'97, 1997.
- Sanjay J. Patel, Marius Evers, and Yale N. Patt. Improving
trace cache effectiveness with branch promotion and trace packing. ISCA 25,
1998.
- Eric Rotenberg, Jim Smith, and Steve Bennett. Trace Cache: a Low
Latency
Approach to High Bandwidth Instruction Fetching. MICRO'96, 1996.
- Eric Rotenberg, Quinn Jacobson, Yiannakis Sazeides, and Jim Smith. Trace
processors. MICRO'97, 1997.
- Bryan Black, Bohuslav Rychlik, and John Paul Shenn. The
block-based trace cache. ISCA 26, 1999.
- Daniel Friendly, Sanjay Patel, and Yale Patt. Putting the fill unit to work.
MICRO 31, 1998.
-
Cache Management Techniques
- Wen-Hann Wang, and Jean-Loup Baer On the
inclusion properties for multi-level cache hierarchies . ISCA, 1988.
- Moinuddin K. Qureshi, David Thompson, and Yale N. Patt The V-Way Cache : Demand-Based
Associativity
via Global Replacement. ISCA, 2005.
- Moinuddin K. Qureshi, Daniel N. Lynch, Onur Mutlu, and Yale N. Patt. A Case for MLP-Aware Cache
Replacement.
ISCA, 2006.
- Moinuddin K. Qureshi, Aamer Jaleel, Yale N. Patt, Simon C. Steely Jr., and Joel Emer. Adaptive Insertion Policies for High
Performance Caching. ISCA, 2007.
- Gennady Pekhimenko et al.Base-Delta-Immediate
Compression: Practical Data Compression for On-Chip Caches. PACT'12, 2012.
- Norman P. Jouppi Improving
direct-mapped cache performance by the addition of a small fully-associative cache and
prefetch
buffers
. ISCA, 1990.
- M. K. Qureshi, M. A. Suleman, and Y. N. Patt. Line Distillation: Increasing Cache
Capacity by Filtering Unused Words in Cache Lines.. HPCA, 2007.
- Gennady Pekhimenko, Tyler Huberty, Rui Cai, Onur Mutlu, Phillip P. Gibbons, Michael A. Kozuch,
and
Todd C. Mowry.Exploiting
Compressed Block Size as an Indicator of Future Reuse.. HPCA, 2015.
-
Data prefetching
-
Runahead Execution
- James Dundas and Trevor Mudge. Improving
data cache performance by pre-executing instructions under a cache miss. ICS-11,
1997.
- Onur Mutlu, Jared Stark, Chris Wilkerson, and Yale N. Patt. Runahead Execution: An Alternative to Very
Large Instruction Windows for Out-of-order Processors. HPCA-9, 2003.
- Onur Mutlu, Hyesoon Kim, and Yale N. Patt. Techniques for Efficient Processing in
Runahead Execution Engines. ISCA-32, 2005.
- Onur Mutlu, Hyesoon Kim, and Yale N. Patt. Address-Value Delta (AVD) Prediction:
Increasing the Effectiveness of Runahead Execution by Exploiting Regular Memory Allocation
Patterns. MICRO, 2005.
-
Branch Prediction
- James E. Smith, A Study of Branch
Prediction Strategies ISCA-8, 1981.
- Tse-Yu Yeh and Yale Patt. Two-Level
Adaptive
Training Branch Prediction. MICRO-24, 1991.
- Tse-Yu Yeh and Yale Patt. Alternative
implementations of two-level adaptive branch prediction. ISCA-19, 1992
- Shien-Tai Pan, Kimming So, Joseph T. Rahmeh. Improving
the accuracy of dynamic branch prediction using branch correlation. ASPLOS-V,
1992.
- Scott McFarling. Combining
Branch Predictors. WRL Technical Note TN-36, 1993.
- Ravi Nair. Dynamic path-based branch
correlation. MICRO-28, 1995.
- Eric Sprangle, et. al. The
Agree Predictor: A Mechanism For Reducing Negative Branch History Interference. ISCA-24,
1997.
- Daniel A. Jiménez and Calvin Lin. Dynamic Branch Prediction with
Perceptrons.HPCA-7, 2001.
- Andre Seznec. Analysis of the
OGEHL
predictor. ISCA-32, 2005.
- Andre Seznec, Pierre Michaud. A
case for (partially) tagged Geometric History Length Branch Prediction. Journal of
Instruction Level Parallelism, Feb. 2006.
- David N. Armstrong, Hyesoon Kim, Onur Mutlu, and Yale N. Patt. Wrong Path Events: Exploiting
Unusual
and Illegal Program Behavior for Early Misprediction Detection and Recovery. MICRO,
2004.
- I-Cheng K. Chen, John T. Coffey, Trevor N. Mudge Analysis of branch prediction via data
compression. ASPLOS, 1996.
- Marius Evers, Sanjay J. Patel, Robert S. Chappell, Yale N. Patt An analysis of correlation and
predictability:
what makes two-level branch predictors work. ISCA, 1998.
-
Predication
-
Block-Structured ISA
-
Measurements
-
Cache Coherence
- Goodman: R. Goodman, "Using Cache Memory to Reduce Processor-Memory Traffic", Proceedings of
the
10th Annual International Symposium on Computer Architecture, pp 124-131, 1983 [pdf]
-
Illinois: Mark S. Papamarcos, Janak H. Patel "A Low-Overhead Coherence Solution for
Multiprocessors with Private Cache Memories", Proceedings of the 11th Annual International Symposium
on
Computer Architecture, pp 348-354, 1984 [pdf]
-
Rudolph/Segall: Larry Rudolph, Zary Segall, "Dynamic Decentralized Cache Schemes for MIMD
Parallel Processors" Proceedings of the 11th Annual International Symposium on Computer
Architecture, pp
340-347, 1984 [pdf]
-
Berkeley: Katz, R.H., S.J Eggers, et. al., "Implementing a Cache Consistency Protocol" The
12th
Annual International Symposium on Computer Architecutre, June 1985, pp. 276-283.
[pdf]
-
Dragon: McCReight, E. "The Dragon computer system: An early overview", Technical Report,
Xerox
Corporation, Sept. 1984.
-
Synapse: Frank, S. et. al.. "Synapse tightly coupled multiprocessors: A New Approach to Solve
Old
Problems", pdf
-
Consistency Models
-
Tagless Caches
- Sembrant, A., Hagersten, E., Black-Schaffer, D. TLC: A tag-less cache for
reducing
dynamic first level cache energy. MICRO, 2013.
- Sembrant, A., Hagersten, E., Black-Schaffer, D. The Direct-to-Data (D2D) Cache: Navigating
the
cache hierarchy with a single lookup. ISCA, 2014.
- Sembrant, A., Hagersten, E., Black-Schaffer, D. Data placement across the cache
hierarchy:
Minimizing data movement with reuse-aware placement. ICCD, 2016.
- Sembrant, A., Hagersten, E., Black-Schaffer, D. A
split cache hierarchy for enabling data-oriented optimizations.. HPCA, 2017.
-
RISC
-
Other Topics
-
Books
-
Patents
- Andreas I. Moshovos, Scott E. Breach, Terani N. Vijaykumar, Gurindar S. Sohi
U.S. Patent 5,781,752. Issued July 14, 1998