- Student Information Sheet – please fill out
this form, attach a recent recognizable photograph, and turn it in Monday, February 7th
This section lists the papers referenced in class. Some links may require you to login using your UT EID if accessed off-campus.
Processor Micorarchitectures
- J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, D. Shippy. Introduction to the Cell multiprocessor. 2005.
- H. Corporaal.Design of Transport Triggered Architectures. 4th Great Lakes Symposium on VLSI, 1995.
- J.R. Goodman et al. PIPE: A VLSI Decoupled Architecture. ISCA-12, 1985.
- Guriandar S. Sohi et al. Multiscalar Processors. ISCA, 1995.
- James E. Smith et al. Implementing Precise Interrupts in Pipelined Processors. IEEE Transactions on Computers, Vol. 37, No 5, May 1988.
- Richard M. Russel The CRAY-1 computer system. Commun. ACM 21, 1 (January 1978), 63-72.
- John H. Kelm, Daniel R. Johnson, Matthew R. Johnson, Neal C. Crago,
William Tuohy, Aqeel Mahesri, Steven S. Lumetta, Matthew I. Frank, Sanjay J. Patel Rigel: an architecture and scalable programming interface for a 1000-core accelerator. ISCA 2009.
James E. Thornton The CDC 6600 Project. Annals of the History of Computing, vol. 2, no. 4, pp. 338-348, Oct.-Dec. 1980.
A. Gottlieb, R. Grishman, C. P. Kruskal, K. P. McAuliffe, L. Rudolph and M. Snir The NYU Ultracomputer; Designing an MIMD Shared Memory Parallel Computer. IEEE Transactions on Computers, vol. C-32, no. 2, pp. 175-189, Feb. 1983.
Adrián Cristal, Daniel Ortega, Josep Llosa, Mateo Valero Kilo-instruction Processors. ISHPC 2003.
Out-of-Order and Superscalar
- Tomasulo, R. M. An Efficient Algorithm for Exploiting Multiple Arithmetic Units. IBM Journal of Research and Development, 1967.
- Arthur H. Vee.
Dataflow machine architecture. ACM Computing Surveys (CSUR), 1986.
- J. R. Gurd, C. C. Kirkham, and I. Watson The Manchester Prototype Dataflow Computer. Communications of the ACM, 1985.
- H. Corporaal Design of transport triggered architectures. VLSI, 1994. Design Automation of High Performance VLSI Systems.
- Yale Patt, Wen-mei Hwu, and Michael Shebanow. HPS, a new microarchitecture: rationale and introduction. MICRO-18, 1985.
- Yale Patt, Stephen W. Melvin, Wen-mei Hwu, and Michael Shebanow. Critical issues regarding HPS, a high performance microarchitecture. MICRO-18, 1985.
- James E. Smith. Decoupled Access/Execute Computer. 1984. (revised journal version)
- Subbarao Palacharla, Norman Jouppi, J.E. Smith. Complexity-Effective Superscalar Processors. ISCA, 1997.
- Jared Stark, Mary D. Brown, and Yale N. Patt, On Pipelining Dynamic Instruction Scheduling Logic. MICRO'00, 2000.
- M.D. Smith, M Johnson, M.A. Horowitz. Limits on multiple instruction issue. ASPLOS-3, 1989.
- Mattan Erez et al.Spills fills and kills. An Architecture for Reducing Register-Memory Traffic.. Technical report Concurrent VLSI Architecture (TR-23), Stanford University, July, 2000.
- Jack B. Dennis, David P. Misunas.A preliminary architecture for a basic data-flow processor. ISCA 1975.
- Arvind and R. S. Nikhil.Executing a program on the MIT tagged-token dataflow architecture. IEEE Transactions on Computers, vol. 39, no. 3, pp. 300-318, Mar 1990.
Simultaneous Multithreading
- Burton Smith. Architecture and applications of the HEP multiprocessor computer system . Proc. SPIE, vol. 298 Real-Time Signal Processing IV, 1981, pp. 241-248.
- Mario Nemirovsky, Forrest Brewer, Roger C. Wood. DISC: Dynamic Instruction Stream Computer. MICRO'91, 1991.
- Donalson, D.; Serrano, M.; Wood, R.; Nemirovsky, M. DISC: dynamic instruction stream computer-an evaluation of performance.Proceeding of the Twenty-Sixth Hawaii International Conference on System Sciences, 1993.
- Hirata, H.; Kimura, K.; Nagamine, S.; Mochizuki, Y.; Nishimura, A.; Nakase, Y.; Nishizawa, T. An Elementary Processor Architecture with Simultaneous Instruction Issuing from Multiple Threads. ISCA-19, 1992.
- D.M. Tullsen, S.J. Eggers, H.M. Levy. Simultaneous Multithreading: Maximizing On-Chip Parallelism. Proceedings of ISCA-22, June 1995.
- Robert S. Chappell, et. al. Simultaneous subordinate microthreading (SSMT). ISCA-26, 1999.
Future Trends
- H. Esmaeilzadeh, E, Blemz, R St. Amantx, K. Sankaralingamz, D. Burger, A. Seznec, P. Michaud. Dark Silicon and the End of Multicore Scaling. ISCA, 2011.
- R. Baumann. Soft Errors in Advanced Computer Systems. IEEE Design and Test of Computers, 2005.
- Yale Patt. Requirements, Bottlenecks, and Good Fortune: Agents for Microprocessor Evolution. Proceedings of the IEEE, vol. 89 no 11, 2001.
- Charles Leiserson, Neil Thompson, Joel Emer, Bradley Kuszmaul, Butler Lampson, Daniel Sanchez, Tao Schard There’s plenty of room at the Top: What will drive computer performance after Moore’s law? Science, Vol 368, No. 6495
Superblocks and Hyperblocks
Trace Cache
- Stephen W. Melvin and Yale N. Patt. Performance benefits of large execution atomic units in dynamically scheduled machines. ICS 3, 1989.
- Alexander Peleg and Uri Weiser. Dynamic flow instruction cache memory organized around trace segments independent of virtual address line. U.S. Patent 5381533, 1994.
- Daniel H. Friendly, Sanjay J. Patel, and Yale N. Patt. Alternative Fetch and Issue Policies for the Trace Cache Fetch Mechanism. MICRO'97, 1997.
- Sanjay J. Patel, Marius Evers, and Yale N. Patt. Improving trace cache effectiveness with branch promotion and trace packing. ISCA 25, 1998.
- Eric Rotenberg, Jim Smith, and Steve Bennett. Trace Cache: a Low Latency Approach to High Bandwidth Instruction Fetching. MICRO'96, 1996.
- Eric Rotenberg, Quinn Jacobson, Yiannakis Sazeides, and Jim Smith. Trace processors. MICRO'97, 1997.
- Bryan Black, Bohuslav Rychlik, and John Paul Shenn. The block-based trace cache. ISCA 26, 1999.
- Daniel Friendly, Sanjay Patel, and Yale Patt. Putting the fill unit to work. MICRO 31, 1998.
Cache Management Techniques
- Wen-Hann Wang, and Jean-Loup Baer On the inclusion properties for multi-level cache hierarchies . ISCA, 1988.
- Moinuddin K. Qureshi, David Thompson, and Yale N. Patt The V-Way Cache : Demand-Based Associativity via Global Replacement. ISCA, 2005.
- Moinuddin K. Qureshi, Daniel N. Lynch, Onur Mutlu, and Yale N. Patt. A Case for MLP-Aware Cache Replacement. ISCA, 2006.
- Moinuddin K. Qureshi, Aamer Jaleel, Yale N. Patt, Simon C. Steely Jr., and Joel Emer. Adaptive Insertion Policies for High Performance Caching. ISCA, 2007.
- Gennady Pekhimenko et al.Base-Delta-Immediate Compression: Practical Data Compression for On-Chip Caches. PACT'12, 2012.
- Norman P. Jouppi Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
. ISCA, 1990.
- M. K. Qureshi, M. A. Suleman, and Y. N. Patt. Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines.. HPCA, 2007.
- Gennady Pekhimenko, Tyler Huberty, Rui Cai, Onur Mutlu, Phillip P. Gibbons, Michael A. Kozuch, and Todd C. Mowry.Exploiting Compressed Block Size as an Indicator of Future Reuse.. HPCA, 2015.
Data prefetching
Runahead Execution
- James Dundas and Trevor Mudge. Improving data cache performance by pre-executing instructions under a cache miss. ICS-11, 1997.
- Onur Mutlu, Jared Stark, Chris Wilkerson, and Yale N. Patt. Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors. HPCA-9, 2003.
- Onur Mutlu, Hyesoon Kim, and Yale N. Patt. Techniques for Efficient Processing in Runahead Execution Engines. ISCA-32, 2005.
- Onur Mutlu, Hyesoon Kim, and Yale N. Patt. Address-Value Delta (AVD) Prediction: Increasing the Effectiveness of Runahead Execution by Exploiting Regular Memory Allocation Patterns. MICRO, 2005.
Branch Prediction
- James E. Smith, A Study of Branch Prediction Strategies ISCA-8, 1981.
- Tse-Yu Yeh and Yale Patt. Two-Level Adaptive Training Branch Prediction. MICRO-24, 1991.
- Tse-Yu Yeh and Yale Patt. Alternative implementations of two-level adaptive branch prediction. ISCA-19, 1992
- Shien-Tai Pan, Kimming So, Joseph T. Rahmeh. Improving the accuracy of dynamic branch prediction using branch correlation. ASPLOS-V, 1992.
- Scott McFarling. Combining Branch Predictors. WRL Technical Note TN-36, 1993.
- Ravi Nair. Dynamic path-based branch correlation. MICRO-28, 1995.
- Eric Sprangle, et. al. The Agree Predictor: A Mechanism For Reducing Negative Branch History Interference. ISCA-24, 1997.
- Daniel A. Jiménez and Calvin Lin. Dynamic Branch Prediction with Perceptrons.HPCA-7, 2001.
- Andre Seznec. Analysis of the OGEHL predictor. ISCA-32, 2005.
- Andre Seznec, Pierre Michaud. A case for (partially) tagged Geometric History Length Branch Prediction. Journal of Instruction Level Parallelism, Feb. 2006.
- David N. Armstrong, Hyesoon Kim, Onur Mutlu, and Yale N. Patt. Wrong Path Events: Exploiting Unusual and Illegal Program Behavior for Early Misprediction Detection and Recovery. MICRO, 2004.
- I-Cheng K. Chen, John T. Coffey, Trevor N. Mudge Analysis of branch prediction via data compression. ASPLOS, 1996.
- Marius Evers, Sanjay J. Patel, Robert S. Chappell, Yale N. Patt An analysis of correlation and predictability: what makes two-level branch predictors work. ISCA, 1998.
Predication
Block-Structured ISA
Measurements
Cache Coherence
- Goodman: R. Goodman, "Using Cache Memory to Reduce Processor-Memory Traffic", Proceedings of the 10th Annual International Symposium on Computer Architecture, pp 124-131, 1983 [pdf]
-
Illinois: Mark S. Papamarcos, Janak H. Patel "A Low-Overhead Coherence Solution for Multiprocessors with Private Cache Memories", Proceedings of the 11th Annual International Symposium on Computer Architecture, pp 348-354, 1984 [pdf]
-
Rudolph/Segall: Larry Rudolph, Zary Segall, "Dynamic Decentralized Cache Schemes for MIMD Parallel Processors" Proceedings of the 11th Annual International Symposium on Computer Architecture, pp 340-347, 1984 [pdf]
-
Berkeley: Katz, R.H., S.J Eggers, et. al., "Implementing a Cache Consistency Protocol" The 12th Annual International Symposium on Computer Architecutre, June 1985, pp. 276-283. [pdf]
-
Dragon: McCReight, E. "The Dragon computer system: An early overview", Technical Report, Xerox Corporation, Sept. 1984.
-
Synapse: Frank, S. et. al.. "Synapse tightly coupled multiprocessors: A New Approach to Solve Old Problems", pdf
Consistency Models
Tagless Caches
- Sembrant, A., Hagersten, E., Black-Schaffer, D. TLC: A tag-less cache for reducing dynamic first level cache energy. MICRO, 2013.
- Sembrant, A., Hagersten, E., Black-Schaffer, D. The Direct-to-Data (D2D) Cache: Navigating the cache hierarchy with a single lookup. ISCA, 2014.
- Sembrant, A., Hagersten, E., Black-Schaffer, D. Data placement across the cache hierarchy: Minimizing data movement with reuse-aware placement. ICCD, 2016.
- Sembrant, A., Hagersten, E., Black-Schaffer, D. A split cache hierarchy for enabling data-oriented optimizations.. HPCA, 2017.
RISC
Other Topics
Books
Patents
- Andreas I. Moshovos, Scott E. Breach, Terani N. Vijaykumar, Gurindar S. Sohi U.S. Patent 5,781,752. Issued July 14, 1998