- Student Information Sheet – please fill out
this form, attach a recent recognizable photograph, and turn it in Wednesday, February 3rd
This section lists the papers referenced in class. Some links may require you to login using your UT EID if accessed off-campus.
Processor Micorarchitectures
- J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, D. Shippy. Introduction to the Cell multiprocessor. 2005.
- H. Corporaal.Design of Transport Triggered Architectures. 4th Great Lakes Symposium on VLSI, 1995.
- J.R. Goodman et al. PIPE: A VLSI Decoupled Architecture. ISCA-12, 1985.
- Guriandar S. Sohi et al. Multiscalar Processors. ISCA, 1995.
- James E. Smith et al. Implementing Precise Interrupts in Pipelined Processors. IEEE Transactions on Computers, Vol. 37, No 5, May 1988.
Out-of-Order and Superscalar
- Tomasulo, R. M. An Efficient Algorithm for Exploiting Multiple Arithmetic Units. IBM Journal of Research and Development, 1967.
- Yale Patt, Wen-mei Hwu, and Michael Shebanow. HPS, a new microarchitecture: rationale and introduction. MICRO-18, 1985.
- Yale Patt, Stephen W. Melvin, Wen-mei Hwu, and Michael Shebanow. Critical issues regarding HPS, a high performance microarchitecture. MICRO-18, 1985.
- James E. Smith. Decoupled Access/Execute Computer. 1984. (revised journal version)
- Subbarao Palacharla, Norman Jouppi, J.E. Smith. Complexity-Effective Superscalar Processors. ISCA, 1997.
- M.D. Smith, M Johnson, M.A. Horowitz. Limits on multiple instruction issue. ASPLOS-3, 1989.
Mattan Erez et al.Spills fills and kills. An Architecture for Reducing Register-Memory Traffic.. Technical report Concurrent VLSI Architecture (TR-23), Stanford University, July, 2000.
Simultaneous Multithreading
- Mario Nemirovsky, Forrest Brewer, Roger C. Wood. DISC: Dynamic Instruction Stream Computer. MICRO'91, 1991.
- Donalson, D.; Serrano, M.; Wood, R.; Nemirovsky, M. DISC: dynamic instruction stream computer-an evaluation of performance.Proceeding of the Twenty-Sixth Hawaii International Conference on System Sciences, 1993.
- Hirata, H.; Kimura, K.; Nagamine, S.; Mochizuki, Y.; Nishimura, A.; Nakase, Y.; Nishizawa, T. An Elementary Processor Architecture with Simultaneous Instruction Issuing from Multiple Threads. ISCA-19, 1992.
- D.M. Tullsen, S.J. Eggers, H.M. Levy. Simultaneous Multithreading: Maximizing On-Chip Parallelism. Proceedings of ISCA-22, June 1995.
- Robert S. Chappell, et. al. Simultaneous subordinate microthreading (SSMT). ISCA-26, 1999.
Future Trends
- H. Esmaeilzadeh, E, Blemz, R St. Amantx, K. Sankaralingamz, D. Burger, A. Seznec, P. Michaud. Dark Silicon and the End of Multicore Scaling. ISCA, 2011.
- R. Baumann. Soft Errors in Advanced Computer Systems. IEEE Design and Test of Computers, 2005.
- Yale Patt. Requirements, Bottlenecks, and Good Fortune: Agents for Microprocessor Evolution. Proceedings of the IEEE, vol. 89 no 11, 2001.
Superblocks and Hyperblocks
Trace Cache
- Stephen W. Melvin and Yale N. Patt. Performance benefits of large execution atomic units in dynamically scheduled machines. ICS 3, 1989.
- Alexander Peleg and Uri Weiser. Dynamic flow instruction cache memory organized around trace segments independent of virtual address line. U.S. Patent 5381533, 1994.
- Daniel H. Friendly, Sanjay J. Patel, and Yale N. Patt. Alternative Fetch and Issue Policies for the Trace Cache Fetch Mechanism. MICRO'97, 1997.
- Sanjay J. Patel, Marius Evers, and Yale N. Patt. Improving trace cache effectiveness with branch promotion and trace packing. ISCA 25, 1998.
- Eric Rotenberg, Jim Smith, and Steve Bennett. Trace Cache: a Low Latency Approach to High Bandwidth Instruction Fetching. MICRO'96, 1996.
- Eric Rotenberg, Quinn Jacobson, Yiannakis Sazeides, and Jim Smith. Trace processors. MICRO'97, 1997.
- Bryan Black, Bohuslav Rychlik, and John Paul Shenn. The block-based trace cache. ISCA 26, 1999.
- Daniel Friendly, Sanjay Patel, and Yale Patt. Putting the fill unit to work. MICRO 31, 1998.
Cache Management Techniques
- Wen-Hann Wang, and Jean-Loup Baer On the inclusion properties for multi-level cache hierarchies . ISCA, 1988.
- Moinuddin K. Qureshi, David Thompson, and Yale N. Patt The V-Way Cache : Demand-Based Associativity via Global Replacement. ISCA, 2005.
- Moinuddin K. Qureshi, Daniel N. Lynch, Onur Mutlu, and Yale N. Patt. A Case for MLP-Aware Cache Replacement. ISCA, 2006.
- Moinuddin K. Qureshi, Aamer Jaleel, Yale N. Patt, Simon C. Steely Jr., and Joel Emer. Adaptive Insertion Policies for High Performance Caching. ISCA, 2007.
- Gennady Pekhimenko et al.Base-Delta-Immediate Compression: Practical Data Compression for On-Chip Caches. PACT'12, 2012.
Runahead Execution
- James Dundas and Trevor Mudge. Improving data cache performance by pre-executing instructions under a cache miss. ICS-11, 1997.
- Onur Mutlu, Jared Stark, Chris Wilkerson, and Yale N. Patt. Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors. HPCA-9, 2003.
- Onur Mutlu, Hyesoon Kim, and Yale N. Patt. Techniques for Efficient Processing in Runahead Execution Engines. ISCA-32, 2005.
- Onur Mutlu, Hyesoon Kim, and Yale N. Patt. Address-Value Delta (AVD) Prediction: Increasing the Effectiveness of Runahead Execution by Exploiting Regular Memory Allocation Patterns. MICRO, 2005.
Branch Prediction
- James E. Smith, A Study of Branch Prediction Strategies ISCA-8, 1981.
- Tse-Yu Yeh and Yale Patt. Two-Level Adaptive Training Branch Prediction. MICRO-24, 1991.
- Tse-Yu Yeh and Yale Patt. Alternative implementations of two-level adaptive branch prediction. ISCA-19, 1992
- Shien-Tai Pan, Kimming So, Joseph T. Rahmeh. Improving the accuracy of dynamic branch prediction using branch correlation. ASPLOS-V, 1992.
- Scott McFarling. Combining Branch Predictors. WRL Technical Note TN-36, 1993.
- Ravi Nair. Dynamic path-based branch correlation. MICRO-28, 1995.
- Eric Sprangle, et. al. The Agree Predictor: A Mechanism For Reducing Negative Branch History Interference. ISCA-24, 1997.
- Daniel A. Jiménez and Calvin Lin. Dynamic Branch Prediction with Perceptrons.HPCA-7, 2001.
- Andre Seznec. Analysis of the OGEHL predictor. ISCA-32, 2005.
- Andre Seznec, Pierre Michaud. A case for (partially) tagged Geometric History Length Branch Prediction. Journal of Instruction Level Parallelism, Feb. 2006.
- David N. Armstrong, Hyesoon Kim, Onur Mutlu, and Yale N. Patt. Wrong Path Events: Exploiting Unusual and Illegal Program Behavior for Early Misprediction Detection and Recovery. MICRO, 2004.
Predication
Block-Structured ISA
Cache Coherence
- Goodman: R. Goodman, "Using Cache Memory to Reduce Processor-Memory Traffic", Proceedings of the 10th Annual International Symposium on Computer Architecture, pp 124-131, 1983 [pdf]
-
Illinois: Mark S. Papamarcos, Janak H. Patel "A Low-Overhead Coherence Solution for Multiprocessors with Private Cache Memories", Proceedings of the 11th Annual International Symposium on Computer Architecture, pp 348-354, 1984 [pdf]
-
Rudolph/Segall: Larry Rudolph, Zary Segall, "Dynamic Decentralized Cache Schemes for MIMD Parallel Processors" Proceedings of the 11th Annual International Symposium on Computer Architecture, pp 340-347, 1984 [pdf]
-
Berkeley: Katz, R.H., S.J Eggers, et. al., "Implementing a Cache Consistency Protocol" The 12th Annual International Symposium on Computer Architecutre, June 1985, pp. 276-283. [pdf]
-
Dragon: McCReight, E. "The Dragon computer system: An early overview", Technical Report, Xerox Corporation, Sept. 1984.
-
Synapse: Frank, S. et. al.. "Synapse tightly coupled multiprocessors: A New Approach to Solve Old Problems", pdf
Consistency Models
Books
Patents
- Andreas I. Moshovos, Scott E. Breach, Terani N. Vijaykumar, Gurindar S. Sohi U.S. Patent 5,781,752. Issued July 14, 1998