EE 382N: Handouts

Administrative Handouts

Student Information Sheet – please fill out this form, attach a recent recognizable photograph, and turn it in Monday, January 26th

Exam 1 Preparation

previous exams: spring 2006, spring 2008, spring 2010, spring 2012, spring 2014, spring 2016, spring 2018, spring 2020, spring 2022, spring 2024

Class Slides

Papers

This section lists papers referenced in class. Some links may require you to login using your UT EID if accessed off-campus.

Processor Micorarchitectures
- J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, D. Shippy. Introduction to the Cell multiprocessor. 2005.
- H. Corporaal.Design of Transport Triggered Architectures. 4th Great Lakes Symposium on VLSI, 1995.
- J.R. Goodman et al. PIPE: A VLSI Decoupled Architecture. ISCA-12, 1985.
- Guriandar S. Sohi et al. Multiscalar Processors. ISCA, 1995.
- James E. Smith et al. Implementing Precise Interrupts in Pipelined Processors. IEEE Transactions on Computers, Vol. 37, No 5, May 1988.
- Richard M. Russel The CRAY-1 computer system. Commun. ACM 21, 1 (January 1978), 63-72.
- John H. Kelm, Daniel R. Johnson, Matthew R. Johnson, Neal C. Crago, William Tuohy, Aqeel Mahesri, Steven S. Lumetta, Matthew I. Frank, Sanjay J. Patel Rigel: an architecture and scalable programming interface for a 1000-core accelerator. ISCA 2009.
- James E. Thornton The CDC 6600 Project. Annals of the History of Computing, vol. 2, no. 4, pp. 338-348, Oct.-Dec. 1980.
- A. Gottlieb, R. Grishman, C. P. Kruskal, K. P. McAuliffe, L. Rudolph and M. Snir The NYU Ultracomputer; Designing an MIMD Shared Memory Parallel Computer. IEEE Transactions on Computers, vol. C-32, no. 2, pp. 175-189, Feb. 1983.
- Adrián Cristal, Daniel Ortega, Josep Llosa, Mateo Valero Kilo-instruction Processors. ISHPC 2003.
Out-of-Order and Superscalar
- Tomasulo, R. M. An Efficient Algorithm for Exploiting Multiple Arithmetic Units. IBM Journal of Research and Development, 1967.
- Arthur H. Vee. Dataflow machine architecture. ACM Computing Surveys (CSUR), 1986.
- J. R. Gurd, C. C. Kirkham, and I. Watson The Manchester Prototype Dataflow Computer. Communications of the ACM, 1985.
- H. Corporaal Design of transport triggered architectures. VLSI, 1994. Design Automation of High Performance VLSI Systems.
- Yale Patt, Wen-mei Hwu, and Michael Shebanow. HPS, a new microarchitecture: rationale and introduction. MICRO-18, 1985.
- Yale Patt, Stephen W. Melvin, Wen-mei Hwu, and Michael Shebanow. Critical issues regarding HPS, a high performance microarchitecture. MICRO-18, 1985.
- James E. Smith. Decoupled Access/Execute Computer. 1984. (revised journal version)
- Subbarao Palacharla, Norman Jouppi, J.E. Smith. Complexity-Effective Superscalar Processors. ISCA, 1997.
- Jared Stark, Mary D. Brown, and Yale N. Patt, On Pipelining Dynamic Instruction Scheduling Logic. MICRO'00, 2000.
- M.D. Smith, M Johnson, M.A. Horowitz. Limits on multiple instruction issue. ASPLOS-3, 1989.
- Mattan Erez et al.Spills fills and kills. An Architecture for Reducing Register-Memory Traffic.. Technical report Concurrent VLSI Architecture (TR-23), Stanford University, July, 2000.
- Jack B. Dennis, David P. Misunas.A preliminary architecture for a basic data-flow processor. ISCA 1975.
- Arvind and R. S. Nikhil.Executing a program on the MIT tagged-token dataflow architecture. IEEE Transactions on Computers, vol. 39, no. 3, pp. 300-318, Mar 1990.
Simultaneous Multithreading
- Burton Smith. Architecture and applications of the HEP multiprocessor computer system . Proc. SPIE, vol. 298 Real-Time Signal Processing IV, 1981, pp. 241-248.
- Mario Nemirovsky, Forrest Brewer, Roger C. Wood. DISC: Dynamic Instruction Stream Computer. MICRO'91, 1991.
- Donalson, D.; Serrano, M.; Wood, R.; Nemirovsky, M. DISC: dynamic instruction stream computer-an evaluation of performance.Proceeding of the Twenty-Sixth Hawaii International Conference on System Sciences, 1993.
- Hirata, H.; Kimura, K.; Nagamine, S.; Mochizuki, Y.; Nishimura, A.; Nakase, Y.; Nishizawa, T. An Elementary Processor Architecture with Simultaneous Instruction Issuing from Multiple Threads. ISCA-19, 1992.
- D.M. Tullsen, S.J. Eggers, H.M. Levy. Simultaneous Multithreading: Maximizing On-Chip Parallelism. Proceedings of ISCA-22, June 1995.
- Robert S. Chappell, et. al. Simultaneous subordinate microthreading (SSMT). ISCA-26, 1999.
Future Trends
- H. Esmaeilzadeh, E, Blemz, R St. Amantx, K. Sankaralingamz, D. Burger, A. Seznec, P. Michaud. Dark Silicon and the End of Multicore Scaling. ISCA, 2011.
- R. Baumann. Soft Errors in Advanced Computer Systems. IEEE Design and Test of Computers, 2005.
- Yale Patt. Requirements, Bottlenecks, and Good Fortune: Agents for Microprocessor Evolution. Proceedings of the IEEE, vol. 89 no 11, 2001.
- Charles Leiserson, Neil Thompson, Joel Emer, Bradley Kuszmaul, Butler Lampson, Daniel Sanchez, Tao Schard There's plenty of room at the Top: What will drive computer performance after Moore's law? Science, Vol 368, No. 6495
Superblocks and Hyperblocks
- Scott Mahlke, et al. Effective compiler support for predicated execution using the hyperblock. MICRO-25, 1992.
- Pohua P. Chang, Scott A. Mahlke, et al. IMPACT: an architectural framework for multiple-instruction-issue processors. ISCA-18, 1991.
- Francis Tseng, Yale N. Patt. Achieving Out-of-Order Performance with Almost In-Order Complexity . ISCA 2008.
Trace Cache
- Stephen W. Melvin and Yale N. Patt. Performance benefits of large execution atomic units in dynamically scheduled machines. ICS 3, 1989.
- Alexander Peleg and Uri Weiser. Dynamic flow instruction cache memory organized around trace segments independent of virtual address line. U.S. Patent 5381533, 1994.
- Daniel H. Friendly, Sanjay J. Patel, and Yale N. Patt. Alternative Fetch and Issue Policies for the Trace Cache Fetch Mechanism. MICRO'97, 1997.
- Sanjay J. Patel, Marius Evers, and Yale N. Patt. Improving trace cache effectiveness with branch promotion and trace packing. ISCA 25, 1998.
- Eric Rotenberg, Jim Smith, and Steve Bennett. Trace Cache: a Low Latency Approach to High Bandwidth Instruction Fetching. MICRO'96, 1996.
- Eric Rotenberg, Quinn Jacobson, Yiannakis Sazeides, and Jim Smith. Trace processors. MICRO'97, 1997.
- Bryan Black, Bohuslav Rychlik, and John Paul Shenn. The block-based trace cache. ISCA 26, 1999.
- Daniel Friendly, Sanjay Patel, and Yale Patt. Putting the fill unit to work. MICRO 31, 1998.
Cache Management Techniques
- Wen-Hann Wang, and Jean-Loup Baer On the inclusion properties for multi-level cache hierarchies . ISCA, 1988.
- Moinuddin K. Qureshi, David Thompson, and Yale N. Patt The V-Way Cache : Demand-Based Associativity via Global Replacement. ISCA, 2005.
- Moinuddin K. Qureshi, Daniel N. Lynch, Onur Mutlu, and Yale N. Patt. A Case for MLP-Aware Cache Replacement. ISCA, 2006.
- Moinuddin K. Qureshi, Aamer Jaleel, Yale N. Patt, Simon C. Steely Jr., and Joel Emer. Adaptive Insertion Policies for High Performance Caching. ISCA, 2007.
- Gennady Pekhimenko et al.Base-Delta-Immediate Compression: Practical Data Compression for On-Chip Caches. PACT'12, 2012.
- Norman P. Jouppi Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers . ISCA, 1990.
- M. K. Qureshi, M. A. Suleman, and Y. N. Patt. Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines.. HPCA, 2007.
- Gennady Pekhimenko, Tyler Huberty, Rui Cai, Onur Mutlu, Phillip P. Gibbons, Michael A. Kozuch, and Todd C. Mowry.Exploiting Compressed Block Size as an Indicator of Future Reuse.. HPCA, 2015.
Data prefetching
- Eiman Ebrahimi, Onur Mutlu, Yale N. Patt† Techniques for Bandwidth-Efficient Prefetching of Linked Data Structures in Hybrid Prefetching Systems . HPCA, 2009.
- Eiman Ebrahimi, Onur Mutlu, Chang Joo Lee, Yale N. Patt† Coordinated control of multiple prefetchers in multi-core systems . MICRO, 2009.
Runahead Execution
- James Dundas and Trevor Mudge. Improving data cache performance by pre-executing instructions under a cache miss. ICS-11, 1997.
- Onur Mutlu, Jared Stark, Chris Wilkerson, and Yale N. Patt. Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors. HPCA-9, 2003.
- Onur Mutlu, Hyesoon Kim, and Yale N. Patt. Techniques for Efficient Processing in Runahead Execution Engines. ISCA-32, 2005.
- Onur Mutlu, Hyesoon Kim, and Yale N. Patt. Address-Value Delta (AVD) Prediction: Increasing the Effectiveness of Runahead Execution by Exploiting Regular Memory Allocation Patterns. MICRO, 2005.
Branch Prediction
- James E. Smith, A Study of Branch Prediction Strategies ISCA-8, 1981.
- Tse-Yu Yeh and Yale Patt. Two-Level Adaptive Training Branch Prediction. MICRO-24, 1991.
- Tse-Yu Yeh and Yale Patt. Alternative implementations of two-level adaptive branch prediction. ISCA-19, 1992
- Shien-Tai Pan, Kimming So, Joseph T. Rahmeh. Improving the accuracy of dynamic branch prediction using branch correlation. ASPLOS-V, 1992.
- Scott McFarling. Combining Branch Predictors. WRL Technical Note TN-36, 1993.
- Ravi Nair. Dynamic path-based branch correlation. MICRO-28, 1995.
- Eric Sprangle, et. al. The Agree Predictor: A Mechanism For Reducing Negative Branch History Interference. ISCA-24, 1997.
- Daniel A. Jiménez and Calvin Lin. Dynamic Branch Prediction with Perceptrons.HPCA-7, 2001.
- Andre Seznec. Analysis of the OGEHL predictor. ISCA-32, 2005.
- Andre Seznec, Pierre Michaud. A case for (partially) tagged Geometric History Length Branch Prediction. Journal of Instruction Level Parallelism, Feb. 2006.
- David N. Armstrong, Hyesoon Kim, Onur Mutlu, and Yale N. Patt. Wrong Path Events: Exploiting Unusual and Illegal Program Behavior for Early Misprediction Detection and Recovery. MICRO, 2004.
- I-Cheng K. Chen, John T. Coffey, Trevor N. Mudge Analysis of branch prediction via data compression. ASPLOS, 1996.
- Marius Evers, Sanjay J. Patel, Robert S. Chappell, Yale N. Patt An analysis of correlation and predictability: what makes two-level branch predictors work. ISCA, 1998.
Predication
- Allen et. al.,Conversion of control dependence to data dependenceProceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages, 1983.
- Hyesoon Kim, Onur Mutlu, Jared Stark, and Yale N. Patt Wish Branches: Combining Conditional Branching and Predication for Adaptive Predicated ExecutionMICRO, 2005.
Block-Structured ISA
- Stephen Melvin and Yale N. Patt. Exploiting Fine-grained Parallelism Through a Combination of Hardware and Software Techniques. ISCA-18, 1991.
- Eric Sprangle and Yale N. Patt. Facilitating superscalar processing via a combined static/dynamic register renaming scheme. MICRO-27, 1994.
- Eric Hao, et. al. Increasing the Instruction Fetch Rate via Block-Structured Instruction Set Architectures. MICRO 29, 1996.
- Doug Burger, Stephen W. Keckler, Kathryn S. McKinley, et al. Scaling to the End of Silicon with EDGE Architectures. IEEE Computer, 37 (7), 2004.
Measurements
- Stephen Melvin and Yale N. Patt. SPAM: A Microcode Based Tool for Tracing Operating System Events . MICRO-20, 1987.
Cache Coherence
- Goodman: R. Goodman, "Using Cache Memory to Reduce Processor-Memory Traffic", Proceedings of the 10th Annual International Symposium on Computer Architecture, pp 124-131, 1983 [pdf]
- Illinois: Mark S. Papamarcos, Janak H. Patel "A Low-Overhead Coherence Solution for Multiprocessors with Private Cache Memories", Proceedings of the 11th Annual International Symposium on Computer Architecture, pp 348-354, 1984 [pdf]
- Rudolph/Segall: Larry Rudolph, Zary Segall, "Dynamic Decentralized Cache Schemes for MIMD Parallel Processors" Proceedings of the 11th Annual International Symposium on Computer Architecture, pp 340-347, 1984 [pdf]
- Berkeley: Katz, R.H., S.J Eggers, et. al., "Implementing a Cache Consistency Protocol" The 12th Annual International Symposium on Computer Architecutre, June 1985, pp. 276-283. [pdf]
- Dragon: McCReight, E. "The Dragon computer system: An early overview", Technical Report, Xerox Corporation, Sept. 1984.
- Synapse: Frank, S. et. al.. "Synapse tightly coupled multiprocessors: A New Approach to Solve Old Problems", pdf
Consistency Models
- Leslie Lamport. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs. IEEE Trans. on Computers, 1979.
Tagless Caches
- Sembrant, A., Hagersten, E., Black-Schaffer, D. TLC: A tag-less cache for reducing dynamic first level cache energy. MICRO, 2013.
- Sembrant, A., Hagersten, E., Black-Schaffer, D. The Direct-to-Data (D2D) Cache: Navigating the cache hierarchy with a single lookup. ISCA, 2014.
- Sembrant, A., Hagersten, E., Black-Schaffer, D. Data placement across the cache hierarchy: Minimizing data movement with reuse-aware placement. ICCD, 2016.
- Sembrant, A., Hagersten, E., Black-Schaffer, D. A split cache hierarchy for enabling data-oriented optimizations.. HPCA, 2017.
RISC
- David A. Patterson, David R. Ditzel. The case for the reduced instruction set computer. ACM SIGARCH Computer Architecture News.
Other Topics
- Harvey Garner. The residue number system. IRE-AIEE-ACM '59.
- A. D. Booth. A SIGNED BINARY MULTIPLICATION TECHNIQUE. The Quarterly Journal of Mechanics and Applied Mathematics, January 1951.
- G. J. Myers, B. R. S. Buckingham. A hardware implementation of capability-based addressing. ACM SIGOPS 1980.
Books
- D. Siewiorek, C.G. Bell, A. Newell Computer Structures: Principles and Examples. McGraw-Hill, 1982.
- P. Kogge. Architecture of Pipelined Computers. McGraw-Hill, 1981.
- B. Colwell. The Pentium Chronicles: The People, Passion, and Politics Behind Intel's Landmark Chips. Wiley-IEEE Computer Society Pr, 2005.
- T. Kidder. The Soul of A New Machine. Back Bay Books , 2000.
Patents
- Andreas I. Moshovos, Scott E. Breach, Terani N. Vijaykumar, Gurindar S. Sohi U.S. Patent 5,781,752. Issued July 14, 1998

x86 ISA

Other

Project advice from former students

Handouts

Administrative Handouts

Exam 1 Preparation

Class Slides

Papers

Processor Micorarchitectures

Out-of-Order and Superscalar

Simultaneous Multithreading

Future Trends

Superblocks and Hyperblocks

Trace Cache

Cache Management Techniques

Data prefetching

Runahead Execution

Branch Prediction

Predication

Block-Structured ISA

Measurements

Cache Coherence

Consistency Models

Tagless Caches

RISC

Other Topics

Books

Patents

x86 ISA

Other