Date: Fri, 12 Apr 2019 19:15:49 -0500
From: "Yale N. Patt"
To: a.deshmukh@utexas.edu, mohammadbehnia@utexas.edu, chestercai@utexas.edu
Subject: Re: Question about cache tag stores for LRU replacement policies.
Message-ID: <20190413001549.GD12617@ece.utexas.edu>
References:
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To:
User-Agent: Mutt/1.5.21 (2010-09-15)
A student writes. I first hesitated to send this to you all since it is
mostly esoteric and beyond what you should expect to see on the midterm.
However, it may provide some insights so I am forwarding it to you all with
the caveat that it is not the first thing to study for the midterm.
> Greetings Dr. Patt.
>
> I had a question regarding the bits required in the tag store for various
> LRU replacement policies, and how it is affected by the associativity of
> the cache. I thought about it and came to the following conclusion, but I
> would like to clarify whether my thought process is correct.
>
> For the victim-next victim policy, you would need two bits for each cache
> line to store the victim and the next victim status, giving you 2*N
> additional bits, where N is the associativity of the cache.
That is correct, 2N bits for each set, 8 bits for 4-way.
If we use this mechanism for N greater than 4-way, we run into a problem.
You recall that there are times when one of the ways that is neither
victim (V) nor next-victom (NV) must be selected for NV. For 4-way, there
are two such ways, and we flip a coin to decide which is to become NV.
For 8-way, there are six such ways, so the mechanism becomes more cumbersome.
I have actually never seen the V-NV scheme implemented in other than 4-way.
> For a pseudo LRU, we would need 1 bits for a two-way cache, and 3 bits for
> a four-way cache which makes sense. Would you then need N-1 bits in general
> for a N-way set associative cache since you need to perform N - 1
> comparisons to determine the least recently used?
Yes, assuming the less complicated and usual case where N is a power of two.
You can think of the bits as nodes in a binary tree. With eight leaf nodes,
you have seven non-leaf nodes, each corresponding to one bit. Four bits for
the comparisons of the eight leaf nodes, two bits for the comparisons of those
four bits, and one bit for the comparison of those two bits. The larger N,
the less good an approximation to LRU.
> As for a "perfect" LRU policy, I am confused as to how that would be
> implemented. I get that a two-way cache would need just 1 LRU bit for the
> set, but how would it work for a four-way or eight-way cache? In general, I
> am kind of unsure as to how a "perfect" LRU is implemented.
First, no one actually implements perfect LRU. To do so, think of a graph
consisting of n! nodes. Each node represents one ordering of when the
n ways were accessed last. Based on which way just got accessed, we move to
the node that reflects the "LRU-ness" of that access. Thus, from each node,
we need an arc to the node if we access way 1, an arc to the node if we access
way 2, etc. Also an arc, if the access results in a cache miss. The
combinational logic you would need is exactly what you would need in
generating the next state variables in a finite state machine (which of course
you all remember from EE316!). This would take far too much logic and time
so no one does it that way, settling for a scheme like one of those above.
> Thank You
> <>
Good luck on the midterm.