Sun, 4 December 2016, 20:34

A student writes:

> Hello Dr. Patt,
> I have 2 questions regarding caches.
> 1. I'm confused on the definition of "Allocate on Write miss".
> As I understand, normal cache allocates on both read and write misses.

Actually, there is no "normal" case.  When we get a read miss, we always
bring in the entire cache line.  When we get a write miss, we can do one of
three things: (a) we can bring in the entire line and then overwrite
the data that caused the write miss, (b) we can ignore the cache and just
write the data back to memory, or (c) we can allocate the line that the
write miss was a part of but NOT bring in that line.

If we do (c), we write the write miss data to that line in the cache.
To indicate that only the datum we have written is valid, we set a valid bit
for that datum, but also provide valid bits for the other data in the line,
and set them to invalid.  We call this notion of separating the cache line
into multiple sub-lines with a separate valid bit for each sub-line a sector
cache.  The size of the subline is usually the size of the datum written.

A common use of a sector cache and the allocate on write miss is the building
of a stack frame where we are pushing data onto the stack.  It makes no sense
to read in a cache line if we are going to immediately thereafter write over
every byte in the cache line.

> Does that mean "Allocate on write miss" allocates only on write miss?

See above.

> On a similar topic, kindly provide more information on sector cache and why
> having multiple valid bits for different segments of cache helps "allocate
> on write miss" cache.

See above.

> 2. In LRU replacement for 4-way set associative cache, we use 4 pairs of
> (V,NV) bits for each tag entry in the set, which consitutes 8 bits per set.
> Instead, cant we use 2 bits to specify V and 2 more to specify NV and get
> away with 4 extra bits per set?

Hah!  Excellent point.  You certainly could.  The first cache I was involved
with that used V, NV bits did it the way I taught you, so I always thought of
it that way.  But it could very well be that your way is better.  I would have
to look at the logic design of both schemes to see if your way creates too long
a propogation delay.  If not, yes, saving four bits per line is a win.  Thank

> Thanks,
> <<name withheld to protect the student who wishes to save 4 bits per set>>

Good luck on the last lab and final exam.

Yale Patt