another student follows up...
And, so, another email message:
Hello Prof. Patt,
I think I happened to be in the same TA session.
We discussed the following two scenarios:
1. We make the [index+byteinblock] bits limited
to the PageOffset field in the virtual address.
And translate the Page Number into Frame number
while we access the tag store in parallel.
2. We use some lower bits of the Page Number field
alongwith the PageOffset field to make the
[index + byteinblock] bits. And again perform
translation in parallel with the Tag store access.
We were not able to understand the advantage 1 has
over 2, or the motivation of doing 1 when we limit
our [index + byteinblock] to Offset only and hence
have a smaller cache unless we increase associativity.
<<name withheld to protect the student who is trying to be helpful>>
If we limit the [index + byteinblock] to Offset only, the two
models are the same. Or, said another way, if we limit it as
stated, #2 is self-contradictory. Do you see why?
The other question (if I can correctly parse this into two questions)
is, "What is the advantage of #1 over #2. Or, said another way, is
there any downside of #2. Answer: yes, there is downside to #2.
Just to put us all on the same page, scheme #2 is the virtually
indexed, physically tagged scheme I talked about in class.
Consider two processes that are sharing the same page of physical
memory but doing so in different places in their respective virtual
address spaces. In scheme #2, some of the index bits are coming
from the page number. If the two page numbers are different, the
cache block will end up in two different places in the cache, since
they will have different index bits. The identity (block number) of
the block is not a problem since you get that from the tag, which in
this case is the page frame number + the unmapped index bits.
What is the problem with having two copies of the block in the cache
at the same time? Wasted space? Sure. But worse, what happens if
the processor writes to one copy of that block?
So, we have several solutions to this problem. (1) A back translation
table which keeps track of the locations in the cache that contain each
block. So a write to a block includes looking up the block in this table
to see if that same physical block is also stored elsewhere in the cache,
and if so, all copies of that block are updated. (2) Having the
operating system enforce sharing as follows: if multiple processes share
the same physical frame of memory, that frame must occupy the same virtual
page in each processor's virtual address space.
Both schemes require extra work, so if the index bits are contained within
the page offset, scheme #1 does the job without either headache.
Does this answer the question? If not, please ping me again.