another student follows up...

And, so, another email message:

	Hello Prof. Patt,

	I think I happened to be in the same TA session. 
	We discussed the following two scenarios:

	1. We make the  [index+byteinblock] bits limited 
	to the PageOffset field in the virtual address. 
	And translate the Page Number into Frame number 
	while we access the tag store in parallel.

	2. We use some lower bits of the Page Number field 
	alongwith the PageOffset field to make the
	[index  + byteinblock] bits. And again perform 
	translation in parallel with the Tag store access.

	We were not able to understand the advantage 1 has 
	over 2, or the motivation of doing 1 when we limit 
	our [index + byteinblock] to Offset only and hence 
	have a smaller cache unless we increase associativity.

	best regards
	<<name withheld to protect the student who is trying to be helpful>>

If we limit the [index + byteinblock] to Offset only, the two
models are the same.  Or, said another way, if we limit it as
stated, #2 is self-contradictory.  Do you see why?

The other question (if I can correctly parse this into two questions) 
is, "What is the advantage of #1 over #2.  Or, said another way, is
there any downside of #2.  Answer: yes, there is downside to #2.

Just to put us all on the same page, scheme #2 is the virtually
indexed, physically tagged scheme I talked about in class.

Consider two processes that are sharing the same page of physical
memory but doing so in different places in their respective virtual
address spaces.  In scheme #2, some of the index bits are coming
from the page number.  If the two page numbers are different, the
cache block will end up in two different places in the cache, since
they will have different index bits.  The identity (block number) of 
the block is not a problem since you get that from the tag, which in 
this case is the page frame number + the unmapped index bits.

What is the problem with having two copies of the block in the cache
at the same time?  Wasted space?  Sure.  But worse, what happens if
the processor writes to one copy of that block?

So, we have several solutions to this problem.  (1) A back translation
table which keeps track of the locations in the cache that contain each
block.  So a write to a block includes looking up the block in this table
to see if that same physical block is also stored elsewhere in the cache,
and if so, all copies of that block are updated.  (2)  Having the
operating system enforce sharing as follows: if multiple processes share
the same physical frame of memory, that frame must occupy the same virtual
page in each processor's virtual address space.  

Both schemes require extra work, so if the index bits are contained within
the page offset, scheme #1 does the job without either headache.

Does this answer the question?  If not, please ping me again.

Yale Patt