More questions from ED

Sun, 6 Apr, 2025
    
    Luke answered the questions below, but I wanted to add a little more.

    > Q1. Is the cache line size determined in the architecture, or can it
    > vary with software?
    >
    > Luke: To my knowledge, it is always determined with the architecture, but
    > I do not know for sure. Maybe Dr. Patt knows of alternate
    > implementations.

    Historically, the cache has not been part of the ISA, but part of the
    microarchitecture, specified by the architect of the specific implementation
    (i.e. Microarchitecture.  Take for example x86.  There are many, many
    implemenations of the x86.  The Microarchitect charged with specifying the
    details has the right to specify size, line size, etc. however he/she wishes,
    always dependent of what he/she will yield the best performance for the
    computer.  My students and I once published a paper speciying a VWAY cache,
    where we allowed the number of ways in a set to vary and be specified and
    respecified depending on its intended use.  The paper was published in the
    top conference (ISCA) in 2005.  But I have yet to see Microarchitect adopt it
    in a machine.


    > Q2. Can multiple cache lines be sent at once. Ie, can a cache level
    > receive multiple cache lines at once, or can it only be one at a time?
    >
    > A. In other words, I believe you're asking why don't we just get the
    > next line too in addition to this one when we bring it in from memory?
    > The answer is yes, we often have mechanisms that predict which cache
    > lines to fetch before we reach them. A "next-line" prefetcher for
    > example will bring in the next cache line into the I-cache (as long as
    > it's not busy with the current one), so that the cache line will be
    > there before we fetch from that location.

    I actually was not sure what the student question was about.  I think Luke
    interpreted the question correctly, and his answer is correct.  It is often
    useful to guess that if the cache line is not in the cache, it is likely that
    the next cache line will also not be in the cache and will be needed for an
    access in the near future, so it is PREFETCHED.  One must be careful since
    bringing in that cache line means some other line needs to be kicked out.  And
    it may be the case that the prefetched line may end up not being accessed, but
    the kicked-out line needs to be accessed.  In that case prefetching would make
    the program take longer to execute.  There is a lot of discussion on that topic,
    unfortunately too much to deal with right now.


    > Q3. When there are multiple levels of cache, can a lower level of
    > cache directly access memory, or must it go through a higher level of
    > cache first. Ie, if a computer has l1 and l2 cache, and l1 cache needs
    > some data, can l1 cache directly have a line sent from memory, or must
    > it go from memory to l2 to l1?
    >
    > A.  L1 should ask L2 for that cache line (otherwise L2 is not useful),
    > but if L2 and all further levels don't have it, RAM could send that
    > cache line straight to L1, skipping L2. The cache line would only
    > exist in L2 if L1 had previously evicted it. This would be an
    > exclusive cache, as opposed to an inclusive cache or a NINE cache
    > (Non-inclusive non-exclusive cache) where the line is loaded into L2
    > also. To my understanding, the main benefit of this is that it makes
    > better use of cache space, as opposed to potentially having copies of
    > the line in both L1 and L2. In multicore, another processor might go
    > looking for a cache line in this processor, and it starts looking from
    > lowest level rather than highest level (L1 is considered highest). A
    > drawback of exclusive caches is something in L1 would take longest to
    > find since it's not also in L2.

    In my view, the most effective use of multiple levels of cache is accomplished
    by adding the line to all levels of the cache.  As Luke mentioned, that is an
    inclusive cache.  All lines in level 1 are in level 2, for example.  Doing that
    provides an important opportunity (cache coherence) which we will discuss
    before end of the semester.  When we bring the line onto the chip, we first
    supply it to the processor so we can continue processing.  We then update the
    caches so the line is in at least L1 and L2, and often in L3 also. 

    If later an L1 access misses in the cache, L2 is checked.  If present there, it
    is brought into the processor and into L1, and a line is kicked out of L1 to
    make room for it.

    > Q4. Are the USP and SSP a physical or virtual address (do we have to
    > do address translation for R6?
    >
    > A. USP is a virtual address because it's maintained by the user
    > process. The SSP is a virtual address as well, but since system space
    > is mapped 1:1 with the first 24 pages of virtual space, it does not
    > need to be translated while you are using it during a context switch.
    > It is already a physical address as well as a virtual address. Similar
    > to how you would not translate the address of the page table because
    > it's already a physical address.

    Actually, while Luke's answer is one way to define stack pointers, the more
    common way is to make them actual registers in the data path.  Access to
    registers is much faster than access to memory.  To give you a better sense
    of registers, think of the program counter.  Every process has its own address
    in the PC.  That address depends on what instruction that program (when you are
    running it) will access next.  If there are 50 processes, there are 50
    addresses in the 50 program counters.  The PC is part of the context of a
    process.  It is one of the registers loaded before that process takes control
    of the computer, and saved when that process is about to lose control of the
    machine and the PC of the process that is about to get control of the machine
    needs to be loaed with the address of the instruction that process needs to
    execute next.