Addr[1:0]
Addr[4:2]
Addr[7:5]
Addr[11:8]
577 Cycles. The first 8 memory accesses, A[0][0] to A[0][7], must occur sequentially with no overlap since they are all acesses to the same bank. Thus, it would take 80 cycles for the 1st 8 memory accesses, with the 8th access starting in cycle 70. Since the 8th and 9th memory accesses, A[0][7] and A[1][0], respectively, are to different banks, the accesses can overlap, and the 9th access can start in cycle 71 (70 cycles for the 1st 7 accesses plus 1 additional cycle of the 8th access). Continuing with this logic, the access to A[2][0] could start in cycle 142 (71x2). Finally, the access to A[7][0] could start in cycle 497 (71x7). Now all that remains are 8 more memory accesses, all to the same bank (A[7][0] to A[7][7]). This takes another 80 cycles, bringing the total to 577 cycles (497 + 80).
If the memory were not interleaved, all 64 memory accesses must happen sequentially with no overlap, so it would take a total of 640 cycles (64*10). Therefore, we do gain some benefit from this interleaving scheme, but not that much.
Yes, a change can be made. The new bits are:
Addr[1:0]
Addr[4:2]
Addr[11:9]
Addr[8:5]
73 Cycles. With the new interleaving scheme, consecutive memory accesses are to either to different banks of the same rank, or to different ranks all together. In both cases, the consecutive accesses can start immediately after each other. Therefore the latency of all memory accesses would be hidden except the first access. Total number of cycles = 10 + 63 = 73
Only one line of code needs to be changed:
sum = sum + A[i][j];
to
sum = sum + A[j][i];
Alternatively, you could keep that line the same, but swap the variable (i
/j
) of the inner
and outer loops as shown below.
Original code:
for(i = 0; i < 8; ++i){
for(j = 0; j < 8; ++j){
sum = sum + A[i][j];
}
}
New code:
for(j = 0; j < 8; ++j){
for(i = 0; i < 8; ++i){
sum = sum + A[i][j];
}
}
87 Cycles. Now consecutive memory accesses are to different banks, so the accesses can overlap. The 1st access, A[0][0], would begin at cycle 0, the 2nd, A[1][0], at cycle 1, and so on. The 8th access, A[7][0], would start at cycle 7. However, the 9th access, A[0][1], cannot start at cycle 8. It would have to wait 2 more cycles for the 1st access to finish since it is on the same bank as the 1st access; therefore, it would start at cycle 10. Continuing this logic, the access to A[0][2] would start at cycle 20, and finally, the access to A[0][7] would start at cycle 70. Now, all that is left are 8 accesses, but they are all to different banks so they can start 1 cycle after each other. The access to A[1][7] would begin at cycle 71, A[2][7] at 72, and finally A[7][7], the last memory access, would begin at cycle 77 and, therefore, end at cycle 87
The following table summarizes the differences between exceptions and interrupts.
Exceptions | Interrupts | |
---|---|---|
Cause | Internal to the running process | External to the running process |
When are they handled? | When detected (mostly) | When convenient (mostly) |
Are they maskable? | Almost never | Almost always |
Priority | Same as the process that caused the exception. | Depends (on the priority of the interrupting device) |
Context | Process | Handling is done withing the system context. |
Size of a page is 512 bytes.
Number of bits of address required to calculate the offset within a page is 9.
Number of frames in physical memory is (8K bytes) ÷ (512 bytes) = 213 ÷ 29 = 24.
Size of PFN is 4 bits.
Size of PTE equals 1 (Valid) + 1 (Modified) + 2 (access control) + 4 (PFN) = 8 bits = 1 byte.
Number of virtual pages in System Space is (3 × 212) ÷ (29) = 24 pages.
Size of System Page Table is 24 × 1 byte = 24 bytes = 24 × 8 bits = 192 bits.
We can determine from the given information that:
The breakdown for a Virtual Address (VA) must be:
The breakdown for a Physical Address (PA) must be:
Answer: x825C
Given the VAX 2-level translation scheme, we know that this VA must be in
system space. Therefore, the top 2 bits of the VA (VA[15:14]) must be 10
(in binary). We also know that the offset of the VA is the same as the offset
of the PA, so the bottom 8 bits of the VA (VA[7:0]) must be x5C
(01011100
in binary).
Now, all we have to figure out is the 6 bit Virtual Page Number (VPN). It
was given that the 1st access to physical memory (let's call this PA1) was at
location x108
. Once again, given the VAX 2-level translation scheme, we know
that PA1 = SBR + (size of PTE in bytes) × VPN. Solving this equation for VPN we get
VPN = (PA1 - SBR) ÷ (size of PTE in bytes).
It was given that PA1 is x108
, SBR is x100
, and the size of a PTE is 4 bytes. Therefore,
the VPN is x2
(000010
in binary). The complete VA is therefore 10 000010 01011100
(in binary)
which is x825C
.
Answer: x80000009
The contents of physical address x45C
(the 2nd access to physical memory) is a PTE.
We know that a PTE is a 32 bit value that consists of 1 valid bit (PTE[31]), and
a 4-bit PFN (PTE[3:0]). All other bits of the PTE (PTE[30:4]) are 0. The contents
of this PTE are used to form the address of the 3rd access to physical memory (x942
).
Therefore, the PFN bits of the PTE must be x9, and the valid bit must be 1. This
implies that the PTE is x80000009
.
Answer x0342
First, we must determine if this VA is in P0 space or P1 space. To determine this,
we have to figure out if P0BR or P1BR was used to compute the virtual address x825C
(the answer to part a). It was given that P0BR is x8250
, and that P1BR is x8350
.
Since x8350
is greater than x825C
, we know that we could not have used P1BR to
compute x825C
, and therefore we must have used P0BR which means the VA is in P0 space.
Therefore, the top 2 bits of the virtual address (VA[15:14]) must be 00
(in binary).
We also know that the offset of the VA is the same as the offset of the PA, so the
bottom 8 bits of the VA (VA[7:0]) must be x42
(01000010
in binary). Now, all we have
to figure out is the 6 bit Virtual Page Number (VPN). Once again, given the VAX 2-level
translation scheme, we know that x825C
= P0BR + ((size of PTE in bytes) × VPN ).
Solving this equation for VPN we get VPN = (x825C
- P0BR) ÷ (size of PTE in bytes).
It was given that P0BR is x8250
, and the size of a PTE is 4 bytes. Therefore, the
VPN is x3
(000011
in binary). The complete VA is therefore 00 000011 01000010
(in binary) which is x0342
.
# physical frames = physical address space ÷ size of frame = 29Bytes ÷ 25 Bytes/frame = 24 frames. Therefore, 4 bits are needed to specify the PFN.
Size of PTE = Valid bit + Modified bit + access control bits + PFN bits = 1 + 1 + 2 + 4 = 8 bits (1 Byte)
User space = (3/4) × Virtual address space = (3/4) × 27 pages = 3 × 25 pages. Each page of user space will have a PTE in the user space page table.
Size of user page table is # of entries × size of PTE = (3 × 25 entries) × 1 Byte/entry = 96 Bytes.
# of pages = 96 Bytes ÷ 25 Bytes/page = 3 pages.
We'll use the prefix “b” to indicate a binary number in this solution. Also, for clarity, we will call
the virtual address x7AC
the virtual address of X (VA_X).
x7AC
b01
(indicates that this is an address in user space)b0111101
b01100
x03D
of user spacex03D
× 1) = x380
+ x03D
= x3BD
.x01D
.x1E0
+ x01D
= x1FD
.x1FD
] = xB8
x8
x11D
x11D
] = x83
x3
x06C
We'll use the prefix “b” to indicate a binary number in this solution.
Virtual address VA_X =x3456789A
b00
(indicates that this is an address in P0 space)b 11 0100 0101 0110 0111 100
(x1A2B3C
)b010011010
x1A2B3C
of P0 spacex1A2B3C
× 4) = x8AC40000
+ x68ACF0
= x8B2CACF0
x59656
x22D958
x22D958
] = x800F5D37
xF5D37
x1EBA6EF0
x1EBA6EF0
] = x80000A72
xA72
x14E49A
Including instruction fetch, every instruction can generate a page fault. Ignoring instruction fetch, LDB, LDW, STB, STW, TRAP, RTI can generate a page fault (If the trap vector table or system stack is always in physical memory, then the TRAP or RTI won't generate a page fault).
Including the instruction fetch, RTI can generate the maximum number of page faults (3) and LDB, LDW, STB, STW, TRAP can generate the next most number of page faults (2). (Points will not be deducted for RTI)