ECE 382N FAQ / Tips

General Protection Exceptions;  Stack Segment Overflow

The only conditions you need to test for causing general protection exceptions are:
1) writing to a read-only page of memory (as determined by the R/W flag of the TLB)
2) a memory operand effective address outside the CS, DS, ES, FS, or GS segment limit.

Accessing a memory operand outside of the SS limit is a Stack Segment Exception.  You do not have to test for this.

Memory-Mapped I/O

Note that the address space of I/O locations cannot be cached.  When you are writing tests to demonstrate the functionality of your I/O devices, you can use the TLB to mark a page of memory as being cache disabled.  (This is referred to as the PCD flag, or page-level cache disable bit in the x86 page table entries.)  Requests to such a page should bypass the cache and be sent straight to the bus.

x86 Info

I have a few miscellaneous comments to make.

Project Specifications

I have added a  document on the project specifications .  This is primarily a compilation of the homeworks, but it also describes the memory hierarchy and specifies the exceptions and interrupts that should be implemented.  In a few weeks I will make an addendum to this document to specify values that should be hardcoded in verilog such as the virtual to physical address translations for the TLB entries and the segment limits.

ROM initialization

Someone asked me how to initialize the ROMs, and I thought other people might want to know this as well.

Suppose you have these instantiations in your TOP module:

   wire [4:0] rom1_in;
   wire [3:0] rom1_out;
   wire rom_enable;
   rom4b32w$ rom1 (rom1_in, rom_enable, rom1_out);

The rom4b32w$ part takes a 5-bit address as input, and the output is 4 bits.

You can use the $readmemb system task like this:

module rommodule;
   $readmemb("rom/rom_A.mem", TOP.rom1.mem, 0);

The text file named "rom_A.mem" in a subdirectory called "rom" should be a list of 4-bit binary numbers separated by spaces and comments.  The first number in the file is the rom output if the address input is 0; the second number is the output if the address is 1, etc...

The input file could look like this:
0110 // if 5'd0 is input, 4'b0110 is output
1001 1110 // if 5'd1 is input, 4'b1001 is output;
          // if 5'd2 is input, 4'b1110 is output
11 10 //  if 5'd3 is input, 4'b0011 is output;
      //  if 5'd4 is input, 4'b0010 is output

// You get the picture ... You can have up to 32 numbers in the file.

If there are fewer than 32 numbers, the rows at the end of the rom will be uninitialized.

The last parameter to the $readmemb function specifies the address at which the initialization begins.  (0 in this case)

You can also use the $readmemh function to read in a list of hex numbers in the same way.

Reset / Power-up
You are not required to have an initial power-up routine.  In fact, it is not at all necessary for the project.  You can use a global RESET signal to reset all of your registers at the beginning of the simulation.  The EIP and the segment registers should be initialized to 0.  The general purpose registers do not need to be initialized.

The contents of the verilog library part regfile8x8$ are undefined on power-up.  However, for this class, you can behaviorally reset the contents to 0 if you are using this part for the segment registers.  For example, if you have an instantiation of regfile8x8$ called reg_file, you can initialize it as follows:

      reg_file.mem_array[0] = 8'h00;
      reg_file.mem_array[1] = 8'h00;

Gate fanout
Because we are not modeling wire capacitance, you should limit gate fanout to 5.  If you need a fanout larger than 5, you should use one of the inverter or buffer parts.

Homework 4

Homework 4 is due on March 8.  You do NOT have to implement the decode and control logic for all instructions for this assignment!  You only have to implement and demonstrate that it works correctly for ONE instruction.  The datapath should support all instructions, though.  (This was covered in Homework 3.)

You may make similar assumptions about the memory system as you did for Homework 2.

Operand Size Override Prefix

For Homework 3, you will need to support all three operand sizes.  If you examine the opcodes, you will notice that all 16- and 32-bit versions of each instruction have the same opcode, whereas 8-bit instructions have different opcodes.

Since 32-bits is the default operand size for your project, the 16 / 32-bit instructions use 32-bit operands unless there is an operand size override prefix in the instruction.

Feedback on HW2

Most of the homeworks were very thorough.  I have a few general comments to make.

Many state machines had the following characteristic: as soon as the Instruction Register was loaded, there was a decode state that would dispatch to dozens and dozens of states for every possible combination of operation and addressing mode.  Many of these states were redundant.  In x86, the operation is orthogonal to the addressing mode.  You should take advantage of this to reduce the number of redundant states in your state machine.

You do not have to constrain yourself to a central-bus architecture.  You may also use as much hardware (adders in particular) as you feel necessary.

A few people are confused about bit order in x86.  Little-endianness in x86 means that the byte with the lowest address is the least significant byte of a piece of data.  Within a byte, bit order is the same as in a big-endian machine.  For example, suppose you read a word from memory at address N with the value 0000000100000000 (byte N is 00000001, and byte N+1 is 00000000).  When converted to big-endian, this is 0000000000000001.  (Note that there is a "byte swap" instruction for converting between big & little endian, not a "bit swap" instruction.)  The prefix or first byte of the opcode of an x86 instruction is at the lowest address.

Stack Operations

We will use a 32-bit wide stack for the project.  As explained in section 4.2.2 of Volume 1, the stack pointer (ESP) will always be incremented or decremented by four, even when pushing or popping a 16-bit value.

When pushing a 16-bit value or a segment register onto the stack, the value should be placed in the lower two bytes of the top of stack.  The upper two bytes can be undefined (they do not have to be filled with all 0's).

When pushing an 8-bit immediate value, this value should be sign-extended to 32 bits before pushing it on the stack.

Clock Cycles in your verilog implementation

Any behavioral inputs that you are using to debug your code (such as the A and B inputs and mux selects you used in Homework 1B) should change at the BEGINNING of the clock cycle.

The flip-flops in the class libraries are POSITIVE edge-triggered.  This means the clock cycle starts when the clock signal transitions from a 0 to a 1.

Suppose you have a cycle time of 10 ns and behaviorally cycle your clock in verilog with a statement like this:
    `define half_cycle 5

    always #(`half_cycle) clk = ~clk;

If you initialize clk to zero at the beginning of the simulation, the first complete clock period will be from 5 ns to 15 ns.  If you want the first clock period to start at time 0, you should initialize your clock to 1.

Schematics for HW2

You don't have to turn in detailed, gate-level circuit diagrams for this assignment.  A high-level view of the datapath with detail similar to the LC-2 datapath in the class handout is sufficient.  (You WILL need detailed circuit diagrams later in the semester -- for the final project report.)

HW 1b Critical Path Calculation

Many people have made a similar error in calculating the critical path.

In your design, the output of the dffs were fed into a mux.  Below is an example of a generic mux design in verilog with a worst case delay of three nand2$ gates:

nand2$ n0 (sel_bar, sel, sel);
nand2$ n1 (sa, sel_bar, a);
nand2$ n2 (sb, sel, b);
nand2$ n3 (out, sa, sb);

In this design, the delay from the sel input to the out output is 0.6 ns, while the delay from the data inputs to the output is 0.4 ns.

The delay from clk to the q output of the dff parts is 0.08 ns.  To find the worst case delay from clk to the output of the mux, you cannot just add 0.6 ns and 0.08 ns.  The delay is 0.48 ns if the select signal does not change; it is 0.6 ns if the select signal changes (provided the data inputs change as well, of course).  Hence the worst case delay through this part is MAX (0.6, 0.48).

Turning in HW2

Do not turn in hardcopies of your verilog code.  Place your code in a subdirectory called hw2 within your class directory.  Make sure that your hw2 subdirectory and the files in it are group readable and executable.  The class directories are located at: /home/projects/tmp/courses/spring_00/ee382n-14945.

You should turn in hardcopies for parts 1, 2, and 4.  You can slip them under the door for room 532 ENS.

Office Hours

Mary will hold office hours on Thursday, Feb. 17 from 5:00 to 6:15 in the 5th floor computer lab.

Homework 2: Which instruction formats should I implement for this assignment?

You do NOT have to implement the following until homework 3:
- 8-bit and 16-bit operand sizes
- any addressing modes that require the SIB byte
- segment override prefixes.

Here is what you DO need to implement for Homework 2:

- an operand size of 32 bits (i.e. doubleword).

You should support instructions that use ANY of the following types of operand addressing:
- immediate.  (This is indicated by the primary opcode)
- register.  (Sometimes indicated by the primary opcode, as in ADD EAX, imm32)
- Any other addressing modes possible using a ModR/M byte (but no SIB byte).
    This includes the following:
       - register (i.e. the Mod bits of the ModR/M byte are 11.  In addition,
                 the Reg bits of the ModR/M byte may also indicate a register operand.  )
       - base (The Mod bits are 00 AND the R/M bits are NOT 100 or 101)
       - displacement (the Mod bits are 00 and the R/M bits are 101)
       - base + displacement.  Note that the displacement can be 8 OR 32 bits.
          The displacement size is not necessarily the same as the operand size!

Hence for the ADD instruction, you should implement the following opcodes for this assignment:
ADD EAX, imm32 (destination operand is register, the other operand is immediate)
ADD r/m32, imm32 (destination operand could be obtained using ANY of the four
                     addressing modes listed above (i.e. r/m), source operand is immediate)
ADD r/m32, r32 (destination operand is r/m, source operand is register (indicated by Reg bits of ModR/M byte))
ADD r32, r/m32 (destination operand is register, source operand is r/m)

Note that the addressing modes above are used to determine an EFFECTIVE address for a memory operand.  You must always add this effective address to the segment base (i.e. DS << 4) to get the LINEAR address.

Homework 2: Verilog implementation

You do NOT have to implement any decode logic for this assignment.  You WILL have to design the datapath logic needed to support the instruction subset specified in the HW2 description.  This includes the following: any logic needed for address calculations and result computations, general purpose registers, segment registers, the EFLAGS register, the Instruction Pointer, an instruction register, and any temporary registers you may need.

You may design your register file using the regfile library parts in lib3, but this is optional.  You may design your register file out of other library parts if you wish.

Although you do not have to implement the instruction-length calculation for this homework, you will need to implement the portion of the datapath needed to get the immediate and displacement fields from the instruction register to the logic that sources them (address calculation and/or result computations).

As I mentioned in class, you do not have to worry about memory or virtual address translation for this assignment.  You can implement a dummy module for the instruction and data memory as well as for the microsequencer.

Even if you intend to design a pipelined datapath for the project, you are not required to have a pipelined datapath for this assignment.  As the semester continues, you will make substantial additions and modifications to your design.  However, in later assignments, you should be able to build upon the datapath you implement for HW2, or you should at least be able to reuse the logic elements you design for HW2.

More on Homework 1b

In the last requirement for part 6, you are asked to print off a waveform demonstrating the longest clock cycle that the circuit did not work properly for.  For example, if your critical path is 20 ns, run the simulation using a cycle time of 19.95 ns.   Because the waveform viewer has no knowledge of the setup time violations, the waveforms will appear correct for this case.  For this reason, rather than printing off a waveform for this case, you can just document that you received a setup time violation.  (This is an alternative to printing the waveform when taking off an extra 0.2 ns as described below.)

dff Setup time violations

The setup time for the d flip-flop is 0.2 ns.  (You may have discovered this by reading the code in the library.)  If the d input is not stable 0.2ns before the rising edge of the clock, you will get a setup time violation.  If this happens, you should get an error message when you run the simulation.  However, everything in the waveforms will appear correct.

In your calculations for the critical path, you should include the setup time even though the waveforms will appear correct if you violate it.  This means that in order to demonstrate the case where the cycle time is too short for the critical path propagation, you will have to take an extra 0.2 ns off the cycle time.

Homework 1b Clarifications

You can use the library parts nand2$, nand3$, and nand4$ (or nor2$, nor3$, nor4$) and dff$ from the library file
/home/projects/courses/spring_00/ee382n-14945/lib/lib1 for this assignment.  You must build everything except the dff parts
out of the nand (or nor) gates.

As in homework 1a, you can encode the control signals however you want.  The module header provided is just an example; you can use more control signals if you wish.

Homework 1a: Overflow, etc.

Don't worry about detecting overflow, the carry-out, or anything other than the 16-bit output for this assignment.

Homework 1a

For this assignment, you should use 2, 3, or 4-input NAND gates only.  For homework 1B, you will be using the class Verilog libraries to implement your design, and only 2, 3, or 4-input NAND gates are available.  You will not be penalized for using gates with 5 or more inputs on this assignment; just be aware that you will have to modify your design for homework 1B.

Homework 1a

A couple of points to make in response to some questions I've received: