Lab Assignment 1 Clarifications

Constants can be expressed in hex or in decimal. Hex constants consist of an ‘x’ or ‘X’ followed by one or more hex digits. Decimal constants consist of a ‘#’ followed by one or more decimal digits. Negative constants are identified by a minus sign immediately after the ‘x’ or ‘#’. For example, #-10 is the negative of decimal 10 (i.e., -10), and x-10 is the negative of x10 (i.e. -16).

Since the sign is explicitly specified, the rest of the constant is treated as an unsigned number. For example, x-FF is equivalent to -255. The ‘x’ tells us the number is in hex, the ‘-’ tells us it is a negative number, and “FF” is treated as an unsigned hex number (i.e., 255). Putting it all together gives us -255.
Your assembler does not have to check for multiple .ORIG pseudo-ops.
Since the .END pseudo-op is used to designate the end of the assembly language file, your assembler does not need to process anything that comes after the .END.
The trap vector for a TRAP instruction and the shift amount for SHF instructions must be non-negative values. If they are not, you should return error code 3.
The same label should not appear in the symbol table more than once. During pass 1 of the assembly process, you should check to make sure a label is not already in the symbol table before adding it to the symbol table. If the label is already in the symbol table, you should return error code 4.
An invalid label (i.e., one that contains non-alphanumeric characters, or one that starts with the letter ‘x’ or a number) is another example of error code 4.
The standard C function isalnum() can be used to check if a character is alphanumeric.
After you have gone through the input file for pass 1 of the assembler and your file pointer is at the end of the file, there are two ways you can get the file pointer back to the beginning. You can either close and reopen the file or you can use the standard C I/O function rewind().

The following definitions can be used to create your symbol table:

#define MAX_LABEL_LEN 20
#define MAX_SYMBOLS 255

typedef struct{
  int address;
  char label[MAX_LABEL_LEN + 1];   /*Question for the reader: Why do we need to add 1? */
} TableEntry;

TableEntry symbolTable[MAX_SYMBOLS];

To check if two strings are the same, you can use the standard C string function strcmp(). To copy one string to another, you can use the standard C string function strcpy().
If you decide to use any of the math functions in math.h, you also have to link the math library by using the command gcc -lm> -ansi -o assemble assembler.c.
When your assembler finds an error in the input assembly language program, it is not required that you print out an error message to the screen. If you choose to do this to make debugging easier, that is fine. What is required is that you exit with the appropriate error code. This is what we will be checking for when we grade your program; we will ignore anything that is printed to the screen.
An assembly program which starts with comments before .ORIG is valid and your assembler should ignore them. You can assume that there will be no label in front of .ORIG and .END in the same line.
You can safely assume that we will test your assembler with assembly programs which are in valid address range, i.e. within x3000 to xFDFF. (updated - 02/06/09)