Constants can be expressed in hex or in decimal. Hex constants consist
of an ‘x’ or ‘X’ followed by one or more hex digits. Decimal constants
consist of a ‘#’ followed by one or more decimal digits. Negative constants
are identified by a minus sign immediately after the ‘x’ or ‘#’. For example, #-10
is the negative of decimal 10 (i.e., -10), and x-10
is the negative of x10
(i.e. -16).
Since the sign is explicitly specified, the rest of the constant is treated as
an unsigned number. For example, x-FF
is equivalent to -255. The ‘x’ tells us
the number is in hex, the ‘-’ tells us it is a negative number, and “FF” is treated
as an unsigned hex number (i.e., 255). Putting it all together gives us -255.
Your assembler does not have to check for multiple .ORIG
pseudo-ops.
Since the .END
pseudo-op is used to designate the end of the assembly
language file, your assembler does not need to process anything that comes
after the .END
.
The trap vector for a TRAP
instruction and the shift amount for SHF
instructions must be non-negative values. If they are not, you should return
error code 3.
The same label should not appear in the symbol table more than once. During pass 1 of the assembly process, you should check to make sure a label is not already in the symbol table before adding it to the symbol table. If the label is already in the symbol table, you should return error code 4.
An invalid label (i.e., one that contains non-alphanumeric characters, or one that starts with the letter ‘x’ or a number) is another example of error code 4.
The standard C function
isalnum()
can be used to check if a character is alphanumeric.
After you have gone through the input file for pass 1 of the assembler and
your file pointer is at the end of the file, there are two ways you can get the
file pointer back to the beginning. You can either close and reopen the file or
you can use the standard C I/O function
rewind()
.
The following definitions can be used to create your symbol table:
#define MAX_LABEL_LEN 20
#define MAX_SYMBOLS 255
typedef struct{
int address;
char label[MAX_LABEL_LEN + 1]; /*Question for the reader: Why do we need to add 1? */
} TableEntry;
TableEntry symbolTable[MAX_SYMBOLS];
To check if two strings are the same, you can use the standard C string function
strcmp()
. To copy one
string to another, you can use the standard C string function
strcpy()
.
If you decide to use any of the math functions in math.h, you also have to link the math library by using the command gcc -lm> -ansi -o assemble assembler.c
.
When your assembler finds an error in the input assembly language program, it is not required that you print out an error message to the screen. If you choose to do this to make debugging easier, that is fine. What is required is that you exit with the appropriate error code. This is what we will be checking for when we grade your program; we will ignore anything that is printed to the screen.
An assembly program which starts with comments before .ORIG
is valid and your assembler should ignore them. You can assume that there will be no label in front of .ORIG
and .END
in the same line.
You can safely assume that we will test your assembler with assembly programs which are in valid address range, i.e. within x3000 to xFDFF. (updated - 02/06/09)