Department of Electrical and Computer Engineering

The University of Texas at Austin

EE 360N, Fall 2004
Lab Assignment 1
Due: Sunday, September 12, 11:59 pm
Yale N. Patt, Instructor
Aater Suleman, Huzefa Sanjeliwala, and Dam Sunwoo, TAs

The purpose of this lab is to reinforce the concepts of assembly language and assemblers. In this lab assignment, you will write an LC-3b Assembler, whose job will be to translate assembly language source code into the machine language (ISA) of the LC-3b. You will also write a program to solve a problem in the LC-3b Assembly Language.

In Lab Assignment 2, you will close the loop by completing the design of a simulator for the LC-3b, and test your assembler by having the simulator execute the program you wrote and assembled in this lab.

Part I: Write an assembler for the LC-3b Assembly Language

The general format of a line of assembly code which will be the input to your assembler is as follows:
        label opcode operands ; comments
The leftmost field on a line will be the label field. Valid labels consists of from one to 20 alphanumeric characters (i.e., a capital or lowercase letter of the alphabet, or a decimal digit), starting with a letter of the alphabet. A valid label cannot be the same as an opcode or a pseudo-op. A valid label should start with a character (other than 'x') and should be alphanumeric. The label is optional, i.e., the label can be empty for some lines if you wish. Labels are necessary if the program is to branch to that instruction or if the location contains data that is to be addressed explicitly.

The opcode field can be any one of the following instructions:

ADD, AND, BR, HALT, JMP, JSR, JSRR, LDB, LDW, LEA, NOP, 
NOT, RET, LSHF, RSHFL, RSHFA, RTI, STB, STW, TRAP, XOR
The number of operands depends on the operation being performed. It can consist of register names, labels, or constants. If a hexadecimal constant is used, it must be prefixed with the 'x' character. Similarly, decimal constants must be prefixed with a '#' character.

Optionally, an instruction can be commented -- which is good style if the comment contains meaningful information. Comments follow the semicolon and are not interpreted by the Assembler. Note that the semicolon prefaces the comment, and a newline ends the comment. Other delimiters should not be used.

In this lab assignment, the NOP instruction translates into the machine language instruction 0x0000.

Note that you should also implement the HALT instruction as TRAP x25. Other TRAP commands (GETC, IN, OUT, PUTS) need not be recognized by your assembler for this assignment.

In addition to LC-3b instructions, an assembly language also contains pseudo-ops, sometimes called macro directives. These are messages from the programmer to the assembler that assist the assembler in performing the translation process. In the case of our LC-3b Assembly Language, we will only require three pseudo-ops to make our lives easier: .ORIG, .END, and .FILL.

An assembly language program will consist of some number of assembly language instructions, delimited by .ORIG and .END. The pseudo-op .END is a message to the assembler designating the end of the assembly language source program. The .ORIG pseudo-op provides two functions: it designates the start of the source program, and it specifies the location of the first instruction in the object module to be produced. For example, ".ORIG n" means "the next instruction will be assigned to location n." The pseudo-op ".FILL w" assigns the value w to the corresponding location in the object module. w is regarded as a word (16-bit value) by the .FILL pseudo-op.

The task of the assembler is that of line-by-line translation. The input is an assembly language file, and the output is an object(ISA) file (consisting of hexadecimal digits). To make it a little more concrete, here is a sample assembly language program:

;This program counts from 10 to 0 
        .ORIG x3000       
	LEA R0, TEN       ;This instruction will be loaded into memory location x3000
        LDW R1, R0, #0 
START   ADD R1, R1, #-1 
        BRZ DONE 
        BR START 
                          ;blank line 
DONE    TRAP x25          ;The last executable instruction 
TEN     .FILL x000A       ;This is 10 in 2's comp, hexadecimal 
        .END              ;The pseudo-op, delimiting the source program 
And its corresponding ISA program:
       0x3000
       0xE005
       0x6200 
       0x127F 
       0x0401 
       0x0FFD 
       0xF025 
       0x000A
Note that each line of the output is a four digit hex number, prefixed with '0', representing the 16-bit machine instruction. If each instruction in the output is not prefixed with a '0', it will not be recognized by the simulator which you will write in Lab Assignment 2. Also note that BR instruction is assembled as the uncontional branch, BRnzp.

When this program is loaded into the simulator, the instruction 0xE005 will be loaded into memory location specified by the first line of the program which is 0x3000. As instructions consist of two bytes, the second instruction, 0x6200, will be loaded into memory location 0x3002. Thus, memory locations 0x3000 to 0x300C will contain the program.

We have included below another example of an assembly language program, and the result of the assembly process. In this case, the .ORIG pseudo-op tells the assembler to place the program at memory address #4096.

        .ORIG #4096 
A       LEA R1, Y
        LDW R1, R1, #0
	LDW R1, R1, #0 
        ADD R1, R1, R1 
        ADD R1, R1, x-10  ;x-10 is the negative of x10 
        BRN A 
        HALT 
Y       .FILL #263 
        .FILL #13 
        .FILL #6 
        .END 
Would be assembled into the following:
       0x1000 
       0xE206
       0x6240
       0x6240 
       0x1241 
       0x1270 
       0x09FA 
       0xF025 
       0x0107 
       0x000D 
       0x0006
Important note: Even though this program will assemble correctly, it may not do anything useful.

The Assembly Process, itself

Your assembler should make two passes of the input file. In the first pass, all the labels should be bound to specific memory addresses. You create a symbol table to contain those bindings. Whenever a new instruction label is encountered in the input file, it is assigned to the current memory address.

The second pass performs the translation from assembly language to machine language, one line at a time. It is during this pass that the output file should be generated.

You should write your program to take two command-line arguments. The first argument is the name of a file that contains a program written in LC-3b assembly language, which will be the input to your program. The second argument is the name of the file to which your program will write its output. In other words, this is the name of the file which will contain the LC-3b machine code corresponding to the input assembly language file. For example, we should be able to run your assembler with the following command-line input


assemble <source.asm> <output.obj>
where assemble is the name of the executable file corresponding to your compiled and linked program; <source.asm> is the input assembly language file, and <output.obj> is the output file that will contain the assembled code.

You will need to include some basic error checking within your assembler to handle improperly constructed assembly language programs. Your assembler must detect three types of errors and must return three types of error codes. The errors to be detected are undefined labels (error code 1), invalid instructions (error code 2), and invalid constants (error code 3). An invalid constant is a constant that is too large to be assembled into an LC-3b instruction. If the .ORIG pseudo-op contains an address that is greater than an address that can be represented in the 16-bit address space, your program should return error code 3. Also, if the .ORIG statement specifies an address that is not word-aligned, your program should return error code 3. Your program must return the error codes via the exit(n) function, where n denotes the error code number. If the assembly language program does not contain any errors, you must exit with error code 0. Exiting with the correct codes is very important since they will be used in the grading process.

This error checking is the bare minimum that we expect. You can return error code 4 for any other errors you find. Just be sure that the errors don't fall within the first three categories specified above.

Examples of error codes

1) Error code 1 : undefined label

A label is used by an instruction but the label is not in the symbol table.
 
  eg.) 
	.ORIG x3000
	 LEA R0, DATA1
	.END 

       DATA1 is not defined in the assembly code. 
2) Error code 2 : invalid instruction

An invalid instruction is one that is not defined in the LC-3b ISA
  
  eg.)  
	.ORIG x1000
	 MUL R0, R1, R2 
	.END

  or 

	.ORIG x1000
	 ABC 
	.END 
3) Error code 3 : invalid constant

An invalid constant is a constant that is too large to be assembled into an LC-3b instruction. An odd constant that follows .ORIG is also an invalid constant.
    
  eg.)
	.ORIG x1000
	 ADD R0, R1, #20  ; error 
	.END 

   or 

	.ORIG x1001       ; error 
	ADD R0, R1, #1
	.END
4) Error code 4 : errors which do not belong to any of the above categories.
  eg.)
	.ORIG x1000
	 ADD R0, R1 
	.END 
  or 
	.ORIG x1000
	.FILL
	.END 
  or 
	.ORIG x1000
	 NOT R1, R2, R3
	.END

Note: Your assembler needs to recognize only labels as operands for LEA, BR, and JSR instructions. For example, if the following line is in an input assembly program, your assembler can exit with error code 4:

	LEA	R1, x100

Part II: Write a program to solve the following problem

Write a program in the LC-3b assembly language that finds the minimum of the 8 16-bit 2's complement numbers stored at memory locations x4000-x400E and stores the result in memory location x4010.

Your assembly language program must begin at memory location x3000. You will have no way of determining if your assembly language code works [yet!], but you can use it to determine if your assembler works!



Requirements of this lab assignment:

Important Note: Because we will be evaluating your code in Unix, please be sure your code compiles using gcc with the -ansi flag. This means that you need to write your code in C such that it conforms to the ANSI C standard.

To complete Lab Assignment 1, you will need to turn in the following:

  1. An adequately documented listing of your LC-3b Assembler.
  2. A source listing (LC-3b Assembly Language) of the "min8" program described above.
  3. Submit your code electronically.

Things to watch for:

Be sure that your assembler can handle comments on any line, including lines that contain pseudo-ops and lines that contain only comments. Be careful with comments that follow a HALT, NOP or RET instructions -- these instructions take no operand.

Your assembler should allow hexadecimal and decimal constants after both ISA instructions, like ADD, and after .FILL pseudo-ops.

You can assume that there will be at most 255 labels in an assembly program. You can also assume that the number of characters on a line will not exceed 255.