Department of Electrical and Computer Engineering
University of Texas

EE 360N, Fall 1999
Y. N. Patt, Instructor
Francis Tseng, Seema Prasad, TAs
Lab Assignment 1; Due September 10, 1999

The purpose of this lab is to acquaint you with the concepts of assembly language and assemblers. In this lab assignment, you will write a program to solve a problem in the LC-2 Assembly Language. You will also write an LC-2 Assembler, whose job will be to translate assembly language source code into the machine language (ISA) of the LC-2. In Lab Assignment 2, we will close the loop by having you a) complete the design of a simulator for the LC-2, and b) test your assember by having it translate the program you wrote in this lab.

Part I: Write an assembler for the LC-2 Assembly Language

The general format of a line of assembly code which will be the input to your assembler program is as follows:

label opcode operands ;comments

The leftmost field on a line will be the label field. Valid labels contain a maximum of 10 characters and do not begin with a '#' or 'x'. The label is optional; i.e., the label can be empty for some lines if you wish. Labels are useful if the program is to branch to that instruction or if the location contains data that is to be addressed explicity.

The opcode field can be any one of the following instructions:

ADD, AND, BR, JSR, JSRR, LD, LDI, LDR, LEA, NOP, NOT, RET, ST, STI, STR, TRAP

The number of operands depends on the operation being performed. It can consist of register names, labels, or constants. If a hexadecimal constant is used, it must be prefixed with the 'x' character. Similarly, decimal constants must be prefixed with a '#' character.

Optionally, an instruction can be commented -- which is good style if the comment contains meaningful information. Comments follow the semicolon and are not interpreted by the Assembler. Note that the semicolon prefaces the comment, and a newline ends the comment. Other delimiters should not be used.

In this lab assignment, you can assume that the RTI opcode does not exist and that a NOP instruction translates into the machine language instruction 0x8000.

In addition to LC-2 instructions, an assembly language also contains pseudo-ops, (sometimes called macro directives). These are messages from the programmer to the assembler that assist the assembler in performing the translation process. In the case of our LC-2 Assembly Language, we will only require three pseudo-ops to make our lives easier: .ORIG, .END, and .FILL.

An assembly language program will consist of some number of assembly language instructions, delimited by .ORIG and .END. The pseudo-op .END is a message to the assembler designating the end of the assembly language source program. The .ORIG pseudo-op provides two functions: it designates the start of the source program, and it specifies the location of the first instruction in the object module to be produced.

For example, .ORIG n means "the next instruction will be assigned to location n."

The pseudo-op .FILL n assigns the value n to the corresponding location in the object module.

The task of the assembler is that of line-by-line translation. The input is an assembly language file, and the output is an ISA file (consisting of hexadecimal digits). To make it a little more concrete, here is a sample assembly language program:

        .ORIG x3000       ;This program counts from 10 to 0
        LD R1, TEN
START   ADD R1, R1, #-1
        BRZ DONE
        JMP START
                          ;blank line
DONE    TRAP x25          ;The last executable instruction
TEN     .FILL x000A       ;This is 10 in 2's comp, hexadecimal
        .END              ;The pseudo-op, delimiting the source program

and its corresponding ISA program:

0x3000
0x2205
0x127F
0x0404
0x4001
0xF025
0x000A

Note that each line of the output is a four digit hex number, prefixed with '0', representing the 16-bit machine instruction. If each instruction in the output is not prefixed with a '0', it will not be recognized by the simulator which you will write in Lab Assignment 2.

When this program is loaded into the simulator, the instruction 0x2205 will be loaded into memory location specified by the first line of the program which is 0x3000. Thus, memory locations 0x3000 to 0x3005 will contain the program.

We have included below another example of an assembly language program, and the result of the assembly process. In this case, we assume that the .ORIG pseudo-op allows assembly to start at arbritrary memory addresses (e.g., 256).

        .ORIG #256
A       LDI R1, Y
        ADD R1, R1, R1
        ADD R1, R1, x-10 ;x-10 is the negative of x10
        BRN A
        HALT
Y       .FILL #263
        .FILL #13
        .FILL #6
        .END

would be assembled into the following:

0x0100
0xA305
0x1241
0x1270
0x0900
0xF025
0x0107
0x000D
0x0006

Note: Even though this program will assemble correctly, it may not do anything useful.

Next, the Assembly Process, itself,

Your assembler should make two passes of the input file. In the first pass, all the labels should be bound to specific memory addresses. You create a symbol table to contain those bindings. Whenever a new instruction label is encountered in the input file, it is assigned to the current memory address.

The second pass performs the translation from assembly language to machine language, one line at a time. It is during this pass that the output file should be generated.

So that we can test your assembler more easily, you should write it such that it is run using the command:

a.out <source> <object>

where <source> is the input assembly language file, and <object> is the output file that will contain the assembled code.

You will need to include some basic error checking within your assembler to handle improperly constructed assembly language programs. Your assembler must detect three types of errors and must return three types of error codes. The errors to be detected are undefined labels (error code 1), invalid instructions (error code 2), and invalid constants (error code 3). An invalid constant is a constant that is too large to be assembled into an LC-2 instruction. If the .ORIG pseudo-op contains an address that is greater than an address that can be represented in the 16-bit address space, your program should return error code 3. Your program must return the error codes via the exit(n) function. If the assembly language program does not contain any errors, you must exit with error code 0.

This error checking is the bare minimum that we expect. You can return error code 4 for any other errors you find. Just be sure that the errors don't fall within the first three categories specified above.

Part II: Write a program to solve the following problem

The LC-2 has no bit-shift instruction. Shifting left is not a great problem -- adding a number to itself produces that effect. Shifting right by one bit is not so easy to do. So, your programming assignment is to write a program in LC-2 Assembly Language to shift the contents of the memory location x4000 one bit to the right. Your assembly program must begin at memory location x3000. You will have no way of determining whether your assembly language code works [yet!], but you can use it to determine whether your assembler works!

Requirements of this lab assignment:

Note: Because we will be evaluating your code in unix, please be sure your code compiles with gcc.

To complete Lab Assignment 1, you will need to turn in the following:

1. An adequately documented listing of your LC-2 Assembler.

2. A source listing (LC-2 Assembly Language) of the shift program described above.

3. The ISA (machine language) listing of the shift program, as output by the assembler.

4. You will also submit your code electronically. More details on this will be given later.

Things to watch for:

Be sure that your assembler can handle comments on any line, including lines that contain psuedo-ops and lines that contain only comments. Be careful with comments that follow a HALT or RET instruction -- these instructions take no operand.

Your assembler should allow hexadecimal and decimal constants after both ISA instructions, like ADD, and after .FILL pseudo-ops.