bigendian, littleendian

A student writes:

	Dear Dr. Patt,

	I have a doubt in 360M. When I was thinking of your telling about the bit that

You mean 360N, ...I hope.   360M is a very different course.

	identifies the big endian/little endian of the MIPS architecture in the class,
	I could not imagine a purpose for an ISA to have both endianness.

	is it like the ISA needs to support both kinds of microarchitecure
	implementations (lil and big endian)?

	or a microarchitecture that implements this ISA should implement both the
	endianess.

	or is it the like the ISA gives support to different kinds of applications,
	which gets efficiently executed on different endianess?.

	I dont get the use of the MIPS ISA incorporating both. I will be happy if this
	can be clarified a little.

	thanks for taking time and reading,

	<<name withheld to protect the student who wants to understand why 
	one microarchitecture, two endians>>

I think I will start with what both endians give you, and then deal with each of your 
suggestions.

First, an example to show how endian works.  Suppose the "natural word length" of 
the ISA is 32 bits, that is, the ALU processes 32 bits per cycle and the Registers 
are 32 bits wide.  And suppose memory is byte addressable.

Suppose the contents of memory are    	A:   11111111
					A+1: 00000000
					A+2: 11110000
					A+3: 00001111

And, you have code 

	LEA  R1,A
	LDW  R2,R1,#0

which expects to load the four bytes of data starting at A into R2.  

If the ISA is bigendian, when you are done, 
R2 contains 11111111000000001111000000001111.

If the ISA is littleendian, when you are done, 
R2 contains 00001111111100000000000011111111..

Why would I want to support both, you might ask.  What you really want is the correct
32 bit value in R2.  The problem comes from the fact that whatever generated that value 
stored it in memory, and since memory is byte addressible, the 32 bits can not be stored
in one memory location.  Whatever generated that data was forced to store it in four 
memory locations.  This value is part of a data set, generated by some computer.  That 
computer filled up A, A+1, A+2, A+3 with the 32 bit value based on the endianness of 
that computer.  

If the endianness of that computer is the same as the endianness of our computer, 
there is no problem.  But what if the endianness is different.  Then bits from those
four locations get loaded into the wrong fields of R2.

We could write a software routine starting with LDB to load the byte from A into a 
register, then shift it 24 bits into its correct spot, then LDB the byte from A+1, 
shift it 16 bits, then LDB the byte from A+2, ...  Much easier if the ISA provides a 
single bit which the microarchitecture uses to directly load the four bytes into the 
correct bit fields.

The very simple hardware solution: 32 2-input muxes that source the appropriate bit
(depending on the endianness) for each bit location in R2 (in this case).  Where do
you suppose we get the select lines for these 32 2-input muxes?

Now, let's deal with your three suggestions:  First, you offered:

        is it like the ISA needs to support both kinds of microarchitecure
        implementations (lil and big endian)?

Actually, I would rather think of it as one microarchitecture with the addition
of a little bit of logic to provide this function, as described above.

        or a microarchitecture that implements this ISA should implement both the
        endianess.

Yes.

        or is it the like the ISA gives support to different kinds of applications,
        which gets efficiently executed on different endianess?.

Not really different kinds of applications.  Rather: different data sets.  That is,
data sets generated by computers some of which are bigendian and some of which are
littleendian.

Your word "efficiently" is an important one, since it is always possible to write the
software routine to move the bits of each value to be in their proper position according
to the endianness needed, as we hinted at above.

One final note: Some ISAs contain a single instruction for this purpose, which is
effectively the same as the bit in the ISA described above.  SWAP R1,R2 would result
in 
		R1[31:24] = R2[7:0]
		R1[23:16] = R2[15:8]
		R1[15:8] = R2[23:16]
		R1[7:0] = R2[31:24]
		
which is much more efficient than a software routine.

OK?

Yale Patt