Somehow, this one got lost, and just surfaced today.

It deals with one of my students reading something that is not correct.
My answer is somewhat longwinded, since I am compelled to correct the
misconceptions of the author quoted.  If you have other things on your
plate right now, you might want to save this for later.

        Dear Dr. Patt,
        I had a question about Pipelining,
        a topic you recently taught in 360N. I
        was browsing through some material about RISC and CISC
        processors when I noticed the following:

        " Even though RISC instructions can be nearly as complex as a CISC
        instruction, they suffer no loss in speed because each RISC instruction
        takes an equal number of clock cycles and hence can be pipelined. This
        results in RISC instructions being executed in parallelized fashion.
        On the other hand, CISC instructions are usually never pipelined
        because each instruction requires different execution times."

        Could you please explain the above? How does instructions taking
        equal/different execution time affect pipelining? Did I miss something
        important in the class on pipelining?

        Thanks in advance for your reply,

        <<name withheld to protect the RISC/CISC browser>>

Thank you for the message.  You don't say what you were browsing
through, so it is hard to know where the author of this quote was
coming from.  A larger problem in this day of self-appointed experts
posting their thoughts on the web is that the unsuspecting reader does
not know whether the poster has any clue whatsoever.  If it came from
a popular textbook, it is more problematical.

Be that as it may, I continue:

First, what is a RISC instruction and what is a CISC instruction is not
so clear cut.  There are a number of attributes usually associated with
RISC, but how many attributes are necessary before the instruction qualifies
as RISC.  The group that coined the term RISC insisted the instructions
be simple, and each took one cycle of execution time.  This was later
disavowed by those that felt it meant exposing the lowest level of control
to the compiler.  Next, it was supposed to mean instructions that can be
executed via a short pipeline.  And so on.  If 360N was a two-semester
course, I would spend a lecture on the RISC/CISC discussion.  I will
probably do so in 382N in the Spring and you are welcome to come and
listen, if you are not taking the course.

RISC instructions all take an equal number of cycles.  FALSE.  Simplest
example was the Alpha ISA.  Even the first implementation (1991) had some
instructions taking 7 cycles, floating point instructions took 11 cycles, etc.

CISC instructions are usually never pipelined.  FALSE.  The most common
"CISC" ISA today (the x86) is heavily pipelined.  In fact, all implementations
these days are pipelined.

You ask how equal/different execution times affect pipelining.  The author
who you quote apparently does not understand out-of-order execution.  In the
days when every implementation was an in-order execution machine, one would
like the instructions to flow through the pipeline uniformly.  If an
instruction took too many stages, some thought it would clog up the pipeline
and prevent following instructions from flowing nicely.  But they were wrong.

Even the first Alpha chip which processed instructions in-order had varying
length instructions.  It handled it by having the first four stages in common,
followed by a quatrification (my own invention!), where the instruction
went to one of four places, depending on the nature of the instruction.
From each of the four places, four separate pipelines continued.

More important is the notion of out-of-order execution, where you recall
my pipeline diagram with two hiccups, one for the reservation stations and
one for the retirement structure.  Instructions, whether they be RISC or
CISC, whatever that distinction really is, flow through the front end in-order.
Then they sit in the reservation stations until their operands are ready.
Then they get shipped to functional units (sometimes pipelined) which take
as many cycles as necessary, and then they get shipped to the retirement
structure until they can be retired in-order.  Thus, the number of stages
an instruction requires has nothing to do with the usefulness of the pipeline.
Out-of-order execution allows easy accomodation to execution by a pipeline.

Sorry for the long answer.  But given what you read, I felt I needed to
correct it, and I don't have time to correct it in a shorter write up.

Good luck with the rest of the course.

Yale Patt