[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [oc] Re: Processor Instruction reply for Andreas



If you combine branch with the compare the instruction time has to be
stretched to permit the completion of the result from the ALU. This in of
itself is not necessarily a bad thing to do if you have nothing else
productive to do between the compare and the branch.

    cmp reg, limit, je exitLoop

The shorter width instruction format is written below however the
instruction timing has to be such that the result from the compare is ready
at the time of execution of the branch (or the branch time is stretchd to
allow for completion)

    cmp reg, limit
    je    exitLoop    // test only after completion of ALU function

One of the measures taken to gain performance back is to seperate the branch
from the ALU operation. And then between the ALU operation you are permitted
to insert an additional instruction (non-ALU if the ALU has no pipeline) and
thus recovering time otherwise wasted. If the output from the ALU is latched
then you can use an instruction that affects the ALU outputs
_for_later_branch codes.

    cmp reg, limit
    mov reg2, reg3    // do something while waiting for ALU
    je exitLoop

A second measure commonly used is the use of delay slots. That is the branch
does not occure until after the instruction that follows the branch. i.e.
you get one free instruction following the not yet executed branch

    cmp reg, limit
    je exitLoop
    mov reg2, reg3    // executed regardless of branch

For some nifty ideas look at the VLIW RISC examples on the Transmeta
website.

After you layout your first thought on the instruction set. Go back and
think about where the delay times are (e.g. completion of ALU). Then think
about how you might use that time to do something productive without
complicating your design too much.

Jim Dempsey

--
To unsubscribe from cores mailing list please visit http://www.opencores.org/mailinglists.shtml