[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [oc] Re: Merlin Hybrid System




----- Original Message ----- 
From: "Andreas Bombe" <bombe@informatik.tu-muenchen.de>
To: <cores@opencores.org>
Sent: Thursday, December 06, 2001 1:13 PM
Subject: [oc] Re: Merlin Hybrid System
> > ...
> 
> I realize now that Transmeta call their code translation
> "virtualization" but that's not what I meant.  I was talking about
> creating virtual machines that run on the real hardware (without
> emulation) under control of a monitor program.  That what VMWare or
> plex86 do for x86.
> 
All x86 like processors, Pentium, Athlon ect. do not execute native
x86 instructions. They all convert (interpret) the x86 code into an
internal format. The Transmeta device was mentioned because it
is not hard coded. You can add extensions to the virutalization
by means of code changes to the Transmeta Code Morphing
software. I believe AMD did just this with modified Transmeta
Crusoe code to emulate a processor under design.

> Code translation does not depend much on the instruction set design.
> Virtualization on the other hand can either work by design or only be
> accomplished by performance eating hacks (as in the x86 case).
> 
> > I've concieved of a method
> > by which using an extension of the Crusoe technology that multiple
> > processors can be used effectively and transparently to make a single
> > threaded program use multiple threads, make a single processor
> > operating system use multiple processors, and make an n-processor
> > operating system use more than n-processors.
> 
> I don't believe that.  Are you saying that you could take one single
> threaded program, let's say a raytracer, and convert it to a multi
> threaded program?  Raytracing is nicely parallelizable:  Divide the
> picture up into parts or an animation into pictures and distribute the
> load. 

I am saying that there is a way (developed by me) wherein you can
take a single threaded program and fragment it down into smaller
entities (my name for these are strands) and that these strands can
run seperately and concurrently (provided you have multiple processing
elements).

The strands are quasi dependant on the thread but are not tightly
dependant on the operating system's control of the thread.
e.g. strands can continue to run during and after the operating system
context switches the thread.

Also the creation and running of the strands is independant of any
x86 code changes and/or operating system changes.

A multithreaded Win32 application on a single processor system
can have portions of multiple threads running concurrently while
the O/S (Windows) is under the assumption that only one thread
is active. Let me re-word that. The replacement chip built, would
look to the motherboard and to the O/S (and applications) as
if it were a single processor. Inside this processor the strandification
process runs and distributes the processing to multiple processing
elements. This occurs now to a much lesser extent with processor
pipelines wherein multiple paths of a branch can begin to execute
as well as where FPU oprations are concurrent with integer
operations (and multi-media instructions, ...).

> How are you going to automate that (finding the line/frame loop
> and taking it apart)?  Your converter would have to _understand_ the
> program, i.e. it would have to be an AI.
> 

This is a well defined process already. There are compilers that can
take say a FORTRAN program and parallelize it. So the techniques
have already been proven. Before you make a quick assumption
and jump to the conclusion that I am saying that you will have to
recompile your programs - I am not saying that. I am simply saying
it is possible to produce an automated system for conversion of
code sets written for single processor to run on multiple processors.

Note, the like the FORTRAN program above significant returns
on performance can be attained. And like this example the
ultimate best performance is likely not attained without making
some programming considerations. i.e. although you are not
required to make programming changes, some changes can
yield better results.

> I doubt the transparency part.  As for the effectiveness, have you
> benchmarked it?
> 

Transparency doesn't necessarily mean undetectable. You can write
code that can detect which version of the Pentium or Athlon your
code is running on. But for the most part applications run
transparently.

As for benchmarking. How can I benchmark without building it?

At the moment this is all theory. The strandification process can
be diagrammed and an emulator could be written to compute
expected results. However, it is very difficult to construct an
emulator that can mimic a Pentium running Windows NT and
a few AutoCad programs (complete with display adapter).

Assuming money can be raised then an emulator could be built
concurrent with the work on the prototype. From my point of
view the construction of the emulator could be done using
the Transmeta Crusoe with firmware modified for purposes
of this emulation. Thusly most of the work done on constructing
the emulator can be used in the new chipset using additional
processing elements.

This concept provides for a migration strategy wherein you can
start with the current, and massive, installed code base. Then
after acceptance of the technology would the software writers
consider making changes to take full advantage of the newer
processor design. This somewhat follows the path of adding
new instruction sets - old code runs, new code runs faster.

Jim Dempsey


--
To unsubscribe from cores mailing list please visit http://www.opencores.org/mailinglists.shtml