[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[oc] Re: Merlin Hybrid System



On Fri, Dec 07, 2001 at 10:22:15AM -0600, Jim Dempsey wrote:
> 
> ----- Original Message ----- 
> From: "Andreas Bombe" <bombe@informatik.tu-muenchen.de>
> To: <cores@opencores.org>
> Sent: Thursday, December 06, 2001 1:13 PM
> Subject: [oc] Re: Merlin Hybrid System
> > > ...
> > 
> > I realize now that Transmeta call their code translation
> > "virtualization" but that's not what I meant.  I was talking about
> > creating virtual machines that run on the real hardware (without
> > emulation) under control of a monitor program.  That what VMWare or
> > plex86 do for x86.
> > 
> All x86 like processors, Pentium, Athlon ect. do not execute native
> x86 instructions. They all convert (interpret) the x86 code into an
> internal format. The Transmeta device was mentioned because it
> is not hard coded. You can add extensions to the virutalization
> by means of code changes to the Transmeta Code Morphing
> software. I believe AMD did just this with modified Transmeta
> Crusoe code to emulate a processor under design.

Sure, I know all that.  Still my point was that I was not talking about
_that_.

> > > I've concieved of a method
> > > by which using an extension of the Crusoe technology that multiple
> > > processors can be used effectively and transparently to make a single
> > > threaded program use multiple threads, make a single processor
> > > operating system use multiple processors, and make an n-processor
> > > operating system use more than n-processors.
> > 
> > I don't believe that.  Are you saying that you could take one single
> > threaded program, let's say a raytracer, and convert it to a multi
> > threaded program?  Raytracing is nicely parallelizable:  Divide the
> > picture up into parts or an animation into pictures and distribute the
> > load. 
> 
> I am saying that there is a way (developed by me) wherein you can
> take a single threaded program and fragment it down into smaller
> entities (my name for these are strands) and that these strands can
> run seperately and concurrently (provided you have multiple processing
> elements).

And you can automate finding the places where to insert spinlocks so
that all non-atomic operations are safe but still not too much is
locked?

How small are those fragments, anyway?  You do want to separate at least
at the level large loops, don't you?

> A multithreaded Win32 application on a single processor system
> can have portions of multiple threads running concurrently while
> the O/S (Windows) is under the assumption that only one thread
> is active. Let me re-word that. The replacement chip built, would
> look to the motherboard and to the O/S (and applications) as
> if it were a single processor.

You're claiming magic SMP without code support.  That I've understood.
I still won't believe until I can see it.

>                                Inside this processor the strandification
> process runs and distributes the processing to multiple processing
> elements. This occurs now to a much lesser extent with processor
> pipelines wherein multiple paths of a branch can begin to execute
> as well as where FPU oprations are concurrent with integer
> operations (and multi-media instructions, ...).

Or where integer operations can operate concurrently with other integer
operations.  There is more than one integer unit in every Pentium or
Athlon, after all (and more than one FPU unit, except for Pentium 4).

> > How are you going to automate that (finding the line/frame loop
> > and taking it apart)?  Your converter would have to _understand_ the
> > program, i.e. it would have to be an AI.
> > 
> 
> This is a well defined process already. There are compilers that can
> take say a FORTRAN program and parallelize it. So the techniques
> have already been proven.

A FORTRAN source program also gives a lot of information.  It also is
somewhat stricter, so that it's easier to find points to parallelize.  I
can not speculate about the quality of the output (or the input
requirements).  For machine code that has no high level harness it's
going to be a lot harder.

> > I doubt the transparency part.  As for the effectiveness, have you
> > benchmarked it?
> > 
> 
> Transparency doesn't necessarily mean undetectable. You can write
> code that can detect which version of the Pentium or Athlon your
> code is running on. But for the most part applications run
> transparently.
> 
> As for benchmarking. How can I benchmark without building it?

How can you tell it's effective without benchmarking it?

-- 
Andreas Bombe <bombe@informatik.tu-muenchen.de>    DSA key 0x04880A44
--
To unsubscribe from cores mailing list please visit http://www.opencores.org/mailinglists.shtml