[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [oc] Thanks For Your Support Of Merlin

To: <cores@opencores.org>
Subject: Re: [oc] Thanks For Your Support Of Merlin
From: "Jim Dempsey" <dempsey@northnet.net>
Date: Fri, 7 Dec 2001 17:00:27 -0600
References: <F863E7fOgIhE2U7yX8s000175f3@hotmail.com> <01120714191600.00890@gandalf.merlintec.com>
Reply-To: cores@opencores.org
Sender: owner-cores@opencores.org

Jecel,

Some background information. (came from letter to Transmeta)

As for x86 code conversion experience, I designed two products for
Network-Systems Design, Inc. POP and BCCx32. The latter was an extension to
the former. These products were used in the following manner. The user's
C/C++ compiler was directed to produce assembler output using a compiler
option switch. Either Borland C++ or Microsoft C++ could be used to generate
the .ASM code. The .ASM output was then used as input to my products. The
products performed pattern analysis of the .ASM code using fuzzy pattern
matching filters. This was a capability of another product I wrote (TECO-8)
for Network-Systems Design, Inc. The pattern analysis identified non-optimal
coding sequences produced by the compilers. Then replaced these non-optimal
sequences with optimal code as new assembler source code. This is similar to
your Code Morphing although this worked on the .ASM source code and Code
Morphing works on the instruction stream.

The BCCx32 extended this much further. This was a product for a very small
niche market. This product required a C++ DOS application to run as fast as
possible and with unlimited data capacity (limited by the amount of physical
memory). The traditional 32-bit Protected Mode DOS Extender was not a
solution due to the high overhead of interrupt reflections. This may get a
little too technical but it is interesting none the less.

On the 386 and later processor designs there is an operational mode that you
can get into while in Real Mode (non-V86 Mode). This is where you set the
memory manager granularity to small for CS and SS and set the granularity to
large for ES, DS, FS, GS. This can only be done in "real" Real Mode (non-V86
Mode). I wasn't the first person to do this. The interesting aspect of
setting up the processor in this mode is all of the MS-DOS programs and 3rd
party applications, compilers and such would run as if you were in standard
Real Mode or in V86 "Real Mode". In addition to this you could run
applications that could take advantage of this mode. In particular, programs
could now index ES, DS, FS, GS using both 16-bit indices and full 32-bit
indices. This means that a program has both 16-bit segmented code/data
characteristics and 32-bit data characteristics.

Up until BCCx32 the other programmers in the know would access large data
arrays above the 1MB mark using function calls or inline ASM statements.
There was no clean way for programmers to transparently reference all of
memory with an unmodified C++ application. With POP (Program Optimizer) the
thought came upon me that I could identify the huge pointer references in
the .ASM code and then convert these to Flat Model 32-bit instruction
sequences that would optimally reference. The extended capability product
was call BCCx32. (The Borland C++ compiler was BCC.EXE). This product
required only a trivial modification to C++ source code in declaring the
Flat Model Pointers using the "huge" keyword in the pointer declaration.

Change:

long far* array;
To:
long huge* array;

In many cases the changes was much more simpler than that because some
programming practices used

#define PLONG long far*

And then used PLONG everywhere else in the source code. This practice would
require only the one line change

#define PLONG long huge*

The code

PLONG array = new long[1000000];

Would generate the standard huge model memory allocation from the C++
compiler and then the post processing by BCCx32 would convert the code to
use the Flat Model allocation routines. Further any data references via this
pointer would be converted to use Flat Model instruction sequences. The code
sequences for far*, huge* and flat* is listed for a common array index.

long value = array[index];

; far* code in assembler

mov ax,word ptr DGROUP:_index
shl ax,1
shl ax,1
les bx,dword ptr DGROUP:_array
add bx,ax
mov ax,word ptr es:[bx+2]
mov dx,word ptr es:[bx]
mov word ptr DGROUP:_value+2,ax
mov word ptr DGROUP:_value,dx

; huge* code in assembler (note, code would be different if index were long)

mov ax,word ptr DGROUP index
cwd
shl ax,1
rcl dx,1
shl ax,1
rcl dx,1
add ax,word ptr DGROUP:_array
adc dx,0
mov cx,offset __AHSHIFT
shl dx,cl
add dx,word ptr DGROUP:_array+2
mov bx,ax
mov es,dx
mov ax,word ptr es:[bx+2]
mov dx,word ptr es:[bx]
mov word ptr DGROUP:_value+2,ax
mov word ptr DGROUP:_value,dx

; Now for the flat pointer code output by BCCx32

movsx eax,word ptr DGROUP:_index
mov ebx,dword ptr _array
mov eax,dword ptr [ebx][eax*4]
mov dword ptr DGROUP:_value,eax

The conversion prowess of BCCx32 surprised me on several occasions. And I
wrote the conversion codes. As you may or may not know, the optimization
process is somewhat iterative in nature. The code you produce on the first
pass will use the information at hand to produce a more optimal solution to
the problem. However, the new code generated may exhibit non-optimal
solutions when conjoined with other newly optimized code fragments. By
making additional pattern filters and code regeneration algorithms an
additional degree of optimizations can be had. In examining some of the code
conversions one case where 21 lines of code and 3 tags, including a function
call to compute an index, and including 3 conditional branches and 2
unconditional branches was optimized into 6 lines of code and the 1
necessary conditional branch.

If(XMStable[UnusedIndex].u.Memory.size >= size) {

Became

movzx eax,si
imul eax,10
mov ebx,dword ptr DGROUP:_XMStable
mov eax,dword ptr [ebx+6][eax]
cmp eax,dword ptr [bp+6]
jnae @17@562

The point to be made in discussing BCCx32 is that very similar problems
exist between the code optimization and conversion challenge from C++ huge
model to C++ Flat Model as compared to the challenge of code optimization
and conversion from x86 code to your RISC code. What you call Code Morphing.
Both challenges rely heavily on pattern recognition and what you are and are
not permitted to do in the conversion process.
-------------------

Jim Dempsey

--
To unsubscribe from cores mailing list please visit http://www.opencores.org/mailinglists.shtml

References:
- [oc] Thanks For Your Support Of Merlin
  - From: "David Drummond" <lacroixlucien@hotmail.com>
- Re: [oc] Thanks For Your Support Of Merlin
  - From: Jecel Assumpcao Jr <jecel@merlintec.com>

Prev by Date: Re: [oc] Thanks For Your Support Of Merlin
Next by Date: Re: [oc] Re: Merlin Hybrid System
Prev by thread: Re: [oc] Thanks For Your Support Of Merlin
Next by thread: [oc] Fwd: SPI Core & MP3
Index(es):
- Date
- Thread