



                Producer: Smalltalk-80 to Objective-C Translator


                                  Brad J. Cox
                      Productivity Products International
                                  75 Glen Road
                              Sandy Hook, CT 06482
                                (203) 426 1875.


          Smalltalk-80 is a tool for  turning  raw  concepts  into  working
     software prototypes. Objective-C is a tool for turning proven concepts
     into fast, commercial-quality, production systems. Producer is a  tool
     for  bridging  the gap between prototyping and production by automati-
     cally translating Smalltalk-80 sources into Objective-C  sources.  The
     translation is guided by a rule base in which the programmer describes
     how differences between the Smalltalk-80 prototyping  environment  and
     the   Objective-C  production  environment  should  be  resolved  when
     translating the code.

          At SIGGRAPH-87, PPI will announce a  library  of  user  interface
     components  from which programmers build applications with iconic user
     interfaces.  The library and applications built using it are  portable
     across  diverse  window  systems,  initially X-Windows, SunWindows and
     Hewlett Packard's window system. While the Objective-C user  interface
     classes  are  different from Smalltalk's, they are similar enough that
     Producer can usually bridge the differences with some  hand-tuning  of
     the  translated  output.   We  confidently hope that Objective-C, this
     library and Producer will make automatic translation  of  Smalltalk-80
     prototypes  a  routine  part  of  many companies' software development
     lifecycle.

          I'm distributing Producer to enlist  your  help  in  testing  the
     practicality of this notion.



                                   Disclaimer


          Producer is not a mature software  product  but  an  embryo  that
     could  grow  to maturity someday.  Specifically it is not supported or
     warranteed in any  way.  It  was  written  by  myself,  an  individual
     employed  by PPI, and has been released prior to maturity by myself as
     an individual with the consent of the  company.   This  document  will
     make its strengths and some of its present shortcomings clear.

          However, even in its present state,  Producer  demonstrates  that
     automatic  translation  is technically feasible and its present imple-
     mentation provides a capable foundation on which to build.  Since  the
     market  for Smalltalk-80 translators is insufficient for PPI to pursue
     presently, we've released Producer for you to make what use of it  you
     can.

          I do ask that you keep me informed of your experiences  in  using
     it  in  its  current  state,  and  PPI requests that you feed back any


     Brad Cox                          1                      June 22, 1987







                Producer: Smalltalk-80 to Objective-C Translator


     improvements so that we can offer a fully supported  translation  pro-
     duct in the future. PPI retains the copyright and all other applicable
     rights. For example, you may not sell products that contain  any  part
     of the Producer distribution without PPI's permission.



                                  How it works


          The following is a brief description of how Producer works inter-
     nally.   This  was written from my recollection of how I left the code
     over a year ago. It may be inaccurate in places.

          Producer is basically a compiler. It's lexical analyzer  (written
     in  lex) divides Smalltalk-80 text into lexemes, and its parser (writ-
     ten in yacc) recognizes  valid  lexeme  sequences  and  constructs  an
     abstract  representation  of  the  program  as an expression tree. The
     expression tree consists of instances  of  Objective-C  classes;  e.g.
     Method,  Statement, Expression, Message, and Variable. The grammar was
     derived from the syntax diagrams in Goldberg and Robson; _S_m_a_l_l_t_a_l_k-_8_0:
     _T_h_e _L_a_n_g_u_a_g_e _a_n_d _i_t_s _I_m_p_l_e_m_e_n_t_a_t_i_o_n; Addison Wesley; 1986.

          The grammar was extended to also recognize rules  that  may  also
     appear  in the lexeme stream. Rules are enclosed in { braces } to help
     fend off shift-reduce conflicts from yacc. The parser stores the rules
     in separate data structures for use during code generation.

          At certain points, the parser sends the  top  of  the  expression
     tree  a  gen  message  to  trigger  code  generation[1].  Recall  that
     Smalltalk-80 is an extremely simple language with basically  two  com-
     ponents;  data  references  (variables,  literals,  etc) and messages.
     Rules may influence how each case is treated during code generation.

          Code generation proceeds in two passes. The first  pass  collects
     typing  information  for  each  symbol  and  message  by examining the
     expression tree from the bottom up. The bottom-most nodes  are  either
     literals whose type is immediately obvious (e.g. 1, 2.3, or 'string'),
     or they are symbols whose type can be known or unknown.  Symbol  types
     ____________________
9        [1] I now regard this as a major architectural flaw whenever I  see
     it in any application. It represents a key departure from an important
     but often ignored rule of object-oriented design.  The expression tree
     classes  should  be  abstract  so  that  they could be reused in other
     tools. But their code generation methods pollute the abstraction  with
     knowledge about a particular concrete interface; Objective-C. The code
     generation methods should have been provided in a  separate  hierarchy
     of  classes  that  know  how to connect the abstract classes to one of
     many potential concrete interfaces. This rule is simply a  generaliza-
     tion  of  the model/view/controller paradigm to apply to interfaces of
     any kind, not just user interfaces.



9     Brad Cox                          2                      June 22, 1987







                Producer: Smalltalk-80 to Objective-C Translator


     become known either as the  result  of  a  previous  type  inferencing
     operation  or because their type was specified in a rule. Unknown sym-
     bols default to id when first referenced.

          Most of the internal  nodes  are  messages.   Message  typing  is
     slightly more complicated because any message can have multiple trans-
     lations depending on how the message is used because  different  rules
     may  specify  different  translations for different receiver and argu-
     ments types. The diverse translations may  each  compute  a  different
     type.  Since  we  assign types bottom up, types have been assigned for
     the arguments and the receiver, so a translation for that selector  is
     chosen  by searching a table of possible translations for one matching
     the receiver and argument types.

          In all cases, unless  overridden  by  a  specific  rule,  default
     translations  are  used.  These amount to a fairly literal translation
     from Smalltalk-80 syntax to Objective-C syntax. However exceptions are
     made  for  Smalltalk  literal  constants, which translate to C literal
     constants. In other words, 2+2 translates  to  [2  plus:2],  which  is
     _g_u_a_r_a_n_t_e_e_d  to  fail catastrophically in Objective-C. The integer 2 is
     an object only in Smalltalk!

          The moral:  _N_e_v_e_r  believe  the  translator.  _A_l_w_a_y_s  monitor  it
     closely. Remember the 90-10 rule. The automatic translation concept is
     capable, with suitable rules, of automatically translating only 90% of
     an  application  correctly;  the  other  10% (where the bugs will have
     congregated) is still up to you.



                             Implementation Status


          Producer currently represents about three  man-weeks  of  effort,
     spent  in  two  intensive  bursts  separated by about a year. The most
     recent burst was nearly a year and a half ago.  The first burst was to
     demonstrate  the  feasibility and practicality of the translation con-
     cept. The second burst was in the course of preparing  a  paper  that,
     coauthored  with Kurt Schmucker, will appear in the OOPSLA-87 proceed-
     ings. A (very) early draft is provided with this distribution.

          For being developed so quickly, the translator does an  effective
     job  of  translation.  I refer you to the paper for discussions of the
     strengths and limitations of the translation  concept.   This  section
     discusses  the current implementation of this concept, the items on my
     own must-do list for the planned, but not yet completed,  third  stage
     of Producer's evolution.

     (1)  Smalltalk-80 fileout format uses '!' delimiters in a fashion that
          I  was never able to formalize correctly in Producer's yacc gram-
          mar. The symptom is that  the  translator  will  generate  syntax
          errors  in nearly every translated file for certain of these del-
          imiters. I'm told that fileout format has been  documented  in  a


     Brad Cox                          3                      June 22, 1987







                Producer: Smalltalk-80 to Objective-C Translator


          paper  somewhere, but I've never worked the repairs back into the
          code. The fix should be local to gram.y.

     (2)  The translator loads its rule base by reading files of  rules  as
          if  they were concatenated with the sources to be translated. The
          rule-specification syntax is abysmal, primarily  because  it  was
          chosen  to  minimize  the  amount of time I spent struggling with
          shift-reduce conflicts from yacc, rather than  making  the  rules
          intelligible  to  users. Smalltalk's formal grammar seemed unrea-
          sonably difficult for yacc to swallow, and I suspect the  problem
          may  lie  in  some  mistake I've made in translating Smalltalk-80
          syntax diagrams into yacc specifications.

     (3)  The program contains extensive provisions for reporting its cogi-
          tations in type inferencing. The various error, warning, logging,
          and debugging messages need to be tuned for greater utility.

     (4)  The code was based on an as yet unreleased libary (phylum) called
          "Substrate",  which  supports  features  that  are not yet in our
          standard product set, like  Blocks,  Coroutining,  and  exception
          handling.   I made a fast editing pass to remove any dependencies
          on these nonstandard library  features.  I  also  added  a  file,
          Substrate.h,  that defines stylistic conventions that I adhere to
          in all my work. See USE, IMPORT, EXPORT, etc in the sources.

          The preceeding problems are superficial and easily repaired.  The
     following  ones  are  somewhat  more  substantial in that they involve
     design work in addition to coding work.

     (1)  The type inferencing machinery infers types of  newly-encountered
          (unknown)  messages and variables by seeing how they are combined
          with variables and messages whose types are known apriori or else
          determined  earlier through inferencing.  The only types that are
          known apriori are literals like 1, 2.3, or  'string'.  This  gen-
          erally  provides  insufficient  typing  information from which to
          infer anything useful, so you should generally  provide  variable
          rules  to  pin  down  types for key instance variables and method
          arguments You do this with rules that state, in effect, that `the
          type of the Smalltalk variable named foo is int, and the variable
          is called foobar in Objective-C'.  Presently  rules  have  global
          scope.  If different Smalltalk classes use the name, foo, in ways
          that should be translated differently, different rule  sets  must
          be  provided  manually  to  the translator. Creating and managing
          these application-specific rules sets  adds  to  the  translation
          effort  and tends to make rules non-reusable across translations.
          The rules should be organized with a scoping  mechanism,  ideally
          one based on inheritance.

     (2)  The inferencing logic is ad-hoc and quite possibly slow.  However
          the  main  bottleneck seems to be loading the rule-base; transla-
          tion  speed  has  never  been  a  real  problem.  Inferencing  is
          presently  deductive,  and  a more inductive scheme based on both
          forwards and backwards reasoning  might  produce  higher  quality


     Brad Cox                          4                      June 22, 1987







                Producer: Smalltalk-80 to Objective-C Translator


          translations.  In other words, the translation of a given message
          expression is determined exclusively by whatever information  can
          be inferred about the types of the receiver and arguments to that
          message (forward reasoning). Backward reasoning would  also  con-
          sider how the results of the expression are used in other expres-
          sions.

     (3)  Producer does not presently handle  non-trivial  uses  of  Blocks
          correctly;  ie.   Block  expressions  that  cannot  be translated
          directly into C conditional expressions like if, while,  or  for,
          which Producer handles just fine already.  Nearly all occurrences
          of Smalltalk-80 Blocks could  be  handled  without  changing  the
          Objective-C  language by adding a trivially simple Block class to
          the library. A named instance  variable  holds  a  pointer  to  a
          static function and indexed instance variables hold _c_o_p_i_e_s _o_f any
          variables that the block accesses in the  instantiation  site[2].
          This  copy  could  be taken entirely automatically by copying the
          instantiation site's stack frame.  However I prefer to have  more
          control  over  space  than that. So I've been using a scheme that
          requires the programmer (and someday  the  compiler)  to  specify
          which  variables are really accessed by the block as arguments to
          the message that instantiates the block; like this

               ... {
                IMPORT void aStaticFunction();
                  id var1 = something, var2 = something;
                aBlock = [Block function:aStaticFunction args:2, var1, var2];
                [anyObject do:aBlock];
                ...
               }
               LOCAL void aStaticFunction(instantiationSiteVariables, value1, value2)
                struct { id var1, var2; } *instantiationSiteVariables;
                id value1, value2;
               {
                if ([instantiationSiteVariables->var1 someMessage])
                  ...
               }


          The block will call the function when anyObject sends  the  block
          one  of  several  evaluation  messages  (value:arg1 or value:arg1
          value:arg2 or ...). The first argument is a  _p_o_i_n_t_e_r  to  block's
          copy  of  the  instantiation site's variables. The trailing argu-
          ments contain the arguments that the invocation  site  passed  in
          the value: message.  I've used this approach extensively by writ-
          ing the static functions by hand, and am trying to get our  staff
          to  extend  the  language  to provide some kind of language-level
          support to make the syntax simpler.  This approach could be,  but
          has not yet been, taken by Producer.
     ____________________
9        [2] In Smalltalk-80, the block seems to have access to the  instan-
     tiation  site's  variables,  so that the block can change variables in


9     Brad Cox                          5                      June 22, 1987







                Producer: Smalltalk-80 to Objective-C Translator


          The inferencing machinery's primary current virtue is that it can
     be made to work for selected test cases. It leaves lots to be desired.
     Call me if you decide to extend it so that I can  prevent  unnecessary
     duplication of effort.



                             About the distribution


          The top level of the distribution consists of

         total 88
         -rw-r--r--  1 cox           181 Jun 22 14:32 Makefile
         -rw-r--r--  1 cox         26592 Jun 22 14:30 README
         drwxr-xr-x  2 cox           512 Jun 22 14:19 example
         -rw-r--r--  1 cox           166 Jun 16 13:18 log
         -rw-r--r--  1 cox           997 Jun 15 11:09 mac.me
         -rw-r--r--  1 cox         26751 Jun 15 11:02 producer.me
         -rw-r--r--  1 cox         21444 Jun 22 14:29 readme.me
         drwxr-xr-x  2 cox           512 Jun 12 10:22 rules
         drwxr-xr-x  2 cox          3072 Jun 22 14:31 src

     The Makefile governs formatting of  the  two  documents;  this  README
     (from   readme.me)   and  the  draft  of  the  OOPSLA-87  paper  (from
     Producer.me). The mac.me file contains text formatting macros that are
     common to both papers; used like this:

         nroff -me mac.me Producer.me >Producer.f


          The rules directory contains  a  single  file,  generic.ru,  that
     represents a first pass at an application-independent rules base. This
     set of rules translate Smalltalk to the conventions used in my  proto-
     type version of the user interface library.

          For example, it translates Smalltalk Integer operations to C  int
     operations,  and  it translates Smalltalk Point operations to C macros
     that manage points as type PT; a pair of 16-bit coordinates in  a  32-
     bit  C  int.   For  example,  pt(x,y) invokes a C macro that trims and
     shifts two ints, x and y, to fit side by side  in  a  32-bit  integer,
     ptPlus(p,q)  invokes  a  macro  that  computes  the  vector sum of two
     points, p and q, etc.

         rules:
         total 35
     ____________________
9     the instantiation site. In Objective-C the block receives  a  copy  of
     the  variables  and cannot use them to communicate with the instantia-
     tion site. I believe that  this  is  the  sole  functional  difference
     between the two schemes.



9     Brad Cox                          6                      June 22, 1987







                Producer: Smalltalk-80 to Objective-C Translator


         -rw-r--r--  1 cox         35567 Jun 12 10:22 generic.ru


          The src directory contains a fragment from  the  video  animation
     program  that  appears  at  the  end  of  the Smalltalk-80 video tape.
     BounceInBoxNode.st is the Smalltalk-80 source file, animation.ru  con-
     tains  the  application-specific  rule  set,  BounceInBoxNode.m is the
     translated version built by Producer as invoked by Makefile[3].

         example:
         total 7
         -rw-r--r--  1 cox          1730 Jun 16 10:24 BounceInBoxNode.m
         -rw-r--r--  1 cox           868 Jun 16 10:18 BounceInBoxNode.st
         -rw-r--r--  1 cox           394 Jun 16 10:20 Makefile
         -rw-r--r--  1 cox          2178 Jun 16 10:18 animation.ru
         -rw-r--r--  1 cox           185 Jun 16 10:24 log
         -rw-r--r--  1 cox           239 Jun 16 10:18 st80.h


          The log file records the results of the translation session.  The
     syntax  error  is innocuous, the result of the beforementioned problem
     in the grammar in handling '!' delimiters.

         Producer -c ../rules/generic.ru animation.ru BounceInBoxNode.st >BounceInBoxNode.m
         error 7:BounceInBoxNode.st: tegory:'Graphics-Animation'!! : syntax error
         *** Error code 1 (ignored)


          The src directory contains the sources for Producer, with its own
     Makefile.    The  Substrate.h  header  file,  which  is  automatically
     included by the Producer.h header file, is technically  a  part  of  a
     internal  lower level library, Substrate, on which Producer was origi-
     nally developed. Substrate.h was copied and changed  superficially  so
     that Producer compiles correctly without the Substrate library.

         src:
         total 70
         -rw-r--r--  1 cox           483 Jun 12 10:21 AbstractTranslation.m
         -rw-r--r--  1 cox           282 Jun 12 10:21 ArgumentList.m
         -rw-r--r--  1 cox           897 Jun 12 10:21 Block.m
         -rw-r--r--  1 cox           143 Jun 12 10:21 CharConstant.m
         -rw-r--r--  1 cox          2205 Jun 12 10:21 Class.m
         -rw-r--r--  1 cox           630 Jun 12 10:21 Comment.m
         -rw-r--r--  1 cox           176 Jun 12 10:21 Constant.m
         -rw-r--r--  1 cox          2032 Jun 12 10:21 Expr.m
         -rw-r--r--  1 cox          1243 Jun 12 10:21 FunctionTranslation.m
         -rw-r--r--  1 cox          1484 Jun 12 10:21 Identifier.m
         -rw-r--r--  1 cox          1248 Jun 12 10:21 IdentifierTranslation.m
     ____________________
9        [3] The full source for the animation program is not  provided.  My
     copyright paranoia argued against providing even this fragment.



9     Brad Cox                          7                      June 22, 1987







                Producer: Smalltalk-80 to Objective-C Translator


         -rw-r--r--  1 cox           105 Jun 12 10:21 List.m
         -rw-r--r--  1 cox          1985 Jun 15 11:55 METHODDECLS.m
         -rw-r--r--  1 cox          1384 Jun 15 11:51 Makefile
         -rw-r--r--  1 cox          4302 Jun 12 10:21 Method.m
         -rw-r--r--  1 cox          3136 Jun 12 10:21 Msg.m
         -rw-r--r--  1 cox           583 Jun 12 10:21 MsgArgPattern.m
         -rw-r--r--  1 cox           828 Jun 12 10:21 MsgNamePattern.m
         -rw-r--r--  1 cox          1280 Jun 12 10:21 MsgTranslation.m
         -rw-r--r--  1 cox           775 Jun 12 10:21 MsgTranslator.m
         -rw-r--r--  1 cox          1868 Jun 12 10:21 Node.m
         -rw-r--r--  1 cox           229 Jun 12 10:21 NumberConstant.m
         -rw-r--r--  1 cox          1402 Jun 15 11:27 Producer.h
         -rw-r--r--  1 cox           306 Jun 12 10:21 Return.m
         -rw-r--r--  1 cox           825 Jun 12 10:21 Scope.m
         -rw-r--r--  1 cox          3157 Jun 12 10:21 Selector.m
         -rw-r--r--  1 cox           253 Jun 12 10:21 SelectorConstant.m
         -rw-r--r--  1 cox           457 Jun 12 10:21 StArray.m
         -rw-r--r--  1 cox           492 Jun 12 10:21 Stmt.m
         -rw-r--r--  1 cox           381 Jun 12 10:21 StringConstant.m
         -rw-r--r--  1 cox          1268 Jun 12 10:21 StringTranslation.m
         -rw-r--r--  1 cox          2140 Jun 15 11:38 Substrate.h
         -rw-r--r--  1 cox          1405 Jun 15 11:53 Symbol.m
         -rw-r--r--  1 cox           452 Jun 12 10:21 Template.m
         -rw-r--r--  1 cox           901 Jun 12 10:21 Type.m
         -rw-r--r--  1 cox          1800 Jun 12 10:21 design.me
         -rw-r--r--  1 cox          3271 Jun 12 10:21 gen.m
         -rw-r--r--  1 cox          9007 Jun 12 10:21 gram.y
         -rw-r--r--  1 cox          3601 Jun 12 10:21 lex.l
         -rw-r--r--  1 cox          2212 Jun 12 10:21 main.m
         -rw-r--r--  1 cox           260 Jun 12 10:21 st80.h
         -rw-r--r--  1 cox           259 Jun 15 11:59 y.tab.h


          The files are exactly as I left them nearly a  year  and  a  half
     ago, except for:

     (1)  The addition of this README  document.  An  early  draft  of  the
          OOPSLA-87 paper, sadly prior to Kurt Schmucker's improvements, is
          in Producer.me.

     (2)  One recompilation pass to remove any obvious dependencies  on  my
          private  Substrate  library  and to verify that Producer compiles
          and runs correctly on the standard Foundation library.  I  tested
          the  changes  by verifing that the Makefile in the example direc-
          tory ran to completion, but this is hardly an ironclad guarantee.



                                   Using Producer


          Flags controlling the  translation  process,  source  files,  and
     rules  files are provided on the command line and are processed in the


     Brad Cox                          8                      June 22, 1987







                Producer: Smalltalk-80 to Objective-C Translator


     order they appear.  The flags are[4]

     -d:  Enable debugging functions (dbg()) scattered throughout the code.
          Seldom useful.

     -m:  Enables  the  Objective-C  Foundation  library  message   tracing
          feature. Seldom useful in Producer.

     -a:  Enables the Objective-C  Foundation  library  allocation  tracing
          feature. Seldom useful in Producer.

     -l:  Enables printing of each lexical token as produced by lex. Useful
          only for debugging lex.l.

     -g:  Enables automatic redirection of each class into a separate  file
          based on the class name parsed from the input file. Automatically
          puts class Foobar into file Foobar.m.

              CAREFUL! This puts at risk other files whose  name  might
              coincide with a Smalltalk-80 class name!


     -s:  Generate Smalltalk-80 sources in the output file  as  Objective-C
          comments (the default).

     -c:  Don't generate Smalltalk-80 sources in the output file.

     -i:  Generate information that was thought at one time  to  be  useful
          when debugging rules.

     -M:  Send storeOn: to the message rule  dictionary  just  before  ter-
          minating as a debugging aid.

     -I:  Send storeOn: to the variable rule dictionary  just  before  ter-
          minating as a debugging aid.

          Typically, the generic rules  in  rules/generic.ru  is  specified
     first, then any application-specific rules, then a single Smalltalk-80
     source file.  Unless -g is  set,  the  translated  output  appears  on
     stdout.  The  various  creaks, groans and mumbles that can be elicited
     about the translation process itself appear on stderr.

          For the syntax for writing new rules, refer to  the  examples  in
     generic.ru  and  animation.ru,  and if necessary, the rules section of
     the grammar in gram.y.

          And good luck! Let me know how you fare...

     ____________________
9        [4] I'm working from memory about what these flags mean.  Some  may
     be nonfunctional:



9     Brad Cox                          9                      June 22, 1987



