NAME

PIR.pod - The Grammar of languages/PIR


DESCRIPTION

This document provides a more readable grammar of languages/PIR. The actual specification for PIR is a bit more complex. This grammar for humans does not contain error handling and some other issues unimportant for this PIR reference.


STATUS

For a bugs and issues, see the section KNOWN ISSUES AND BUGS.

The grammar includes some constructs that are in the IMCC parser, but are not implemented. An example of this is the .global directive.


VERSION

0.1.1


LEXICAL CONVENTIONS

PIR Directives

PIR has a number of directives. All directives start with a dot. Macro identifiers (when using a macro, on expansion) also start with a dot (see below). Therefore, it is important not to use any of the PIR directives as a macro identifier. The PIR directives are:

  .arg            .invocant          .pcc_call
  .const          .lex               .pcc_end_return
  .emit           .line              .pcc_end_yield
  .end            .loadlib           .pcc_end
  .endnamespace   .local             .pcc_sub
  .eom            .meth_call         .pragma
  .get_results    .namespace         .return
  .global         .nci_call          .result
  .HLL_map        .param             .sub
  .HLL            .pcc_begin_return  .sym
  .immediate      .pcc_begin_yield   .yield
  .include        .pcc_begin

Registers

PIR has two types of registers: real registers and symbolic or temporary registers. Real registers are actual registers in the Parrot VM, and are written like:

  [S|N|I|P]n, where n is a number between 0 to, but not including, 100.

Symbolic, or temporary registers are written like:

  $[S|N|I|P]n, where n is a positive integer.

Symbolic registers can be thought of local variable identifiers that don't need a declaration. This prevents you from writing .local directives if you're in a hurry. Of course, it would make the code more self-documenting if .locals would be used.

Constants

An integer constant is a string of one or more digits. Examples: 0, 42.

A floatin-point constant is a string of one or more digits, followed by a dot and one or more digits. Examples: 1.1, 42.567

A string constant is a single or double quoted series of characters. Examples: 'hello world', "Parrot".

TODO: PMC constants.

Identifiers

An identifier starts with a character from [_a-zA-Z], followed by zero or more characters from [_a-zA-Z0-9].

Examples: x, x1, _foo

Labels

A label is an identifier with a colon attached to it.

Examples: LABEL:

Macro identifiers

A macro identifier is an identifier prefixed with an dot. A macro identifier is used when expanding the macro (on usage), not in the macro definition.

Examples: .myMacro


GRAMMAR RULES

Compilation Units

A PIR program consists of one or more compilation units. A compilation unit is a global, sub, constant or macro definition, or a pragma or emit block. PIR is a line oriented language, which means that each statement ends in a newline (indicated as "nl"). Moreover, compilation units are always separated by a newline. Each of the different compilation units are discussed in this document.

  program:
    compilation_unit [ nl compilation_unit ]*
  compilation_unit:
      global_def
    | sub_def
    | const_def
    | macro_def
    | pragma
    | emit

Subroutine definitions

  sub_def:
    [ ".sub" | ".pcc_sub" ] sub_id sub_pragmas nl body
  sub_id:
    identifier | string_constant
  sub_pragmas:
    sub_pragma [ ","? sub_pragma ]*
  sub_pragma:
      ":load"
    | ":init"
    | ":immediate"
    | ":postcomp"
    | ":main"
    | ":anon"
    | ":lex"
    | wrap_pragma
    | vtable_pragma
    | multi_pragma
    | outer_pragma
  wrap_pragma:
    ":wrap" parenthesized_string
  vtable_pragma:
    ":vtable" parenthesized_string?
  parenthesized_string:
    "(" string_constant ")"
  multi_pragma:
    ":multi" "(" multi_types? ")"
  outer_pragma:
    ":outer" "(" sub_id ")"
  multi_tyes:
    multi_type [ "," multi_type ]*
  multi_type:
      type
    | "_"
    | keylist
    | identifier
    | string_constant
  body:
    param_decl*
    labeled_pir_instr*
    ".end"
  param_decl:
    ".param"  [ [ type identifier ] | register ] get_flags? nl
  get_flags:
    [ ":slurpy"
    | ":optional"
    | ":opt_flag"
    | named_flag
    ]+
  named_flag:
    ":named" parenthesized_string?

Examples subroutine

The simplest example for a subroutine definition looks like:

        .sub foo
        # PIR instructions go here
        .end

The body of the subroutine can contain PIR instructions. The subroutine can be given one or more flags, indicating the sub should behave in a special way. Below is a list of these flags and their meaning:

The sub flags are listed after the sub name. They may be separated by a comma, but this is not necessary. The subroutine name can also be a string instead of a bareword, as is shown in this example:

        .sub 'foo' :load, :init :anon
        # PIR body
        .end

Parameter definitions have the following syntax:

        .sub main
          .param int argc :optional
          .param int has_argc :optional
          .param num nParam
          .param pmc argv :slurpy
          .param string sParam :named('foo')
          .param $P0 :named('bar')
          # body
        .end

As shown, parameter definitions may take flags as well. These flags are listed here:

The correct order of the parameters depends on the flag they have.

PIR instructions

  labeled_pir_instr:
    label? instr nl
  labeled_pasm_instr:
    label? pasm_instr nl
  instr:
    pir_instr | pasm_instr

NOTE: the rule 'pasm_instr' is not included in this reference grammar. pasm_instr defines the syntax for pure PASM instructions.

  pir_instr:
      local_decl
    | lexical_decl
    | const_def
    | conditional_stat
    | assignment_stat
    | open_namespace
    | close_namespace
    | return_stat
    | sub_invocation
    | macro_invocation
    | jump_stat
    | source_info

Local declarations

  local_decl:
    [ ".local" | ".sym" ] type local_id_list
  local_id_list:
    local_id [ "," local_id ]*
  local_id:
    identifier ":unique_reg"?

Examples local declarations

Local temporary variables can be declared by the directives .local or .sym. There is no difference between these directives, except within macro definitions. (See Macros).

        .local int i
        .local num a, b, c
        .sym string s1, s2
        .sym pmc obj

The optional :unique_reg modifier will force the register allocator to associate the identifier with a unique register for the duration of the compilation unit.

        .local int j :unique_reg

Lexical declarations

  lexical_decl:
    ".lex" string_constant "," target

Example lexical declarations

The declaration

        .lex 'i', $P0

indicates that the value in $P0 is stored as a lexical variable, named by 'i'. Once the above lexical declaration is written, and given the following statement:

        $P1 = new .Integer

then the following two statements have an identical effect:

Likewise, these two statements also have an identical effect:

Instead of a register, one can also specify a local variable, like so:

        .local pmc p
        .lex 'i', p

Global definitions

  global_def:
    ".global" identifier

Example global declarations

This syntax is defined in the parser of IMCC, but its functionality is not implemented. The goal is to allow for global definitions outside of subroutines. That way, the variable can be accessed by all subroutines without doing a global lookup. It is unclear whether this feature will be implemented.

An example is:

        .global my_global_var

Constant definitions

  const_def:
    ".const" type identifier "=" constant_expr

Example constant definitions

        .const int answer = 42

defines an integer constant by name 'answer', giving it a value of 42. Note that the constant type and the value type should match, i.e. you cannot assign a floating point number to an integer constant. The PIR parser will check for this.

Conditional statements

  conditional_stat:
      [ "if" | "unless" ]
    [ [ "null" target "goto" identifier ]
    | [ simple_expr [ relational_op simple_expr ]? ]
    ] "goto" identifier

Examples conditional statements

The syntax for if and unless statements is the same, except for the keyword itself. Therefore the examples will use either.

        if null $P0 goto L1

Checks whether $P0 is null, if it is, flow of control jumps to label L1

        unless $P0 goto L2
        unless x   goto L2
        unless 1.1 goto L2

Unless $P0, x or 1.1 are 'true', flow of control jumps to L2. When the argument is a PMC (like the first example), true-ness depends on the PMC itself. For instance, in some languages, the number 0 is defined as 'true', in others it is considered 'false' (like C).

        if x < y goto L1
        if y != z  goto L1

are examples that check for the logical expression after if. Any of the relational operators may be used here.

Branching statements

  jump_stat:
    "goto" identifier

Examples branching statements

        goto MyLabel

The program will continue running at label 'MyLabel:'.

Operators

  relational_op:
      "=="
    | "!="
    | "<="
    | "<"
    | <"="
    | <""
  binary_op:
      "+"
    | "-"
    | "/"
    | "**"
    | "*"
    | "%"
    | "<<"
    | <">>"
    | <">"
    | "&&"
    | "||"
    | "~~"
    | "|"
    | "&"
    | "~"
    | "."
  assign_op:
      "+="
    | "-="
    | "/="
    | "%="
    | "*="
    | ".="
    | "&="
    | "|="
    | "~="
    | "<<="
    | <">="
    | <">>="
  unary_op:
      "!"
    | "-"
    | "~"

Expressions

  expression:
      simple_expr
    | simple_expr binary_op simple_expr
    | unary_op simple_expr
  simple_expr:
      float_constant
    | int_constant
    | string_constant
    | target

Example expressions

        42
        42 + x
        1.1 / 0.1
        "hello" . "world"
        str1 . str2
        -100
        ~obj
        !isSomething

Arithmetic operators are only allowed on floating-point numbers and integer values (or variables of that type). Likewise, string concatenation (".") is only allowed on strings. These checks are not done by the PIR parser.

Assignments

  assignment_stat:
      target "=" short_sub_call
    | target "=" target keylist
    | target "=" expression
    | target "=" "new" [ int_constant | string_constant | macro_id ]
    | target "=" "new" keylist
    | target "=" "find_type" [ string_constant | string_reg | id ]
    | target "=" heredoc
    | target "=" "global" string_constant
    | target assign_op simple_expr
    | target keylist "=" simple_expr
    | "global" string_constant "=" target
    | result_var_list "=" short_sub_call

NOTE: the definition of assignment statements is not complete yet. As languages/PIR evolves, this will be completed.

  keylist:
    "[" keys "]"
  keys:
    key [ sep key ]*
  sep:
    "," | ";"
  key:
      simple_expr
    | simple_expr ".."
    | ".." simple_expr
    | simple_expr ".." simple_expr
  result_var_list:
    "(" result_vars ")"
  result_vars:
    result_var [ "," result_var ]*
  result_var:
    target get_flags?

Examples assignment statements

        $I1 = 1 + 2
        $I1 += 1
        $P0 = foo()
        $I0 = $P0[1]
        $I0 = $P0[12.34]
        $I0 = $P0["Hello"]
        $P0 = new 42 # but this is really not very clear, better use identifiers
        $S0 = <<'HELLO'
        ...
        HELLO
        $P0 = global "X"
        global "X" = $P0
        .local int a, b, c
        (a, b, c) = foo()

Heredoc

NOTE: the heredoc rules are not complete or tested. Some work is required here.

  heredoc:
    "<<" string_constant nl
    heredoc_string
    heredoc_label
  heredoc_label:
    ^^ identifier
  heredoc_string:
    [ \N | \n ]*

Example Heredoc

        .local string str
        str = <<'ENDOFSTRING'
          this text
               is stored in the
                     variable
            named 'str'. Whitespace and newlines
          are                  stored as well.
        ENDOFSTRING

Note that the Heredoc identifier should be at the beginning of the line, no whitespace in front of it is allowed. Printing str would print:

    this text
               is stored in the
                     variable
            named 'str'. Whitespace and newlines
          are                  stored as well.

Invoking subroutines and methods

  sub_invocation:
    long_sub_call | short_sub_call
  long_sub_call:
    ".pcc_begin" nl
    arguments
    [ method_call | non_method_call] target nl
    [ local_decl nl ]*
    result_values
    ".pcc_end"
  non_method_call:
    ".pcc_call" | ".nci_call"
  method_call:
    ".invocant" target nl
    ".meth_call"
  parenthesized_args:
    "(" args ")"
  args:
    arg [ "," arg ]
  arg:
    [ float_constant
    | int_constant
    | string_constant [ "=>" target ]?
    | target
    ]
    set_flags?
  arguments:
    [ ".arg" simple_expr set_flags? nl ]*
  result_values:
    [ ".result" target get_flags? nl ]*
  set_flags:
    [ ":flat"
    | named_flag
    ]+

Example long subroutine call

The long subroutine call syntax is very suitable to be generated by a language compiler targeting Parrot. Its syntax is rather verbose, but easy to read. The minimal invocation looks like this:

        .pcc_begin
        .pcc_call $P0
        .pcc_end

Invoking instance methods is a simple variation:

        .pcc_begin
        .invocant $P0
        .meth_call $P1
        .pcc_end

Passing arguments and retrieving return values is done like this:

        .pcc_begin
        .arg 42
        .pcc_call $P0
        .local int res
        .result res
        .pcc_end

Arguments can take flags as well. The following argument flags are defined:

        .local pmc arr
        arr = new .Array
        arr = 2
        arr[0] = 42
        arr[1] = 43
        .pcc_begin
        .arg arr :flat
        .arg $I0 :named('intArg')
        .pcc_call foo
        .pcc_end

The Native Calling Interface (NCI) allows for calling C routines, in order to talk to the world outside of Parrot. Its syntax is a slight variation; it uses .nci_call instead of .pcc_call.

        .pcc_begin
        .nci_call $P0
        .pcc_end

Short subroutine invocation

  short_sub_call:
    invocant? [ target | string_constant ] parenthesized_args
  invocant:
    [ target"." | target "->" ]

Example short subroutine call

The short subroutine call syntax is useful when manually writing PIR code. Its simplest form is:

        foo()

Or a method call:

        obj.'toString'() # call the method 'toString'
        obj.x() # call the method whose name is stored in 'x'.

IMCC also allows the "->" instead of a dot, to make it readable for C++ programmers:

        obj->'toString'()

And of course, using the short version, passing arguments can be done as well, including all flags that were defined for the long version. The same example from the 'long subroutine invocation' is now shown in its short version:

        .local pmc arr
        arr = new .Array
        arr = 2
        arr[0] = 42
        arr[1] = 43
        foo(arr :flat, $I0 :named('intArg'))

Please note that the short subroutine call does not allow for NCI calls.

Return values from subroutines

  return_stat:
      long_return_stat
    | short_return_stat
    | long_yield_stat
    | short_yield_stat
    | tail_call
  long_return_stat:
    ".pcc_begin_return" nl
    return_directive*
    ".pcc_end_return"
  return_directive:
    ".return" simple_expr set_flags? nl

Example long return statement

Returning values from a subroutine is in fact similar to passing arguments to a subroutine. Therefore, the same flags can be used:

        .pcc_begin_return
        .return 42 :named('answer')
        .return $P0 :flat
        .pcc_end_return

In this example, the value 42 is passed into the return value that takes the named return value known by 'answer'. The aggregate value in $P0 is flattened, and each of its values is passed as a return value.

Short return statement

  short_return_stat:
    ".return" parenthesized_args

Example short return statement

        .return(myVar, "hello", 2.76, 3.14);

Just as the return values in the long return statement could take flags, the short return statement may as well:

        .return(42 :named('answer'), $P0 :flat)

Long yield statements

  long_yield_stat:
    ".pcc_begin_yield" nl
    return_directive*
    ".pcc_end_yield"

Example long yield statement

A yield statement works the same as a normal return value, except that the point where the subroutine was left is stored somewhere, so that the subroutine can be resumed from that point as soon as the subroutine is invoked again. Returning values is identical to normal return statements.

        .sub foo
          .pcc_begin_yield
          .return 42
          .pcc_end_yield
          # and later in the sub, one could return another value:
          .pcc_begin_yield
          .return 43
          .pcc_end_yield
        .end
        # when invoking twice:
        foo() # returns 42
        foo() # returns 43

Short yield statements

  short_yield_stat:
    ".yield" parenthesized_args

Example short yield statement

Again, the short version is identical to the short version of the return statement as well.

        .yield("hello", 42)

Tail calls

  tail_call:
    ".return" short_sub_call

Example tail call

        .return foo()

Returns the return values from foo. This is implemented by a tail call, which is more efficient than:

        .local pmc results = foo()
        .return(results)

The call to foo can be considered a normal function call with respect to parameters: it can take the exact same format using argument flags.

Symbol namespaces

  open_namespace:
    ".namespace" identifier
  close_namespace:
    ".endnamespace" identifier

Example open/close namespaces

        .sub main
          .local int x
          x = 42
          say x
          .namespace NESTED
          .local int x
          x = 43
          say x
          .endnamespace NESTED
          say x
        .end

Will print:

        42
        43
        42

Please note that it is not necessary to pair these statements; it is acceptable to open a .namespace without closing it. The scope of the .namespace is limited to the subroutine.

Emit blocks

  emit:
    ".emit" nl
    labeled_pasm_instr*
    ".eom"

Example Emit block

An emit block only allows PASM instructions, not PIR instructions.

        .emit
           set I0, 10
           new P0, .Integer
           ret
         _foo:
           print "This is PASM subroutine "foo"
           ret
         .eom

Macros

  macro_def:
    ".macro" identifier macro_parameters? nl
    macro_body
  macro_parameters:
    "(" id_list? ")"
  macro_body:
    .*?
    ".endm" nl
  macro_invocation:
    macro_id parenthesized_args?

Example Macros

NOTE: the macro definition is not complete, and untested. This should be fixed. For now, all characters up to but not including ".endm" are 'matched'.

PIR Pragmas

  pragma:
      include
    | new_operators
    | loadlib
    | namespace
    | hll_mapping
    | hll_specifier
    | source_info
  include:
    ".include" string_constant
  new_operators:
    ".pragma" "n_operators" int_constant
  loadlib:
    ".loadlib" string_constant
  namespace:
    ".namespace" [ "[" namespace_id "]" ]?
  hll_specifier:
    ".HLL" string_constant "," string_constant
  hll_mapping:
    ".HLL_map" int_constant "," int_constant
  namespace_id:
    string_constant [ ";" string_constant ]*
  source_info:
    ".line" int_constant [ "," string_constant ]?
  id_list:
    identifier [ "," identifier ]*

Examples pragmas

        .include "myLib.pir"

includes the source from the file "myLib.pir" at the point of this directive.

        .pragma n_operators 1

makes Parrot automatically create new PMCs when using arithmetic operators, like:

        $P1 = new .Integer
        $P2 = new .Integer
        $P1 = 42
        $P2 = 43
        $P0 = $P1 * $P2
        # now, $P0 is automatically assigned a newly created PMC.
        .line 100
        .line 100, "myfile.pir"

NOTE: currently, the line directive is implemented in IMCC as #line. See the PROPOSALS document for more information on this.

        .namespace ['Foo'] # namespace Foo
        .namespace ['Object';'Foo'] # nested namespace
        .namespace # no [ id ] means the root namespace is activated

opens the namespace 'Foo'. When doing Object Oriented programming, this would indicate that sub or method definitions belong to the class 'Foo'. Of course, you can also define namespaces without doing OO-programming.

Please note that this .namespace directive is different from the .namespace directive that is used within subroutines.

        .HLL "Lua", "lua_group"

is an example of specifying the High Level Language (HLL) for which the PIR is being generated. It is a shortcut for setting the namespace to 'Lua', and for loading the PMCs in the lua_group library.

        .HLL_map .Integer, .LuaNumber

is a way of telling Parrot, that whenever an Integer is created somewhere in the system (C code), instead a LuaNumber object is created.

        .loadlib "myLib"

is a shortcut for telling Parrot that the library "myLib" should be loaded when running the program. In fact, it is a shortcut for:

        .sub _load :load :anon
          loadlib "myLib"
        .end

TODO: check flags and syntax for this.

Tokens, types and targets

  string_constant:
    charset_specifier?  quoted_string
  charset_specifier:
      "ascii:"
    | "binary:"
    | "unicode:"
    | "iso-8859-1:"
  type:
      "int"
    | "num"
    | "pmc"
    | "object"
    | "string"
    | "Array"
    | "Hash"
  target:
    identifier | register

Notes on Tokens, types and targets

A string constant can be written like:

        "Hello world"

but if desirable, the character set can be specified:

        unicode:"Hello world"

IMCC currently allows identifiers to be used as types. During the parse, the identifier is checked whether it is a defined class. The built-in types int, num, pmc and string are always available.

A target is something that can be assigned to, it is an L-value (but of course may be read just like an R-value). It is either an identifier or a register.


AUTHOR

Klaas-Jan Stol [parrotcode@gmail.com]


KNOWN ISSUES AND BUGS

Some work should be done on:


REFERENCES


CHANGES

0.1.1

0.1