This tutorial introduces developing an extensible language in Sterling. It builds an example with a host language and a few extensions. Note this only covers the semantic portion of the language, describing abstract syntax and language semantics.
Sterling languages are built by modules. These modules build on
each other, adding to the definitions given by the modules on
which they build. Because extensions to a module are written
separately, they may introduce the same name for different uses.
To avoid confusion, each name introduced in a module is qualified
with the module's name. For example,
our tutorial:host
module introduces a constructor
with the short name var
. The full name of this
constructor is tutorial:host:var
. Either the full
name or the short name may be used generally. However, if two
constructs with the same name are in the module, perhaps from two
separate built-on modules, the full name must be used to
differentiate them.
Modules in Sterling are defined as collections of files in a
single directory. Modules are named after their directories, so
the tutorial:host
module would be a collection of
files in a directory named host
inside a directory
named tutorial
(tutorial/host
). When we
run Sterling and tell it to process the tutorial:host
module, it looks for this directory structure to identify the
files belonging to the module.
The language we build here has four modules:
tutorial:host
)
let
bindings (tutorial:let
)
tutorial:typing
)
tutorial:num
)
Languages in Sterling start with a host language, a module on which all others build. Our host language will be the untyped lambda calculus.
We start each file in a module with a declaration of which module
we are defining. In our case, this will be
the tutorial:host
module:
Module tutorial:hostThis name will be used to identify this module for other modules to build on it, adding to its definitions.
We start by defining our abstract syntax:
tm ::= var(string) | abs(string, tm) | app(tm, tm)This defines one syntactic category,
tm
, with three
constructors:
var
constructor represents lambda calculus
variables. We use the Sterling built-in string
type for variable names.
abs
constructor represents lambda calculus
abstractions. This uses the Sterling string
type
for the bound variable and the tm
type for lambda
calculus terms for the body.
app
constructor represents the application
of one term to another.
For each syntactic category we introduce, we must also give its projection type:
Projection tm :Each constructor of a syntactic category introduced by an extension will introduce a projection of terms built by that constructor. These projections can then be used to define relations extensions introduce for constructs from other extensions, as we shall see below.
The empty projection type we have for tm
means the
projection will relation two terms of type tm
. In
some cases, it is useful to condition the projection on other
arguments. For example, we might want to be able to condition the
projection on the object-level types of sub-terms. In that case,
we would want a projection type with a typing context so we could
determine the types of sub-terms in the rules for projection.
We start by defining a relation checking whether a tm
is closed. We first declare it:
/*whether a tm is closed, that it does not use any free vars*/
Judgment closed : [string] tm*
This declares a relation named closed
with two
arguments, a list of strings and a tm
. Additionally,
the tm
is marked as the primary component of
the relation. Conceptually, the primary component is the argument
the relation is about. Here the closed
relation is about the tm
, with the list of
variable names being present only to support determining something
about the tm
.
We will use the primary component to ensure the semantics of relations for known constructs are not changed by extensions. When an extension introduces a rule defining a relation introduced by a built-on module, the primary component must be built by a constructor introduced by that extension. This prevents extensions from introducing rules applicable to pre-existing constructs and modifying their semantics.
We write rules for the relation, which are as one would expect:
mem X Vars ------------------ [C-Var] closed Vars var(X) closed X::Vars T --------------------- [C-Abs] closed Vars abs(X, T) closed Vars T1 closed Vars T2 ----------------------- [C-App] closed Vars app(T1, T2)In the
C-Var
rule we check whether the
variable X
is in the list of bound variables. This
uses the mem
relation in the Sterling standard
library rather than a relation defined in this module.
We also define evaluation for tm
s, with several
relations to support it:
/*eval T1 T2: small-step evaluation, where T1 steps to T2*/ Judgment eval : tm* tm /*subst X R B S: substitute R for X in B yielding S*/ Judgment subst : string tm tm* tm /*value T: T is a value*/ Judgment value : tm*These relations have rules generally as we would expect:
X != Y ----------------------- [S-Var-NEq] subst X R var(Y) var(Y) ------------------ [S-Var-Eq] subst X R var(X) R X != Y subst X R B S ----------------------------- [S-Abs-NEq] subst X R abs(Y, B) abs(Y, S) ----------------------------- [S-Abs-Eq] subst X R abs(X, B) abs(X, B) subst X R T1 S1 subst X R T2 S2 --------------------------------- [S-App] subst X R app(T1, T2) app(S1, S2) --------------- [V-Abs] value abs(X, T) eval T1 T11 ----------------------------- [E-App-Step1] eval app(T1, T2) app(T11, T2) value T1 eval T2 T21 ----------------------------- [E-App-Step2] eval app(T1, T2) app(T1, T21) absForm T1 X Body value T2 subst X T2 Body V ------------------ [E-App-Subst] eval app(T1, T2) VThe exception is the
E-App-Subst
rule. We would
expect this rule to have as its conclusion eval app(abs(X,
Body), T2) V
, explicitly stating the first term is an
abstraction. The reason we do not write this is that it prevents
extensions from introducing new abstract-like constructs that may
be applied. An extension cannot introduce a new rule defining
evaluation for app
because of the restrictions for
the primary component, so, if we want to allow app
to
apply extension-introduced function constructors, we cannot put
the abs
form into the rule directly.
To solve the problem of extensibility while still getting the
information we need, we define a relation absForm
that checks if a tm
is
somehow abs
-shaped, pulling out the bound variable
name and body if so:
/*absForm T X B: T is like the abstraction abs(X, T)*/
Judgment absForm : tm* string tm
--------------------- [AF-Abs]
absForm abs(X, T) X T
This relation can be extended by extensions so their constructs
can also be applied using app
.
Some extensions only add new constructors and define existing
judgments over them. This extension adds a new let
constructor and defines the host language's judgments on it.
The extension declares its name and that it builds on the host language:
Module tutorial:let Builds on tutorial:host
This adds one new constructor of the tm
category:
tm ::= ... | let(string, tm, tm)We start the constructor declaration with an ellipsis (
...
) to show this is an addition to the pre-existing
constructors of tm
rather than a new tm
category.
Because let
builds a syntactic category from a
built-on module, we need to give it a projection that eliminates
the let
constructor:
--------------------------------------------- [Proj-Let] |{tm}- let(X, T1, T2) ~~> app(abs(X, T2), T1)This says a term built by the
let
constructor is
similar to an application of a built abstraction over the same
sub-terms.
We also define the host language's relations for the
new let
constructor:
eval T1 T11 ----------------------------------- [E-Let-Step] eval let(X, T1, T2) let(X, T11, T2) value T1 subst X T1 T2 V --------------------- [E-Let-Subst] eval let(X, T1, T2) V X != Y subst X R T1 S1 subst X R T2 S2 --------------------------------------- [S-Let-NEq] subst X R let(Y, T1, T2) let(Y, S1, S2) subst X R T1 S1 --------------------------------------- [S-Let-Eq] subst X R let(X, T1, T2) let(X, S1, T2) closed Vars T1 closed X::Vars T2 -------------------------- [C-Let] closed Vars let(X, T1, T2)Note that each rule has a term built by the
let
constructor as the primary component in the conclusion. Each new
rule introduced in an extension for a relation introduced by a
built-on module must have the primary component built by a
constructor introduced by the extension so the semantics of
constructs not introduced by the extension are not affected by the
extension's new rules.
Extensions may also introduce new syntactic categories and new judgments. This extension introduces a notion of typing for the language.
As in all extensions, we declare the module name and the module on which it builds:
Module tutorial:typing Builds on tutorial:host
Because we will be typing tm
s, we need a way to
describe types. We introduce a new syntactic
category ty
:
ty ::= baseTy | arrowTy(ty, ty) Projection ty :
We also add a typed version of an abstraction, projecting to a
basic abstraction (abs
) and define the host
language's judgments on it:
tm ::= ... | tyAbs(string, ty, tm) ------------------------------------ [Proj-TyAbs] |{tm}- tyAbs(X, Ty, T) ~~> abs(X, T) X != Y subst X R B S ----------------------------------------- [S-TyAbs-NEq] subst X R tyAbs(Y, Ty, B) tyAbs(Y, Ty, S) ----------------------------------------- [S-TyAbs-Eq] subst X R tyAbs(X, Ty, B) tyAbs(X, Ty, B) --------------------- [V-TyAbs] value tyAbs(X, Ty, T) --------------------------- [AF-TyAbs] absForm tyAbs(X, Ty, T) X T closed X::Vars T --------------------------- [C-TyAbs] closed Vars tyAbs(X, Ty, T)Note that rule
AF-TyAbs
takes a tyAbs
apart as an abstraction so app(tyAbs("x", var("x")),
...)
can evaluate.
We introduce a judgment typeOf
for
typing tm
s:
Judgment typeOf : [(string, ty)] tm* tyWe define typing rules for all known constructs, those introduced by the typing extension and the host language:
lookup G X Ty ------------------ [T-Var] typeOf G var(X) Ty typeOf G T1 arrowTy(Ty1, Ty2) typeOf G T2 Ty1 ----------------------------- [T-App] typeOf G app(T1, T2) Ty2 typeOf (X, Ty1)::G T Ty2 ------------------------------------------- [T-TyAbs] typeOf G tyAbs(X, Ty1, T) arrowTy(Ty1, Ty2) typeOf (X, Ty1)::G T Ty2 ------------------------------------ [T-Abs] typeOf G abs(X, T) arrowTy(Ty1, Ty2)Note that in the
T-Abs
rule we use a meta-variable
for the argument type. This gives us the same semantics
as T-TyAbs
in that the function must have exactly one
type when used, but a program's writer does not need to specify
what that type is.
If we left the description of the typeOf
relation at
this point, it would work in the context of the host language and
typing extension. However, if we included the let
extension in the language composition, typing would not be defined
for it. Thus we include a special rule, marked
with *
after the rule name, to define the relation
for terms built by constructors from other extensions that do not
build on the current one:
|{tm}- T ~~> T_T typeOf G T_T Ty ---------------- [T-D]* typeOf G T TyFor example, this rule would be instantiated for the
let
extension's let
constructor,
defining its typing by the typing of its projection.
Each relation introduced by an extension with a primary component
introduced by one of the modules on which it builds must include a
single default rule. Often this takes the form seen
in T-D
, where the other arguments to the relation
are copied directly from the projection of the primary component.
However, this form is not required, and default rules may take any
form. The only requirement is that the primary component in the
conclusion must be a meta-variable, as it will be filled with
constructors from other extensions.
Extensions can also build on other extensions. We add an
extension introducing numbers, building on top of
the typing
extension.
We declare the module's name and that it builds on
the typing
extension:
Module tutorial:num Builds on tutorial:typingThis single
Builds on
declaration will transitively
include the modules on which tutorial:typing
builds,
so the current extension knows both the typing
extension and the host language.
We extend both tm
from the host language
and ty
from the typing
extension, giving
projections for the new constructors from both:
tm ::= ... | num(int) | plus(tm, tm) ty ::= ... | intTy ---------------------------------------------------------- [Proj-IntTy] |{ty}- intTy ~~> arrowTy(arrowTy(Ty, Ty), arrowTy(Ty, Ty)) ---------------------------------------------------------- [Proj-Plus] |{tm}- plus(T1, T2) ~~> abs("m", abs("n", abs("f", abs("x", app(var("m"), app(var("f"), app(var("n"), app(var("f"), var("x"))))))))) buildNumTerm I T --------------------------------------- [Proj-Num] |{tm}- num(I) ~~> abs("f", abs("x", T))The
[Proj-Num]
rule uses another relation to build
the appropriate Church encoding for its number:
/*Build the body of a Church encoding of a number,
using names "f" and "x"*/
Fixed Judgment buildNumTerm : int tm
======================= [BNT-Zero]
buildNumTerm 0 var("x")
N - 1 = NSub
buildNumTerm NSub T
=============================== [BNT-Step]
buildNumTerm N app(var("f"), T)
Unlike the other relations we have seen, this is a fixed
judgment. It is not extensible; all its rules are given by the
module introducing it. Because it is not extensible, it does not
have a primary component. Fixed judgments are useful for defining
relations over non-extensible types, such as lists, or to tie
together multiple extensible relations into a single relation that
uses them all. To remind the writer that the relation being
defined is fixed rather than extensible, the rules use a line made
from equals signs rather than hyphens.
The extension defines the relations introduced by both the host
language and the typing
extension for the constructs
it introduces:
eval T1 T11 ------------------------------- [E-Plus-Step1] eval plus(T1, T2) plus(T11, T2) value T1 eval T2 T21 ------------------------------- [E-Plus-Step2] eval plus(T1, T2) plus(T1, T21) I1 + I2 = I ---------------------------------- [E-Plus-Add] eval plus(num(I1), num(I2)) num(I) ----------------------- [S-Num] subst X R num(I) num(I) subst X R T1 S1 subst X R T2 S2 ----------------------------------- [S-Plus] subst X R plus(T1, T2) plus(S1, S2) ------------ [V-Num] value num(I) ------------------ [C-Num] closed Vars num(I) closed Vars T1 closed Vars T2 ------------------------ [C-Plus] closed Vars plus(T1, T2) --------------------- [T-Num] typeOf G num(I) intTy typeOf G T1 intTy typeOf G T2 intTy --------------------------- [T-Plus] typeOf G plus(T1, T2) intTyThe restrictions on extending the relations are the same whether the relation was introduced by the host language or the
typing
extension.
Note that the projection of plus
will not evaluate
with the rules above because it relies on the Church encoding of
numbers, that is, the projection of num(I)
. If we
believe it is important the projection capture evaluation
behavior, we can add a rule for absForm
for num
capturing its Church encoding:
buildNumTerm I T ------------------------------ [AF-Num] absForm num(I) "f" abs("x", T)This rule will also have the effect of making
app(num(3),
...)
evaluable. Strange as this may seem, the extension
writer can decide it is fine, especially in light of building on
the typing
extension that will rule out such a term
as ill-typed.