Tutorial

This tutorial introduces developing an extensible language in Sterling. It builds an example with a host language and a few extensions. Note this only covers the semantic portion of the language, describing abstract syntax and language semantics.

Sterling languages are built by modules. These modules build on each other, adding to the definitions given by the modules on which they build. Because extensions to a module are written separately, they may introduce the same name for different uses. To avoid confusion, each name introduced in a module is qualified with the module's name. For example, our tutorial:host module introduces a constructor with the short name var. The full name of this constructor is tutorial:host:var. Either the full name or the short name may be used generally. However, if two constructs with the same name are in the module, perhaps from two separate built-on modules, the full name must be used to differentiate them.

Modules in Sterling are defined as collections of files in a single directory. Modules are named after their directories, so the tutorial:host module would be a collection of files in a directory named host inside a directory named tutorial (tutorial/host). When we run Sterling and tell it to process the tutorial:host module, it looks for this directory structure to identify the files belonging to the module.

The language we build here has four modules:

Host language (tutorial:host)
Extension adding let bindings (tutorial:let)
Extension adding typing (tutorial:typing)
Extension adding numbers (tutorial:num)

Host Language

Languages in Sterling start with a host language, a module on which all others build. Our host language will be the untyped lambda calculus.

We start each file in a module with a declaration of which module we are defining. In our case, this will be the tutorial:host module:

Module tutorial:host

This name will be used to identify this module for other modules to build on it, adding to its definitions.

Defining Syntax

We start by defining our abstract syntax:

tm ::= var(string)
     | abs(string, tm)
     | app(tm, tm)

This defines one syntactic category, tm, with three constructors:

The var constructor represents lambda calculus variables. We use the Sterling built-in string type for variable names.
The abs constructor represents lambda calculus abstractions. This uses the Sterling string type for the bound variable and the tm type for lambda calculus terms for the body.
The app constructor represents the application of one term to another.

These are the only constructors we introduce in the host language, although extensions may add to them.

For each syntactic category we introduce, we must also give its projection type:

Projection tm :

Each constructor of a syntactic category introduced by an extension will introduce a projection of terms built by that constructor. These projections can then be used to define relations extensions introduce for constructs from other extensions, as we shall see below.

The empty projection type we have for tm means the projection will relation two terms of type tm. In some cases, it is useful to condition the projection on other arguments. For example, we might want to be able to condition the projection on the object-level types of sub-terms. In that case, we would want a projection type with a typing context so we could determine the types of sub-terms in the rules for projection.

Defining Relations

We start by defining a relation checking whether a tm is closed. We first declare it:

/*whether a tm is closed, that it does not use any free vars*/
Judgment closed : [string] tm*

This declares a relation named closed with two arguments, a list of strings and a tm. Additionally, the tm is marked as the primary component of the relation. Conceptually, the primary component is the argument the relation is about. Here the closed relation is about the tm, with the list of variable names being present only to support determining something about the tm.

We will use the primary component to ensure the semantics of relations for known constructs are not changed by extensions. When an extension introduces a rule defining a relation introduced by a built-on module, the primary component must be built by a constructor introduced by that extension. This prevents extensions from introducing rules applicable to pre-existing constructs and modifying their semantics.

We write rules for the relation, which are as one would expect:

mem X Vars
------------------ [C-Var]
closed Vars var(X)


closed X::Vars T
--------------------- [C-Abs]
closed Vars abs(X, T)


closed Vars T1
closed Vars T2
----------------------- [C-App]
closed Vars app(T1, T2)

In the C-Var rule we check whether the variable X is in the list of bound variables. This uses the mem relation in the Sterling standard library rather than a relation defined in this module.

We also define evaluation for tms, with several relations to support it:

/*eval T1 T2:  small-step evaluation, where T1 steps to T2*/
Judgment eval : tm* tm

/*subst X R B S:  substitute R for X in B yielding S*/
Judgment subst : string tm tm* tm

/*value T:  T is a value*/
Judgment value : tm*

These relations have rules generally as we would expect:

X != Y
----------------------- [S-Var-NEq]
subst X R var(Y) var(Y)


------------------ [S-Var-Eq]
subst X R var(X) R


X != Y
subst X R B S
----------------------------- [S-Abs-NEq]
subst X R abs(Y, B) abs(Y, S)


----------------------------- [S-Abs-Eq]
subst X R abs(X, B) abs(X, B)


subst X R T1 S1
subst X R T2 S2
--------------------------------- [S-App]
subst X R app(T1, T2) app(S1, S2)



--------------- [V-Abs]
value abs(X, T)



eval T1 T11
----------------------------- [E-App-Step1]
eval app(T1, T2) app(T11, T2)


value T1
eval T2 T21
----------------------------- [E-App-Step2]
eval app(T1, T2) app(T1, T21)


absForm T1 X Body
value T2
subst X T2 Body V
------------------ [E-App-Subst]
eval app(T1, T2) V

The exception is the E-App-Subst rule. We would expect this rule to have as its conclusion

eval app(abs(X,
    Body), T2) V

, explicitly stating the first term is an abstraction. The reason we do not write this is that it prevents extensions from introducing new abstract-like constructs that may be applied. An extension cannot introduce a new rule defining evaluation for app because of the restrictions for the primary component, so, if we want to allow app to apply extension-introduced function constructors, we cannot put the abs form into the rule directly.

To solve the problem of extensibility while still getting the information we need, we define a relation absForm that checks if a tm is somehow abs-shaped, pulling out the bound variable name and body if so:

/*absForm T X B:  T is like the abstraction abs(X, T)*/
Judgment absForm : tm* string tm

--------------------- [AF-Abs]
absForm abs(X, T) X T

This relation can be extended by extensions so their constructs can also be applied using app.

Let Extension

Some extensions only add new constructors and define existing judgments over them. This extension adds a new let constructor and defines the host language's judgments on it.

The extension declares its name and that it builds on the host language:

Module tutorial:let

Builds on tutorial:host

This adds one new constructor of the tm category:

tm ::= ...
     | let(string, tm, tm)

We start the constructor declaration with an ellipsis (...) to show this is an addition to the pre-existing constructors of tm rather than a new tm category.

Because let builds a syntactic category from a built-on module, we need to give it a projection that eliminates the let constructor:


--------------------------------------------- [Proj-Let]
|{tm}- let(X, T1, T2) ~~> app(abs(X, T2), T1)

This says a term built by the let constructor is similar to an application of a built abstraction over the same sub-terms.

We also define the host language's relations for the new let constructor:

eval T1 T11
----------------------------------- [E-Let-Step]
eval let(X, T1, T2) let(X, T11, T2)


value T1
subst X T1 T2 V
--------------------- [E-Let-Subst]
eval let(X, T1, T2) V



X != Y
subst X R T1 S1
subst X R T2 S2
--------------------------------------- [S-Let-NEq]
subst X R let(Y, T1, T2) let(Y, S1, S2)


subst X R T1 S1
--------------------------------------- [S-Let-Eq]
subst X R let(X, T1, T2) let(X, S1, T2)



closed Vars T1
closed X::Vars T2
-------------------------- [C-Let]
closed Vars let(X, T1, T2)

Note that each rule has a term built by the let constructor as the primary component in the conclusion. Each new rule introduced in an extension for a relation introduced by a built-on module must have the primary component built by a constructor introduced by the extension so the semantics of constructs not introduced by the extension are not affected by the extension's new rules.

Typing Extension

Extensions may also introduce new syntactic categories and new judgments. This extension introduces a notion of typing for the language.

As in all extensions, we declare the module name and the module on which it builds:

Module tutorial:typing

Builds on tutorial:host

Because we will be typing tms, we need a way to describe types. We introduce a new syntactic category ty:

ty ::= baseTy
     | arrowTy(ty, ty)

Projection ty :

We also add a typed version of an abstraction, projecting to a basic abstraction (abs) and define the host language's judgments on it:

tm ::= ...
     | tyAbs(string, ty, tm)

------------------------------------ [Proj-TyAbs]
|{tm}- tyAbs(X, Ty, T) ~~> abs(X, T)



X != Y
subst X R B S
----------------------------------------- [S-TyAbs-NEq]
subst X R tyAbs(Y, Ty, B) tyAbs(Y, Ty, S)


----------------------------------------- [S-TyAbs-Eq]
subst X R tyAbs(X, Ty, B) tyAbs(X, Ty, B)



--------------------- [V-TyAbs]
value tyAbs(X, Ty, T)



--------------------------- [AF-TyAbs]
absForm tyAbs(X, Ty, T) X T



closed X::Vars T
--------------------------- [C-TyAbs]
closed Vars tyAbs(X, Ty, T)

Note that rule AF-TyAbs takes a tyAbs apart as an abstraction so

app(tyAbs("x", var("x")),
    ...)

can evaluate.

We introduce a judgment typeOf for typing tms:

Judgment typeOf : [(string, ty)] tm* ty

We define typing rules for all known constructs, those introduced by the typing extension and the host language:

lookup G X Ty
------------------ [T-Var]
typeOf G var(X) Ty


typeOf G T1 arrowTy(Ty1, Ty2)
typeOf G T2 Ty1
----------------------------- [T-App]
typeOf G app(T1, T2) Ty2


typeOf (X, Ty1)::G T Ty2
------------------------------------------- [T-TyAbs]
typeOf G tyAbs(X, Ty1, T) arrowTy(Ty1, Ty2)


typeOf (X, Ty1)::G T Ty2
------------------------------------ [T-Abs]
typeOf G abs(X, T) arrowTy(Ty1, Ty2)

Note that in the T-Abs rule we use a meta-variable for the argument type. This gives us the same semantics as T-TyAbs in that the function must have exactly one type when used, but a program's writer does not need to specify what that type is.

If we left the description of the typeOf relation at this point, it would work in the context of the host language and typing extension. However, if we included the let extension in the language composition, typing would not be defined for it. Thus we include a special rule, marked with * after the rule name, to define the relation for terms built by constructors from other extensions that do not build on the current one:

|{tm}- T ~~> T_T
typeOf G T_T Ty
---------------- [T-D]*
typeOf G T Ty

For example, this rule would be instantiated for the let extension's let constructor, defining its typing by the typing of its projection.

Each relation introduced by an extension with a primary component introduced by one of the modules on which it builds must include a single default rule. Often this takes the form seen in T-D, where the other arguments to the relation are copied directly from the projection of the primary component. However, this form is not required, and default rules may take any form. The only requirement is that the primary component in the conclusion must be a meta-variable, as it will be filled with constructors from other extensions.

Number Extension

Extensions can also build on other extensions. We add an extension introducing numbers, building on top of the typing extension.

We declare the module's name and that it builds on the typing extension:

Module tutorial:num

Builds on tutorial:typing

This single Builds on declaration will transitively include the modules on which tutorial:typing builds, so the current extension knows both the typing extension and the host language.

We extend both tm from the host language and ty from the typing extension, giving projections for the new constructors from both:

tm ::= ...
     | num(int)
     | plus(tm, tm)

ty ::= ...
     | intTy

---------------------------------------------------------- [Proj-IntTy]
|{ty}- intTy ~~> arrowTy(arrowTy(Ty, Ty), arrowTy(Ty, Ty))


---------------------------------------------------------- [Proj-Plus]
|{tm}- plus(T1, T2) ~~>
       abs("m", abs("n", abs("f", abs("x",
          app(var("m"), app(var("f"),
              app(var("n"), app(var("f"), var("x")))))))))


buildNumTerm I T
--------------------------------------- [Proj-Num]
|{tm}- num(I) ~~> abs("f", abs("x", T))

The [Proj-Num] rule uses another relation to build the appropriate Church encoding for its number:

/*Build the body of a Church encoding of a number,
  using names "f" and "x"*/
Fixed Judgment buildNumTerm : int tm

======================= [BNT-Zero]
buildNumTerm 0 var("x")


N - 1 = NSub
buildNumTerm NSub T
=============================== [BNT-Step]
buildNumTerm N app(var("f"), T)

Unlike the other relations we have seen, this is a fixed judgment. It is not extensible; all its rules are given by the module introducing it. Because it is not extensible, it does not have a primary component. Fixed judgments are useful for defining relations over non-extensible types, such as lists, or to tie together multiple extensible relations into a single relation that uses them all. To remind the writer that the relation being defined is fixed rather than extensible, the rules use a line made from equals signs rather than hyphens.

The extension defines the relations introduced by both the host language and the typing extension for the constructs it introduces:

eval T1 T11
------------------------------- [E-Plus-Step1]
eval plus(T1, T2) plus(T11, T2)


value T1
eval T2 T21
------------------------------- [E-Plus-Step2]
eval plus(T1, T2) plus(T1, T21)


I1 + I2 = I
---------------------------------- [E-Plus-Add]
eval plus(num(I1), num(I2)) num(I)



----------------------- [S-Num]
subst X R num(I) num(I)


subst X R T1 S1
subst X R T2 S2
----------------------------------- [S-Plus]
subst X R plus(T1, T2) plus(S1, S2)



------------ [V-Num]
value num(I)



------------------ [C-Num]
closed Vars num(I)


closed Vars T1
closed Vars T2
------------------------ [C-Plus]
closed Vars plus(T1, T2)



--------------------- [T-Num]
typeOf G num(I) intTy


typeOf G T1 intTy
typeOf G T2 intTy
--------------------------- [T-Plus]
typeOf G plus(T1, T2) intTy

The restrictions on extending the relations are the same whether the relation was introduced by the host language or the typing extension.

Note that the projection of plus will not evaluate with the rules above because it relies on the Church encoding of numbers, that is, the projection of num(I). If we believe it is important the projection capture evaluation behavior, we can add a rule for absForm for num capturing its Church encoding:

buildNumTerm I T
------------------------------ [AF-Num]
absForm num(I) "f" abs("x", T)

This rule will also have the effect of making

app(num(3),
    ...)

evaluable. Strange as this may seem, the extension writer can decide it is fine, especially in light of building on the typing extension that will rule out such a term as ill-typed.