Reference Guide
Sterling is a system for defining extensible languages using
inference rules. It has three parts for defining languages: the
semantic portion, describing the abstract syntax and judgments in
the language; the concrete portion, describing the concrete syntax
of the language; and the main function portion, describing how the
language should be run on a program.
This is a reference for specific features, not a tutorial. A
Sterling tutorial can be found
elsewhere.
Running Sterling
To run Sterling to check a module named mo:du:le
, run
sterling mo:du:le
This will look for files in the directory
mo/du/le
rooted in the current directory. To add another root directory
location, use the
-I
flag:
sterling -I .. mo:du:le
This will find the module when run from within the
mo
directory. The
-I
flag may be used multiple times,
with each given directory added as a location. The root
directories of the module being checked and each of the modules on
which it builds must be added if they are not located in the
directory where Sterling is being run.
Running Sterling without flags other than the -I
flag
will check for errors but not result in other actions. To see
more options for running Sterling, use the --help
flag.
Semantic Portion
The semantic portion of the language is written
in .sos
files. It includes declaring abstract syntax
and declaring and defining judgments. This is also where modules
on which the current one builds are declared, with builds-on
declarations like
Builds on mo:du:le
Declaring Abstract Syntax
Syntactic categories and sets of constructors for them are
declared together:
expr ::= or(expr, expr)
| and(expr, expr)
| true
| false
| abs(string, ty, expr)
| app(expr, expr)
If a constructor takes no arguments we may leave the parentheses
off, as with
true
above.
We can add new constructors to an existing category by starting
the declaration of constructors with an ellipsis:
expr ::= ...
| plus(expr, expr)
| num(int)
| greater(expr, expr)
Each new syntactic category requires a declaration of its
projection type. For example, we might define projection for an
expression to depend on a typing context:
Projection expr : typingCtx
Quite often, we find we don't want projections to depend on
anything, so we can have no arguments:
Projection stmt :
We might project a repeat-while
construct into a
regular while
loop:
----------------------------------------------------------------- [Proj-RepeatWhile]
|{stmt}- repeatWhile(Body, Cond) ~~> seq(Body, while(Cond, Body))
We might project a
let
binding to the basic
simply-typed lambda calculus by checking the type of the bound
expression:
typeOf Ctx E1 Ty
------------------------------------------------------- [Proj-Let]
Ctx |{expr}- let(X, E1, E2) ~~> app(abs(X, Ty, E2), E1)
Declaring and Defining Judgments
The syntax of terms in Sterling includes the following constructs:
c(t1, t2, t3) |
constructor application (parentheses optional for
constructors without arguments) |
Var |
variable (starts with capital letter) |
<integer> |
integer literal |
"string" |
string literal |
(t1, t2, t3) |
tuple literal |
t1::t2 |
cons list |
[] |
empty list |
[t1, t2, t3] |
list literal |
(t : ty) |
ascription |
The syntax of judgments in Sterling includes the following
constructs:
t1 = t2 |
equality check |
t1 != t2 |
inequality check |
t1 < t2 |
less than for integers |
t1 > t2 |
greater than for integers |
t1 <= t2 |
less than or equal to for integers |
t1 >= t2 |
greater than or equal to for integers |
t1 + t2 = t3 |
integer addition |
t1 - t2 = t3 |
integer subtraction |
t1 * t2 = t3 |
integer multiplication |
t1 / t2 = t3 |
integer division |
t1 % t2 = t3 |
integer modulus |
t1 ++ t2 = t3 |
append for two strings or two lists |
t1, t2, t3 |{c}- t4 ~~> t5 |
projection |
rel t1 t2 t3 |
relation judgment |
!rel t1 t2 t3 |
negation of a relation judgment |
Fixed judgments have all their rules defined in the module
introducing them. We can declare a fixed ternary
judgment frel
as
Fixed Judgment frel : a b c
Rules for fixed judgments have the form
<premise judgment 1>
...
<premise judgment n>
===================== [<rule name>]
<conclusion judgment>
The line must include at least five equals signs. Rule names are
expected to be unique.
Fixed judgments may be declared to be polymorphic by including
type variables, written using capital letters, in their type
declarations. For example,
Fixed Judgment rel : [A] A
declares a relation relating a list containing some type of
element to a term of that type.
Extensible judgments have their rules extended by further
extensions building on the module in which they are introduced.
We can declare an extensible binary judgment erel
as
Judgment erel : a b*
The
*
marks the primary component around which the
relation is defined, required for each extensible judgment. Rules
for extensible judgments have the form
<premise judgment 1>
...
<premise judgment n>
--------------------- [<rule name>]
<conclusion judgment>
The line must include at least five hyphens. Extensions building
on the module introducing a relation may add new rules defining
it, provided the conclusion of each new rule has its primary
component built at the top level by a constructor introduced by
the extension introducing the rule.
Extensible judgments declared with a primary component category
from a module on which the current one builds (as opposed to a
category the current module introduces) must declare a lifting
rule, marked with *
after the rule name. These often
have the form of a copy rule,
|{cat}- B ~~> B_Proj
erel A B_Proj
--------------------- [Erel-Proj]*
erel A B
but may have any form. The conclusion must have a variable as its
primary component argument.
Extensibella Stand-In Rules
When using Sterling to write specifications
for Extensibella
reasoning, we often need to specify stand-in rules that will be
used to model the behavior of other, unknown extensions. These
are written as regular extensible rules, but within
an Extensibella_Stand_In
environment:
Extensibella_Stand_In {
<premises>
----------- [Erel-Q]
erel <args>
}
These rules do not have any effect on a language's semantics and
are only used in the generated Extensibella specification. All
arguments to the conclusion must be unique variables; structured
terms or repeated variable names are not allowed, as they would
limit the applicability of this rule that is supposed to
represent
all unknown rules. As with default rules,
stand-in rules can only be introduced by the module introducing
the relation being defined.
Concrete Portion
Terminals are declared with names and regular expressions, such as
name_t /[a-z][a-zA-Z0-9_]*
Regular expressions are as expected. Special characters must be
escaped using backslashes. Spaces are ignored by default, but can
be included by backslash escaping them.
Terminals that will be ignored by all productions in the grammar
can be declared as ignore terminals, such us
ignore /[\n\r\t\ ]+/
Strings matching ignore terminals may occur anywhere in a string
being parsed by the grammar and will be skipped.
Nonterminals and productions are declared together, similar to how
abstract syntax categories and constructors are declared.
Nonterminals are also declared with the type of abstract syntax
built for them:
nt<abs ty> ::= <prod 1>
| <prod 2>
| <prod 3>
When adding new productions to existing nonterminals, the
nonterminal type is given without an abstract syntax type and the
productions are prefaced with an ellipsis:
nt ::= ..
| <new prod 1>
| <new prod 2>
Productions have two parts, the signature and the abstract syntax
term, joined by ~~>
. Production signatures are built
from terminals and nonterminals, which may be given names to
reference them. For example,
backslash_t Name::name_t dot_t Body::term_c
might be the signature for a lambda abstract,
where
backslash_t
,
name_t
,
and
dot_t
are terminals and
term_c
is a
nonterminal.
The syntax of terms for building abstract syntax in Sterling:
Name |
names of production elements from the signature |
c(t1, t2, t3) |
abstract syntax constructors |
"string" |
string literals |
<integer> |
integer literal |
$to_int(t) |
convert a string to an integer |
t1[t2:t3] |
substring of a string with zero-based indices |
t1::t2 |
cons list |
[] |
empty list |
[t1, t2, t3] |
list literal |
(t1, t2, t3) |
tuple literal |
Note that the type of a named production element is string for a
terminal and, for a nonterminal, the type of its built abstract
syntax tree.
Main Function Portion
The main function portion of the language uses functions to tie
together the parsing and judgments written in the other two parts
of the language. Functions are declared as
Function <name> : <name 1 : type 1> <name 2 : type 2> <name 3 : type 3> -> <return type>{
<body>
}
Each parameter name must be capitalized. The types may be the
types in the
semantic portion of the
language as well as
bool
and
unit
types in only this portion of the language.
The starting point for running a Sterling module is the
main
function with type
[string] -> int
.
Each function's body is a single expression:
true |
Boolean true |
false |
Boolean false |
e1 || e2 |
Boolean disjunction |
e1 && e2 |
Boolean conjunction |
If e1 Then e2 Else e3 |
conditional |
e1 + e2 |
integer addition |
e1 - e2 |
integer subtraction |
e1 * e2 |
integer multiplication |
e1 / e2 |
integer division |
e1 % e2 |
integer modulus |
<integer> |
integer literal |
e1 < e2 |
integer less than |
e1 > e2 |
integer greater than |
e1 <= e2 |
integer less than or equal to |
e1 >= e2 |
integer greater than or equal to |
e1 = e2 |
equality comparison |
"string" |
string literals |
fun(e1, e2, e3) |
function call |
e1[e2] |
zero-based list indexing |
e1 ++ e2 |
list and string concatenation |
Let <Var> := e1 In e2 |
let binding (variable must be capitalized) |
<Var> |
variables referring to arguments to the function or
let-bound variables |
e1 Before e2 |
expression sequencing (first expression must have
type unit ) |
Print e |
print the string form of the expression's value |
Write e1 to e2 |
write the string form of expression e1 's
value to a file with name e2 |
Read e |
read the contents of a file |
Parse <nt> from e |
parse the given nonterminal from the string expression
(detailed discussion below) |
Derive {<judgment>} for <input vars>
assigning [<output vars>] |
derive the given relation for the given arguments,
setting the output variables (detailed discussion
below) |
Parsing takes a nonterminal name, for the nonterminal to parse,
and a string expression for the string from which to parse it.
A Parse
returns a triple of type (bool,
<abstract syntax>, string)
. These are
- whether the parse was successful (
true
) or not
- the abstract syntax produced from the parse tree (only
defined if the first element is
true
)
- errors describing why the string didn't parse (only defined
if the first element is
false
)
A common structure for parsing, then, is
Let Success, A, Err := Parse nt from String
In
If Success
Then <use A>
Else <use Err>
In deriving judgments, we need to treat them as functions so we
identify both input variables and output variables. Input
variables take their values from the current environment and
output variables produce values. For example, in
Derive {typeOf [] Tm Ty} for Tm assigning [Ty]
This reads the value of
Tm
, then derives a type for
it. Deriving a judgment produces a tuple of
n + 1 values,
where
n is the number of output variables. The first
element of the tuple has type
bool
and tells whether
the derivation is successful. The other elements in the tuple are
the values found for the output variables. In the example, the
output type of the expression would be
(bool, ty)
.
Miscellaneous
Names
All defined names start with a lowercase letter, followed by zero
or more lowercase letters, capital letters, digits, and
underscores. All variables must start with capital letters.
Qualified Names
All names introduced in any portion of the language are implicitly
qualified by the module in which they are declared. For example,
a type expr
declared in a
module mo:du:le
has as its qualified
name mo:du:le:expr
. The non-qualified name, the
short name, can be used as long as no other name for the same type
of thing is in scope, in which case the qualified names must be
used for both. For example, if two categories a:c
and b:c
are both in scope, they must be referenced by
their qualified names. If instead b:c
is a
constructor, they do not interfere and may be referenced by their
short names.
Built-In Types
Sterling has a few built-in types:
string
int
- lists, written
[ty]
- n-ary tuples, written as comma-separated lists of
types in parentheses (e.g.
(int, expr, [int])
)
Emacs Support
Sterling has modes written for Emacs for each part of the
language, found in
the Sterling
GitHub repository. To use these, load them in
your .emacs
file with the following lines:
(load "path/to/sterling/ide/sterling.el")
(load "path/to/sterling/ide/sterling-conc.el")
(load "path/to/sterling/ide/sterling-main.el")
This will make Emacs know to open Sterling files using the
appropriate mode automatically.