In general, extensible programming languages are languages where portions of them are developed separately, then put together. The specific view of extensible languages we take is one in which the language's syntax and semantics are given by a set of independently-developed modules building on each other. A common form for this to take is one where one module is the host language and the other modules are extensions adding to it. A language is built by choosing some subset of these modules, creating a language with all the constructs and semantics of all included modules.
While we are not tied to the host-and-extensions model, we will often use it, and in particular will use the term "extension" to refer to any module building on another, to highlight the fact it extends the other. We occasionally also use "host" to refer to modules not building on others. Also, most of our examples will use a direct host-and-extensions structure.
This view of extensibility is not new. Some examples of previous work using the host-and-extensions model:
Previous work has addressed frameworks for guaranteed composition of extensible languages. One aspect is parsing extensible languages, guaranteeing the descriptions of concrete syntax for extensible languages given in the context of independent modules will produce a non-ambiguous parser in any composition if they passed a modular check. Another aspect is well-definedness of semantic descriptions for extensible languages written in attribute grammars, guaranteeing the language semantics are fully-defined and not missing equations without the person building the composed language needing to write any glue code to fill in missing equations. Our work in general seeks to give guarantees for the correctness of the semantics of composed languages as another aspect of guaranteed composition.
In a basic formulation of language extensibility, a host language can introduce the following:
Extension modules can build on the definitions given by the modules on which they build by introducing the following:
A composed language is built by combining a set of modules. A
well-formed composition requires, for each module included, that
the modules on which it builds are also included. For example, if
module A
builds on modules B
and C
, a language composition
including A
must also include B
and C
. A composed language contains all the syntax
and semantics introduced by all modules included in it.
Notice that we have independent extension on two fronts, in both the syntax and semantics of our language. How, when one extension adds a new semantic relation, do we define it on the new syntax constructors introduced by another extension? This is part of Phil Wadler's famous expression problem. One option is to say the relation is not derivable on the new constructors. However, this would make the combination of new syntax and new semantic relations unusable in practice. New relations, such as a new static safety check, would never hold if a program included new syntax from a different extension, and thus we would need to choose between the safety check and the other new syntax.
The solution we use allows a new relation from one extension to be defined on new syntax from another extension. This solution relies on the notion of the primary component of a relation, given when the relation is declared. The primary component is the argument the relation is about. For example, a typing relation is about the expression being typed, with the typing context and type being secondary, only telling us something about the expression.
Each relation introduced by an extension where the primary
component is a syntactic category given by the base language is
given a special rule, called its default rule, to define
it on new syntax from other extensions. This rule is written with
a variable in the place of the primary component. For example, we
might have an optimize
relation reducing constant
computations in an expression, with the default rule stating
optimizing an unknown expression leaves it as it
is: optimize E E
(a safe, if uninspiring, choice).
When a composed language is built, combining several extensions,
the default rule is replicated for each new constructor from other
extensions, filling in the primary component with the new
constructor. For example, if other extensions introduced
constructors add3(expr, expr, expr)
and negate(expr)
, we would instantiate the default
rule for each of these
(using Sterling syntax):
------------------------------------------ optimize add3(E1, E2, E3) add3(E1, E2, E3) ---------------------------- optimize negate(E) negate(E)This gives the
optimize
relation a definition on any
syntax, even that not known to it when it was introduced.
Oftentimes a simple default rule, as we saw
with optimize
, is not sufficient for defining an
extension-introduced relation. Generally what we want is to use a
notion of projection of an unknown construct to a known one.
Each syntactic category given has a projection relation, and each
constructor an extension introduces building an imported syntactic
category must have a rule giving its projection. Then the
default rule for a relation can be written to find the projection
of the primary component and derive the relation on this
projection, oftentimes something like this (once again, in
Sterling syntax):
|{ty}- X ~~> XT rel A B XT C D --------------- [Rel-D]* rel A B X C DThis essentially copies the definition from the projection, and allows more interesting relations to be defined in extensions.
Defining new semantic relations in an extension relies on
understanding the semantics of existing constructs. For example,
writing the optimize
relation given above to reduce
constant computations such as 3 + 4
only works if we
know that 3 + 4
must always reduce to 7
.
If an extension could write a new rule for evaluation
allowing 3 + 4
to evaluate instead to another number,
or a different kind of value entirely, the optimization would no
longer accurately capture the language's semantics.
Similarly, proving properties in the context of extensible
languages relies on understanding the semantics of existing
constructs. In proving the optimize
relation given
above is correct, that the optimized expression evaluates to the
same value as the original expression, requires knowing constant
expressions such as 3 + 4
must reduce according to
the known rules. If an extension could write a new rule for
evaluation allowing 3 + 4
to evaluate instead to a
value other than 7
, we could not prove
the optimize
relation accurately captures the
language's semantics.
To ensure the semantics of the known language remain the same
under extension, new rules given in an extension defining a
relation it imports must have the primary component of the
conclusion be built at the top by a new constructor. For example,
the extension adding the add3
constructor could
define evaluation for it, but it could not write a rule defining
evaluation for the pre-existing addition.