Extensible Languages

In general, extensible programming languages are languages where portions of them are developed separately, then put together. The specific view of extensible languages we take is one in which the language's syntax and semantics are given by a set of independently-developed modules building on each other. A common form for this to take is one where one module is the host language and the other modules are extensions adding to it. A language is built by choosing some subset of these modules, creating a language with all the constructs and semantics of all included modules.

While we are not tied to the host-and-extensions model, we will often use it, and in particular will use the term "extension" to refer to any module building on another, to highlight the fact it extends the other. We occasionally also use "host" to refer to modules not building on others. Also, most of our examples will use a direct host-and-extensions structure.

This view of extensibility is not new. Some examples of previous work using the host-and-extensions model:

JastAdd is a Java compiler that permits adding extensions to Java.
SugarJ adds new syntax to Java by permitting libraries to export new syntax in addition to functions.
Silver is an attribute grammar system for writing extensible languages, and is an extensible language itself.
AbleC is an extensible version of the C programming language.

Our view is slightly different from that of the first two in that extensions may add both new syntax and semantics, with language composition also not requiring glue code; any composed language includes the semantics of all modules included in it, and these semantics are defined without the language composer needing to add any more definitions.

Previous work has addressed frameworks for guaranteed composition of extensible languages. One aspect is parsing extensible languages, guaranteeing the descriptions of concrete syntax for extensible languages given in the context of independent modules will produce a non-ambiguous parser in any composition if they passed a modular check. Another aspect is well-definedness of semantic descriptions for extensible languages written in attribute grammars, guaranteeing the language semantics are fully-defined and not missing equations without the person building the composed language needing to write any glue code to fill in missing equations. Our work in general seeks to give guarantees for the correctness of the semantics of composed languages as another aspect of guaranteed composition.

Basic Extensible Languages

In a basic formulation of language extensibility, a host language can introduce the following:

Syntactic categories
Constructors building those syntactic categories
Semantic relations
Rules defining those semantic relations

Extension modules can build on the definitions given by the modules on which they build by introducing the following:

New constructors of imported syntactic categories
New rules defining imported relations
New syntactic categories
Constructors building those new syntactic categories
New semantic relations
Rules defining those new semantic relations

A composed language is built by combining a set of modules. A well-formed composition requires, for each module included, that the modules on which it builds are also included. For example, if module A builds on modules B and C, a language composition including A must also include B and C. A composed language contains all the syntax and semantics introduced by all modules included in it.

Composing New Syntax and New Semantics

Notice that we have independent extension on two fronts, in both the syntax and semantics of our language. How, when one extension adds a new semantic relation, do we define it on the new syntax constructors introduced by another extension? This is part of Phil Wadler's famous expression problem. One option is to say the relation is not derivable on the new constructors. However, this would make the combination of new syntax and new semantic relations unusable in practice. New relations, such as a new static safety check, would never hold if a program included new syntax from a different extension, and thus we would need to choose between the safety check and the other new syntax.

The solution we use allows a new relation from one extension to be defined on new syntax from another extension. This solution relies on the notion of the primary component of a relation, given when the relation is declared. The primary component is the argument the relation is about. For example, a typing relation is about the expression being typed, with the typing context and type being secondary, only telling us something about the expression.

Each relation introduced by an extension where the primary component is a syntactic category given by the base language is given a special rule, called its default rule, to define it on new syntax from other extensions. This rule is written with a variable in the place of the primary component. For example, we might have an optimize relation reducing constant computations in an expression, with the default rule stating optimizing an unknown expression leaves it as it is: optimize E E (a safe, if uninspiring, choice). When a composed language is built, combining several extensions, the default rule is replicated for each new constructor from other extensions, filling in the primary component with the new constructor. For example, if other extensions introduced constructors add3(expr, expr, expr) and negate(expr), we would instantiate the default rule for each of these (using Sterling syntax):


------------------------------------------
optimize add3(E1, E2, E3) add3(E1, E2, E3)

----------------------------
optimize negate(E) negate(E)

This gives the optimize relation a definition on any syntax, even that not known to it when it was introduced.

Oftentimes a simple default rule, as we saw with optimize, is not sufficient for defining an extension-introduced relation. Generally what we want is to use a notion of projection of an unknown construct to a known one. Each syntactic category given has a projection relation, and each constructor an extension introduces building an imported syntactic category must have a rule giving its projection. Then the default rule for a relation can be written to find the projection of the primary component and derive the relation on this projection, oftentimes something like this (once again, in Sterling syntax):

|{ty}- X ~~> XT
rel A B XT C D
--------------- [Rel-D]*
rel A B X C D

This essentially copies the definition from the projection, and allows more interesting relations to be defined in extensions.

Limitations on Extending Relations

Defining new semantic relations in an extension relies on understanding the semantics of existing constructs. For example, writing the optimize relation given above to reduce constant computations such as 3 + 4 only works if we know that 3 + 4 must always reduce to 7. If an extension could write a new rule for evaluation allowing 3 + 4 to evaluate instead to another number, or a different kind of value entirely, the optimization would no longer accurately capture the language's semantics.

Similarly, proving properties in the context of extensible languages relies on understanding the semantics of existing constructs. In proving the optimize relation given above is correct, that the optimized expression evaluates to the same value as the original expression, requires knowing constant expressions such as 3 + 4 must reduce according to the known rules. If an extension could write a new rule for evaluation allowing 3 + 4 to evaluate instead to a value other than 7, we could not prove the optimize relation accurately captures the language's semantics.

To ensure the semantics of the known language remain the same under extension, new rules given in an extension defining a relation it imports must have the primary component of the conclusion be built at the top by a new constructor. For example, the extension adding the add3 constructor could define evaluation for it, but it could not write a rule defining evaluation for the pre-existing addition.