Solving Expression Problem with Visual Programming

author: Grzegorz Przybylski

Introduction

This thesis will tackle the Expression Problem. In short, Expression Problem occurs during the process of writing a program with some number of types of data, and some number of operations on all these types. When new types and new operations are added consecutively in arbitrary order, one must revisit existing definitions upon these additions. In object-oriented programming one must revisit all existing classes when adding a new method. In functional programming one must revisit all existing function when adding a data case.

In the first chapter we will review the history of the problem research and approaches to its solutions. Then we will define Expression Problem more formally, considering all the interpretations of the informal definition. We will separate a common understanding of the definition agreed upon by all previous papers and follow it by extensions – aspects considered by some works.

In the second chapter we will compare few notable solutions to the Expression Problem present in existing literature. For all of them we will follow a common example, prepared by the author of this work. For each solution a discussion will assess how it solves the extensions of the Expression Problem defined in the first chapter.

Finally, third chapter will present Visual Solution – a proposal of novel approach to the Expression Problem utilizing techniques of visual (non-textual) programming to solve the organizational aspects of the Expression Problem. The Visual Solution will be discussed in the same categories as previous solutions to provide fair comparison and enlist known limitations.

The main contributions of this work are:

Providing a common ground for comparison of existing solutions to the Expression Problem, including implementation in several different programming languages,
Proposing a new non-textual programming model for solving Expression Problem, along with a detailed description of the valid and invalid states of this model and its translation to a programming language for proving its expressiveness,
Design of a graphical user interface for an integrated development environment encompassing such a model, and description of usage of such interface.

Introduction to problem terminology

Function: an abstraction for a unit of code separated under a name, parametric with accordance to an arbitrary number of unknowns and reusable via a construct of function call. The Type of function parameter constrains a set of values that this parameter can take based on compile-time check.
Algorithm: a function that at the top level of indirection has a single implementation; generic in regard of its argument type but can in reality differ in implementation on the further levels of indirection via non-generic functions that it uses.
Method: a function, which has an arbitrary number of alternative implementations, of which one is selected at runtime based on the data type of one or more of its arguments.
Object Oriented Programming: a style of programming, where a definition of a data type is inseparable from definitions of functions on this data type and from declarations of partaking in an abstraction. Such a unit of data description, abstraction membership declarations and function definitions we call a Class.
Interface: an abstraction in Object Oriented Programming defined by a set of externally available methods.
Functional Programming: Functional Programming: a style of programming, where a data type is defined separately from functions operating on that data type. While in general functional programming is defined by more distinct qualities, they are not a concern of this work.
Compilation unit: a unit of executable code or a runtime linkable library that is a result of a single compilation operation, of code provided directly, and code linked statically by a reference to another available module of source code.
Compilation unit: a unit of executable code or a runtime linkable library that is a result of a single compilation operation, of code provided directly, and code linked statically by a reference to another available module of source code.
Extension of existing code: a change that regardless of compilation unit it is defined in, could be transferred to an external compilation unit with preservation of all semantics.
Modification of existing code: code change that is made in the compilation unit of the base version and cannot be transferred to an external compilation unit due to specifics of a language it is defined in. Remark: in equivalent code in two different programming languages the same change can be an extension in one and modification in the other.

Introduction to Expression Problem

History of expression problem research

Expression problem has been an object of some academic study since at least (Reynolds, 1978). The formulation of the problem changed together with new concepts being added to popular programming languages. In (Cook, 1991) two complementary methods of defining abstractions are described as “Procedural Data Abstractions” (PDA) and “Abstract Data Types” (ADT). Krishnamurthi et al. (1998) were the first to acknowledge two perspectives on the problem as functional and object oriented. They also pointed out that visitor pattern was a way of emulating functional approach in object oriented environment. This paper also introduces an example of expression problem with recursive data, which helps to identify unsoundness in potential solutions. It is also the first publicized text on the matter to describe it in terms of modern object-oriented programming languages, like Java. (Walder, 1998) was perhaps the first to use the name “Expression Problem” and present it on an example of expressions as they are understood by a (programming) language parser. Walder’s explanation for the name of the problem was that “Whether language can solve the Expression Problem is a salient indicator of its capacity for expression”. This formulation of Expression Problem was a provoker to further research from Rice University's Programming Languages Team (PLT). (Torgersen, 2004) introduced four new Expression Problem using generics.

(Tarr et al., 1999) discussed the problem wider than the Expression Problem: cross-cutting concerns in software development. They proposed a solution to be a decomposition of code with a composition rule according to a selected rule of organization.

(Zenger & Odersky, 2005) added a criterion of cross-unit composability to the definition of the problem, and introduced two solutions based on Scala’s mixin feature, which were extending existing solutions of object and functional decomposition to satisfy this new requirement.

Another line of research focused on the expression problem in a specific context of embedding programming languages in other languages, where a special subset of the expression problem occurs. The idea of embedding domain specific languages has been considered for a long time as well. It has been a centerpiece of the famous programming lecture and a book Structure and Interpretation of Computer Programs . A new approach to encoding languages, called “final tagless”, named after the title of the paper “Finally tagless, partially evaluated: Tagless staged interpreters for simpler typed languages” by Carette et al., 2009 allowed to introduce new interpreters and extend the set of possible language terms independently. It also didn’t demand advanced language constructs, unlike the initial approach. Oliveira & Cook (2012) presented a similar approach realized in the context of an object oriented language.

Definition of expression problem

Basic definition

Definition

Suppose our program models an external domain and defines a set of data types describing a set of objects from the external domain. Those data types may differ by structure, but all conform to a common interface. We then want to define a set of operations that all should handle all aforementioned types, possibly in different ways, specific to individual types. Expression Problem seeks for a procedure of defining these data types and functions in such a way, that an addition of a new type does not modify any existing code and addition of a new operation also does not modify any existing code.

Example

The basic, and historically first example of Expression Problem, and source of the name is a scenario of incrementally changing requirements, where the implementer is tasked with creating a system for processing syntax trees for expressions in some hypothetical programming language. Initially the requirements only demand two types of expression – a number literal and binary plus literal and a single operation to conduct on them – expression evaluation. A typical solution in an object oriented style involves creating two classes: Literal and BinaryPlus, containing all the data necessary for each case, along with the Execute method for execution of the expression. Common sense, as well as the need for polymorphism, when it comes to the type of left and right subexpression of the binary plus will result in creation of the abstraction over these classes in the form of an abstract class, or an interface Expression, which includes the specification for the execute method for all the specific classes.


							abstract class Expression {
	public abstract int Evaluate();
}

class Literal(int value) : Expression {
	public int Value = value;
	public override int Evaluate() {
		return value;
	}
}
class AddOperator(Expression left, Expression right) : Expression {
	public Expression Left = left;
	public Expression Right = right;
	public override int Evaluate() {
		return Left.Evaluate() + Right.Evaluate();
	}
}

Listing 1.2.1: Implementation of the given specification in object-oriented approach (language: C#).

A solution in a functional style would see a definition of the Expression type as a sum type (also known as tagged union) of two cases: Literal and BinaryPlus. It would also define a single function execute, which takes an expression and returns its value. The function would have two separate implementations and one would be selected with a pattern matching technique. These pattern matches would be resolved at compile time on a call-by-call basis.

type Expression =
	| Literal of int
	| AddOperator of Expression * Expression

let rec evaluate = function
	| Literal value -> value
	| AddOperator(left, right) -> evaluate left + evaluate right

Listing 1.2.2: Implementation of the given specification in functional approach (language: F#).

After the required domain is modeled, specification is extended by one additional data type: a negation operator and one new operation: getting a string representation of the expression. In the object oriented approach adding the new data type requires no modification of existing code; only an addition of a new class. Adding a new operation however, requires changes to the abstraction on all the existing classes, as they now have to include the Stringify method.


							interface Expression {
	public int Evaluate();
	public string Stringify();
}

class Literal(int value) :Expression {
	public int Value = value;
	public int Evaluate() {
		return value;
	}
	public string Stringify() {
		return value.ToString();
	}
}

class AddOperator(Expression left,Expression right) : Expression {
	public Expression Left = left;
	public Expression Right = right;
	public int Evaluate() {
		return Left.Evaluate() + Right.Evaluate();
	}
	public string Stringify() {
		return $"({Left.Stringify()}) + ({Right.Stringify()})";
	}
}

class NegateOperator(Expressionoperand) : Expression {
	public Expression Operand = operand;
	public int Evaluate() {
		return -Operand.Evaluate();
	}
	public string Stringify() {
		return $"-({Operand.Stringify()})";
	}
}

Listing 1.2.3: Implementation of the extended specification in object-oriented approach (language: C#).

In the functional approach adding new function stringify requires no modification to the existing definitions. Adding the Negation data type requires modifying the definition of expression, as well as modification of all the existing functions, by addition of a pattern to match and an implementation.


							type Expression =
    | Literal of int
    | AddOperator of Expression * Expression
    | NegateOperator of Expression

let rec evaluate = function
    | Literal value -> value
    | AddOperator(left, right) -> evaluate left + evaluate right
    | NegateOperator(operand) -> -evaluate operand

let rec stringify = function
    | Literal value -> value.ToString()
    | AddOperator(left, right) -> $"({stringify left}) + ({stringify right})"
    | NegateOperator(operand) -> $"-({stringify operand})"

Listing 1.2.4: Implementation of the extended specification in functional approach (language: F#).

This example illustrates, how expression problem is not trivial, and why it has been a point of research for the past three decades. In the following part of the chapter, we will try to specify the exact meaning of this definition.

Extensions

Available literature mostly agrees on the definition quoted at the beginning of this chapter, but different sources interpret it in a variety of ways.

Syntactical code organization

Some sources focus on an organizational aspect of code – namely: how different definitions are being organized in syntactic structures (Tarr et al., 1999). In this interpretation our main point of concern is the fact that certain implementations are being bound syntactically into a single structure, and upon addition of new type/operation this structure needs to be extended, and therefore modified (from a syntactic standpoint). We take note of the phenomena of scattering and tangling. Scattering is a placement of several pieces of code, which touch on a single concern, in separate units of code. In our example scattering can be observed in two ways: in object-oriented solution methods that share a concern of the same operation are scattered across all the classes. In functional solution operations concerning one type of object is scattered between different functions. Tangling occurs when different units of code in a grouping touch on different concerns. In our example tangling is also visible in two ways. In object-oriented solution several methods are tangled in the same class, despite representing different operations. In functional solution implementations for different types are tangled in each function. This perspective, focused on code organization mostly applies to situations, where one compilation unit is being developed, and in the bound of that unit changes occur over time, extending the data and operations handled by the program.

External extensibility

A different branch of research is focused on the external extensibility (Szyperski, 1996). This perspective assumes an externally defined module with a set of data types and operations, which are being used in in-house developed program. In this line of reasoning the external module (dependency) and internal module (consumer) are separate compilation units. This brings a consequence of true inability of change of dependency code, as it is shipped in a compiled state with its source code potentially unavailable. Therefore, a solution requires a special structure of compilation targets, which allows for extension at link time (Pirkelbauer et al., 2007).

For an example, let us consider the same problem domain as before. Assume the external module consists of functionality from initial specification, i.e. two types of expression and a single operation defined on them. The implementation of this module is equal to that presented in Listing 1.2.1 and Listing 1.2.2, however now we will need to account for module related syntax (Listing 1.2.5 and Listing 1.2.6).


								namespace BasicExpressions;

public interface Expression
{
	public abstract int Evaluate();
}

public class Literal(int value) : Expression
{
	public int Value	{ get; set; } = value;
	public int Evaluate()	=> Value;
}

public class AddOperator<TExpression>(TExpression left, TExpression right) : Expression where TExpression : Expression
{
	public TExpression	Left	{ get; set; } = left;
	public TExpression	Right	{ get; set; } = right;
	public int	Evaluate()	=> Left.Evaluate() + Right.Evaluate();
}

Listing 1.2.5: Object-oriented implementation of the specification enclosed in a BasicExpressions module (language: C#).


								module BasicExpressions
type Expression =
    | Literal of int
    | AddOperator of Expression * Expression


let rec evaluate = function
    | Literal value -> value
    | AddOperator(left, right) -> evaluate left + evaluate right

Listing 1.2.6: Functional implementation of the specification enclosed in a BasicExpressions module (language: F#).

The dependency module may be distributed and used separately for different clients in a form of compilation unit. The problem is then understood as writing a consumer module, that uses definitions of data types and operations given by dependency, extends it with definitions of new data types and new operations and makes this extended functionality available for further consumers. Let us illustrate this with object-oriented and functional approach to this problem.


									namespace ExtendedExpressions;
using Basic = BasicExpressions;

public interface Expression : Basic.Expression
{
	public string Stringify();
}

public class Literal(int n) : Basic.Literal(n), Expression
{
	public string Stringify() => Value.ToString();
}

public class Addition(Expression left, Expression right) :
	Basic.Addition(left, right), Expression
{
	public string Stringify() => $"({Left.Stringify()}) + ({Right.Stringify()})";
}

public class Negation(Expression negated) : Expression
{
	public Expression Negated { get; set; } = negated;
	public int Evaluate() => -Negated.Evaluate();
	public string Stringify() => $"- ({Negated.Stringify()})";
}

Listing 1.2.7: Object-oriented extension of the functionality provided by the BasicExpressions module, as implemented in ExtendedExpressions (language: C#).

Naïve object-oriented approach (Listing 1.2.7) uses two strategies to extend the dependency, one for adding new data variants, and a second for adding new operations. For adding new data variants, it is once again enough to create a new class, which implements all the methods defined in dependency. For the addition of new methods, a new Expression interface is created, that extends the original Expression interface by all the newly introduced method declarations. New implementation for each existing data variant is then created, reusing all the old behaviour through inheritance, and adding new behaviour directly.


								module ExtendedExpressions
open BasicExpressions; module Basic = BasicExpressions

type Expression =
    | BasicExpression of Basic.Expression
    | Negation of (Expression)

let rec evaluate = function
    | BasicExpression b -> Basic.evaluate b
    | Negation e -> - evaluate e
let rec stringify = function
    | BasicExpression (Literal n) -> string n
    | BasicExpression (Addition (left,right)) -> 
$"({stringify (BasicExpression left)}) + ({stringify (BasicExpression right)})"
     | Negation e -> $"-({stringify e})"

Listing 1.2.8: Functional extension of the functionality provided by the BasicExpressions module, as implemented in ExtendedExpressions (language: F#).

Naïve functional approach (Listing 1.2.8) again, uses two strategies – one for extending data variants and the other for extending operations. If it is only required to add a new operation, it can be done by simply adding a new function. However, if at least one new data variant needs to be added it is necessary to create a new union type which enumerates the original Expression as one of its variants and enlists newly introduced variants separately. All functions, including those existing in the original module need to be redefined in terms of this new data type, delegating implementation to the original functions wherever possible.

Note, that both naïve approaches do not achieve their goals. In particular, the point of failure is wherever any of the data variants composes polymorphically of the base abstraction.

When we look at the redefinition of the Addition class, it inherits from original Addition, which composes two members of type BasicExpressions.Expression, but Addition would need them to be of type ExtendedExpressions.Expression to call the Strigify method recursively.

Similarly, in the functional approach a BasicExpression(Addition(left,right)) variant of Extended.Expression composes left and right as Basic.Expressions, so a data structure representing expression where negation is within addition is impossible to construct.

External composability

While considering an Expression Problem, with extension of external module, some references consider the influence of their extension on other existing clients of the dependency module and a possible diamond problem (see Figure 1.1 below) that can occur between modules (Zenger & Odersky, 2005). However, not all papers on the matter consider this interaction.

Figure 1.1: Illustration of a possible diamond problem. Module B defines new data while module C defines new operations. Module D is a unifying module, because it unifies definitions from unrelated modules. Depending on the solution, module D may be impossible to create, or objects constructed in module B may not be used with operations define in module C.

Polymorphism

Another aspect of Expression Problem which is not universally agreed on, is whether the data types should be unified by an abstraction (Carette et al., 2009; Zenger & Odersky, 2005). Adding this requirement usually is associated with enabling a certain form of polymorphism on all the data forms registered under that abstraction. If the requirement of abstraction is imposed, then the question arises about the form of this polymorphism. Static polymorphism allows operations to have polymorphic definitions, but with the restriction that all calls of these operations must have argument types resolvable statically by the compiler. Dynamic polymorphism allows a compilation of code that includes operation calls with argument types dependent on runtime, and narrowed only to the broadest abstraction that defines this operation.

In general, sources agree that this problem does not apply to dynamic and dynamically typed languages such as Python or JavaScript, which allow runtime modification of data and operation definitions (Carette et al., 2009; Haeri & Keir, 2019; Oliveira, 2009; Oliveira et al., 2013; Oliveira & Cook, 2012; Zenger & Odersky, 2005). In such languages extensibility is granted even for dynamically dispatched operations. However, these languages lack static guarantees that are granted by static type systems and as a result they allow for a whole class of runtime errors not present in statically typed languages.

Expression Problem in this paper

As this chapter shown, there are different aspects of the Expression Problem, which can be respected or omitted in different situations. Having this in mind, the next chapter will assess solutions proposed in literature separately for each criterion, instead of choosing a specific subset of them. Similarly, following chapter will assess my own solution according to all aspects of Expression Problem in separation.

Overview of proposed solutions

Continue from here