Synthesis

Software synthesis offers an alternative way of developing trustworthy programs. At a high level, program specifications describe what the function should do, and its corresponding implementation describes how it will do it. While verification ensures that both views match, synthesis consists of translating these specifications (or expectations) to executable programs which realises them.

As we have seen previously, relatively precise specifications of complex operations can be written concisely. Synthesis can thus reduce development time and allows the user to focus on high-level aspects.

Deductive Synthesis Framework

The synthesizer takes a synthesis problem that it extracts from choose or holes (???) found in the program. Formally, we define a synthesis problem as

\[[[ ~~ \bar{a} ~~ \langle ~~ \Pi \rhd \phi ~~ \rangle ~~ \bar{x} ~~ ]]\]

which carries the following information:

  • a set of input variables \(\bar{a}\), initially all variables in scope of the choose,
  • a set of output variables \(\bar{x}\), corresponding to the values to synthesize,
  • a path-condition \(\Pi\), constraining \(\bar{a}\),
  • the specification \(\phi\) relating input variables to output variables.

The job of the synthesizer is to convert this problem into a solution \([ ~ P ~ | ~ T ~ ]\), which consists of

  • a executable program term \(T\) (an expression that may contain input variables),
  • a precondition \(P\) under which this program is valid

To illustrate, we consider the following program:

def foo(a: BigInt, b: BigInt): Int = {
  if (a > b) {
    choose ( (x: BigInt) => x > a)
  } else {
    0
  }
}

From this program, we will extract the following initial synthesis problem:

\[\begin{split}[[ ~~ a, b ~~ \langle ~~ a > b ~ \rhd ~ x > a ~~ \rangle ~~ x ~~ ]]\end{split}\]

A possible solution to this problem would be:

\[[ ~ \top ~ | ~ a + 1 ~ ]\]

To solve this problem, the synthesizer will apply a series of rules which will try to either

  1. Decompose this problem into sub problems, which may be simpler to solve
  2. Immediately find a solution

This corresponds to an and-or graph of rule applications, which Leon will explore.

Decomposing Rules

Leon defines several rules that decompose a synthesis problem (a choose) into sub-problems that may be simpler to solve. Such rules also define a way to generate a solution for the original problem, when provided with solutions for all of the sub-problems. These rules thus both decompose the problems and recompose the solutions. Leon defines several of such decomposing rules:

Equality Split

Given two input variables a and b of compatible types, this rule considers the cases where a = b and a != b. From:

choose(res => spec(a, b, res))

this rule generates two sub-chooses, and combines them as follows:

if (a == b) {
  choose(res => spec(a, a, res))
} else {
  choose(res => spec(a, b, res))
}

Inequality Split

Given two input variables a and b of numeric type, this rule considers the cases where a < b, a == b, and a > b. From:

choose(res => spec(a, b, res))

this rule generates three sub-chooses, and combines them as follows:

if (a < b) {
  choose(res => spec(a, b, res))
} else if (a > b) {
  choose(res => spec(a, b, res))
} else {
  choose(res => spec(a, a, res))
}

ADT Split

Given a variable a typed as an algebraic data type T, the rules decomposes the problem in cases where each case correspond to one subtype of T:

abstract class T
case class A(f1: Int) extends T
case class B(f2: Boolean) extends T
case object C extends T

choose(res => spec(a, res))

this rule generates three sub-chooses, in which the input variable a is substituted by the appropriate case, and combines them as follows:

a match {
  case A(f1) => choose(res => spec(A(f1), res))
  case B(f2) => choose(res => spec(B(f2), res))
  case C     => choose(res => spec(C, res))
}

Int Induction

Given an Int (or BigInt) variable a, the rules performs induction on a:

choose(res => spec(a, res))

this rule generates three sub-chooses, one for the base case and one for each inductive case (we allow negative numbers):

def tmp1(a: Int) = {
  if (a == 0) {
    choose(res => spec(a, res))
  } else if (a > 0) {
    val r1 = tmp1(a-1)
    choose(res => spec(a, res))
  } else if (a < 0) {
    val r1 = tmp1(a+1)
    choose(res => spec(a, res))
  }
}

tmp1(a)

This allows Leon to synthesize a well-structured recursive function.

One Point

This syntactic rule considers equalities of an output variable at the top level of the specification, and substitutes the variable with the corresponding expression in the rest of the formula. Given the following specification:

\[res1 = expr \land \phi\]

and assuming \(expr\) does not use \(a\), we generate the alternative and arguable simpler specification:

\[\phi[res1 \rightarrow expr]\]

Assert

The Assert rule scans the specification for predicates that only constraint input variables and lifts them out of the specification. Since these are constraints over the input variables, they typically represent the precondition necessary for the choose to be feasible. Given an input variable a:

choose(res => spec(a, res) && pred(a))

will become:

require(pred(a))

choose(res => spec(a, res))

Case Split

This rule considers a top-level disjunction and decomposes it:

choose(res => spec1(a, res) || spec2(a, res))

thus becomes two sub-chooses

if (P) {
  choose(res => spec1(a, res))
} else {
  choose(res => spec2(a, res))
}

Here we note that P is not known until the first choose is solved, as it corresponds to its precondition.

Equivalent Input

This rule discovers equivalences in the input variables in order to eliminate redundancies. We consider two kinds of equivalences:

1) Simple equivalences: the specification contains \(a = b\) at the top level.

2) ADT equivalence the specification contains \(l.isInstanceOf[Cons] \land h = l.head \land t = l.tail\) which entails \(l = Cons(h, t)\) and thus allows us to substitute \(l\) by \(Cons(h, t)\).

Eliminating equivalences prevents explosion of redundant rule instantiations. For instance, if you have four integer variables where three of them are equivalent, Leon has 6 ways of applying Inequality Split. After eliminating equivalences, only one application remains possible.

Unused Input

This rule tracks input variables (variables originally in scope of the choose) that are not constrained by the specification or the path-condition. These input variables carry no information and are thus basically useless. The rule consequently eliminates them from the set of input variables with which rules may be instantiated.

Unconstrained Output

This rule is the dual of Unused Input: it tracks output variable (result values) that are not constrained. Such variables can be trivially synthesized by any value or expression of the right type. For instance:

choose ((x: Int, y: T) => spec(y))

becomes

(0, choose ((y: T) => spec(y)))

Leon will use the simplest value of the given type, when available. Note this rule is not able to synthesize variables of generic types, as no literal values exist for these. While null may be appropriate in Scala, Leon does not define it.

Closing Rules

While decomposing rules split problems in sub-problems, Leon also defines rules that are able to directly solve certain synthesis problems. These rules are crucial for the synthesis search to terminate efficiently. We define several closing rules that apply in different scenarios:

Ground

This rule applies when the synthesis problem has no input variables. If the specification is satisfiable, its model corresponds to a valid solution. We rely on SMT solvers to check satisfiability of the formulas. For instance:

choose ((x: Int, y: Int) => x > 42 && y > x)

can trivially be synthesized by (1000, 1001).

If the specification turns out to be UNSAT, the synthesis problem is impossible and we synthesize it as an error with a false precondition.

Optimistic Ground

This rule acts like Ground, but without the requirement on the absence of input variables. The reasoning is that even though the problem has input variables, the solutions might still be a constant expression.

Optimistic Ground also tries to satisfy the specification, but it also needs to validate the resulting model. That is, given a valuation of output variables, it checks whether it exists a valuation for input variables such that the specification is violated. The model is discarded if such counter-example is found. If no counter-example exist, we solve the synthesis problem with the corresponding values.

The rule tries at most three times to discover a valid value.

CEGIS

CEGIS stands for Counter-Example-Guided Inductive Synthesis, it explores the space of small expressions to find valid solutions. Here we represent the space of programs by a tree, where branches are determined by free boolean variables. For instance, a tree for small integer operations could be:

def res(b, a1, a2) =      if (b1) 0
                     else if (b2) 1
                     else if (b3) a1
                     else if (b4) a2
                     else if (b5) c1(b, a1, a2) + c2(b, a1, a2)
                     else         c1(b, a1, a2) * c2(b, a1, a2)

def c1(b, a1, a2)  =      if (b7) 0
                     else if (b8) 1
                     else if (b9) a1
                     else         a2

def c2(b, a1, a2)  =      if (b10) 0
                     else if (b11) 1
                     else if (b12) a1
                     else          a2

At a high-level, it consists of the following loop:

  1. Find one expression and inputs that satisfies the specification: \(\exists \bar{b}, a1, a2. spec(a1, a2, res(\bar{b}, a1, a2))\). If this fails, we know that the solution is not in the search space. If this succeeds, we:
  2. Validate the expression represented by \(M_\bar{b}\) for all inputs by searching for a counter-example: \(\exists a1, a2. \lnot spec(a1, a2, res(M_\bar{b}, a1, a2))\). If such counter-example exists, start over with (1) with this program excluded. If no counter-example exists we found a valid expression.

The space of expressions our CEGIS rule considers is small expressions of bounded depth (3), which contain for each type: a few literals, functions and operations returning that type that do not transitively call the function under synthesis (to prevent infinite loops), and recursive calls where one argument is decreasing.

TEGIS

This rule uses the same search space as CEGIS but relies only on tests (specified by the user or generated) to validate expressions. It is thus a generally faster way of discovering candidate expressions. However, these expressions are not guaranteed to be valid since they have only been validated by tests. Leon’s synthesis framework supports untrusted solutions which trigger an end-to-end validation step that relies on verification.

String Conversion

This rule applies to pretty-printing problems given a non-empty list of examples, i.e. of the type:

choose ((x: String) =>
  (input, x) passes {
     case InputExample1 => "Output Example1"
     case InputExample2 => "Output Example2"
  }
)

It will create a set of functions and an expression that will be consistent with the example. The example need to ensure the following properties:

  • Primitive display: All primitive values (int, boolean, Bigint) present in the InputExample must appear in the OutputExample. The exception being Boolean which can also be rendered differently (e.g. as “yes” and “no”)
  • Linearization: The output example must use the same order as the definition of InputExample; that is, no Set, Map or out of order rendering (e.g. case (1, 2) => "2, 1" will not work)

To further optimize the search, it is also better to ensure the following property

  • Constant case class display: By default, if a hierarchy of case classes only contains parameterless variants, such as
abstract class StackThread
case class T1() extends StackThread
case class T2() extends StackThread

it will first try to render expressions with the following function:

def StackThreadToString(t: StackThread) = t match {
  case T1() => "T1"
  case T2() => "T2"
}
CONST + StackThreadToString(t) + CONST

where CONST will be inferred from the examples.