Skip to main content

Datatypes

Algebraic datatypes, known from programming languages such as ML, offer a convenient way for specifying common data structures. Records and tuples are special cases of algebraic datatypes, and so are scalars (enumeration types). But algebraic datatypes are more general. They can be used to specify finite lists, trees and other recursive structures.

Records

A record is specified as a datatype with a single constructor and as many arguments as record elements. The number of arguments to a record are always the same. The type system does not allow to extend records and there is no record subtyping.

The following example illustrates that two records are equal only if all the arguments are equal. It introduces the parametric type Pair, with constructor mk-pair and two arguments that can be accessed using the selector functions first and second.

Record Updates

You can create variants of records by updating selected fields.

Scalars (enumeration types)

A scalar sort is a finite domain sort. The elements of the finite domain are enumerated as distinct constants. For example, the sort S is a scalar type with three values A, B and C. It is possible for three constants of sort S to be distinct, but not for four constants.

Recursive datatypes

A recursive datatype declaration includes itself directly (or indirectly) as a component. A standard example of a recursive data-type is the one of lists. A parametric list can be specified in Z3 as

The List recursive datatype is builtin in Z3. The empty list is nil, and the constructor insert is used to build new lists. The accessors head and tail are defined as usual.

In the example above, we also assert that l1 and l2 are not nil. This is because the interpretation of head and tail is under-specified on nil. So then head and tail would not be able to distinguish nil from (insert (head nil) (tail nil)).

Mutually recursive datatypes

You can also specify mutually recursive datatypes for Z3. We list one example below.

In the example above, we have a tree of Booleans and a tree of integers. The leaf constant must return a tree of a specific sort. To specify the result sort, we use the qualified identifier (as leaf (Tree Int)). Note that, we do not need to use a qualified identifier for value, since Z3 can infer the intended declaration using the sort of the argument.

Z3 will not prove inductive facts

The ground decision procedures for recursive datatypes don't lift to establishing inductive facts. Z3 does not contain methods for producing proofs by induction. This may change in the future. In particular, consider the following example where the function p is true on all natural numbers, which can be proved by induction over Nat. Z3 enters a matching loop as it attempts instantiating the universally quantified implication.

Nested datatypes with Arrays and Sequences

In some applications it is convenient to have a sequence of types that are recursively defined. For example an abstract syntax tree of a program is a sequence of basic statements, and a basic statement can be an assignment or an if-then-else statement, where the then and else branches are statements. Similarly, it may be convenient to use a nesting of algebraic data-types and arrays. Z3 supports nesting ADTs over sequences and over ranges of arrays.

Using Datatypes for solving type constraints

In the following we use algebraic datatypes to represent type constraints for simply typed lambda calculus. Terms and types over simply typed lambda calculus are of the form

  • M::=xMMλx . MM ::= x \mid M M \mid \lambda x \ . \ M
  • τ::=intstringττ\tau ::= int \mid string \mid \tau \rightarrow \tau

where xx is bound variable, MMM M' applies the function MM to argument MM', and λx . M\lambda x \ . \ M is a lambda abstraction. The intint and stringstring types are type constants.

A type judgemnt is

  • ΓM:τ\Gamma \vdash M : \tau

where Γ\Gamma is a type environment that provides types to free variables in MM. An expression MM has a type τ\tau if there is a derivation using the rules:

  • Γ,x:τx:τ\Gamma, x : \tau \vdash x : \tau
  • If ΓM:ττ\Gamma \vdash M : \tau \rightarrow \tau', ΓM:τ\Gamma \vdash M' : \tau then ΓMM:τ\Gamma \vdash M M' : \tau'.
  • If Γ,x:τM:τ\Gamma, x : \tau \vdash M : \tau' then Γλx:M:ττ\Gamma \vdash \lambda x : M : \tau \rightarrow \tau'.

We can use constraints over algebraic data-types to determine if expressions can be typed.

  • type(MM)=Ytype(M M') = Y, type(M)=XYtype(M) = X \rightarrow Y, type(M)=Xtype(M') = X, for fresh X,YX, Y.
  • type(λx.M)=type(x)type(M)type(\lambda x . M) = type(x) \rightarrow type(M)

Checking if terms have principal types

We define types and expressions as algebraic data-types. The types of applications produces three constraints for applications and the single constraint for lambda abstraction. The encoding into SMTLIB uses several features. Besides algebraic data-types it uses uninterpreted functions instead of introducing fresh variables. It defines a recursive function that extracts type constraints from an expression.

Using UNSAT cores to identify mutually inconsistent type constraints

We can track each sub-expression and use unsatisfiable cores to identify a set of mutually inconsistent type constraints. When the core is minimial, it means that modifying any one of the subterms from the corresponding violated constraints can fix the type error. This provides some indication of error location, but isn't great for diagnostics. In the next section we use MaxSAT for more targeted diagnostics.

Using optimization to localize type errors

By asserting each type checking condition as a soft constraint and seeking an optimized solution to satisfy as many type constraints as possible, we obtain targeted information of what sub-terms could not be type checked.

We can read off the type annotations for remaining sub-terms using the current model. Albeit, it is not a user-friendly format.