Regular Expressions
SMTLIB2 standard The theory of unicode strings and regular expressions
The sort constructor RegEx
takes as argument a sequence type.
The set of regular expressions over strings is thus (RegEx String)
;
it is synonymous with the sort RegLan
defined in the SMTLIB2 format.
Summary of Operations
Operation | Brief Description |
---|---|
(str.to.re s) | Convert string to regular expression accepting s |
(str.in.re s r) | Determine if s is in the language generated by r |
re.allchar | The regular expression accepting every string |
re.nostr | The regular expression rejecting every string |
(re.range ch1 ch2) | The range of characters (represented as strings) between ch1 and ch2 |
(re.++ r1 r2 r3) | Concatenation of regular expressions |
(re.* r) | Kleene star |
(re.+ r) | Kleene plus |
(re.opt r) | Zero or one use of r |
((_ re.loop lo hi) r) | from lo to hi number of repetitions of r |
(re.union r1 r2) | The union of regular languages |
(re.inter r1 r2) | The intersection of regular languages |
(seq.to.re s) | Convert sequence to regular expression accepting s |
(seq.in.re s r) | Determine if sequence s is in the language generated by r |
(as re.all R) | The regular expression of sort R accepting every sequence |
(as re.empty R) | The regular expression of sort R rejecting every sequence |
(re.of.prop p) | Sequences of length 1 where character satisfies predicate p . The sort of p is (Array C Bool) , where C is the character sort. |
(re.replace_re s r dst) | Currently not supported: replace left-most smallest occurrence matching r in s by dst |
(re.replace_re_all s r dst) | Currently not supported: replace, traversing left-to-right, smallest matches, all occurrences matching r in s by dst |
The re.range
operator expects two strings each encoding a single character.
For example (re.range "a" "\u{ff}")
is a valid range of characters,
while (re.range "aa" "")
is the empty language.
For compatibility with the SMTLIB2 format
Z3 also accepts expressions of the form (re.loop r lo hi)
.
Z3 understands only the meaning of these terms when lo, hi
are
integer numerals.
What (not) to expect of regular expressions
The default solver for regular expressions unfolds membership relations of regular expressions lazily.
It uses symbolic derivatives .
This approach works for many membership and non-membership constraints, but is not a complete
procedure when membership constraints are combined with constraints over strings.
Note that the syntax allows forming symbolic regular expressions that contain uninterpreted non-terminals.
It also does not handle regular expressions symbolic sequences (it allows
to express non-regular languages).
Thus, the string s
in (str.to.re s)
should be
a string literal. You can write formulas with equalities over
regular expressions. Z3 is a decision procedure for equalities and disequalities between non-symbolic regular expressions.
Examples
The maximal length is 6 for a string of length 2 repeated at most 3 times.