# Regular Expressions

SMTLIB2 standardThe theory of unicode strings and regular expressions

The sort constructor `RegEx`

takes as argument a sequence type.
The set of regular expressions over strings is thus `(RegEx String)`

;
it is synonymous with the sort `RegLan`

defined in the SMTLIB2 format.

# Summary of Operations

Operation | Brief Description |
---|---|

`(str.to.re s)` | Convert string to regular expression accepting `s` |

`(str.in.re s r)` | Determine if `s` is in the language generated by `r` |

`re.allchar` | The regular expression accepting every string |

`re.nostr` | The regular expression rejecting every string |

`(re.range ch1 ch2)` | The range of characters (represented as strings) between `ch1` and `ch2` |

`(re.++ r1 r2 r3)` | Concatenation of regular expressions |

`(re.* r)` | Kleene star |

`(re.+ r)` | Kleene plus |

`(re.opt r)` | Zero or one use of `r` |

`((_ re.loop lo hi) r)` | from `lo` to `hi` number of repetitions of `r` |

`(re.union r1 r2)` | The union of regular languages |

`(re.inter r1 r2)` | The intersection of regular languages |

`(seq.to.re s)` | Convert sequence to regular expression accepting `s` |

`(seq.in.re s r)` | Determine if sequence `s` is in the language generated by `r` |

`(as re.all R)` | The regular expression of sort `R` accepting every sequence |

`(as re.empty R)` | The regular expression of sort `R` rejecting every sequence |

`(re.of.prop p)` | Sequences of length 1 where character satisfies predicate `p` . The sort of `p` is `(Array C Bool)` , where `C` is the character sort. |

`(re.replace_re s r dst)` | Currently not supported: replace left-most smallest occurrence matching `r` in `s` by `dst` |

`(re.replace_re_all s r dst)` | Currently not supported: replace, traversing left-to-right, smallest matches, all occurrences matching `r` in `s` by `dst` |

The `re.range`

operator expects two strings each encoding a single character.
For example `(re.range "a" "\u{ff}")`

is a valid range of characters,
while `(re.range "aa" "")`

is the empty language.

For compatibility with the SMTLIB2 format
Z3 also accepts expressions of the form `(re.loop r lo hi)`

.
Z3 understands only the meaning of these terms when `lo, hi`

are
integer numerals.

# What (not) to expect of regular expressions

The default solver for regular expressions unfolds membership relations of regular expressions lazily.
It uses symbolic derivatives .
This approach works for many membership and non-membership constraints, but is not a complete
procedure when membership constraints are combined with constraints over strings.
Note that the syntax allows forming *symbolic* regular expressions that contain uninterpreted non-terminals.
It also does not handle regular expressions symbolic sequences (it allows
to express non-regular languages).
Thus, the string `s`

in `(str.to.re s)`

should be
a string literal. You can write formulas with equalities over
regular expressions. Z3 is a decision procedure for equalities and disequalities between non-symbolic regular expressions.

# Examples

The maximal length is 6 for a string of length 2 repeated at most 3 times.