Single-Step Models

Syntheseus currently supports 8 established single-step models.

For convenience, for each model we include a default checkpoint trained on USPTO-50K. If no checkpoint directory is provided during model loading, syntheseus will automatically download a default checkpoint and cache it on disk for future use. The default path for the cache is $HOME/.cache/torch/syntheseus, but it can be overriden by setting the SYNTHESEUS_CACHE_DIR environment variable. See table below for the links to the default checkpoints.

Model checkpoint link	Source
Chemformer	finetuned by us starting from checkpoint released by authors
GLN	released by authors
Graph2Edits	released by authors
LocalRetro	trained by us
MEGAN	trained by us
MHNreact	trained by us
RetroKNN	trained by us
RootAligned	released by authors

More advanced datasets

The USPTO-50K dataset is well-established but relatively small. Advanced users may prefer to retrain their models of interest on a larger dataset, such as USPTO-FULL or Pistachio. To do that, please follow the instructions in the original model repositories.

In reaction_prediction/cli/eval.py a forward model can be used for computing back-translation (round-trip) accuracy. See here for a Chemformer checkpoint finetuned for forward prediction on USPTO-50K. As for the backward direction, pretrained weights released by original authors were used as a starting point.

Licenses

All checkpoints were produced in a way that involved external model repositories, hence may be affected by the exact license each model was released with. For more details about a particular model see the top of the corresponding model wrapper file in reaction_prediction/inference/.