Finetune MatterSim#
Finetune Script#
MatterSim provides a finetune script to
finetune the pre-trained MatterSim model on a custom dataset.
You can find the script in the training
folder or in the
github link.
Finetune Parameters#
The finetune script accepts several command-line arguments to customize the training process. Below is a list of the available parameters:
run_name: (str) The name of the run. Default is “example”.
train_data_path: (str) Path to the training data file. Supports various file types readable by ASE (e.g., .xyz, .traj, .cif) and .pkl files. Default is “./sample.xyz”.
valid_data_path: (str) Path to the validation data file. Default is None.
load_model_path: (str) Path to load the pre-trained model. Default is “mattersim-v1.0.0-1m”.
save_path: (str) Path to save the trained model. Default is “./results”.
save_checkpoint: (bool) Whether to save checkpoints during training. Default is False.
ckpt_interval: (int) Interval (in epochs) to save checkpoints. Default is 10.
device: (str) Device to use for training, either “cuda” or “cpu”. Default is “cuda”.
cutoff: (float) Cutoff radius for interactions. Default is 5.0.
threebody_cutoff: (float) Cutoff radius for three-body interactions, should be smaller than the two-body cutoff. Default is 4.0.
epochs: (int) Number of training epochs. Default is 1000.
batch_size: (int) Batch size for training. Default is 16.
lr: (float) Learning rate for the optimizer. Default is 2e-4.
step_size: (int) Step size for the learning rate scheduler. Default is 10.
include_forces: (bool) Whether to include forces in the training. Default is True.
include_stresses: (bool) Whether to include stresses in the training. Default is False.
force_loss_ratio: (float) Ratio of force loss in the total loss. Default is 1.0.
stress_loss_ratio: (float) Ratio of stress loss in the total loss. Default is 0.1.
early_stop_patience: (int) Patience for early stopping. Default is 10.
seed: (int) Random seed for reproducibility. Default is 42.
re_normalize: (bool) Whether to re-normalize energy and forces according to new data. Default is False.
scale_key: (str) Key for scaling forces. Only used when
re_normalize
is True. Default is “per_species_forces_rms”.shift_key: (str) Key for shifting energy. Only used when
re_normalize
is True. Default is “per_species_energy_mean_linear_reg”.init_scale: (float) Initial scale value. Only used when
re_normalize
is True. Default is None.init_shift: (float) Initial shift value. Only used when
re_normalize
is True. Default is None.trainable_scale: (bool) Whether the scale is trainable. Only used when
re_normalize
is True. Default is False.trainable_shift: (bool) Whether the shift is trainable. Only used when
re_normalize
is True. Default is False.wandb: (bool) Whether to use Weights & Biases for logging. Default is False.
wandb_api_key: (str) API key for Weights & Biases. Default is None.
wandb_project: (str) Project name for Weights & Biases. Default is “wandb_test”.
These parameters allow you to customize the finetuning process to suit your specific dataset and computational resources.
Finetune Example#
You can replace the data path with your own data path.
torchrun --nproc_per_node=1 src/mattersim/training/finetune_mattersim.py \
--load_model_path mattersim-v1.0.0-1m \
--train_data_path xyz_files/train.xyz \
--valid_data_path xyz_files/valid.xyz \
--batch_size 16 \
--lr 2e-4 \
--step_size 20 \
--epochs 200 \
--save_path ./finetune_result \
--save_checkpoint \
--ckpt_interval 20 \
--include_stresses \
--include_forces