We make a video to demonstrate how to use GETMusic at this link
Python environment:
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
pip install tensorboard
pip install pyyaml
pip install tqdm
pip install transformers
pip install einops
pip install miditoolkit
pip install scipy
We use the song “Childhood” by Tayu Lo for demonstration. It is located in the “example_data” folder.
To perform track generation, follow these steps:
python track_generation.py --load_path /path-of-checkpoint --file_path example_data/inference
Resume from /path-of-checkpoint
example_data/inference/childhood.mid
skip?
skip?n
Select condition tracks ('b' for bass, 'd' for drums, 'g' for guitar, 'l' for lead, 'p' for piano, 's' for strings, 'c' for chords; multiple choices; input any other key to skip): lc
Select content tracks ('l' for lead, 'b' for bass, 'd' for drums, 'g' for guitar, 'p' for piano, 's' for strings; multiple choices): dgp
In this example, we generate drum, guitar, and piano tracks based on the lead and chord tracks. We also truncate the song at a length of 512 to avoid extrapolation.
100%|████████| 100/100 [00:06<00:00, 16.52it/s]
sampling, the song has 512 time units
The generation process is fast, and you can find the results saved as ‘example_data/inference/lc2dgp-childhood.mid’. You can open it with Musescore for further composition.
As shown in our demo page, we support the hybrid generation of track-wise composition and infilling. While the specification of such composition needs might seem complicated, we have not found a simpler solution:
python position_generation.py --load_path /path-of-checkpoint --file_path example_data/inference
The script will examine the tracks in the input MIDI and provide a representation visualization and an input example to make the condition specification clear:
Resume from /path-of-checkpoint
example_data/inference/childhood.mid
skip?n
The music has {'lead'} tracks, with 865 positions
Representation Visualization:
0,1,2,3,4,5,6,7,8,...
(0)lead
(1)bass
(2)drum
(3)guitar
(4)piano
(5)string
(6)chord
Example: condition on 100 to 200 position of lead, 300 to 400 position of piano, write command like this:'0,100,200;4,300,400
Input positions you want to condition on:
Input positions you want to empty:
Input positions you want to condition on: 0,0,200;6,0,
Input positions you want to empty: 1,0,;4,0,;5,0,
In this example, the specified conditions are the first 200 time units of the lead track and the entire chord track. The empty positions include the entire bass, piano, and string tracks.
All the examples mentioned above use chord guidance, which is automatically inferred from the input tracks. If you want to generate tracks from scratch but condition them on chords, the simplest way is to input a song with the desired chord progression and let the model infer the chords. Unfortunately, we haven’t found a user-friendly solution to specify the desired chord progression through interactive input, so we do not open this function code. However, you can modify the code if needed.
Here are some tips to enhance your experience with GETMusic:
About ‘bass’: if you want to generate a ‘bass’ track, the default instrument in Musescore is ‘低音提琴’ (Double Bass), which may not sound harmonious. Change it to ‘原音贝斯’ (Electric Bass/Bass Guitar).
Tune the volume: GETScore does not involve volume information. To obtain satisfactory composition results, you may tune the volume of each instrument. For example, our default volume for ‘string’ may be too loud that covers the lead melody, you may need to turn it down.
Enable Chord Guidance: We recommend always enabling chord guidance when generating music to achieve a regular pattern in the generated music score.
Incremental generation: Our experience indicates that employing incremental generation when generating multiple tracks from scratch yields improved results in terms of both regularity in music patterns and overall quality. For example, you can conduct a two-stage generation:
Avoid Domain Gap:
We do not open training data or the data cleansing scripts we used. However, we have included some MIDI files in the “example_data/train” folder to demonstrate data pre-processing:
python preprocess/to_oct.py example_data/train example_data/processed_train
The output will be:
SUCCESS: example_data/train/0_10230_TS0.mid
SUCCESS: example_data/train/0_10232_TS0.mid
SUCCESS: example_data/train/0_10239_TS0.mid
SUCCESS: example_data/train/0_01023_TS0.mid
4/4 (100.00%) MIDI files successfully processed
python preprocess/make_dict.py example_data/processed_train/ 3
The number 3 indicates that only tokens appearing more than 3 times should be included in the vocabulary. The output will display the tokens details in each track. Use the last two rows of the output to modify the last two rows in ‘getmusic/utils/midi_config.py’ as follows:
tracks_start = [16, 144, 272, 408, 545, 745]
tracks_end = [143, 271, 407, 544, 744, 903]
python preprocess/binarize.py example_data/processed_train/pitch_dict.txt example_data/processed_train/oct.txt example_data/processed_train
The output will display the number of files in the validation and train sets:
# valid set: 10
100%|██████| 10/10 [00:00<00:00, 1227.77it/s]
valid set has 10 reps
| #train set: 9
100%|██████| 9/9 [00:00<00:00, 1317.12it/s]
train set has 9 reps
After executing these commands, you will find the following files in the “processed_train” folder:
|example_data
|----processed_train
|--------oct.txt
|--------pitch_dict.txt
|--------train_length.npy
|--------train.data
|--------train.idx
|--------valid_length.npy
|--------valid.data
|--------valid.idx
Modify other parameters such as the scheduler, optimizer, batch size, etc., as per your requirements.
python train.py
We appreciate to the following authors who make their code available: