Train a RBM
To train a RBM, you need to use the rbms train script.
RBM hyperparameters
--num_hiddens
Number of hidden nodes for the RBM. Setting it to \(20\) or less allows to recover the exact log-likelihood of the model by enumerating on all hidden configurations.--batch_size
Batch size, defaults to \(2000\). Changing the batch size has an impact on the noise in the estimation of the positive term of the gradient. Setting it to a low value can lead to a very bad estimation and a bad training, but setting it too high can lead to an exact gradient, losing the benefits of the SGD (and remain trapped in a local minima for example).--num_chains
Number of parallel chains, defaults to \(2000\). Setting it to a much higher value than the batch size does not provide benefits, since it only impacts the estimation of the negative term of the gradient.--gibbs_steps
Number of sampling steps performed at each gradient update. The \(k\) in PCD-\(k\).--learning_rate
Learning rate. Defaults to \(0.01\), setting a larger learning rate often leads to instability.--num_updates
The training time is indexed on the number of gradient updates performed and not the number of epochs.--beta
The inverse temperature to use during training (Defaults to \(1\) and should not be changed)
Save options
--filename
The path to the hdf5 archive to save the RBM during training. It will overwrite previously existing file.--n_save
The number of machines to save during the training.--spacing
Can beexp
orlinear
, defaults toexp
. Whenexp
is selected, the time between the save of two models will increase exponentially. (It will look good in log-scale). Whenlinear
is selected, the time between the save of two models will be constant. Saving lots of models can quickly become the computational bottleneck, leading to long execution times.--log
For now it is deprecated so you don’t care about it.--acc_ptt
Target acceptance rate. Defaults to \(0.25\). Models will be saved when the acceptance rate between two consecutive models when sampling them using PTT drops below this threshold.--acc_ll
Same as before but defaults to \(0.75\). This allows to have two different schemes when saving models.
PyTorch options
--device
The device on which to run the computations. Follows the PyTorch semantic so you can select which GPU to use with ‘cuda:1’ for example.--dtype
The dtype of all the tensors. can beint
,double
orfloat
. The default isfloat
which corresponds totorch.float32
.
Example
The command I typically use to train a RBM on MNIST-01
will be
rbms train -d ./path/to/MNIST.h5 --subset_labels 0 1 \
--filename output/rbm/MNIST01_from_scratch.h5 --num_updates 10000 \
--n_save 50 --spacing exp --num_hiddens 20 --batch_size 2000 --num_chains 2000 \
--learning_rate 0.01 --device cuda --dtype float
Restore the training
If you want to continue the training of a RBM (be it one recovered from a RCM or a previously trained one), you can use the same scripts. The differences are that you should add the --restore
flags. Also some arguments are not useful anymore and can be safely ignored:
--num_hiddens
--batch_size
--gibbs_steps
--learning_rate
--num_chains
Finally the updates will be added to the same archive you provide as an input through --filename
. If the --restore
flag is set, then the file will not be overwritten.