Train a RBM

To train a RBM, you need to use the rbms train script.

RBM hyperparameters

  • --num_hiddens Number of hidden nodes for the RBM. Setting it to \(20\) or less allows to recover the exact log-likelihood of the model by enumerating on all hidden configurations.

  • --batch_size Batch size, defaults to \(2000\). Changing the batch size has an impact on the noise in the estimation of the positive term of the gradient. Setting it to a low value can lead to a very bad estimation and a bad training, but setting it too high can lead to an exact gradient, losing the benefits of the SGD (and remain trapped in a local minima for example).

  • --num_chains Number of parallel chains, defaults to \(2000\). Setting it to a much higher value than the batch size does not provide benefits, since it only impacts the estimation of the negative term of the gradient.

  • --gibbs_steps Number of sampling steps performed at each gradient update. The \(k\) in PCD-\(k\).

  • --learning_rate Learning rate. Defaults to \(0.01\), setting a larger learning rate often leads to instability.

  • --num_updates The training time is indexed on the number of gradient updates performed and not the number of epochs.

  • --beta The inverse temperature to use during training (Defaults to \(1\) and should not be changed)

Save options

  • --filename The path to the hdf5 archive to save the RBM during training. It will overwrite previously existing file.

  • --n_save The number of machines to save during the training.

  • --spacing Can be exp or linear, defaults to exp. When exp is selected, the time between the save of two models will increase exponentially. (It will look good in log-scale). When linear is selected, the time between the save of two models will be constant. Saving lots of models can quickly become the computational bottleneck, leading to long execution times.

  • --log For now it is deprecated so you don’t care about it.

  • --acc_ptt Target acceptance rate. Defaults to \(0.25\). Models will be saved when the acceptance rate between two consecutive models when sampling them using PTT drops below this threshold.

  • --acc_ll Same as before but defaults to \(0.75\). This allows to have two different schemes when saving models.

PyTorch options

  • --device The device on which to run the computations. Follows the PyTorch semantic so you can select which GPU to use with ‘cuda:1’ for example.

  • --dtype The dtype of all the tensors. can be int, double or float. The default is float which corresponds to torch.float32.

Example

The command I typically use to train a RBM on MNIST-01 will be

rbms train -d ./path/to/MNIST.h5 --subset_labels 0 1 \
--filename output/rbm/MNIST01_from_scratch.h5  --num_updates 10000 \
--n_save 50 --spacing exp --num_hiddens 20 --batch_size 2000 --num_chains 2000 \
--learning_rate 0.01 --device cuda --dtype float

Restore the training

If you want to continue the training of a RBM (be it one recovered from a RCM or a previously trained one), you can use the same scripts. The differences are that you should add the --restore flags. Also some arguments are not useful anymore and can be safely ignored:

  • --num_hiddens

  • --batch_size

  • --gibbs_steps

  • --learning_rate

  • --num_chains

Finally the updates will be added to the same archive you provide as an input through --filename. If the --restore flag is set, then the file will not be overwritten.