Train a RBM
To train a RBM, you need to use the rbms train script.
RBM hyperparameters
--num_hiddensNumber of hidden nodes for the RBM. Setting it to \(20\) or less allows to recover the exact log-likelihood of the model by enumerating on all hidden configurations.--batch_sizeBatch size, defaults to \(2000\). Changing the batch size has an impact on the noise in the estimation of the positive term of the gradient. Setting it to a low value can lead to a very bad estimation and a bad training, but setting it too high can lead to an exact gradient, losing the benefits of the SGD (and remain trapped in a local minima for example).--num_chainsNumber of parallel chains, defaults to \(2000\). Setting it to a much higher value than the batch size does not provide benefits, since it only impacts the estimation of the negative term of the gradient.--gibbs_stepsNumber of sampling steps performed at each gradient update. The \(k\) in PCD-\(k\).--learning_rateLearning rate. Defaults to \(0.01\), setting a larger learning rate often leads to instability.--num_updatesThe training time is indexed on the number of gradient updates performed and not the number of epochs.--betaThe inverse temperature to use during training (Defaults to \(1\) and should not be changed)
Save options
--filenameThe path to the hdf5 archive to save the RBM during training. It will overwrite previously existing file.--n_saveThe number of machines to save during the training.--spacingCan beexporlinear, defaults toexp. Whenexpis selected, the time between the save of two models will increase exponentially. (It will look good in log-scale). Whenlinearis selected, the time between the save of two models will be constant. Saving lots of models can quickly become the computational bottleneck, leading to long execution times.--logFor now it is deprecated so you don’t care about it.--acc_pttTarget acceptance rate. Defaults to \(0.25\). Models will be saved when the acceptance rate between two consecutive models when sampling them using PTT drops below this threshold.--acc_llSame as before but defaults to \(0.75\). This allows to have two different schemes when saving models.
PyTorch options
--deviceThe device on which to run the computations. Follows the PyTorch semantic so you can select which GPU to use with ‘cuda:1’ for example.--dtypeThe dtype of all the tensors. can beint,doubleorfloat. The default isfloatwhich corresponds totorch.float32.
Example
The command I typically use to train a RBM on MNIST-01 will be
rbms train -d ./path/to/MNIST.h5 --subset_labels 0 1 \
--filename output/rbm/MNIST01_from_scratch.h5 --num_updates 10000 \
--n_save 50 --spacing exp --num_hiddens 20 --batch_size 2000 --num_chains 2000 \
--learning_rate 0.01 --device cuda --dtype float
Restore the training
If you want to continue the training of a RBM (be it one recovered from a RCM or a previously trained one), you can use the same scripts. The differences are that you should add the --restore flags. Also some arguments are not useful anymore and can be safely ignored:
--num_hiddens--batch_size--gibbs_steps--learning_rate--num_chains
Finally the updates will be added to the same archive you provide as an input through --filename. If the --restore flag is set, then the file will not be overwritten.