# Train a RBM To train a RBM, you need to use the rbms train script. ## RBM hyperparameters - `--num_hiddens` Number of hidden nodes for the RBM. Setting it to $20$ or less allows to recover the exact log-likelihood of the model by enumerating on all hidden configurations. - `--batch_size` Batch size, defaults to $2000$. Changing the batch size has an impact on the noise in the estimation of the positive term of the gradient. Setting it to a low value can lead to a very bad estimation and a bad training, but setting it too high can lead to an exact gradient, losing the benefits of the SGD (and remain trapped in a local minima for example). - `--num_chains` Number of parallel chains, defaults to $2000$. Setting it to a much higher value than the batch size does not provide benefits, since it only impacts the estimation of the negative term of the gradient. - `--gibbs_steps` Number of sampling steps performed at each gradient update. The $k$ in PCD-$k$. - `--learning_rate` Learning rate. Defaults to $0.01$, setting a larger learning rate often leads to instability. - `--num_updates` The training time is indexed on the number of gradient updates performed and not the number of epochs. - `--beta` The inverse temperature to use during training (Defaults to $1$ and should not be changed) ## Save options - `--filename` The path to the hdf5 archive to save the RBM during training. It will overwrite previously existing file. - `--n_save` The number of machines to save during the training. - `--spacing` Can be `exp` or `linear`, defaults to `exp`. When `exp` is selected, the time between the save of two models will increase exponentially. (It will look good in log-scale). When `linear` is selected, the time between the save of two models will be constant. Saving lots of models can quickly become the computational bottleneck, leading to long execution times. - `--log` For now it is deprecated so you don't care about it. - `--acc_ptt` Target acceptance rate. Defaults to $0.25$. Models will be saved when the acceptance rate between two consecutive models when sampling them using PTT drops below this threshold. - `--acc_ll` Same as before but defaults to $0.75$. This allows to have two different schemes when saving models. ## PyTorch options - `--device` The device on which to run the computations. Follows the PyTorch semantic so you can select which GPU to use with 'cuda:1' for example. - `--dtype` The dtype of all the tensors. can be `int`, `double` or `float`. The default is `float` which corresponds to `torch.float32`. ## Example The command I typically use to train a RBM on `MNIST-01` will be ```bash rbms train -d ./path/to/MNIST.h5 --subset_labels 0 1 \ --filename output/rbm/MNIST01_from_scratch.h5 --num_updates 10000 \ --n_save 50 --spacing exp --num_hiddens 20 --batch_size 2000 --num_chains 2000 \ --learning_rate 0.01 --device cuda --dtype float ``` # Restore the training If you want to continue the training of a RBM (be it one recovered from a RCM or a previously trained one), you can use the same scripts. The differences are that you should add the `--restore` flags. Also some arguments are not useful anymore and can be safely ignored: - `--num_hiddens` - `--batch_size` - `--gibbs_steps` - `--learning_rate` - `--num_chains` Finally the updates will be added to the same archive you provide as an input through `--filename`. If the `--restore` flag is set, then the file will **not** be overwritten.