Variational Transformer

The variational transformer is a Bayesian variant that produces multiple different predictions by using different dropout masks. Dropout (Srivastava et al., 2014) is a regulation technique that randomly sets connections between neurons in a neural network to zero during training in order to avoid co-adaption during training. Importantly, this technique is disabled during inference. In their work, Gal & Ghahramani (2016a) propose to use Dropout during inference as well in order to approximate the weight posterior of neural networks. In a follow-up work, Gal & Ghahramani (2016b) apply this technique to recurrent neural networks as well, and in Xiao et al., (2020) to transformer architectures.

Warning

In Xiao et al., (2020), it is not fully specified if MC Dropout is used with all available dropout layers. We opted for this approach, and found encouraging results (Ulmer et al., 2022) .

In this module, we implement two versions:

  • nlp_uncertainty_zoo.models.variational_transformer.VariationalTransformer / nlp_uncertainty_zoo.models.variational_transformer.VariationalTransformerModule: MC Dropout applied to a transformer trained from scratch. See nlp_uncertainty_zoo.models.transformer for more information on how to use the Transformer model & module.

  • nlp_uncertainty_zoo.models.variational_transformer.VariationalBert / nlp_uncertainty_zoo.models.variational_transformer.VariationalBertModule: MC Dropout applied to pre-trained and then fine-tuned. See nlp_uncertainty_zoo.models.bert for more information on how to use the Bert model & module.

The application of MC Dropout to LSTMs can be found in nlp_uncertainty_zoo.models.variational_lstm.

Variational Transformer Module Documentation