SNGP Transformer¶
The Spectral-normalized Gaussian Process transformer consists of a transformer-based feature extractor, on which then a Gaussian Process output layer is fitted.
This idea was proposed by Liu et al. (2020), and is an instance of
Deep Kernel learning (Wilson et al., 2015).
The spectral aspect is further explained in nlp_uncertainty_zoo.models.spectral
, since the spectral normalization
is also shared by other models.
In this module, we implement two versions:
nlp_uncertainty_zoo.models.sngp_transformer.SNGPTransformer
/nlp_uncertainty_zoo.models.sngp_transformer.SNGPTransformerModule
: SNGP applied to a transformer trained from scratch. Seenlp_uncertainty_zoo.models.transformer
for more information on how to use the Transformer model & module.
nlp_uncertainty_zoo.models.sngp_transformer.SNGPBert
/nlp_uncertainty_zoo.models.sngp_transformer.SNGPBertModule
: SNGP applied to pre-trained and then fine-tuned. Seenlp_uncertainty_zoo.models.bert
for more information on how to use the Bert model & module.
Warning
The nlp_uncertainty_zoo.models.sngp_transformer.SNGPTransformer
/ nlp_uncertainty_zoo.models.sngp_transformer.SNGPTransformerModule
were included for completeness, but might not be very stable.
In Ulmer et al. (2022), it was already found that even with pre-trained BERT models as feature extractors, training was quite unstable, and would probably be even more unstable when training the underlying transformer from scratch.
All the important model logic is encapsulated in the nlp_uncertainty_zoo.models.sngp_transformer.SNGPModule
class in order to avoid code redundancies.
Since many NLP tasks involve many classes, we use the approximation detailed in Appendix A.1 in the paper.
To be able to compute uncertainty metrics like nlp_uncertainty_zoo.utils.metrics.mutual_information()
, we choose to not use the mean-field approximation of the posterior in equation (7).