Bayesian LSTM

The Bayesian LSTM is based on the concept of Bayes-by-backprop introduced by Blundell et al. (2015), applied to recurrent networks by Fortunato et al. (2017). The idea is that instead of learning one single value per parameter, we learn a normal distribution over parameter values (thus, we actually learn two parameters, the mean and variance of every network parameter). During inference, we sample one parameter set from these distributions to make a prediction.

In this case, we implement the Bayesian LSTM using the Blitz package.

Bayesian LSTM Module Documentation

Implement the Bayesian Bayes-by-backprop LSTM by Fortunato et al. (2017).

class nlp_uncertainty_zoo.models.bayesian_lstm.BayesianLSTM(vocab_size: int, output_size: int, input_size: int = 650, hidden_size: int = 650, num_layers: int = 2, dropout: float = 0.3, prior_sigma_1: float = 0.7, prior_sigma_2: float = 0.8, prior_pi: float = 0.1, posterior_mu_init: float = -0.04, posterior_rho_init: float = -6, num_predictions: int = 10, is_sequence_classifier: bool = True, lr: float = 0.1, weight_decay: float = 0.001, optimizer_class: ~torch.optim.optimizer.Optimizer = <class 'torch.optim.adam.Adam'>, model_dir: ~typing.Optional[str] = None, device: ~typing.Union[~torch.device, str] = 'cpu', **model_params)

Bases: Model

class nlp_uncertainty_zoo.models.bayesian_lstm.BayesianLSTMModule(vocab_size: int, output_size: int, input_size: int, hidden_size: int, num_layers: int, dropout: float, prior_sigma_1: float, prior_sigma_2: float, prior_pi: float, posterior_mu_init: float, posterior_rho_init: float, num_predictions: int, is_sequence_classifier: bool, device: Union[device, str], **build_params)

Bases: LSTMModule, MultiPredictionMixin

Implementation of a Bayes-by-backprop LSTM by Fortunato et al. (2017).

get_logits(input_: LongTensor, *args, num_predictions: Optional[int] = None, **kwargs) FloatTensor

Get the logits for an input. Results in a tensor of size batch_size x seq_len x output_size or batch_size x num_predictions x seq_len x output_size depending on the model type. Used to create inputs for the uncertainty metrics defined in nlp_uncertainty_zoo.metrics.

Parameters:
input_: torch.LongTensor

(Batch of) Indexed input sequences.

num_predictions: Optional[int]

Number of predictions (forward passes) used to make predictions.

Returns:
torch.FloatTensor

Logits for current input.

predict(input_: LongTensor, *args, **kwargs) FloatTensor

Output a probability distribution over classes given an input. Results in a tensor of size batch_size x seq_len x output_size or batch_size x num_predictions x seq_len x output_size depending on the model type.

Parameters:
input_: torch.LongTensor

(Batch of) Indexed input sequences.

Returns:
torch.FloatTensor

Logits for current input.

training: bool