Bayesian LSTM¶

The Bayesian LSTM is based on the concept of Bayes-by-backprop introduced by Blundell et al. (2015), applied to recurrent networks by Fortunato et al. (2017). The idea is that instead of learning one single value per parameter, we learn a normal distribution over parameter values (thus, we actually learn two parameters, the mean and variance of every network parameter). During inference, we sample one parameter set from these distributions to make a prediction.

In this case, we implement the Bayesian LSTM using the Blitz package.

Bayesian LSTM Module Documentation¶

Implement the Bayesian Bayes-by-backprop LSTM by Fortunato et al. (2017).

class nlp_uncertainty_zoo.models.bayesian_lstm.BayesianLSTM(vocab_size: int, output_size: int, input_size: int = 650, hidden_size: int = 650, num_layers: int = 2, dropout: float = 0.3, prior_sigma_1: float = 0.7, prior_sigma_2: float = 0.8, prior_pi: float = 0.1, posterior_mu_init: float = -0.04, posterior_rho_init: float = -6, num_predictions: int = 10, is_sequence_classifier: bool = True, lr: float = 0.1, weight_decay: float = 0.001, optimizer_class: ~torch.optim.optimizer.Optimizer = <class 'torch.optim.adam.Adam'>, model_dir: ~typing.Optional[str] = None, device: ~typing.Union[~torch.device, str] = 'cpu', **model_params)¶: Bases: Model

class nlp_uncertainty_zoo.models.bayesian_lstm.BayesianLSTMModule(vocab_size: int, output_size: int, input_size: int, hidden_size: int, num_layers: int, dropout: float, prior_sigma_1: float, prior_sigma_2: float, prior_pi: float, posterior_mu_init: float, posterior_rho_init: float, num_predictions: int, is_sequence_classifier: bool, device: Union[device, str], **build_params)¶

Bases: LSTMModule, MultiPredictionMixin

Implementation of a Bayes-by-backprop LSTM by Fortunato et al. (2017).

get_logits(input_: LongTensor, *args, num_predictions: Optional[int] = None, **kwargs) → FloatTensor¶

Get the logits for an input. Results in a tensor of size batch_size x seq_len x output_size or batch_size x num_predictions x seq_len x output_size depending on the model type. Used to create inputs for the uncertainty metrics defined in nlp_uncertainty_zoo.metrics.

Parameters:

input_: torch.LongTensor: (Batch of) Indexed input sequences.
num_predictions: Optional[int]: Number of predictions (forward passes) used to make predictions.

Returns:

torch.FloatTensor: Logits for current input.

predict(input_: LongTensor, *args, **kwargs) → FloatTensor¶

Output a probability distribution over classes given an input. Results in a tensor of size batch_size x seq_len x output_size or batch_size x num_predictions x seq_len x output_size depending on the model type.

Parameters:

input_: torch.LongTensor: (Batch of) Indexed input sequences.

Returns:

torch.FloatTensor: Logits for current input.

training: bool¶